Skip to main content

Posts

Showing posts from April, 2021

Use case: Migrating an on-premise data lake to the cloud

  In this new series of blog posts called Use Cases , I'm going to go over specific projects that we've worked on with clients to address specific needs, how the project went, and the lessons retained from that experience. I won't go into specifics, but I'll cover the important bits. Hopefully this will be helpful if your organization is looking at undertaking a similar project. This week, I'll talk about a project where we had to migrate a large data lake from an on-premise Hadoop infrastructure to the cloud on AWS. Before any project can start, an analysis of the business needs and a design has to be constructed in order to decide what the solution will be. In this particular case, we weren't involved in the initial decision process, but the solution architect came up with a pretty good design. The company had around 30 TB of data sitting in an on-premise data lake. Data engineers and developers would use Hadoop clusters in order to run their ETL pipelines, an