Most companies (and most likely yours too) want to be data-driven and/or monetize their vast amounts of data. But before you start using advanced technologies like AI or Machine Learning, this data has to be prepared first. It’s a complex process, but from this series, you can learn what it involves.
Nowadays, a Modern Data Warehouse is a need for virtually any company. The internet is full of suggestions, reference architectures, and recommendations on how to do the job well. The theory is a good starting point, but it doesn’t always correspond with reality.
In this brand new article series, we will share our experience from real Data Warehouse modernization projects. We will cover many aspects, not just those related to development.
When it comes to large projects, development is not the biggest challenge, especially when you have a mature and experienced team. So, over the course of 6 dedicated posts, we will shed some light on the following topics:
Today, I will focus on the Modern Data Warehouse architecture. I will also touch upon other areas that we will elaborate on later in the series. Let’s begin.
First, we need to clarify the concept of data warehouse. According to Wikipedia:
(Source)
Data Warehouses are well-known and widely used. But how is a modernized version different?
Here are the two main reasons why you need to consider an upgrade:
Microsoft describes this solution in the following way:
(Source)
In other words, a Modern Data Warehouse can handle much larger volumes of data and perform complex operations on multiple types of data, giving you in-depth insights.
The Microsoft concept of a Modern Data Warehouse is based on multiple Azure cloud services:
This is a visualization of the concept:
The services and components that form a Modern Data Warehouse (adapted from the Microsoft model; click to view full-size)
It’s hardly possible to find an enterprise that doesn’t possess some kind of Enterprise Data Warehouse (or EDW for short) solution, or at least some elements of it.
Every business needs some sort of reporting and/or dashboarding. Typically, it also uses many different systems to conduct its day to day operations, so the data must be acquired, cleansed, transformed, and integrated beforehand.
However, more often than not, an existing EDW is a result of years of constant development due to the evolution of a company itself, and adaptation to an ever-changing external business environment.
A Business Intelligence project is never finished (we know that BI is not a very popular term right now! These days, we have either a Big Data or an Advanced Analytics solution aided by Machine Learning and/or Artificial Intelligence).
Unfortunately, the pre-cloud era in IT did not support an agile and flexible approach to any system development or deployment. But let’s be clear, it does not mean all past deployments are a complete mess. However, the combination of:
makes most EDWs look a bit like monsters: unfriendly, not useful, not eager to cooperate, yet fighting to maintain the status quo.
You have probably heard this already.
We even wrote it ourselves once before.
And yet, data professionals, engineers, and scientists point out that only a small amount of the available data is being utilized. And that’s despite the fact that companies are exposed to an unbelievable explosion of information.
Therefore, with public cloud services, availability, and maturity of the Platform as a Service (PaaS) computing, and considering the data processing needs of virtually every business, now is the best time to modernize the existing data warehouses.
We’ve already mentioned the scalability and deep insights available with this solution. What’s more:
There are of course many concerns that need to be taken into account, such as:
Modern Data Warehouse can’t exist without modern infrastructure. As a Microsoft Partner, we work within the Microsoft ecosystem, especially the Microsoft Azure cloud. The majority of our development is done using Microsoft stack.
We focus on Azure PaaS data services whenever possible, our intention being to reduce overhead related to infrastructure maintenance. Of course, during a project like this, we have to touch upon many other systems beyond the Microsoft ecosystem.
Azure PaaS is crucial here. It helps us save hundreds, if not thousands, of working hours.
Selecting and buying the right hardware, then setup, tuning, high availability and disaster recovery setup, troubleshooting, etc. – it’s usually a nightmare. And this is just the tip of the iceberg compared to what you have to take care of in on-premises environments.
Moreover, you can’t be 100% sure that your hardware estimation is accurate. Therefore, for safety reasons, you buy more than necessary and your infrastructure might lay idle.
On the other hand, within a few months, it might not be sufficient. Then you can try to scale up/out. No matter what you do in this case, it is not something that you can achieve within a few minutes or on demand.
It is extremely difficult to estimate the project scope, and business users’ needs and requirements for such a comprehensive project, especially at the beginning of a digital transformation in a large organization.
These kinds of engagements don’t just take a couple of weeks. Many assumptions will change from phase to phase. Internal and/or external factors (like COVID-19) influence the business. The cloud gives you a number of advantages.
The benefits of using the cloud
In today’s world, we operate in a continuously changing environment. You need to be flexible to adjust to a new reality fast. The infrastructure needs to be flexible to meet demand, business needs, increasing data volume, increasing queries, users, etc. Fast adoption is a must.
Thanks to the cloud environment, we can now react instantly. Need more computing resources? No problem, let’s scale up the service or provision more database instances to meet the demand. Without this ability, any advanced data project is likely to fail.
Another thing worth noting is the environment provision and configuration. With the Azure public cloud, we can build in an automated manner a set of environments (development, test, and production). With a single button click, we can provision the whole environment from scratch using automation pipelines.
Additionally, all environments are consistent, which means we avoid issues related to software versions when deploying the solution to test and production.
The conclusion? If you are not restricted (by law, or regulations) – move to the cloud. Do not experiment with on-premises, and save your time and money.
In the next article, we’ll start explaining how to implement a Modern Enterprise Data Warehouse, based on the projects we delivered. But before we do this, let’s add some background information from those engagements. Here are the figures for our typical modern EDW projects:
From our experience, clients decide to undertake Data Warehouse Modernization projects due to many business and technical challenges they faced with the previous versions of the solution. Here are the most important reasons:
Last but not least, most clients want to transform their business with digital services aided by Machine Learning and Artificial Intelligence. The first step to achieve this ultimate goal leads through an Enterprise Data Warehouse modernization project.
We’ll end here for now. In the next article, we start looking at what conducting such a project involves. Click here to go to the next article.
Want to get in touch? Send your question here!
Written in collaboration with Paweł Borowiecki.
Read similar articles