Bring your questions, and we will provide answers.

 

That’s it. No commitments. With less than 1.5 hours of your time, you will limit the risk for your project. 

We like to deliver! After a call, you will get a summary report with all the information we covered.
Do not wait.

 

Register for your free scoping call right now.



    *Predica needs the contact information you provide to us to contact you about our products and services. You may unsubscribe from these communications at any time. For information on how to unsubscribe, as well as our privacy practices and commitment to protecting your privacy, please review our Privacy Policy.

    Everything You Need To Know About Modern Data Warehouse (Part 1)

    Modern Data Warehouse

    Most companies (and most likely yours too) want to be data-driven and/or monetize their vast amounts of data. But before you start using advanced technologies like AI or Machine Learning, this data has to be prepared first. It’s a complex process, but from this series, you can learn what it involves.

    Nowadays, a Modern Data Warehouse is a need for virtually any company. The internet is full of suggestions, reference architectures, and recommendations on how to do the job well. The theory is a good starting point, but it doesn’t always correspond with reality.

    In this brand new article series, Paweł Borowiecki and I will share our experience from real Data Warehouse modernization projects. We will cover many aspects, not just those related to development.

    Key points:

    • What is a Data Warehouse?
    • Why is it necessary to modernize it?
    • How does the cloud help with its implementation?
    • What does a typical modernization project look like?

    Which topics will we cover?

    When it comes to large projects, development is not the biggest challenge, especially when you have a mature and experienced team. So, over the course of 6 dedicated posts, we will shed some light on the following topics:

    • Modern Data Warehouse architecture and challenges
    • Predica Data Domain Framework (PDDF)
    • project organization, tools, process structure, infrastructure and environments, automation
    • Master Data Management.

    Today, I will focus on the Modern Data Warehouse architecture. I will also touch upon other areas that we will elaborate on later in the series. Let’s begin.

    What is a Modern Data Warehouse?

    First, we need to clarify the concept of data warehouse. According to Wikipedia:

    “a data warehouse . . . is a system used for reporting and data analysis . . . [Data Warehouses] are central repositories of integrated data from one or more disparate data sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise.”

    (Source)

    Data Warehouses are well known and widely used. But how is a modernized version different?

    Why do you need a Modern Data Warehouse and what does it allow you to achieve?

    Here are the two main reasons why you need to consider an upgrade:

    • You have Big Data and unstructured / semi-structured data that must be integrated with structured data
    • You want to use modern public cloud data services (keep reading to find out why this is a good idea).

    Microsoft describes this solution in the following way:

    “A modern data warehouse lets you bring together all your data at any scale easily, and to get insights through analytical dashboards, operational reports, or advanced analytics for all your users. . . [It] combine[s] all your structured, unstructured and semi-structured data”

    (Source)

    In other words, a Modern Data Warehouse can handle much larger volumes of data and perform complex operations on multiple types of data, giving you in-depth insights.

    The Microsoft concept of a Modern Data Warehouse is based on multiple Azure cloud services:

    • Azure Data Factory
    • Azure Data Lake Storage
    • Azure Databricks
    • Azure Synapse Analytics
    • Azure Analysis Services
    • … and others.

    This is a visualization of the concept:

    Modern Data Warehouse structure

    The services and components that form a Modern Data Warehouse (adapted from the Microsoft model; click to view full-size)

    Why do you need a Data Warehouse modernization?

    It’s hardly possible to find an enterprise that doesn’t possess some kind of Enterprise Data Warehouse (or EDW for short) solution, or at least some elements of it.

    Every business needs some sort of reporting and/or dashboarding. Typically, it also uses many different systems to conduct its day to day operations, so the data must be acquired, cleansed, transformed, and integrated beforehand.

    However, more often than not, an existing EDW is a result of years of constant development due to the evolution of a company itself, and adaptation to an ever-changing external business environment.

    A Business Intelligence project is never finished (we know that BI is not a very popular term right now! These days, we have either a Big Data or an Advanced Analytics solution aided by Machine Learning and/or Artificial Intelligence).

    Unfortunately, the pre-cloud era in IT did not support an agile and flexible approach to any system development or deployment. But let’s be clear, it does not mean all past deployments are a complete mess. However, the combination of:

    • the ever-changing business and technical requirements
    • serious scalability challenges of on-prem hardware and software platforms, and
    • not being suitable for the waterfall approach

    makes most EDWs look a bit like monsters: unfriendly, not useful, not eager to cooperate, yet fighting to maintain the status quo.

    Why is the change important?

    You have probably heard this already.

    “Data is the new oil of the digital economy”

    We even wrote it ourselves once before.

    And yet, data professionals, engineers, and scientists point out that only a small amount of the available data is being utilized. And that’s despite the fact that companies are exposed to an unbelievable explosion of information.

    Therefore, with public cloud services, availability, and maturity of the Platform as a Service (PaaS) computing, and considering the data processing needs of virtually every business, now is the best time to modernize the existing data warehouses.

    Why is PaaS the right choice?

    We’ve already mentioned the scalability and deep insights available with this solution. What’s more:

    • PaaS computing requires less administration and management from the infrastructure personnel as there are fewer maintenance responsibilities (version updates, patching, etc.)
    • PaaS offers better scalability, easy and fast provisioning and configuration
    • High Availability (HA) and Disaster Recovery (DR) features are available out-of-the-box
    • From the financial point of view, the public cloud offers a low entry cost and a pay-as-you-go model.

    There are of course many concerns that need to be taken into account, such as:

    • privacy and compliance
    • no previous public cloud experience, or
    • the lack of knowledge with on-prem/public cloud integration (many of the existing data sources, mainly Line of Business systems, are on-prem solutions).
    Note: Predica Cloud Governance Framework (CGF) may be used to address Business, People, and Technology concerns regarding using the public cloud.

    How does the Azure cloud help?

    Modern Data Warehouse can’t exist without modern infrastructure. As a Microsoft Partner, we work within the Microsoft ecosystem, especially the Microsoft Azure cloud. The majority of our development is done using Microsoft stack.

    We focus on Azure PaaS data services whenever possible, our intention being to reduce overhead related to infrastructure maintenance. Of course, during a project like this, we have to touch upon many other systems beyond the Microsoft ecosystem.

    Azure PaaS is crucial here. It helps us save hundreds, if not thousands, of working hours.

    Selecting and buying the right hardware, then setup, tuning, high availability and disaster recovery setup, troubleshooting, etc. – it’s usually a nightmare. And this is just the tip of the iceberg compared to what you have to take care of in on-premises environments.

    Moreover, you can’t be 100% sure that your hardware estimation is accurate. Therefore, for safety reasons, you buy more than necessary and your infrastructure might lay idle.

    On the other hand, within a few months, it might not be sufficient. Then you can try to scale up/out. No matter what you do in this case, it is not something that you can achieve within a few minutes or on demand.

    Want to stay in touch? Get curated insights on the hottest topics in the industry, delivered directly to your inbox every two weeks! Sign me up!

    What are the advantages of using the cloud?

    It is extremely difficult to estimate the project scope, and business users’ needs and requirements for such a comprehensive project, especially at the beginning of a digital transformation in a large organization.

    These kinds of engagements don’t just take a couple of weeks. Many assumptions will change from phase to phase. Internal and/or external factors (like COVID-19) influence the business. The cloud gives you a number of advantages.

    advantages of the cloud

    The benefits of using the cloud

    In today’s world, we operate in a continuously changing environment. You need to be flexible to adjust to a new reality fast. The infrastructure needs to be flexible to meet demand, business needs, increasing data volume, increasing queries, users, etc. Fast adoption is a must.

    Thanks to the cloud environment, we can now react instantly. Need more computing resources? No problem, let’s scale up the service or provision more database instances to meet the demand. Without this ability, any advanced data project is likely to fail.

    Another thing worth noting is the environment provision and configuration. With the Azure public cloud, we can build in an automated manner a set of environments (development, test, and production). With a single button click, we can provision the whole environment from scratch using automation pipelines.

    Additionally, all environments are consistent, which means we avoid issues related to software versions when deploying the solution to test and production.

    The conclusion? If you are not restricted (by law, regulations) – move to the cloud. Do not experiment with on-premises, and save your time and money.

    What does a typical project involve?

    With the next article, we’ll start explaining how to implement a Modern Enterprise Data Warehouse, based on the projects we delivered. But before we do this, let’s add some background information from those engagements. Here are the figures for our typical modern EDW projects:

    Modern Enterprise Data Warehouse project infographic

    Our Modern Data Warehouse projects in numbers (click to view full-size)

    What are the frequent reasons for modernization?

    From our experience, clients decide to undertake Data Warehouse Modernization projects due to many business and technical challenges they faced with the previous versions of the solution. Here are the most important reasons:

    • The data is in different sources, taking it out takes hours, days, and weeks. This results in missed opportunities
    • There is no single view on spending habits, which leads to costly procurement
    • It is hardly possible to join social data with the company’s own. Consequently, marketing campaigns are less effective and inefficient
    • It isn’t possible to identify the most valuable customers and find out what would allow us to retain them
    • Data quality isn’t good enough and leads to unstable reporting from the business perspective
    • The import of actual data packages from Line-of-Business (LoB) systems often fails due to scalability and availability issues
    • The window for data acquisitions, reconciliation, recalculation, and reporting is often too short to deliver.

    Last but not least, most clients want to transform their business with digital services aided by Machine Learning and Artificial Intelligence. The first step to achieve this ultimate goal leads through a Enterprise Data Warehouse modernization project.

    Add this post to favorites!

    We’ll end here for now. In the next article, we start looking at what conducting such a project involves. Click here to go to the next article.

    Want to get in touch? Send your question here!

    Written in collaboration with Paweł Borowiecki.

    Key takeaways:

    1. A data warehouse is a structure which consolidates and analyzes data to gain valuable business information. A Modern Data Warehouse uses cloud technology to deliver more powerful functionalities and deeper insights.
    2. Using the cloud, you can integrate more diverse sources of data and take advantage of Machine Learning and AI to identify trends and predict future performance.
    3. Using cloud services to modernize your data warehouse means you can take advantage of its flexibility and scalability. As your business grows or the market changes, you can adjust quickly as needed.
    4. We have gained our experience across a range of projects where companies had vast amounts of data spanning up to 20 years. Implementing our solution enabled them to discover and take advantage of new business opportunities.