ArrowLinkToArchivePageBlog

Everything you need to know about Modern Data Warehouse (Part 3) Data Warehouse Design for Analytics

How to use the cloud to deliver actionable business insights?

Welcome to the third part of our data journey. Now you know what a Modern Data Warehouse is and how to successfully migrate data to the cloud, we can talk about getting our insights!

We will continue going through a typical Modern Data Warehouse project. If you missed out on the last part, get up to speed here.

KEY TAKEAWAYS:

  • How to fit large data models into a limited memory space?
  • What is the key to building a user-friendly dashboard?
  • How to easily secure the available data?

Today we will find out how to analyze data, create user-friendly visualizations, and keep data safe for your users. Let’s begin!

How to fit large amounts of data into Azure Analysis Services? Challenge 4

Once the dimensions and facts are calculated, data is exposed for analysis using a dedicated service, such as Azure Analysis Services.

AAS provides additional analytical features that let users slice and dice data from different contexts, define aggregation measures, play with time intelligence, and look at the data from a hierarchical perspective.

In general, analytical databases form another abstraction layer that enhances your data analytics and reporting capabilities.

What are the storage capacities for Azure cloud services?

When it comes to the available data storage services on the Azure cloud, most of them have a sufficient amount of available space. For example, Azure SQL Database Hyperscale can store up to 100 TB of data, and Azure Synapse Analytics up to 1 PB. In the case of Azure Analysis Services, the biggest available service tier can hold up to 400 GB.

The important thing here is that you should leave some memory free (about 20-25%), otherwise data processing will fail. Therefore, the available memory space is effectively around 300 GB. As a result, we had yet another challenge – how to fit several terabytes of data into ~0,3 TB space.

The answer was quite simple. With the removal of one unnecessary column, we were able to reduce the model size by 70%.

The implemented model stores 5 billion records and occupies 200 GB of storage. This might not be a tremendous amount of records but still, without a few adjustments, it would not even fit within 1 TB. Great work here is done by the VertiPaq engine, which compresses data at a high ratio.

However, this is not everything. When it comes to very large analytical models, you need to analyze the problem from a wider perspective. If you successfully fit the data inside the model, you’re only halfway there. Still, a good model performance is another milestone that has to be reached.

What have we learned from this challenge?

Here are the main takeaways we took from this stage.

Lessons from challenge 4: Understand which measures and columns you need, consider your data types as they might affect model size, avoid high cardinality columns, aggregate data up front, process in increments

Lessons learned from preparing the data model for analysis

How to build a fast and responsive dashboard? Challenge 5

Fast and responsive reports are probably the most crucial element of the whole Modern Data Warehouse engagement. This is the major connection point where the users gain business insights from all the data collected.

Anything behind reports and dashboards is hardly visible, yet it probably consumes 90-95% of the time and effort to provide and display results. However, it is the remaining 5-10% that determines the success or failure of a project.

If your reports contain errors and do not refresh within a few seconds, then your project will fail and 100% of your effort will land in the trash. Users will not be interested and will go back to the old way of reporting, however old-fashioned it might be.

Model optimization is key

A large amount of data always leaves quite a mark on query performance. If the model is not optimized and measure definitions are poor, then query performance might decrease exponentially as the data volume increases.

However, a well-designed model will not have any significant impact on performance even if data grows from 2 to 4, to 8 billion records.

Once the model is done, it might still need additional improvements to produce a fast and responsive report. Each selected visualization and related query might require additional analysis and optimization, or changes at a model level.

What is also important is how many visualizations you have and how they are organized. Sometimes it is better to combine multiple visualizations into one. This step reduces the number of queries sent to the model and lowers the number of queries.

In one of the cases, the users needed Power BI dashboards with a live connection to Azure Analysis Services. The reports were used by people at every organizational level, starting with employees located at a particular location, all the way up to top management. In the end, they were utilized by several thousands of users daily.

The expectation was to get the reports refreshed within 3 seconds after an interaction, such as changing a filter value. Following the tips above, we were able to achieve this result even though the queried model contained 5 billion rows.

NoteA well-optimized and responsive analytical model is one of the required steps to achieve success. The next stage is an eye-catching and well-developed report. Creating one requires two different sets of skills (development and graphic design). The final step is adoption within the organization. A completely different story and a topic for another blog series ?

What have we learned from this challenge?

Here are our key findings from this step.

Lessons learned from challenge 5: try to avoid resource-intensive calculations, narrow the data scope and the number of visualizations, use a standard filter panel if possible

Lessons learned from creating dashboards

How to secure data for a large number of users? Challenge 6

It seems that everything is done. Data flow pipelines are stable and process data at set time intervals. Insights are available on time and can be consumed from dedicated reports or through analytical models.

But there is one more important thing to do. We need to grant access to this information for a large number of users. Moreover, access levels should be different, depending on the role within the organization.

In one of our cases, the daily number of active users stood at several thousand. The expectation was to grant access to data based on the organization’s hierarchical structure. Each user could have a unique set of requirements.

Altogether, there were around 500,000 different security combinations that needed to be applied. More importantly, with over 25,000 employees, the company had a lot of personnel changes. Some people would leave the company, others would join it, someone would be promoted, etc.

Security measures needed to take these constant changes into account. They had to be reflected in the security model, and all adjustments should be done automatically.

Want to stay in touch? Get curated insights on the hottest topics in the industry, delivered directly to your inbox every two weeks! Sign me up!

How to manage user lifecycles automatically?

Creating dedicated AD groups for this many cases is not a good idea. There might be some external solutions that can handle and automate them. We went for a more convenient option.

Our solution was to implement row-level security in the analytical model, combined with the organizational structure. Security is driven by the place and position to which a particular employee was assigned. We used the central employee database as a data source.

As a result, security requirements are evaluated automatically, based on the changes within the organization. Furthermore, additional efforts were made to manually add exceptions. In some cases, we needed to add different security rules. Nevertheless, everything is managed and driven from one central place.

Using this approach, every employee has access to the information they need. An employee at a local branch sees data related to that branch. District Managers can see all data from their districts, and Brand Managers see all data related to their brand.

What have we learned from this challenge?

Here are our conclusions from this final stage.

Lessons learned from challenge 6: Don't leave security as an afterthought, build it into the project, consider implementing row-level security, leverage the organizational structure

Lessons learned from securing data

Don’t miss part 4!

And that’s a wrap on the first stage of our journey! Now you know what a Data Warehouse is, why it’s important to modernize it, and how the Azure cloud can help with it. We’ve also gone over an MVP phase of a Data Warehouse modernization project, and how to tackle the most common challenges.

Click here for part 4, where we go over a Data Domain Framework, to show you how you can make managing data projects much easier.

Got any questions? Just click here to send them – we will get back to you!

Written in collaboration with Paweł Borowiecki.

Key takeaways

  1. Select only the necessary metrics for your data model. It will help you build a more efficient data model.
  2. Choose your visualizations carefully. Narrowing down the scope to exactly what you need will improve the performance of the data model.
  3. Introduce robust security measures to protect the data. See if you can leverage the organizational structure to set up permissions.

See other articles in our Modern Data Warehouse series:

  1. Introduction to Modern Data Warehouse
  2. How to design a Modern Data Warehouse architecture
  3. How to run a Modern Data Warehouse project effectively
  4. How to keep development under control
  5. How to incorporate Master Data Management

Sign up for Predica Newsletter

A weekly, ad-free newsletter that helps cutomer stay in the know. Take a look.

SHARE

Want more updates like this? Join thousands of specialists who already follow our newsletter.

Make data-backed changes with an Enterprise Data Warehouse

Find out how to build a data hub for your business.

Stay up to date with the latest cloud insights from our CTO