Is your data really helping your business?

DataOps

The information we gather as organizations can give us a lot of answers, but only if we know how to listen.

After exploring the worlds of DevOps and security, let’s talk about data.

Key points:

  • Why is data important?
  • What is data operations (DataOps)?
  • How to integrate it into your processes?

Data is the new oil! But is it just any data?

How do you know if the information you are using is the right kind of fuel for your organization?

Every company has three main assets:

  • Organization: people, culture, the business it operates
  • Data: information it gathers while doing business and from the market
  • Technology enabling the organization to leverage this data to mitigate business risk.

DevOps and SecOps are about ensuring that technology to meet business goals is deployed and secure.

What about the data that flows through the systems?

Why do we gather and process data?

Organizations use data for one of three reasons:

  • Making improvements to their business
  • Developing new offering
  • Inventing business models

How advanced a company is in using its information depends on its maturity stage. But almost all business use cases for data fall into one of these three categories.

The stages of advancement in data use

First, the organization becomes “data-aware.” There is this feeling that there is more to your data than what you are using at the moment. Questions are being asked and answered by digging through the information at individual level in Excel files.

There is no standard data model. A lot of time is being spent on finding out how to get the right data from many sources, and use it.

When an organization grows, the amount of information it gathers and processes grows as well. The way data is collected and used also matures.

After the initial data processing in silos, there is the next stage. Technology is used to build data warehouses and data marts. New people are on-boarded with the task of building data capabilities. You have, or even are yourself, a Chief Data Officer.

The usage of data at the organization matures still. There are more business cases. More people use it daily to find new insights or control operations.

The company becomes “data-guided”!

Here’s the catch. As the usage of information matures, its maintenance becomes more complicated.

  • The number of data sources and data flows between them grows
  • The growing complexity of data raises questions about it being correct.

Does it sound familiar? Aren’t those the same issues that troubled application development? When speed and integration became an issue, we turned to DevOps to solve it.

Why not turn to DataOps? Does such a thing even exist?

Yes, it does!

This article is based on our newsletter – if you want to be among the first people to get this content, leave your email address. Sign up

What is data operations, or DataOps?

DataOps is a process, like DevOps, used by data and analytical teams. Its purpose is “to improve quality and reduce the time cycle of data analytics.” I got this quote from Wikipedia.

Let’s try to explain it in a more straightforward way. Think about your organization. You know its business. You have an idea that based on data, you can improve a process and provide more value for the core business.

How do you test and confirm it? What do you need to check it? Your data warehouse test environment? A copy of your production data model and data itself?

How long will it take for you to get it? Hours? Days? Weeks?

Let’s assume you’ve got a test environment. You developed your change to the data model, and the move is brilliant!

Do you want everyone to use it? How do you do it, and how long does it take?

Do you have to go through a test environment deployment and manual validation? Would it take days? Weeks?

As an alternative: can you commit your change to a code repository? Can you have it deployed to your data warehouse by tomorrow?

When the change is deployed, there might be some side-effects. Errors, even.

How do you know if your data flow will still make sense from the business perspective? The fact that it works doesn’t mean it contains logical and business knowledge.

What else does your current data process involve?

The answers to these questions are what you need to take care of to make sure your business uses the correct data.

All of it takes time in the standard model to develop, test, and deploy.

All the time that is used for it slows down your business in taking advantage of data.

And all of these issues raise your business risk.

Ready for a change? This is where DataOps steps in.

Data operations is about innovating your value chain of data. It facilitates easy testing and validation of ideas and bringing it as a value to your organization.

Let’s discuss how you can introduce it to your processes.

Where are you on your DevOps journey?

#1 Standardize your data environment

Every journey starts with directions. The common language also helps to navigate it. Build a shared repository of information and practices in your organization around data engineering.

Knowledge is no longer spread across many places and in people’s heads. It lives in a shared repository, where everyone can find and update it.

What is the goal? Your data team starts to organize and standardize the approach to your solutions. Instead of 100s different ways of doing something, now you have your own data framework!

We’ve built our own Predica Data Domain Framework: a shared repository of best practices, available for everyone working with data projects.

A key element of data operations: a shared repository for everyone working on data projects

A screenshot of our repository

Pro tip: don’t aim to standardize everything in one shot. Don’t let this effort stop you from doing work. It is a living standard. Build it along the way when delivering value from your data projects.

#2 Turn your data environment into code

You can’t speed things up if you don’t turn your data projects into code. Everything lives as code nowadays. The Azure cloud makes it much more manageable. Look at this reference data journey architecture:

Azure data platform example architecture

Example data platform architecture

All these are services in Azure. If it is in the cloud, it means it is based on APIs and can be automated. Once you allow your data and infrastructure to process it as code, you may proceed to the next step, which is delivering two crucial elements of your data operations:

  • Orchestrate and test
  • Deploy.

#3 Automate data flow testing and monitoring

Data flows can be described in code deployed into your Azure services. They make building new environments easy. They also provide standardization and automation of such deployments.

Once data flows are treated as code, you can also build automated tests to verify the streams of data and its quality.

Instead of spending days on execution of manual data validation tests, you can make them part of the pipeline and execute every day or at every deployment.

Example dashboard for data testing automation

A sample dashboard for data testing performance

It provides so much needed trust in data and saves time. Now you can check if your information is right every time a change is implemented to data flows or models. It also lets you find out if something was changed at the source. And all of this using a single dashboard!

#4 Automate data environment deployments

With your data project living as code, built on standard practices, and automation of tests, you are ready for the next step: deployment.

A typical data project deployment is time-consuming and laborious. But after a transformation into a DataOps project, you can deploy it as any other code – with a CI/CD pipeline.

A CI/CD pipeline for data

A CI/CD pipeline for data

So, what are the benefits of DataOps?

It allows you to build environments.

It allows you to execute data flow tests as part of the deployment.

Finally, it allows you to merge new changes – to deliver value.

See what your data can do!

With those four elements, you started your organization’s journey into DataOps. Now you are ready to iterate and innovate on data, to reduce your business risk. It also makes your data project fun and interesting for teams working on it. They are no longer SQL people, but the DataOps team!

Let’s end with some resources

To help you with your journey into DataOps, here are a few links which will let you dig more into the topic:

  • DataOps Manifesto – read it with your team and find out where you are on this journey
  • DataOps resources on Wikipedia
  • DataKitchen DataOps book and blog – they provide excellent support to get started with data operations
  • DataOps is not DevOps for data – a great blog post to understand the concepts behind DataOps in a nutshell.

If you’d like to take a step back and look more into the topic of DevOps, you can complete this questionnaire to find out how your teams are doing right now and where you could improve.

And if you have any questions about the tools we use, the Predica Data Domain Framework, or even the meaning of life, feel free to ask me. Simply contact me or post your question below. I will be sure to answer it.

Next time, we will talk about money!

Summary:

  1. The information your business gathers can help you grow your organization and improve products or services it offers.
  2. Data operations (DataOps) is an approach which standardizes data processing through the use of DevOps practices, to increase the value derived from information.
  3. The key elements of DataOps are: unifying data environment with the use of repository, turning data into code, and automating testing, monitoring and deployments.

Ready to learn more about us?