Reducing costs of Azure Databricks (FinOps in practice)
Big data is what powers companies. Lots of data mean lots of insights, which enables better decision-making. Having ...
3 project areas that will help you keep development under control
This series is focused on how to help keep your Enterprise Data Warehouse project in line. Today, we’ll talk about how to organize it – from structuring your team and preparing your tools, to managing your infrastructure.
In a hurry? Focus on the sentences in bold font for a summary!
Organizing large projects isn’t just about code complexity, challenges, unification, or structure. It also involves managing large teams, involving development and business teams, both the client’s and the vendor’s.
This means you’ll need to look after many people from multiple departments and areas, with different skill sets, people who might work in different time zones. The project will also involve setting up infrastructure, multiple environments, and automation.
There are three key aspects of a project to take care of: framework, tools, and infrastructure.
Frameworks like Scrum or Agile are a good starting point. However, in practice, doing everything 100% by the book might be impossible, very difficult, or even, on occasion, a bad choice.
It’s not because people do not understand the rules of the framework or have bad will. It might be that the organization is not ready, push back from stakeholders might be strong, continuously changing requirements and priorities might interfere, etc.
In such cases, some exceptions might help and allow us to adapt. The worst that can happen is forcing specific framework rules and treat them like they’re the only acceptable way.
Someone might suggest to just use a different framework. But it’s likely that it won’t change much, and you’ll still be left with something that doesn’t quite work for you.
To make it clear, we’re not claiming that changing framework rules entirely is a solution (in that case, it is probably better to create your own framework).
What we know from our experience is that slight modification or deviation from the rules might be helpful, and will allow for easier adjustment to project reality and better control of it.
When a delivery team counts 2 to 4 people, there are no major issues with organizing work. One person can easily have everything under control.
But when you have 15 team members on board or more, the scale is completely different. One person is no longer able to control everything without a good setup.
However, in a large project, there might be multiple smaller engagements which at some stage should be developed independently.
This is because they concern completely different areas that are completely or highly separated, e.g. sales and human resources, compliance and customer relationships, etc.
In these situations, development might be done in parallel by two or more delivery teams.
How does a parallel team setup work?
Assuming 15 team members, where:
the suggested team split could be the following:
The delivery team is self-organized (like in Scrum). Team leaders support the lead architect/Project Owner in requirement collection and business analysis. Based on the conducted analysis, together with all team members, they define the tasks and estimations.
During development, the delivery team leader and sub-leaders support other team members in development activities, but they also often communicate directly with the client’s SME when some requirements need more clarification or refinement.
This approach works well, especially when it comes to clarifying subtle technical details.
It also helps avoid multi-stage communication and unnecessary involvement of the lead architect/Project Owner. They just need to focus on high-level topics and engage when an important architectural decision needs to be made.
There are probably many other setups and possibilities for how to organize your team. The key point is that you should not be afraid to change the rules, team setup, or approach if these make things easier and better for the project.
Flexibility is not a bad thing. In fact, it is necessary in difficult and complex engagements. Still, you should avoid changing your mind too often, as experimenting too much and for too long might introduce chaos. Start with some baseline (framework) and adjust it slightly. Don’t change everything completely, and don’t try to invent anything from scratch.
The right tools and solutions can improve and boost team collaboration and simplify project management. It won’t come as a surprise that we use Microsoft solutions daily.
In essence, Azure DevOps is a service that enables us to manage code (Git repositories), plan the scope of work, manage and measure progress, and build automation pipelines (Continuous Integration or Continuous Delivery).
Keeping the entire code in one shared place is a standard in any IT project. No matter the engagement size. However, this is not just about storage.
The code needs to be protected and specific policies need to be applied – changes in the code should be made under some control. Nobody should have the ability to take any part of the code, update it, and then commit it to the repository without any reviews and checks.
Additionally, a good branching strategy should also be developed. It helps with code merging and preventing issues.
This is especially true when the solution is already in production. We might need to develop additional new features between releases or apply hotfixes to production systems.
Scope management and progress measurement
Scope and progress are two elements that we need to manage and monitor throughout the project.
Scrum is probably the most popular framework for iterative code development, but other frameworks, like AgilePM, go beyond iterations and embrace entire programs.
In Azure DevOps, you can choose from templates for iteration development approaches like Scrum or Agile. Depending on the selected type, specific artifacts (i.e. work items like User Story, Project Backlog Item, Task, Bug, etc.) are defined.
Our approach to methodology is more flexible, in that we do not follow the framework 100%. We apply some modifications, e.g. by extending work items with additional fields or adding new states. Improvements are introduced to better fit the project.
Azure DevOps gives a lot of flexibility in this area, which means that project scope management is much more efficient.
During the engagement, there are a lot of smaller parts, features that can be scoped, traced, and measured.
For this purpose, we use Burndown charts. They give a better view of whether estimations were accurate and precise if we have delays in delivery, or when specific work will be done.
Of course, Azure DevOps provides many other charts and tools which can help with running a project.
Continuous integration and continuous delivery
The last feature which we use intensively is CI/CD pipelines. They allow for automating environment creation and deployment, but also automated code build and deployments.
In large projects where multiple environments are used, deployment automation and control are vital. Depending on configuration and project setup, you might have 2-4 environments (development, test, pre-production, production).
Automation pipelines (Infrastructure as Code) ensure that each environment is equal, i.e. you can easily make sure that services and hardware setup are the same. This way you can often avoid issues related to environment setup.
Watch the video to learn more about the Infrastructure as Code approach. Click here to check out the full series.
A small difference in e.g. the service pack version might lead to deployment failure because one environment has a newer version of a specific service installed.
Another benefit is easy environment recreation with a simple single-click.
There are situations when you might need to recreate the environment, e.g. in case of damage, or because something critical needs to be validated or checked. When that happens, you just run a defined pipeline, and you can be sure that the environment setup will be correct.
How to work with multiple services?
Solutions we build work on defined infrastructures. For each service, some code is developed, then updates and changes are continuously applied. Therefore, they all need different approaches during code deployment.
To simplify and improve this process, Azure DevOps allows us to build automation pipelines that take care of code build, and deploy it to specific services on particular environments.
For example, for every commit to the development branch on the Git repository, an automated code build is triggered and is ready for deployment.
It is possible to configure automated deployment. Alternatively, you can set up a permission-based process, where the change will be deployed to a test environment once manually approved. The same policy could be configured for other environments.
For each larger project, where multiple developers continuously add new features and make updates, automated continuous delivery (CD) has a magnificent impact on delivery time and quality. Automated pipelines save us a massive amount of manual work and help us avoid many deployment mistakes or problems.
How to work with Microsoft OneNote?
During a project, we take a lot of notes and make a lot of sketches. It is good to organize all of them in Microsoft OneNote, rather than Microsoft Word, for easy access.
It is worth spending some additional time to elaborate on the notes’ structure and discuss it with the team as a part of the kick-off meeting. It will be much easier for everyone to place and find notes in the correct place.
Nowadays, with teams distributed across the country (or countries), remote work is largely unusual. Daily stand-ups, meetings, ad-hoc calls, or workshop sessions can be conducted easily over the internet.
Remote meetings are one of the main features of Microsoft Teams, as well as many other products on the market.
During the project, especially during the analysis phase, a lot of documents and content are exchanged and created. Keeping the documentation aligned and organized in a logical structure helps team members to collaborate effectively.
Instead of generating multiple places or storage locations for keeping project-related assets, it’s best to keep them in one place. Microsoft Teams allows us to do this.
Similarly to OneNote, it is good to spend some time planning and develop Microsoft Teams organization and structure, e.g. folders, team structure, naming convention, etc.
The simpler, the better: it is worth organizing particular tools similarly if you can, e.g. in line with the structure defined for OneNote. This way everyone will have a better view of what is going on.
This part is about infrastructure, i.e. working environments where your solution is developed, tested, or used daily.
Engagements from data areas are continuously evolving and need to adapt to new situations. In addition, production environments must work consistently, without interruptions, especially during business hours when data consumption is high.
Deploy uniform environments
There are several different configurations of deployment environments. Sometimes it’s just two (i.e. one environment for development & testing, and one for production). More advanced setups might consist of even five or six different working environments. Most often, we end up with three or four.
Of course, everything depends on your project complexity and scale. But regardless of that, it is crucial to ensure that environments are the same in terms of installed software, configuration, versions, patches, etc.
These conditions might have a significant impact when moving code versions from development to testing.
Any discrepancies may lead to some solution components not being deployable because the patch version between testing and development is slightly different, and certain functions aren’t working in the same way.
Another case might be that everything is deployed without any issues, but when the code is run, some functions or procedures are not working because the software version is different.
Handling and maintaining large environments to ensure their uniformity needs to be automated. This can be achieved with ARM (Azure Resource Manager) Templates.
What are ARM Templates?
In essence, ARM templates are JSON files that store the definition and configuration (i.e. service tier, disk size, etc.) of specific Azure services.
A good practice is to use parameters within the ARM template. This is useful when the same service needs to be deployed to different environments with a slightly different configuration.
For example, your development environment might not have to be too powerful, therefore a specific service tier might be lower than on the test environment. By using parameters, the same script can be reused to deploy both environments.
Watch the video to see ARM Templates in action! Click here to check out the full series.
Additionally, ARM templates might be used (and this is good practice) to make the entire environments automatically deployable by using CI/CD pipelines from Azure DevOps. Click here to go to the previous section and read about CI/CD pipelines.
The last thing which should be considered when it comes to Azure infrastructure components is automation, which can help us achieve savings on consumption costs.
Imagine a situation where three environments are deployed, i.e. development, test, and production. Each of them might run a service that is not used 100% of the time. We could switch it off for some part of the time, or the service tier might be lowered to a minimum.
Why automate your resource usage?
A good example is Azure Synapse Analytics (formerly Azure SQL Warehouse). For instance, ETL processes are run every day between 3 AM and 7 AM. After that period, the warehouse does not perform any intensive workloads.
Therefore, the service should work on a higher tier for just four hours, and for the rest of the day, it can run on the lowest tier.
This process can be automated, i.e. we can scale up service at the beginning of the ETL pipeline, and scale it down at the end of the ETL pipeline.
Or, let’s take the development environment as our example.
Most of the team might work between 7 AM and 5 PM from Monday to Friday. During the rest of the day and on weekends, the environment is not utilized.
Therefore, to save cost, all services (where possible) should be automatically switched off (Virtual Machines), paused (e.g. Azure Analysis Services, Azure Synapse Analytics), or scaled-down (Azure SQL Databases) at, say, 5:30 PM.
The opposite should be done in the morning, e.g. at 6:50 AM, the whole development environment can be automatically activated to be fully operational when the team starts the job.
With such a simple move, no one has to worry about how to switch on the environment (less maintenance). At the same time, costs can be reduced significantly.
For each environment, it is advisable to do an additional exercise and analyze when and at which tier your particular services should be running.
The production environment should probably always be available, but there might be time windows where particular services might work at a lower tier.
As for the testing environment, everything depends on the solution and release plan.
Sometimes, it might need to run for 24 hours on full power, but other times, it might be completely switched off. The development environment will probably be only operational during working hours.
That’s not all!
Now you know how to structure and organize your Modern Data Warehouse project in terms of teams, tools, and infrastructure. Previously, we have also shared some information on a project management framework which might help you avoid some of the common pitfalls during this engagement.
There is still one more aspect of this engagement ahead of us, which is Master Data Management. Click here to read it now!
Want to discuss your project? Click here to book a free call! Or, you can check out our Data Warehouse Assessment Service here.
Written in collaboration with Paweł Borowiecki.
Read similar articles