As I promised you last time, here is a short guide on how to start spending smarter (notice that it’s not “spending less”) on your cloud. In a way, it’s all about designing for cost – that is, building your services with cost in mind right from the beginning.
I will explain it using Azure Well-Architected Framework, but all major vendors provide similar guidelines, so you can use these principles for any cloud you use. For example, here is the framework for AWS, and there is also one for GCP. They may use different terms, but the idea is largely the same.
If you prefer videos, here’s a good series by Microsoft that goes over the principles in more detail:
At the core of the Azure Well-Architected Framework are 5 pillars:
Each of these pillars can represent a different stage in your cloud application lifecycle. And at each stage you can take action to optimize your spending.
Before we dive in, a question: What is the nightmare scenario all of us fear when using the public cloud?
For most of us, it is waking up in the morning and seeing that your cloud bill went to hundreds of thousands of dollars overnight, because someone left some resources running!
That is happening in real life more often than you think.
At the end of this article, I will share one Azure feature, which will let you sleep peacefully at night, knowing that something like this will not happen to you.
Read on…
I will start here because you likely have some cloud services already in place. So, before you begin to look at anything new, you need to take stock of what you have and make sure you understand your current environment.
The key here is resource monitoring – that is, proper analysis of your environment. Identifying your assets, for example by tagging your resources for fast identification, is essential if you want to understand what you’re paying for.
As a next step, you should introduce consistent practices and policies for implementing changes or new resources, and a standardized procedure for deployments.
In short, operational excellence comes down to bringing order to your environment. It’s about making sure everything runs smoothly, in a unified way. How?
Cloud governance is your friend here. It serves to standardize your environment so that any processes can be repeatable and measurable. To take it a step further, adopting DevOps can raise your game. It’s not just about automation but about a way of working where teams across departments cooperate and communicate seamlessly.
And speaking of DevOps, we have a cool new guide about it, to help you with implementing it – download the ebook now.
No one likes to visit a website that’s down. Sure, most times you can come back later and everything will be fine again. But not everyone will “try again later”, and every minute of downtime is a loss of business.
If your organization offers services that customers use, then any disruption can be very costly for your company. And with so much competition around, it’s very easy to switch to an alternative. That’s why reliability is so important – it specifies the availability of your solution in production, and defines how fast your system can recover in case of failure.
A large part here plays the architecture you choose for your service. Will you use IaaS, PaaS, SaaS, or containers? Each approach comes with upsides and downsides, so before you answer this question, you need to be clear on what your solution needs to be able to do. From that point, you can determine the availability you need, which will also play a part in the pricing.
For example, you may be able to run your application more cheaply but at a slightly lower availability. If it’s a business-critical solution, you will probably need to keep it operational at all times (or at least the classic “five nines” – 99.999%), which will cost you more. But likely, you won’t need full capacity for everything all the time – which is where room for savings appears.
This part, essentially, comes down to scalability. How flexible is your application when it comes to handling varying loads of data? Some services operate at a fairly stable level. Others may have some variations – for example, you might not need all of your VMs running at full capacity over the weekend if your dev teams work Monday to Friday.
On yet another hand, if you’ve got an e-commerce solution, you may need to prepare for an occasional increase in transaction volume, e.g., due to holidays, sales, or seasonal changes. An efficient system will scale automatically as and when needed, while retaining its performance level.
Of course, scale is also a function of cost. If you’re scaling up or out, you need to be prepared for higher costs. If you can scale down, you can also cut your bill. But the extent to which you may scale is also dependent on your architecture.
For example, as written recently by my colleague Daniel, you may be able to use containers rather than VMs for running your application. However, depending on the service you use, you may have different scalability options, which will also impact the cost.
I covered this topic a lot in the past but really, it’s never enough. Security is something which you should keep in mind always, at every lifecycle stage. From making sure your dependencies don’t introduce any vulnerabilities or expose any secrets, to enabling secure access to processed data, security applies to every area of your application.
And it doesn’t end at shipping it to production. You still need to monitor all the signals, both incoming and outgoing, looking for anomalies that could alert you to a possible breach. It’s impossible to do at scale by hand and with security incidents, it’s no longer a matter of if but of when. That is why ongoing monitoring and up-to-date security features are a must-have.
How to make it happen? For secure development, a lot of tools are already built into your services. Both Azure DevOps and GitHub have advanced features to help you write and deploy high quality code.
As for monitoring – your vendor will also provide features for it. I find that a good 9 out of 10 companies fail to take advantage of the services they already pay for, which would go a long way toward protecting their resources. But if you’ve covered the basics and want to take it a step further, an automated service like Managed SOC will ensure that you’re well equipped to manage any incident.
And here we are, at the last stage. Why is cost optimization at the very end? Hopefully, if you’ve read through the previous points, it will be clear by now. It’s because it’s not a one-off action but a process that should happen at every step of your IT development.
Of course, you can look at it as a stand-alone measure. Then, it comes down to reviewing your resources and making sure they’re not used when they’re not needed. Services like Azure Advisor can help you out with that.
You may be even able to save money just by changing the way you buy your resources from one model (e.g. volume buying) to another (e.g., CSP). Here’s a video explaining how that works:
And now… what about the tip on how to sleep well at night, knowing that your cloud bill will not go through the roof? Here it is – simply use Azure alerts for anomalies and unexpected costs provided free of charge as part of Azure services.
Here are all the details you need: Identify anomalies and unexpected changes in cost
Set it up and it will save you nightmares about unexpected Azure costs.
And as per tradition, here are a few readings on the topic you may find helpful.
Cloud vendor frameworks:
Additional resources:
Read similar articles