The role of clouds is growing, and in the new realities of the pandemic and the quarantine of entire countries is becoming critical for many businesses. Therefore, today companies are ready (and sometimes forced) to go extra mile to quickly implement those changes that have not been considered for a long time. After all, this allows not only to withstand turbulent times, but also to gain additional competitive advantages.
I will share my vision of why this is happening, what needs companies have and what new opportunities the crisis opens up for cloud service companies and DevOps.
What changes did the pandemic cause?
It is important for service companies to understand what is happening to customers, what current needs they have and how we can improve their business to get the most out of the current situation.
Now customers can be divided into three conditional categories:
- Unexpectedly fast growing – the number of customers has increased hundreds of times in a few days. The IT infrastructures of the companies were not ready for this, so they cannot withstand such a load. This primarily includes retailers with a strong online component, online services and e-learning platforms.
- Unexpectedly falling sharply – demand fell to almost zero, revenues are falling. The most important thing for these companies is to keep operating costs to a minimum, save every dollar to save business and lose as few people as possible. Reducing the cost of maintaining your IT infrastructure plays an important role. This is where companies need our help. The category includes, in particular, the tourism industry and offline retail.
- So far, they are stable – the current situation has had a minimal impact on the core business of these companies. But they do not know what awaits them next.
Uncertainty about the future is a common feature for all categories. Now no one knows how the situation will unfold. This leads to the following consequences:
- Even those companies that are currently winning are not ready for significant transformations and investments in IT infrastructure. Expensive and complex services, such as architecture changes or software upgrades, will only be relevant in the near future for a small number of companies that will need digital transformation. The main part tries not to take risks and invest only in what will bring immediate benefits.
- The focus is shifting from long-term strategies that bring greater benefits in the long run ( 1-2 years), to short-term, the result of which is weaker, but noticeable in a few months or even weeks.
- Scalability and efficiency of both infrastructure and business in general are a priority. After all, the current, greatly increased / decreased demand will potentially return to the previous level as soon as the situation stabilizes. But there may be another sharp jump. Companies understand that their infrastructure needs to adapt quickly so that businesses can operate in a variety of environments. This should be done quickly and without significant investment (for example, without building a data center, purchasing servers).
A virtual server in the cloud is a way out, because you can rent it in just a few minutes and give it up just as quickly. Everyone has known this approach for a long time, but before few people were ready to abandon previous investments, retrain staff, and even change hundreds of processes in the company.
So here are some key ideas on where the DevOps and cloud services market is heading.
Infrastructure cost optimization
According to the State of the Cloud Report , in 2020, 82% of enterprise companies consider optimizing the cost of infrastructure in the cloud a top priority. And for more than a third of them, this is a big challenge. The situation with traditional data centers is even worse – most companies claim that they are not optimized and this leads to an overuse of about 30% of resources.
We regularly receive requests from customers for these services. But now optimization is becoming even more important. For some companies, this is a matter of survival. And even those whose situation is more or less stable understand that it is time to reconsider their practices.
Briefly about what optimization looks like
We first need to understand the state of the client’s cloud infrastructure at this time. To do this, we study the documentation, accounts, tools used by the company, automation, interview key stakeholders. This allows you to prepare a detailed report and develop a strategy based on it, which usually includes short-term and long-term cost optimization plans.
What does the client receive? As practice shows, the implementation of the short-term plan can reduce the budget by 15-30%, long-term – by 20-50%. Even for declining businesses, cost optimization is not only a matter of saving money, but also investing in how quickly and efficiently they can restore processes when the situation normalizes and you need to return to active work.
One of our latest cases is that a client spent more than $ 300,000 a month supporting Azure’s cloud infrastructure, which was well optimized and used by most best practices. The company came to us at the end of February with a request to reduce this amount by at least 35% in order to be able to keep the team. As of the end of March, we managed to reduce it to 243 thousand dollars, by the end of April – to 157 thousand. Among the main steps that helped to achieve this, I highlight the following:
- Combined regional Dev / QA / UAT into one global stratified Kubernetes cluster.
- Left on the dedicated server pool are only workloads that are not experiencing a restart. Most of the resources in the cluster live on spot instances.
- With the help of automation, most QA / UATs have been transferred to the on-demand model, where the environment starts only when it is needed and automatically stops after a while.
- Made many changes to the resource profile to reduce their performance. This has affected metrics such as Build Time / Test Time. But as the development process switched to “priority products only” during the crisis, the total number of committees and builds decreased, the load on the entire system also decreased, so Time To Production (Market) almost did not change.
You may ask why this optimization has only just been reached? After all, everything could be done in quiet times. And you are right, this infrastructure and processes should have been optimized a long time ago. Some of these imprints were even laid down in this year’s plan. Unfortunately, only the crisis has helped businesses understand the importance of the effectiveness of their IT infrastructure and processes. Only the threat of losing this business pushed the developers’ time to prioritize the necessary code changes, focus on self-tests and DevOps teams to develop a new approach to testing products and infrastructure. Already, management is actively planning to deploy this new approach to Production, which will save another 30-40 thousand dollars a month.
As a result, the company will have twice as much cost-efficient infrastructure as before.
Infrastructure expansion through the use of public clouds
Maintaining core business in our own data centers has significant shortcomings, including high TCO (total cost of ownership) with low ROI (return on investment) and the difficulty of scaling capacity. Therefore, more and more companies began to switch to hybrid clouds. According to IDG , the number of organizations running at least one program or part of the infrastructure in the public cloud increased from 51% in 2011 to 73% in 2018, and today has exceeded 90%. About 44% of organizations already use both private and public clouds to provide one of their services.
This trend continues to win business, as it is the best way to increase the efficiency of existing infrastructure in terms of the ratio of money spent to profit. But more importantly, it is no longer necessary to completely change the structure of your decision, team skills and management tools.
Many companies have faced the problem that infrastructure and business in general cannot respond to calls as quickly as needed. Accordingly, even more companies will switch to hybrid clouds even faster. Therefore, this trend is not new. Then what has changed?
About two-thirds of the companies that started the hybrid model put only new projects in the cloud, but were not ready to invest in relocating the most profitable core business, which is now heavily burdened. In such cases, the hybrid cloud looks like this: all the core business is run in the data center, new projects are launched in the cloud, but still contact the data center to use the data stored in it. Accordingly, the fact that these companies have resources in the cloud does not help in a situation where you need to quickly scale the core business.
About multicloud
Another pressing issue is that not only companies are experiencing an increase in server load, but also cloud providers. For example, the load on the Azure in March increased by more than 700%. This affects its users – some of them are very dependent on available resources for short-time bursts. The best solution in this situation is to expand to another public cloud.
One of Europe’s largest users of one of its public cloud providers is an online retailer that needed to expand rapidly as the epidemic began. But he faced a lack of free resources in the data centers of his cloud provider. So he went back to the multi-cloud strategy we offered him last year. We are now building a solution for this company that will help shift some of the load to another provider’s cloud in the short term. And in the long run it will allow you to freely balance services between several providers.
What these changes mean for DevOps engineers
New technologies or trends are not emerging now, but the focus is changing and the changes that began long ago are accelerating. Therefore, engineers must be ready for this, must learn, acquire new skills.
One of the key ones: expertise in working with clouds and container platforms is a must have, without which it is difficult to find a project. Not a very distant prospect – solutions for hybrid / multicloud and Workload Mobility: Google Anthos, OpenShift and VMware Taznu.
As for the sources, I do not undertake to advise, because there is nothing here that can be studied and immediately become an expert. In addition, we all perceive information differently. I am in favor of understanding the direction in which to move, and choose the most convenient formats – technical articles, videos on YouTube, courses on Coursera and more. You need to study a lot, constantly follow the news and trends. Yes, it is difficult and time consuming, but at the same time it increases the chances of being successful in the profession.