Only the lazy one did not get his expertise from the far shelf and did not predict the “online” life – as forced as the regime of self-isolation. But traffic, indeed, has already begun to grow, and taking into account the “holidays”, resources offering goods delivery, online education services, and especially online entertainment, may not be ready for the flow of visitors in the new reality before the end of April.
Based on our 12 years of experience in technical support of web projects and remote server administration, we have prepared a kind of “training manual”: what is worth checking and what you need to take care of if you want to be sure that your site will cope with any load. Well, almost any.
So, here are 10 points that are critical to the active life of your web project in the coming days and weeks:
1) Monitor your infrastructure
First of all, you should know what is going on with your site. If you know how to handle Prometheus / Grafana, use them. But if there is no experience interacting with them, this is not a problem: you can use any service like datadog – you can install it as quickly as possible. Still complicated? – Pay attention to Pingdom or Site24x7: these are not full-fledged monitoring services, but they are suitable at least to be sure of the availability of your site and to know when it is close to falling.
Remember: you can only control what you can measure. After all, if you don’t know what is going on inside your system and especially where it is happening, you cannot fix it if something breaks there.
There are many options for what could go wrong when traffic piles up:
- You are limited by processor resources.
- You are limited by RAM limits.
- You are limited by disk / storage performance.
- You are limited by the bandwidth of your cloud instance \ cluster \ server.
2) Get ready to scale
As soon as you see that you have reached 80% of the resource limit, you must immediately begin scaling. Because if you wait 100%, the site will fall. And you will need some time to restore the project. Not to mention the fact that it’s all quite nervous …
You must act quickly, because otherwise you will lose visitors and possibly make even more mistakes. Therefore, when you reach 80% of the resource limit, scale to lower it to 40%.
Repeat if necessary (c)
3) Do not forget to monitor HDD performance and channel bandwidth
It is much more difficult to understand what happens when the system is “dumb” due to high disk load or bandwidth limitations.
4) Monitor the performance of your databases
Especially if you use cloud databases. RDS, Cloud SQL, MongoDB Atlas and other services are managed using cloud technologies, but they have their own limits, and they must be monitored in order to scale in time.
5) When your database creates a large load on the CPU – check the use of indexes, it really helps
The introduction of indexing fantastically reduces the load on the processor. Suppose the CPU of your database is used at 90%. You probably want to scale in half to handle the double load. But if most of your queries are not indexed, then introducing indexing can reduce the CPU load by 10 times. An analysis of index usage is worth the effort!
6) Keep an eye on the accounts for the services of cloud providers
It’s easy to forget about it when you have time trouble. Therefore, set alerts on your billing system in order to receive notifications of possible cost overruns. Channel width is what is especially costly. If it’s not possible to transfer content to CDN or to hosting companies that provide large volumes of traffic cheaply, like like100tb.com or leaseweb, get ready for serious costs.
7) Avoid state in your application
Despite the fact that it is possible to scale the resources of the processor or RAM in the cloud, there are still limits that cannot be crossed. From this point of view, if you want to scale horizontally, adding new instances of the same application, it should be ready for such a development of events. And when you have multiple instances of the same application, user requests will be distributed among multiple servers, and as a result, you will not be able to store data on a local disk.
8) Consider moving to the cloud if the site is on a dedicated hosting
For in this situation, you can’t easily scale: adding servers will take a certain amount of time – it will take from a couple of hours to a couple of days to get available new servers. Plus, you usually pay on a monthly basis, not an hourly basis. In general, you obviously do not want to wait hours, or even days, if your site is down. It is much easier to scale in the cloud.
9) Upgrade the infrastructure
There are several basic things that are disabled by default, and they should be optimized – at the OS, network, application manager and application levels. And this can significantly reduce resource consumption. Google your technology stack setup and follow the basic guidelines.
10) Get ready to launch the cached version
Despite all your efforts, with a hundredfold increase in traffic, you will “lie down”. It takes time to scale. So get ready to use the static cached version. For this, the Cloudfront / Cloudflare cache, your CDN cache, nginx’s cache or any other is useful. Just make sure you have the opportunity if you need it.
Need help in scaling or moving to cloud? Feel free to contact System Admins PRO team and get best results shortly!