Installation, configuration of Linux software, Windows operating systems
Schedule of monthly maintenance works.
If you expect to read about how to organize periodic maintenance works, this material is not exactly what you were looking for. I am going to talk about the realities I have encountered with and about the experience gained in trying to turn everything on the path of systematization and automation.
First of all, I will give a list of works that are eventually performed periodically:
- Viewing system logs, network equipment, backup system.
- Operating systems and software updates.
- Analysis of consumed resources (disk space, memory and CPU)
- Test backups deployment.
- Viewing the hardware status (including the directly visual).
- Service accounts passwords changing. Operations with weak users passwords.
- Disabling outdated users accounts, computers and services.
- Network traffic analysis.
- Certificates and revoked certificates list updating.
In order to talk about each type of work in more detail, it is necessary to describe a little the environment in which it all works. In fact, there have been several environments, so I will consider only those features that will be useful for understanding of today’s material.
So, in my infrastructure there are up to 10 physical servers, virtualization system (up to 50 virtual servers), PC with control systems for technical services (lighting, television, etc.), data storage systems of different sizes, a number of switches and routers, firewall, monitoring system, ActiveDirectory. The vast majority of servers is from the Windows family.
Viewing systems logs, network equipment, backup system
Performed daily. Key messages (including successful ones) are sent by e-mail notification. For example, the lack of a report on successfully created backups already indicates a problem. To view logs, the principle of centralization is used, that is, setting up the collection of all systems logs in a single place.
In my case, such role is performed by the monitoring system, which generates warnings, including on the basis of the Windows log, syslog and can read text files. In case of problem detection, it is put in the queue of system administrator’s tasks on this issue prioritized accordingly. If necessary, he may get recommendations for fixing the problem.
Operating systems and software updates
Is monthly performed for each server or product. In practice, the operating systems update is as follows. A list of servers including PC administrators is made. For each server or PC, an update day is reserved. It turns out that 2-3 servers are updated per day. Depending on the server tasks, the installation and subsequent restart of the server is performed during working or non-working hours (for example, during a break). Based on this, it is necessary to choose the right “neighbors”, updated in one day.
Also, servers that perform the tasks of the same type (for example, fault-tolerant cluster nodes) should be updated with a time difference of more than one week. Thus, some updates testing is organized, because there are not enough technical and human resources to create a test environment in our reality. The software is updated as new versions are released, and it is convenient to receive notification of their availability by mail. If there is an update, the task goes to the queue of the system administrator’s tasks, on this issue.
Analysis of consumed resources (disk space, memory and CPU)
Performed monthly. Despite the fairly competent planning (even with a margin), not once in my practice I had to solve the problem of excessive resources consumption. Besides the systems operation features, the cause of this problem often was the human factor and excessive confidence in the monitoring system, which does not always have time required for responding to the situation changes before the server crashes.
So that is how appeared a monthly task consisting in viewing reports on consumed resources, in the analysis of tendencies and in finding potentially critical points. Reports can be obtained from different places: monitoring system, analysis tools on servers or data storage systems.
Test backups deployment
The idea that having a backup does not mean that you can successfully recover data or service from it is far from new. However, from my experience, I can say that only those administrators who have already faced this problem check data backups. There are many different objects in the backup system: virtual machines, databases, user PCs, mailboxes and databases, file structures, etc. Given that each of the objects has its own recovery mechanisms, it is necessary to test backups! Step-by-step instructions make this task much easier.
Once you have written this, you will protect yourself from such a way of losing information as sclerosis, and will be able to save your precious working time by giving the responsibility to check the performance of backups to the intern student. Just keep in mind that in such a case, you need to remember about the security and privacy of servers and information that will fall into your intern’s hands. The task is performed monthly or quarterly, depending on the number of backup objects and the system administrators’ workload.
Viewing the hardware status (including the directly visual)
It is not always possible to add a device to the monitoring system, sometimes among 20 settings on 10 servers, a couple of settings are lost, and sometimes the system for some reasons did not have time or could not report. In my case, there is an equipment inspection register with a list of what is worth paying attention to.
Specialists who visit “a sanctuary” go through all the points and mark the control date in the register. It is preferably to carry out this type of maintenance for servers and other equipment at least monthly. Mail system reminders can be used to control the monitoring frequency.
Disabling outdated user accounts, computers and services
In this regard, it all depends on the work processes organization in the enterprise in general. Our company has the processes of hiring and dismissing employees, so at the time of the employee’s exit from the office he has no access to corporate resources and data. However, every few months, I still check non-locked Active Directory objects with a fairly long time of domain logging on. This allows to maintain order among the work and test accounts, prevents problems with PC naming, and so on. Control tools are developed internally.
Service accounts passwords changing. Operations with weak users passwords
Type of work related to the security and servers and systems protection. The frequency should be regulated by your company’s security policy. In my case, it is rarely done. However, do not neglect this type of work, especially if a large percentage of your users are lazy computer users or there is a high turnover in the IT department. Remember that in order to maintain security, the system administrator must periodically try to hack his own system. It’s entertaining!
Network traffic analysis
Another kind of interesting work that allows to check the correctness of restrictions configured by you, to detect possible flaws, or even security flaws, to assess the real load on the data channels. Regular checks on this issue give a lot of food for thought. Besides, usually managers quite suddenly ask to provide a report on the Internet resources visited by users. Having the right information at the right time often has a positive impact on the IT department’s bonuses.
Certificates and CRLs updating
This is the last type of work that is performed quite rarely (depending on your settings), and this is why it is often left out of the controlled field. An expired certificate or a list of revoked certificates can cause services to be temporarily unavailable or even shut down, and automatic reissue for a number of reasons is not always possible. Unfortunately, the monitoring systems that are used in my infrastructure also do not control this point. It’s simple -I use reminders.
In conclusion, I will say that, in my opinion, periodic maintenance works are the thankless duty, because their effectiveness is visible only when they are not performed on time, so that something gets out of order. And it seems to me that the most optimal solution is the maximum automation of these processes and the administrator-nerd-student, and maybe even an outsourcer for the systems control.
Since managers often involve the initiative leading administrators (like me and hopefully you) in solving of extremely important, and top urgent complicated tasks, constantly forgetting that failure to comply with the daily routine work slowly but surely leads to disaster.