If you expect to read how to organize periodic work, this material is somewhat different than what you were looking for. I will tell you about the realities that I happened to encounter, and about the experience gained when trying to wrap everything up in the way of systematization and automation.
First of all, I’ll give you a list of works that in the long run are executed on a time basis:
View logs of systems, network equipment, backup system.
- Update of the operating systems and software.
- Perform analysis of consumed resources (disk space, memory and CPU)
- Test deployment of backups.
- View the status of the hardware (including the visual status).
- Change of passwords for service accounts. Work with weak user passwords.
- Disabling obsolete user, computer, and service accounts.
- Perform network traffic analysis.
- Updating certificates and Certificate Revocation Lists.
In order to discuss more each type of work, you need to describe a little the environment in which all of this works. In fact, there were several environments, so I’ll take only those features that will be useful for understanding in today’s material.
So, in my infrastructure there are up to 10 physical servers, virtualization system (up to 50 virtual servers), PCs with management systems for technical services (lighting, television, etc.), storage systems of various sizes, a number of switches and routers, firewall, monitoring system, ActiveDirectory. The vast majority of Windows servers.
Viewing logs of systems, network equipment, backup system
It is carried out on a daily basis. Key messages (including successful ones) are being sent with a notification to the mail. For example, the lack of a report on successfully created backups represents a problem. The principle of centralization is used in viewing logs. That is, configuring the logging of all systems to save in a single place. In my case, this role is performed by a monitoring system that issues warnings, including those based on the Windows log, syslog. It can read text files. If a problem is detected, it is placed with the appropriate priority in the system administrator task queue in this direction. If necessary, it is given recommendations on how to fix the problem.
Update of the operating systems and software.
It is carried out on a monthly basis for each server or product. As a matter of actual practice, the update of operating systems is as follows. A list of servers is created, and control PCs are included there. An update day is reserved for each server or PC. Therefore, 2-3 servers are being updated per day. Depending on the server tasks, installation and subsequent reboot of the server is performed during working hours or after hours (for example, during a break). With that in mind, it is necessary to select “neighbors”, which are updated in one day properly.
Also, servers that perform the same tasks (for example, nodes of a failover cluster) are better to update with a time difference of more than one week. Thus, some testing of updates is organized, since there is not enough technical or human resources to create a test environment in our realities. Software update is performed as new versions are released, notification of which is convenient to receive by mail. If there is an update, the task adds up to task queue for the system administrator in this direction.
Analysis of consumed resources (disk space, memory and CPU)
It is carried out on a monthly basis. Despite competent designing (even with a reserve), not once in my practice I managed to solve problems of excessive consumption of resources. In addition to the features of the systems, the cause of this problem was often the human factor and excessive confidence in the monitoring system, which does not always have time to react to changes in the situation before the server’s emergency shutdown. So this is how a special monthly task was created. It centers around viewing reports on consumed resources, analyzing trends and finding potentially critical points. Reports can be obtained from different locations: a monitoring system, analysis tools on servers or storage systems.
Test deployment of backups
The idea of having a backup does not mean that you can successfully restore data or service from it is far from new. However, from my practice it can be said that only those administrators who have already faced this problem face to face are checking back up of the data. There are a lot of different objects in the backup system: virtual machines, databases, user PCs, mailboxes and databases, file structures, etc.
Given that for each of the objects there are recovery mechanisms, you have to test backup copies! Step-by-step instructions greatly facilitate this task. Once you write such instruction, you will protect yourself against such a method of information loss as sclerosis, and will be able to save your precious working time by delegating a check on the working capacity of the backup copies to a trainee. Just remember that in such a case you need to remember the security and privacy of servers and information that will fall into the hands of your trainee. The task is executed monthly or quarterly, depending on the number of backup objects, and the availability of system administrators.
View the status of the hardware (including the visual status).
It is not always possible to add a device to the monitoring system. Sometimes among 20 settings on 10 servers, a couple of settings are lost. And sometimes the system for some reason did not have time or could not report. In my case there is a log of equipment inspection with a list of what is worth paying attention to. Specialists visiting the holy of holies pass through all points and mark the date of control in the journal. It is desirable to carry out this kind of servicing of servers and other equipment at least monthly. For periodicity control I use reminders of the mail system.
Disabling obsolete user, computer, and service accounts
Everything depends in this issue on the organization of work processes in the enterprise as a whole. In our company there are processes of reception and dismissal of the employee, thus, at the time of the employee’s exit from the office, he no longer has access to corporate resources and data. However, every few months I still check for unblocked Active Directory objects with a fairly long time to enter the domain. This allows you to maintain order among the workers and test accounts, prevents problems when naming the PC, and so on. Controls are designed with your own hands.
Change of passwords for service accounts. Work with weak user passwords
The type of work associated with the security and protection of servers and systems. Periodicity should be regulated by the security policy of your company. In my case it is carried out rarely. However, do not neglect this kind of work, especially if a large percentage of your users are computer lazy people or if the IT department has high personnel turnover. Remember that to maintain security, the system administrator must periodically try to hack into his system. It’s entertaining!
Network traffic analysis
Another kind of entertaining work that allows you to check the correctness of the restrictions you set, to detect possible flaws, or even security holes, to assess the real load on the data transmission channels. Regular checks in this direction give a lot of food for thought. In addition, usually the management quite suddenly asks for a report on the Internet resources visited by users. Having the right information at the right time often has a positive impact on the bonus payments for the IT department.
Updating Certificates and CRLs
This is the last kind of work that is performed quite rarely (depends on your settings), and for this reason it is often released from the control field. An expired certificate or a list of revoked certificates can lead to temporary inaccessibility of services or even the shutdown of their services, and automatic reissue for several reasons is not always possible. Unfortunately, the monitoring systems that are used in my infrastructure also do not control this aspect. Everything is trite ‒ I use reminders.
In conclusion, I would say that, in my opinion, periodic work is an ungrateful task, because its effectiveness is visible only when it is not executed on time, and as a result, something is certainly broken. And the most optimal solution in my mind is the maximum automation of these processes and the nerd trainee admin, and maybe even an outsourcer controlling the systems. Because management often attracts the initiative leading administrators (such as I and, I hope, you) to the solution of super-important, archhistorical and super-entangled tasks and constantly forgets that the failure of daily routine work slowly but surely leads to disaster.