With the System Administrator Appreciation Day soon and my heart broken because the people say:
“The SysAdmin day is so cool but we haven’t one SysAdmin, right?”
“We don’t have problems with Voip/Servers/Computer stuff, this thinks never fail”
“Don’t go to the Site!!! Something lives there and is Deadly (and/or too sexy)
The people do this comments and others with you are a good SysAdmin or in my case people can’t see me cause i’m a ninja. So here are some bits of system administration:
Performing Backups:
Performing backups is perhaps the most important job of the SysAdmin, Backups are boring and time consuming but absolutely necessary so we have a lot of tools like:
- rsync, git-annex or the powerful taskd for file synchronization.
- duplicity, btar, dar and the legendary dump for incremental backups (chunk and file incremental).
- Bacula, BackupPC and burp for Network, distributed and hardcore backups (i love bacula).
Although this tools can automate the backup process, still is the sysadmin’s job to make sure that backups are executed correctly and on schedule.
Maintaining systems documentation:
Commonly a system is changed in order to fit organization’s needs, so our job is track and document the changes from the vanilla-plain version of the system, also backup and guard this documentation is our job, keep update this documentation can be the difference in a critical time (by example: when we need request support). This is a small (and incomplete) list of important documentation:
- Hardware documentation (warranty, support phones, owner’s manual, physical location, etc)
- Software documentation (warranty, support phones, owner’s manual, local change logs, etc)
- Network (equipment, scripts, cabling, configurations, etc)
- Record of backup status (files, software, databases, etc)
- Local procedures and policies.
Installing and upgrade software
This is one of those things for which we need to have cold blood, Every software, every patch, every update should be staged for testing before being deployed.
So many times we receive news above vulnerabilities, critical upgrades or simple bugfixes for our software versions but never upgrade or patch software in production system before test it, no matter how critical is this upgrade patch, It has more value for organizations running a vulnerable system than one aren’t working, it is a calculated risk. (exception: the system is already down and this upgrade solve the problem)
As patches and security updates are released they must be incorporated smoothly in the production systems.
When apply upgrades or patches always do:
- Out of production hours.
- Have a rollback plan in case of failure
- Document day, hour, person in charge, and reason of the upgrade.
Account provisioning
The process of adding and remove users can be automated, but certain administrative decisions must still be made before a user can be added or removed, by example:
- Follow the principle of least privilege when adding new accounts.
- Backup files from users before deleting their accounts.
- On vacations disable user’s account.
- Be patient with the “Forgotten passwords”
Adding and removing hardware
From the simple task of add a printer to the complex job of adding a disk array, the hardware support is a very important activity of the system administrator. like a software updates we always need very careful and have a rollback plan. it is vast and complex topic, so I’m only give you one advice about it:
Always be aware of end-of-life of your hardware (the time when the provider stop maintain and produce parts for your hardware), before you need planning about get some parts or (better) update the hardware.
if you did not prepare, pray is a good option.
Monitoring the system
I have two favorite protocols “Internet Protocol Control Protocol” and “Simple Network Management Protocol”, the first because every time i say it the people laughs, the second because It makes my life easier, from the single cacty to the mosnters Nagios and OpenNMS, we can track many indicators such:
- Network traffic.
- CPU Load
- Disk usage
- many others.
Now more than ever, we have an arsenal of tools to check our system status, an example that I love is systemd because:
- Is the default init system for the major distributions of GNU/Linux.
- Can show the general services status of the system :
systemctl status
or systemctl --failed
- Can generate containers with systemd-nspaw.
- Can schedule jobs with systemd-timers. (A cron alternative)
for one really good explanations of why use systemd check the Lennart’s blog (one of the creators of systemd).
Another very good tool is use dmidecode
command to get information about the hardware status, useful if your servers are in Far Far Away kingdom and you can’t physically check hardware alarms (you know, the scariest blinding leds).
Vigilantly monitoring security
In these dark times always do routine checkups:
- Check password strength with John the Ripper
- Check open ports with nmap
- Review any changes on config files (You can create a git repository on the /etc directory you can check changes and backup then at the same time).
- Implement a IDS or IPS (Coff Coff Nagios is open source great idea)
- Subscribe to SysAdmin and Security Newsletters like SANS,LWN, securityfocus, etc.
- Check the industry best practices and adopt those that meet the requirements of your organization.
The security is one complex and holistic thing but all SysAdmin must know the basics.
Four general tips more
- I talk a lot about planning, beside of the IT certifications, if you have knowledge about project management not only you will boost your career, you will be best SysAdmin, consider a PMI certification.
- Adopt and contribute to one open source project, if you use a open source tool daily you can suggest and do improvements, remember nobody knows a software piece more than their developers. (suggestions: systemd, drill, ovirt,docker,etc).
- Master containers (Docker, sistemd-nspawn), in few years this knowledge will be indispensable.
- Embrace DevOps, today’s organizations are seeking talent who can design, develop and deploy software for production environment. We are (the SysAdmins) the best options, how many python/perl/bash scripts we wrote and test in our daily job? We are good programmers than also known how to deploy and maintain this software up and running.
So…
Happy System Administrator Appreciation Day !!!
cheers!