VictorOps is now Splunk On-Call! Learn More.
For decades, SysAdmins have worked largely in the shadows to maintain the accessibility and uptime of your most important IT services. And, while the rise of DevOps and cloud computing has led to more people with a hybrid SysAdmin/Developer skillset, the primary duties of a system administrator will always be required. Today’s system administrators are knowledgeable in both hardware and software – configuring resilient, secure architecture to ensure the success of the business.
System administrators are normally tasked with the installation, maintenance, configuration and repair for servers, networks and other computer systems. They dabble in both hardware and software – learning a little bit of programming and scripting to execute tasks and actions across their applications and infrastructure. In the world of DevOps, software developers are becoming more like SysAdmins and SysAdmins are becoming more like developers – leading to better collaboration and tighter feedback loops across all teams.
Because the system administrator role has changed so much in the last decade, we decided to build the definitive guide to being a system administrator in 2019. First, we’ll cover the basic roles and responsibilities of a SysAdmin before diving into some tips and resources for being highly effective in a SysAdmin role.
As a SysAdmin, you’re essentially maintaining the entire technology and IT stack. And, in the technology industry – this means you’re literally maintaining the system holding up your entire business. For every second that your website or server goes down, it means lost productivity, revenue and hefty costs of downtime. So, above all, SysAdmins need to be efficient problem solvers. With numerous operating systems, network configurations and security concerns to keep in mind – being an effective system administrator means you can learn new things and maintain strong feedback loops with your development team.
But, to get more granular, let’s go over 12 common SysAdmin job responsibilities so you can better understand the skills and technologies you’ll need to be acquainted with.
Depending on your toolchain and technology stack, the system administrator is in charge of monitoring and alerting across your applications and infrastructure. Monitoring core server and network metrics like CPU, disk usage, DNS, latency and ETL can help SysAdmins detect an incident. Then, they can set up alerts based on monitoring thresholds to receive on-call notifications in case of any major incidents. It’s important that SysAdmins know how to use both external system outputs and metrics to determine the health of their systems – leading to more observable architecture.
System administrators are generally in charge of user permissions and administration for all applications and services. SysAdmins can assign user roles and manage the entire organization’s IT stack – allowing everyone the access they need to certain applications and services in a secure way.
The SysAdmin is tasked with managing passwords and SSO policies and practices across the company. They are able to reset passwords and ensure security requirements are met everywhere. If using SSO and/or two-factor authentication, the system administrator is in charge of managing these tools and helping employees get access to the systems they need when they need it.
To ensure data organization and consistency, the SysAdmin will usually place policies and procedures around the way files are organized and shared within the organization. Along with most of the other SysAdmin responsibilities, this is to ensure security from external attacks as well as ensuring appropriate, easy access to files for employees.
At a very general level, the system administrator will need to define best practices for working within the organization’s systems. This includes everything from proprietary software you’re building to different third-party IT applications and services. By showing employees how to use systems in a secure, productive way, SysAdmins are able to completely change the way work is conducted within an organization.
It’s the SysAdmin’s job to put policies and procedures in place to keep up with software installation and updates. If there are any errors with new updates or interdependencies between new versions of systems, the SysAdmin should be able to detect these issues and fix them.
SysAdmins should have active, updated plans for redundancies, rollovers and incident recovery. Through effective monitoring, alerting and cross-functional communication, the system administrator should be able to quickly detect any failures and remediate IT incidents.
Security should be top-of-mind across everything a system administrator works on. Whether it’s user permissions or the way the team maintains documentation, the SysAdmins needs to perform all actions in a secure way. As they set up networks, policies and servers, the SysAdmin will know how to do it in a technically sound, secure way.
SysAdmins are often tasked with maintaining documentation and keeping runbooks up-to-date. In a world of CI/CD, this can be a daunting task. System administrators need to know how they can leverage automation to keep runbooks and documentation accurate and updated without slowing the development lifecycle.
System administrators can’t simply throw their IT and security environment together. They need to build it with visibility and speed in mind. How can you set up a system to allow for rapid incident detection, response and remediation in case an issue does pop up? What kind of monitoring and alerting needs to be in place? What’s the communication strategy if you experience an outage? SysAdmins should be on top of all of these questions in order to make the most of their incident management practices.
Many times, system administrators will be in charge of conducting post-incident reviews for their affected systems. How long did it take to identify the issue? How long did it take to actually remediate the incident? Keeping up with post-incident reviews, collaborating with other affected teams and taking detailed post-incident notes can help improve IT and software developer relationships – leading to better feedback loops and more reliable deployments. Use post-incident reviews as a way to learn from your past mistakes and improve people, processes and technology for the future.
At the core, a good system administrator will be an excellent problem solver who can find ways to prepare for unknowns. In the world of CI/CD and DevOps, teams are deploying more complex architecture faster – making a SysAdmin’s job more complicated than ever. So, finding ways to reduce bottlenecks in the deployment lifecycle while simultaneously reducing risks in your IT and security infrastructure will always make your life as a SysAdmin easier.
In order to be effective in the modern era, system administrators need to know more about programming, automation and cloud computing. SysAdmins aren’t simply rebooting servers and decommissioning old equipment – they maintain the reliability and uptime for all of your software and hardware. So, we wanted to cover a few of the more modern skills and technologies that system administrators should be familiar with:
Being comfortable with tools like Puppet, Chef, Ansible and Jenkins is paramount to SysAdmin success. These tools allow system administrators to automate a number of tasks and configurations along the release lifecycle – leading to less error and faster deployments. So, developers can spend more time building new applications and services instead of reworking projects in the current pipeline or fixing support escalations.
Because of the takeover of AWS, Azure and GCP – system administrators everywhere need to understand how to orchestrate systems in the cloud. What types of monitoring and alerting tools should you use? What’s the best way to manage your servers and networks now that your infrastructure is cloud-based? SysAdmins work on questions like these all the time, building redundancies and security into the entire system. But, as nearly every application and service moves to the cloud, it’s one of the most important skills for SysAdmins everywhere.
Git is the most commonly-used form of version control. Version control is a way to track changes in code and different versions of an application or service. This way, if there’s ever an issue with the current version of a service, SysAdmins can easily rollback deployments or updates to fix the problem. Version control is essential to maintaining a reliable CI/CD pipeline and providing visibility to projects across all of engineering and IT. SysAdmins need to understand version control so they can quickly see what developers are doing, identify issues and fix them – many times before they ever happen for customers.
As mentioned above, SysAdmins need to understand the ins and outs of server and network upkeep. These servers and networks are the pillars holding up your entire business and providing value to customers. So, system administrators need to be continuously improving on processes in order to maintain more reliable systems, avoid outages as much as possible and improve incident response when an incident does strike.
More and more, SysAdmins are writing scripts and programming to achieve their desired results. This need for system administrators who frequently write code as given way to a newer movement in site reliability engineering (SRE). Traditionally, SysAdmins have been highly reactive toward incidents in production due to the code that was passed to them by developers. But, as SysAdmins and SRE teams start to write code more often and collaborate with developers earlier in the deployment lifecycle, they’re able to proactively identify problems and fix them more often. SysAdmins who’re effective at writing scripts and programming are highly coveted in today’s market because they can actively help improve system reliability and drive business value.
System administrators rarely get the glory they deserve. They frequently respond to on-call incidents at 4 AM and fix incidents that could potentially lead to millions of dollars in lost revenue and negative customer experiences. Within any good IT and engineering team, there’s a constant balance between speed and reliability. While developers are often pushing the boundaries on speed, SysAdmins are doing the hard job of slowing them down before they go too far – ensuring greater reliability and security across all of your applications and services.
VictorOps is purpose-built on-call scheduling, incident response and alerting software for all kinds of IT and engineering teams. See for yourself in a 14-day free trial or request a free personalized demo to see how VictorOps can make on-call suck less for anyone on-call – from system administrators to DevOps engineers.