Site Reliability Engineer

Frontdoor

Full-time

Remote

United States

$123,000 - $150,000 USD yearly

Software & Technology

Site Reliability Engineers (SREs) are responsible for maintaining the availability and uptime of infrastructure. SREs use software engineering principles to solve operational challenges to create reliable infrastructure. This position will reduce the toil from our everyday work using as much automation as possible.

Responsibilities:

Research and implement solutions to build an always-up, always-available, resilient services.
Builds and maintains automation tooling for infrastructure, CI/CD and observability (monitoring, alerting, logging, tracing) pipelines.
Builds and maintains cloud and container orchestration infrastructure.
Collaborates with software engineering, security, systems teams to help automate and streamline operations and processes.
Implements best DevOps practices across the organization to improve performance and efficiency.
Performs research and implements solutions to build an always-up, always-available, resilient services.
Integrates and automates existing manual solutions and processes.
Participates in an on-call rotation for production issue escalations.
Troubleshoot and support productions issues
Assists with the planning for growth and capacity of the infrastructure
Participates on cross functional company project teams responsible for implementing technology.
Investigates anomalies/outages and determines steps to reproduce, root cause, and solutions options.
Monitors environment performance and provides all necessary reporting analysis.
Assists with the integration and automation of existing manual solutions and processes.
Attends relevant conference/seminars to remain current on new and upcoming technology.
Self-directed with the ability to coordinate the work of others, both inside and external to the team.
May include other duties as assigned.

Qualifications

Required Skills:

Good understanding of Unix/Linux operating systems and its internals
Good understanding of core concepts of computer networking (TCP/UDP, IP Routing, DNS)
Well-versed with Linux CLI
In addition to shell scripting (sh/bash), proficient with one other programming language (Python/Go)
Hands-on experience with cloud service providers (at least one of GCP, AWS and Azure)
Hands-on experience with at least one configuration management software (Terraform/Ansible/Chef/Puppet)
Working knowledge of containers and any one container orchestration platform (Kubernetes/Nomad/Mesos/Swarm)
Experience with Palo Alto, F5, cloud firewalls, load balancers and security groups, WAF, Akamai and related products and technologies.
Understanding and experience in at least one CI/CD pipeline (Jenkins/Travis/CircleCI/Gitlab etc.)
Working knowledge of any one distributed version control systems (git/bzr/hg)
Ability to write good technical user documents
Exposure to managing Infrastructure as Code with tools like Terraform/CloudFormation or using Cloud Provider SDKs
Experience with a CDN (e.g. Akamai)

Preferred Skills:

AWS & GCP
Terraform
Kafka
Git
GitLab
Kubernetes
Docker
Good working knowledge of Istio service mesh
Good working knowledge of Akamai
Experience working with AWS & GCP for VPC configuration, NAT, Load Balancing, monitoring
Understanding of Kubernetes and networking in a microservice architecture
PaloAlto networks, PanOS and Panorama devices, physical and virtual
Infoblox Grid Manager

Minimum Education, Licensure and Professional Certification requirements:

BA/BS required in Computer Science, Computer Engineering preferred

Minimum Experience required  (number of years necessary to perform role) :

5+ years of hands-on DevOps experience required.
2+ years of managing production infrastructure on any cloud.
2+ years of experience developing code, either maintaining scripts or applications

Site Reliability Engineer

More jobs

Junior Test Automation Engineer

Vultr

Principal Software Engineer - Cloud Networking

Vultr