Frontdoor logo

Site Reliability Engineer

Frontdoor
Full-time
Remote
United States
$123,000 - $150,000 USD yearly
Software/ IT

Site Reliability Engineers (SREs) are responsible for maintaining the availability and uptime of infrastructure. SREs use software engineering principles to solve operational challenges to create reliable infrastructure. This position will reduce the toil from our everyday work using as much automation as possible.

Responsibilities:

  • Research and implement solutions to build an always-up, always-available, resilient services.
  • Builds and maintains automation tooling for infrastructure, CI/CD and observability (monitoring, alerting, logging, tracing) pipelines.
  • Builds and maintains cloud and container orchestration infrastructure.
  • Collaborates with software engineering, security, systems teams to help automate and streamline operations and processes.
  • Implements best DevOps practices across the organization to improve performance and efficiency.
  • Performs research and implements solutions to build an always-up, always-available, resilient services.
  • Integrates and automates existing manual solutions and processes.
  • Participates in an on-call rotation for production issue escalations.
  • Troubleshoot and support productions issues
  • Assists with the planning for growth and capacity of the infrastructure
  • Participates on cross functional company project teams responsible for implementing technology.
  • Investigates anomalies/outages and determines steps to reproduce, root cause, and solutions options.
  • Monitors environment performance and provides all necessary reporting analysis.
  • Assists with the integration and automation of existing manual solutions and processes.
  • Attends relevant conference/seminars to remain current on new and upcoming technology.
  • Self-directed with the ability to coordinate the work of others, both inside and external to the team.
  • May include other duties as assigned.

Qualifications

Required Skills:

  • Good understanding of Unix/Linux operating systems and its internals
  • Good understanding of core concepts of computer networking (TCP/UDP, IP Routing, DNS)
  • Well-versed with Linux CLI
  • In addition to shell scripting (sh/bash), proficient with one other programming language (Python/Go)
  • Hands-on experience with cloud service providers (at least one of GCP, AWS and Azure)
  • Hands-on experience with at least one configuration management software (Terraform/Ansible/Chef/Puppet)
  • Working knowledge of containers and any one container orchestration platform (Kubernetes/Nomad/Mesos/Swarm)
  • Experience with Palo Alto, F5, cloud firewalls, load balancers and security groups, WAF, Akamai and related products and technologies.
  • Understanding and experience in at least one CI/CD pipeline (Jenkins/Travis/CircleCI/Gitlab etc.)
  • Working knowledge of any one distributed version control systems (git/bzr/hg)
  • Ability to write good technical user documents
  • Exposure to managing Infrastructure as Code with tools like Terraform/CloudFormation or using Cloud Provider SDKs
  • Experience with a CDN (e.g. Akamai)

Preferred Skills:

  • AWS & GCP
  • Terraform
  • Kafka
  • Git
  • GitLab
  • Kubernetes
  • Docker
  • Good working knowledge of Istio service mesh
  • Good working knowledge of Akamai
  • Experience working with AWS & GCP for VPC configuration, NAT, Load Balancing, monitoring
  • Understanding of Kubernetes and networking in a microservice architecture
  • PaloAlto networks, PanOS and Panorama devices, physical and virtual
  • Infoblox Grid Manager

Minimum Education, Licensure and Professional Certification requirements:

  • BA/BS required in Computer Science, Computer Engineering preferred

Minimum Experience required  (number of years necessary to perform role) :

  • 5+ years of hands-on DevOps experience required.
  • 2+ years of managing production infrastructure on any cloud.
  • 2+ years of experience developing code, either maintaining scripts or applications