As a Site Reliability Engineer within the APX SRE organization, you’ll focus on delivering practical, scalable solutions to support the reliability and performance of our mission-critical, cloud-native global Kubernetes platform and the services that run on it. You care deeply about system stability, clear documentation, and creating tools that improve the developer experience.
Location: This role is based out of our Boston, MA office and follows a hybrid schedule. We rely on in-person collaboration and ask that team members work onsite Tuesdays through Fridays, with the flexibility to work remotely on Mondays, unless there is an approved workplace accommodation. We believe that connection fuels innovation, and our in-office culture is designed to foster meaningful teamwork, mentorship, and shared success.
What You’ll Do
As an SRE, you’ll play a critical role in building the infrastructure and tools that power reliable, scalable, and secure engineering operations across the company. You will:
- Build robust, easy-to-use kubernetes platforms and tools that enable engineering teams to provision and operate services rapidly, consistently, and securely.
- Exemplify cloud-native site reliability best practices.
- Write code that is performant, maintainable, clear, and concise.
- Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems.
- Influence and educate the engineering organization to adopt new and improved architectural patterns.
- Provide robust documentation for use by engineers to promote self-service.
- Continually seek improvement within our kubernetes platform for improved reliability, operability, and cost efficiency
- Take calculated risks, champion new ideas, and cultivate your craft.
What You Bring
Basic Qualifications
- Some applicable experience and/or class work in platform engineering, and container orchestration
- Experience on clouds such as Azure and AWS
- A genuine interest in building, operating, and innovating clustering solutions for Kubernetes platforms like AKS, EKS, or similar in production at scale
- Experience with programming languages such as Python, Go, C#, Java, or similar.
- Experience of code collaboration such as GitHub, ArgoCD, or similar.
- Experience using observability tools such as APM, logging, and metrics to assist with debugging issues.
- Familiarity with infrastructure as code tools such as terraform
- Empathy to support the needs of software engineers.