Vultr logo

Manager, Network DevOps

Vultr
Full-time
Remote
United States
$140,000 - $150,000 USD yearly
Engineering
  • Own the NetDevOps roadmap — spanning automation, observability, configuration validation, telemetry ingestion, and operational tooling for the global network.
  • Manage and grow a high-performing team of NetDevOps Engineers, providing technical guidance, career development, and hands-on mentorship.
  • Drive automation for complex environments, including EVPN-VXLAN data center fabrics, RoCEv2 lossless Ethernet, and global WAN/edge infrastructure.
  • Build and evolve operator tooling for Network Operations (Tier 1/2) including event correlation, intent validation, playbooks, and automated remediation workflows.
  • Ensure operational excellence across fleet-wide updates, config management, CI/CD pipelines, and reliability metrics for automation systems.
  • Partner closely with Cloud Networking (who own front-end networking, VPC automation, dataplane behavior) to unify automation interfaces and ensure clean separation of responsibilities.
  • Collaborate with Architecture, Platform, and GPU/AI Engineering on next-generation fabric design, automation hooks, observability, and provisioning flows.
  • Standardize telemetry ingestion and correlation pipelines (gNMI, Kafka, Prometheus, custom collectors) to generate actionable, real-time insights into network behavior.
  • Lead complex investigations across routing, switching, RDMA transport behavior, congestion, ECMP, and overlay/underlay interactions, especially where tooling or automation must evolve.
  • Define engineering standards, SLIs/SLOs for automation services, and operational maturity goals (testing, documentation, failure modes).

Qualifications

  • Strong experience building and leading high-performing engineering teams (NetDevOps, SRE, automation, or network engineering groups).
  • Deep understanding of modern data center networking: EVPN-VXLAN, BGP, QoS, telemetry, and config automation.
  • Familiarity with RoCEv2/RDMA fabrics, PFC/ECN tuning, congestion management, or GPU/AI fabric operations.
  • Hands-on experience with automation ecosystems - Ansible, Python, Go, Rust, CI/CD pipelines, config linting, and intent validation frameworks.
  • Experience integrating automation with a Source-of-Truth (NetBox, Nautobot, OpsMill, homegrown systems).
  • Strong understanding of telemetry and monitoring stacks (Prometheus/Grafana, Kafka, OpenTelemetry, custom collectors).
  • Ability to dive deep into Linux networking internals, namespaces, netlink, and distributed systems behavior.
  • Proven experience delivering reliable automation services at scalem with strong fundamentals in testing, versioning, rollback, and change management.
Apply now
Share this job