Own the NetDevOps roadmap — spanning automation, observability, configuration validation, telemetry ingestion, and operational tooling for the global network.
Manage and grow a high-performing team of NetDevOps Engineers, providing technical guidance, career development, and hands-on mentorship.
Drive automation for complex environments, including EVPN-VXLAN data center fabrics, RoCEv2 lossless Ethernet, and global WAN/edge infrastructure.
Build and evolve operator tooling for Network Operations (Tier 1/2) including event correlation, intent validation, playbooks, and automated remediation workflows.
Ensure operational excellence across fleet-wide updates, config management, CI/CD pipelines, and reliability metrics for automation systems.
Partner closely with Cloud Networking (who own front-end networking, VPC automation, dataplane behavior) to unify automation interfaces and ensure clean separation of responsibilities.
Collaborate with Architecture, Platform, and GPU/AI Engineering on next-generation fabric design, automation hooks, observability, and provisioning flows.
Standardize telemetry ingestion and correlation pipelines (gNMI, Kafka, Prometheus, custom collectors) to generate actionable, real-time insights into network behavior.
Lead complex investigations across routing, switching, RDMA transport behavior, congestion, ECMP, and overlay/underlay interactions, especially where tooling or automation must evolve.
Define engineering standards, SLIs/SLOs for automation services, and operational maturity goals (testing, documentation, failure modes).
Qualifications
Strong experience building and leading high-performing engineering teams (NetDevOps, SRE, automation, or network engineering groups).
Deep understanding of modern data center networking: EVPN-VXLAN, BGP, QoS, telemetry, and config automation.
Familiarity with RoCEv2/RDMA fabrics, PFC/ECN tuning, congestion management, or GPU/AI fabric operations.
Hands-on experience with automation ecosystems - Ansible, Python, Go, Rust, CI/CD pipelines, config linting, and intent validation frameworks.
Experience integrating automation with a Source-of-Truth (NetBox, Nautobot, OpsMill, homegrown systems).
Strong understanding of telemetry and monitoring stacks (Prometheus/Grafana, Kafka, OpenTelemetry, custom collectors).
Ability to dive deep into Linux networking internals, namespaces, netlink, and distributed systems behavior.
Proven experience delivering reliable automation services at scalem with strong fundamentals in testing, versioning, rollback, and change management.