Network Ops Engineer Load Balancers Federal
Listed on 2026-01-09
-
IT/Tech
Cloud Computing, Systems Engineer, Network Engineer
Job Description
*
* Please Note:
**** This position will include supporting our US Public Sector customers.
“This position requires passing a Service Now background screening, USFedPASS (US Federal Personnel Authorization Screening Standards). This includes a credit check, criminal/misdemeanor check and taking a drug test. Any employment is contingent upon passing the screening.
Due to Federal requirements, only US citizens, US naturalized citizens or US Permanent Residents, holding a green card, will be considered.
The Cloud Network Services (CNS) team at Service Now is responsible for ensuring reliable, high-performance application traffic delivery to every Service Now customer around the globe. We design and operate our own cloud networking solution, blending industry-standard technologies with the power of the Service Now platform to automate and scale host configurations from customer requirements. Our team continuously monitors, tests, and optimizes the platform to deliver new features and maintain top-tier performance.
As we evolve, our focus has expanded to embrace hybrid cloud strategies through deep integrations and partnerships with leading cloud providers, including Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure.
- Operate and maintain Service Now’s global cloud network infrastructure, including backbone routing, top-of-rack (TOR) switching, VPN services, and application delivery controller (ADC) systems.
- Troubleshoot and resolve network issues, including urgent operational events.
- Participate in 24/7 on‑call rotation, including weekends, as part of the Network Operations Engineering team.
- Maintain software‑defined, declarative infrastructure at scale using automation tools such as Ansible, Git Lab.
- Perform software upgrades, version control, and security patching across production systems.
- Proactively analyze network metrics such as capacity, latency, and availability to detect and prevent outages.
- Support network operations in private and hybrid multi‑cloud environments (e.g., Azure, AWS, GCP).
- Partner with the Site Reliability Engineering (SRE) team to improve operational processes and reliability.
- Review, consult, and prepare for planned changes and releases to the production environment.
- Create and maintain detailed documentation of infrastructure, automation, and standard operating procedures.
- Provide feedback to infrastructure architects and contribute to design discussions for new initiatives.
- Collaborate with peer teams building world‑class networking and orchestration solutions.
- Evaluate, adopt, and implement new open‑source and commercial tools and technologies.
- Contribute to processes and automation to build a low‑touch, continuous deployment pipeline with near‑zero downtime and high success rates.
- Drive automation to enable rapid deployment and updates across large‑scale environments.
- Experience in leveraging or critically thinking about how to integrate AI into work processes, decision‑making, or problem‑solving. This may include using AI‑powered tools, automating workflows, analyzing AI‑driven insights, or exploring AI's potential impact on the function or industry
- 4+ years of experience in network operations, infrastructure engineering, or a similar role supporting large‑scale distributed systems.
- Strong hands‑on experience with load balancers (e.g., F5, NGINX), routing/switching (e.g., Juniper, Cisco), and security devices (e.g., Palo Alto, Radware) in production environments.
- Solid understanding of network protocols and services, including TCP/IP, BGP, DNS, TLS/mTLS, and VPNs.
- Experience managing hybrid and public cloud environments (AWS, GCP, Azure) in an operational capacity.
- Proficient in Linux systems administration and troubleshooting.
- Familiarity with container technologies (e.g., Docker, Kubernetes) and service mesh architectures.
- Experience with monitoring, observability, and alerting tools (e.g., Prometheus, Grafana, Splunk).
- Ability to respond to incident resolution, including root cause analysis and post‑mortems.
- Proficiency in infrastructure‑as‑code and automation…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).