DNS Engineer - SRE
Listed on 2026-05-30
-
IT/Tech
Systems Engineer, Cloud Computing, Network Security
Job Summary
DNS Engineer – SRE is a high‑impact role responsible for the architecture, scalability, and reliability of the mission‑critical DNS infrastructure powering our ISP and core network services. This position is designed for an engineer who views infrastructure through the lens of Site Reliability Engineering (SRE) prioritizing automation, observability, and self‑healing systems over manual intervention. You will combine deep IP networking and DNS expertise with modern security protocols to ensure our platforms remain resilient against evolving threats and perform at the highest level for millions of users.
The Role will serve as a technical authority, leading cross‑functional initiatives with Product, Security, and Service Assurance teams, delivering a carrier‑grade DNS ecosystem that balances cutting‑edge privacy standards with uncompromising availability required by Tier‑1 network operations.
ResponsibilitiesCore Platform Strategy & Leadership
- Architectural Ownership:
Lead the design and evolution of global DNS architectures, ensuring high availability through Anycast routing, multi‑provider redundancy, and automated failover mechanisms. - Strategic Vendor Relations:
Act as the primary technical authority in engagements with DNS and infrastructure vendors, driving roadmaps that align with our long‑term reliability and security goals. - Lifecycle & Capacity Management:
Oversee the full lifecycle of DNS platforms—including automated software deployments, hardware refreshes, and proactive capacity planning—to stay ahead of traffic growth. - Standardization & Policy:
Optimize, define and enforce organization‑wide standards for DNS record management, security protocols (DNSSEC), and traffic steering policies to optimize user latency. - Reliability Engineering:
Convert strategic design into operational reality by defining Service Level Objectives (SLOs) and Error Budgets for all core name services.
Cross‑Domain DNS Operations & SRE
- Protocol Management:
Manage the nuances of UDP/TCP port 53, recursion vs. iteration, and complex record types (A, AAAA, CNAME, MX, TXT, SRV). - Security & Mitigation:
Implement and manage DNSSEC to prevent cache poisoning; act as a subject‑matter expert in mitigating DDoS and DNS amplification attacks. - Automation (Eliminating Toil):
Replace manual updates and “pool” management with automated workflows using Python, Go, Ansible, or Terraform. - Performance Tuning:
Perform Linux kernel tuning for high‑performance network throughput and conduct in‑depth log analysis on systems like BIND, Unbound, or Power
DNS. - Observability:
Utilize Prometheus, Grafana, and dnstap to monitor query rates and latency, providing actionable insights into error codes (NXDOMAIN, SERVFAIL).
Minimum Qualifications
- Education:
Bachelor’s degree in Computer Science, Telecommunications, or a related field (or equivalent practical experience in networking and security). - Experience:
5+ years in a networking or systems engineering role, with a focus on SRE principles (automation, reliability, and monitoring) in production environments. - DNS Fundamentals:
Hands‑on experience configuring and maintaining at least two of the following: BIND, Unbound, Power
DNS, AWS Route 53, or Azure DNS. - Networking Protocols:
Functional understanding of TCP/IP (IPv4/IPv6) and DNS‑specific protocols including DNSSEC and encrypted transport (DoH/DoT). - Systems & Automation:
Strong Linux/Unix administration skills and proficiency in at least one scripting language (Python, Bash, or Go) for task automation. - Observability:
Experience using Grafana and Open Telemetry (or similar tools) to monitor service health and performance. - Familiarity with AI Tools and AI‑First mindset.
Preferred Qualifications
- DNS Systems:
Experience managing BIND, Unbound, or Power
DNS in high‑traffic environments, alongside cloud‑native solutions (AWS Route 53, Azure DNS, Google Cloud DNS). - Protocol Expertise:
Mastery of DNS‑specific protocols including DNSSEC, DoT, and DoH, with a firm grasp of underlying transport layers (UDP/TCP) and dual‑stack (IPv4/IPv6) networking. - Observability:
Experience building dashboards and alerts using Prometheus, ELK, or Open Telemetry to…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).