Systems Engineer, Metrics and Alerting
Listed on 2026-01-01
-
Software Development
Software Engineer, DevOps
Position Title: Systems Engineer, Metrics and Alerting
About UsAt Cloudflare, we are on a mission to help build a better Internet. Today the company runs one of the worlds largest networks that powers millions of websites and other Internet properties for customers ranging from individual bloggers to SMBs to Fortune 500 companies. Cloudflare protects and accelerates any Internet application online without adding hardware, installing software, or changing a line of code.
Internet properties powered by Cloudflare all have web traffic routed through its intelligent global network, which gets smarter with every request. As a result, they see significant improvement in performance and a decrease in spam and other attacks. Cloudflare was named to Entrepreneur Magazines Top Company Cultures list and ranked among the Worlds Most Innovative Companies by Fast Company.
We realize people do not fit into neat boxes. We are looking for curious and empathetic individuals who are committed to developing themselves and learning new skills, and we are ready to help you do that. We cannot complete our mission without building a diverse and inclusive team. We hire the best people based on an evaluation of their potential and support them throughout their time e join us!
AvailableLocations:
London or Lisbon About the Department
Production Engineering is responsible for the worlds most reliable, observable, performant, and safe network ecosystem. Our customers rely on our products and systems to safely modify, troubleshoot, and release products without external impact.
Our external customers rely on us to provide seamless and predictable incident, traffic, policy management, resulting in the fastest and safest network services in the world.
We are accountable for the overall performance of internal and external facing services, guiding our product teams to optimal configurations and maximum efficiency. From the moment that a packet enters the Cloudflare ecosystem, we know exactly what its expected purpose and behaviour is and we are capable of determining and exposing anomalous behaviour.
The Cloudflare network makes it possible to solve challenges at massive scale and efficiency which would be impossible for almost any other organization.
About the TeamThis role is for the internal Observability Team, responsible for the observability platform and stack to make our engineering teams productive. This includes (but is not limited to) areas like metrics, alerting, error tracking, logging, tracing, and more.
In this role, you can expect to:- Design, deliver, and operate software and a platform that progresses Cloudflare's Observability competency
- Solve scaling bottlenecks in critical services in our Metrics & Alerting pipeline
- Work on highly distributed and scalable systems
- Participate in the constant cycle of knowledge sharing and mentoring
- Participate in the global on-call rotation for the services your team owns
- Research and introduce cutting‑edge technologies
- Contribute to open‑source
We are a small team, well-funded, growing and focused on building an extraordinary company. This is a software engineering/systems engineering role and is a superb opportunity to be part of a high performing team to help to support Cloud flares mission and help build a better internet.
You may be a good fit for our team if you have:- A Software Engineering background and proficiency in high-level programming languages (e.g., Go)
- Proficiency in Data structures and databases like TSDBs, Columnar stores or related
- Proficiency in distributed Linux environments
- Proficiency in designing high‑scale distributed systems
- Proficiency in Prometheus, Alert manager, Thanos
- Experience working in a fast, high‑growth environment
- Experience working in a 24/7/365 service environment
- Exquisite written and verbal communication skills
- Familiarity with Internetworking, networking protocols Layer 2-7 of the OSI model and BGP
- Strong bias for action
- Experience with high-bandwidth transit Internetworking and routing
- Passion for code simplicity and performance
We're not just a highly ambitious, large-scale technology company. We're a…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).