Site Reliability & DevOps Engineer
Listed on 2026-04-23
-
IT/Tech
Systems Engineer, SRE/Site Reliability, IT Support, Cloud Computing
Location: Hungary
Staff Site Reliability & Dev Ops Engineer - Observability
At Cision, we believe in empowering every individual to make an impact. Here, your voice is heard, your ideas are valued, and your unique perspective fuels our collective success. As part of our global team, you'll thrive in an environment that champions curiosity, collaboration, and innovation, all while making meaningful contributions to the brands we accelerate.
Join us in shaping the future of communication and building authentic connections that matter. Whether you're solving complex problems or driving bold innovations, your growth is our success, and together, we’ll create the conversations of tomorrow.
Empower your impact seen, be understood, be you.
This role focuses on designing, operating, and evolving observability platforms with a strong emphasis on metrics, logging, and alerting. The primary tooling is Grafana and Prometheus, with responsibility for ensuring production systems are observable, reliable, and operable role works closely with platform, infrastructure, and application teams.
Key responsibilities- Design, build, and operate observability platforms based on Grafana and Prometheus
- Define and maintain metrics standards, dashboards, alerts, and SLOs
- Improve signal quality: reduce alert noise, tune thresholds, and improve runbooks
- Support incident response by providing actionable telemetry and post-incident analysis
- Integrate metrics, logs, and traces across distributed systems
- Work with engineering teams to instrument services correctly
- Automate observability configuration using infrastructure as code
- Contribute to reliability improvements through capacity planning and performance analysis
- Strong experience with Prometheus (scraping, federation, recording rules, alerting)
- Strong experience with Grafana (dashboards, alerting, templating, RBAC)
- Solid Linux and networking fundamentals
- Experience running observability stacks in Kubernetes environments
- Infrastructure as code experience (Terraform preferred)
- Familiarity with incident management and on-call practices
- Ability to debug production systems using metrics and logs
- Experience with logs and traces (e.g. Loki, Tempo, Open Telemetry)
- Experience operating large-scale or multi-cluster Kubernetes platforms
- Experience with cloud platforms (GCP, AWS, OCI)
- Exposure to SRE concepts such as error budgets and SLO-driven prioritisation
- Engineers trust dashboards and alerts to reflect system health
- Incidents are detected earlier and diagnosed faster
- Alert fatigue is reduced and on-call quality improves
- Observability is treated as a first-class platform capability
Cision is proud to be an equal opportunity employer, seeking to create a welcoming and diverse environment. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender identity or expression, sexual orientation, national origin, genetics, disability, age, veteran status, or other protected statuses.
Cision is committed to the full inclusion of all qualified individuals. In keeping with our commitment, Cision will take the steps to assure that people with disabilities are provided reasonable accommodations. Accordingly, if reasonable accommodation is required to fully participate in the job application or interview process, to perform the essential functions of the position, and/or to receive all other benefits and privileges of employment, please contact
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).