Site Reliability Engineer
Listed on 2025-12-02
-
IT/Tech
Cloud Computing, IT Support, SRE/Site Reliability, Systems Engineer
We are seeking a Site Reliability Engineer (SRE) to support and enhance the reliability engineering, operations, and customer support for our Service Now SaaS platform. This is a hybrid role combining automation, process improvement, and production support with a strong emphasis on building and maintaining reliable and scalable systems. As part of a global SRE community, you'll collaborate with diverse teams and stakeholders to optimize system performance, resolve incidents, and drive service excellence.
The ideal candidate brings a blend of development skills, a problem-solving mindset, and a passion for operational excellence. Whether you come from a development, infrastructure, or systems administration background, if you’re eager to apply SRE principles and deliver measurable improvements, we encourage you to apply.
Key Responsibilities:
- Drive improvements in availability, performance, and scalability for the Service Now SaaS platform by optimizing and automating operational tasks.
- Collaborate with global SRE colleagues to develop observability tools (metrics, logging, tracing, dashboards) that monitor and define product reliability.
- Engage in incident response and resolution, particularly for Service Now and occasionally Linux-based on-premise infrastructure.
- Participate in a global on-call rotation, ensuring timely response and remediation during incidents (time-off in lieu offered).
- Contribute to knowledge documentation and ongoing efforts to understand and map dependencies in Service Now and associated systems.
- Identify, prioritize, and address technical debt that hinders performance, reliability, or client satisfaction.
- Collaborate in architecture reviews, process delivery improvements, and operational tooling development to support SRE goals.
- Provide constructive feedback on policies and operational processes to continuously improve service delivery and team effectiveness.
Required Skills &
Qualifications:
- Minimum 7 years of relevant experience in software development, system administration, or infrastructure operations.
- Strong proficiency in at least one programming/scripting language (e.g., Python).
- Excellent troubleshooting skills across Service Now and Linux-based systems.
- Strong interpersonal and communication skills; capable of building positive, productive relationships across teams.
- Proven dependability in handling time-sensitive or high-impact technical incidents.
- Commitment to continuous learning and improvement of reliability, efficiency, and customer satisfaction.
Preferred
Skills:
- Service Now administration or development experience (training available if not already acquired).
- Familiarity with SRE principles such as task automation, technical debt reduction, capacity management, and monitoring.
- Experience in a production support or Dev Ops/SRE role in an enterprise-scale environment.
- Exposure to IT service management (ITSM), SaaS platforms, and enterprise tool chains.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: