More jobs:
Site Reliability Engineer
Job in
Saint Petersburg, Pinellas County, Florida, 33739, USA
Listed on 2026-05-09
Listing for:
Zelis Healthcare, LLC
Full Time
position Listed on 2026-05-09
Job specializations:
-
IT/Tech
Systems Engineer, IT Support, Cloud Computing: Infrastructure & Operations, SRE/Site Reliability
Job Description & How to Apply Below
Position Overview
Job Title: Site Reliability Engineer
Location: Remote, In‑office, or Hybrid
Department: IT Operations
Reports To: Manager of Observability & Reliability
Job Type: Full‑Time Employee (FTE)
Job Summary: This role is responsible for establishing a consistent and scalable approach to monitoring and alerting, leveraging golden signals to enhance system reliability and operational efficiency. The successful candidate will collaborate closely with the ZEIT SRE team, engineering leads, and India‑based resources to build a unified observability strategy aligned with organizational goals.
Key Responsibilities Observability Roadmap Development- Define a unified vision for observability across all platforms, with golden signals as the foundation for monitoring and alerting.
- Develop and maintain a comprehensive roadmap to improve observability, reduce tool redundancy, and standardize practices across platforms.
- Establish and track key performance indicators (KPIs) to measure progress and ensure accountability for roadmap milestones.
- Partner with the ZEIT SRE team and engineering leads to break down silos and promote consistent observability practices.
- Drive cross‑platform collaboration to reduce operational inconsistencies and define a "north star" approach for observability.
- Facilitate knowledge sharing to ensure alignment on current and future observability initiatives.
- Standardize the implementation of golden signals across applications to improve system reliability and incident detection.
- Optimize alerting tools and reduce redundant or ineffective monitoring interfaces ("panes of glass").
- Lead efforts to enhance observability while minimizing operational overhead for platform teams.
- Maintain and enhance observability dashboards, delivering actionable insights into application health and performance.
- Identify and address gaps in existing observability practices, prioritizing long‑term scalability and reliability.
- Collaborate with India‑based resources to execute observability build‑outs efficiently and with high quality.
- Reduce client, provider, and print facility‑raised issues through proactive monitoring and early detection.
- Measure and report on observability success metrics, including actionable alert volume and reduced issue escalations.
- Continuously evaluate and refine observability strategies based on stakeholder feedback and evolving organizational needs.
- Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience).
- Minimum of 5 years of experience in Site Reliability Engineering, Dev Ops, or a related role with a strong focus on observability.
- 5+ years of hands‑on experience with .NET (C#), including advanced knowledge of ASP.NET Core, Web APIs, and performance optimization.
- Demonstrated success in designing and implementing monitoring and alerting solutions across complex IT environments.
- Deep understanding of SRE principles and golden signals for system monitoring.
- Proficiency with observability tools such as Prometheus, Grafana, Splunk, New Relic, or Datadog.
- Familiarity with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
- Advanced proficiency in scripting languages such as Power Shell.
- Experience in front‑end development using React.js.
- Advanced knowledge of .NET.
- Strong leadership and collaboration abilities, with a proven ability to align diverse teams toward common goals.
- Excellent analytical and problem‑solving skills, with a proactive approach to identifying and resolving issues.
- Clear and effective communication skills, capable of conveying technical concepts to stakeholders at all levels.
- Experience with building observability roadmaps and scaling solutions in enterprise environments.
- Certifications in cloud or Dev Ops‑related disciplines (e.g., AWS Certified Dev Ops Engineer, Kubernetes Administrator).
Please note at this time we are unable to proceed with candidates who require visa…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×