SRE/DevOps Engineer
Listed on 2026-06-05
-
IT/Tech
Cloud Computing, SRE/Site Reliability
About Us
Versana is an industry-backed data and technology company on a mission to make the syndicated loan market better. By digitally capturing agent banks’ data on a real‑time basis and centralizing it onto a single platform, Versana provides unprecedented transparency into loan‑level details and portfolio positions, bringing efficiency and velocity to the entire market. Through our platform, participants can rest assured they are accessing the loan market’s most credible source of deal information.
AboutYou
Versana is seeking a motivated SRE/Dev Ops Engineer with strong observability experience to join our growing Platform Engineering squad. The squad’s goal is to manage public cloud, improve Dev Ops practices, and monitor Versana’s real‑time syndicated loan data platform. The ideal candidate will have a deep understanding of cloud‑native applications, distributed computing, CI/CD implementation, observability tools and practices.
Key Responsibilities- Design, implement and enhance system observability and monitoring tools.
- Monitor system performance, create incident response plans, and implement observability practices to gain insights into system behavior.
- Implement and monitor service‑level objectives (SLOs) and indicators.
- Improve system reliability and resiliency.
- Conduct post‑incident reviews and implement necessary changes to prevent system failures.
- Assist teams in implementing observability tools and leveraging available telemetry data to troubleshoot and resolve incidents and problems.
- Leverage observability and event management to improve key incident management metrics, such as mean time to detect and mean time to restore services.
- Continually optimize systems and workflows by improving architecture, infrastructure, automation, CI/CD, and observability.
- Collaborate with developers to ensure applications are designed with Dev Ops best practices in mind.
- Participate in a rotating on‑call schedule for weekend releases and be available to respond to production issues outside of regular working hours, including weekends and holidays.
- 5+ years of experience as a Site Reliability Engineer or similar role.
- 3+ years of work experience with public cloud (Azure, AWS or GCP).
- 3+ years of direct experience with observability tools like Datadog, Elasticsearch, and Grafana Labs, etc.
- 3+ years of experience with containerization and orchestration technologies like Docker and Kubernetes.
- 2+ years of experience in development and management of CI/CD pipelines (e.g., Azure Dev Ops, Gitlab CI/CD, Github Actions, Jenkins, etc).
- 2+ years of experience with Infrastructure‑as‑code tools like Terraform, Azure Bicep, Cloud Formation, etc.
- 1+ years of experience with site reliability tools like Gremlin, Chaos Mesh, or similar.
- Proven track record leveraging core observability concepts, end‑user monitoring, and infrastructure monitoring with SaaS solutions.
- Experience with messaging services like Kafka or Azure Event Hubs.
- Good understanding of the Linux operating system.
- Experience in at least one coding language such as Java, JavaScript, Python, GoLang, or .NET.
- Certifications in cloud technologies.
- Experience with Azure cloud or Azure Dev Ops.
- Experience with Datadog or similar modern observability tools.
We are committed to providing equal employment opportunities to all employees and applicants for employment and prohibit discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).