Senior Site Reliability Engineer Job Austin area,Texas USA,IT/Tech

About the Role

At Jamf, we believe in an open, flexible culture based on respect and trust. Our track record and thriving work environment all stem from the freedom we grant ourselves to get the job done right. We take pride in helping tens of thousands of customers around the globe succeed with Apple.

Locations

This role is offered as remote in Minneapolis, MN;
Eau Claire, WI; or Austin, TX metro areas. You may be required to work periodically at a Jamf office or collaborative work location with other Jamf employees in your area for certain events or moments that matter. We are only able to accept applications for those based in one of these locations.

What You'll Do

As a Senior Site Reliability Engineer, you'll help us balance development velocity with the reliability our customers depend on. You'll partner with engineering teams to shape how their services are measured, lead the work to improve them, and use what you learn from production to build the automation and agentic tooling that improves reliability globally. This is a senior individual contributor role at the intersection of Engineering, Product, Customer Success and Technical Support, where you'll play a meaningful part in shaping how we practice SRE at Jamf.

Job Responsibilities

Partner with engineering teams to define service‑level objectives, error budgets, and supporting indicators for their services, and help them use those measures to inform prioritization and reliability investment.
Investigate complex production issues end‑to‑end across application, data, infrastructure, and network layers, using AI to correlate logs, metrics, and code and to pressure‑test hypotheses before acting.
Produce clear technical documentation, runbooks, architecture notes, post‑mortems and proofs of concept for both technical and non‑technical audiences, in a form that engineers and AI tools can re‑use.
Identify systemic sources of toil and lead the work to eliminate them through automation, AI agents, tooling, and process change.
Set the conditions for AI agents to do reliable work in our environment, including repository context, well‑specified tasks, integrations such as MCP servers that give AI safe access to the systems it needs, and the tests and guardrails needed for AI‑authored change to be trusted.
Participate in team ceremonies to identify and refine work, communicate findings, and drive opportunities to collaborate.
Drive cross‑team and cross‑department collaboration on reliability initiatives, including reviewing designs, influencing roadmaps, and mentoring engineers on SRE practices, including effective AI use in their reliability work.
Advise senior leadership and stakeholders during critical customer escalations, translating between technical reality and business impact.
Contribute to scaling the SRE practice itself: improving our standards, our tooling, and how we partner with product engineering teams.

What We Are Looking For

Minimum of 5 years experience in software engineering, SRE or production operations roles.
Required
Strong production troubleshooting skills across the stack. Ability to diagnose issues from first principles using the tools available (profilers, heap and thread dumps, query plans, traces, logs, metrics).
Required
Experience working within a form of the Agile development framework process.
Required
Hands‑on experience operating production services on AWS (e.g., EC2, S3, EKS, RDS/Aurora, Cloud Front).
Required
Experience utilizing observability tools (e.g., Grafana, Prometheus, Logic Monitor).
Required
Experience creating clear and concise technical documentation that is targeted at both technical and non‑technical audiences.
Required
Experience writing infrastructure as a code.
Required
Experience writing automation in a general‑purpose language (e.g., Python, Go, Java, or similar) to a production standard.
Required
Strong judgement about how to apply AI effectively across the full range of SRE work, including high‑stakes areas such as production access and sensitive data, knowing how to scope and verify work to make it safe.
Required
Hands‑on experience using agentic development tools (e.g., Claude Code, Cursor, Copilot) to…