Senior Site Reliability Engineer Job Nashville area,Tennessee USA,IT/Tech

We deliver mission-critical

IT/OT infrastructure—in cloud and on-prem—for industrial customers thatcan'tafford downtime.

Small team. Hard problems. Practical solutions. No bureaucracy. No blame. No egos.

We ship it, own it, and make itbetter—blameless but accountable, shoulder to shoulder. We work hard. We stay human. We trust each other. Wefigureit out.

If you know what to do, delight in building it, and feel the ownership to support it—keep reading.

What You'll Do Customer Delivery

Design complex IT/OT architectures—in cloud and on-prem—that are secure, recoverable, and sized appropriately
Work directly with customers to understand their environment and estimate effort
Build or use reusable modules when it makes sense—build bespoke when itdoesn't
Deploy and manage Kubernetes-based infrastructure and stateful applications across diverse customer environments
Participate in on-call rotation alongside the rest of the team—everyone here supports what we ship
Own incidents through resolution, then drive root cause analysis thateliminatesthe class of problem—not just the symptom
Build the runbooks, alerts, and automation that make the next incident less likely or less painful

Infrastructure & Automation

Workwith Infrastructure-as-Code tools to provision and manage diverse customer environments
Implement and maintain

GitOpsworkflows for in-cluster deployments
Ensure all infrastructure and application changes are declarative and version-controlled
Automate self-healing and system updates—reduce manual intervention and keep environments current

Observability & Reliability

Build andmaintainmonitoring, alerting, and dashboards using Prometheus, Loki, and Grafana
Define SLIs and SLOs that reflect what actually mattersto customers
Surface real problems, reduce noise, and continually improve reliability andteamefficiency

Shape the Future

Wedon'thave everything figured out.

You'llhelp build, create, and shape how we operate
Contribute to standards, patterns, and processes that make us better—not bureaucracy for its own sake
Bring the SRE mindset: automate toil, prefer boring/stable systems, and relentlessly improve

What We're Looking For

5+ years in SRE, Dev Ops, or Infrastructure Engineering
Strong Kubernetes skills in production environments—you'll troubleshoot real clusters, not just tutorials
Experience with

GitOpstooling (ArgoCD,Rancher Fleet,FluxCD,or similar)
Solid understanding of Infrastructure-as-Code concepts (Terraform,Pulumi,Crossplane,or similar)
Real incident response experience—you'vebeen on-call, stayed calm, and fixed things under pressure
Comfort with heterogeneous environments—every customer site is a little different and you need to adapt
Clear communication skills—you can write a useful runbook, gather requirements on a customer call, and document what you learned
Ability tooperatein ambiguity—we'rebuilding clarity, not waiting for it

Strong Plus

Azure experience (our primary cloud)
Experience with SUSE ecosystem (SLE Micro, RKE2, Rancher, Longhorn)
Industrial, manufacturing, or OT environment experience
Familiarity with Inductive Automation'sIgnition platform andMQTT
Experience in a startup or small-team environment where you wore many hats

The SRE Mindset

This matters here. We need someone who:

Sees repetitive manual work as a problem to automate, nota fact of life
Prefers stable, predictable, "boring" production over clever and fragile
Supports what they create—no throwing things over the wall
Treats incidents as opportunities for systemic improvement
Works well on a small team where everyone carries weight
Stays current with SRE practices, emerging technologies, and cloud/edge trends

This is a startup. Hours can be demanding. Priorities shift. Youwon'thave a team of30 backing you up.

What you will have: the autonomy to make real decisions, teammates who own their work, and customers who genuinely depend on what we build. We work hard because the work matters—and we have fun doing it.

If you wanta structured 9-5, predictability, and a clear ladder—this probably isn'tthe right fit.

If you want to build, learn, and be part of something that's actually going somewhere—let's talk.

What We Offer

Fully remote—workfrom anywhere in the world
A team whereit'ssafe to be honest, learn from mistakes, and get better together

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language