×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer; Core AI Infrastructure

Job in Sumter, Sumter County, South Carolina, 29150, USA
Listing for: Coinbase
Full Time position
Listed on 2026-06-29
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Cloud Computing: Infrastructure & Operations, IT Project Manager
Salary/Wage Range or Industry Benchmark: 130000 - 160000 USD Yearly USD 130000.00 160000.00 YEAR
Job Description & How to Apply Below
Position: Staff Site Reliability Engineer (Core AI Infrastructure)

Job Overview

You’ll join a high-performing team of engineers driving AI transformation at Coinbase as a Staff Site Reliability Engineer on the IT Operations team.

Responsibilities
  • This team builds and scales the infrastructure powering Coinbase’s AI products, with direct exposure to senior leadership in a fast-paced, incubator-style environment.
  • You’ll own the reliability, monitoring, and incident response lifecycle for AI infrastructure services, including on-call support for AWS deployment pipelines, root cause analysis, and blameless retros.
  • Build automation and tooling to streamline operational IT workflows, eliminate manual tasks, and improve deployment velocity across CI/CD frameworks and Kubernetes environments.
  • Partner with the Coinbase Infrastructure team to extend CI/CD frameworks supporting IT services and enterprise network platforms, and with Security and Compliance to integrate surveillance tooling into deployment pipelines.
  • Strengthen observability and documentation standards across IT engineering by defining metrics, implementing monitoring solutions, and maintaining technical documentation that sets a standard of excellence.
  • Develop full-stack applications that power internal AI products and infrastructure with Go or Python.
Qualifications

Utilizes generative AI responsibly, maintaining human oversight to deliver business-ready outputs and drive measurable improvements in workflow efficiency, cost, and quality.

Proven experience deploying, managing, and troubleshooting containerized workloads using Docker and Kubernetes in production environments.

Track record of leading incident response in environments with strict SLAs, including root cause analysis, blameless retros, and measurable reliability improvements.

8+ years of experience automating and supporting cloud infrastructure (AWS) and network environments, with hands‑on use of infrastructure‑as‑code tools (Terraform, Ansible, Chef, Puppet, or Salt).

Proficiency in at least one scripting or programming language (Python, Bash, Ruby, or Go) and version control workflows using Git-based CI/CD pipelines.

Expertise with Linux, bash, ruby, python and/or go.

Expertise automating EC2 or containers deployment with Terraform.

Strong network security fundamentals.

Experience managing and leveraging log aggregation.

Experience in a fast‑paced, high‑growth company.

Experience working in a highly regulated environment.

Experience in a Remote‑first IT environment.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary