×
Register Here to Apply for Jobs or Post Jobs. X

AI Reliability Engineer; AI SRE

Job in Long Beach, Los Angeles County, California, 90801, USA
Listing for: DeWinter Group
Contract position
Listed on 2026-05-30
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, SRE/Site Reliability, AI Engineer
Salary/Wage Range or Industry Benchmark: 50 - 175 USD Hourly USD 50.00 175.00 HOUR
Job Description & How to Apply Below
Position: AI Reliability Engineer (AI SRE)

Title: AI Reliability Engineer (AI SRE)

Job Type: Contract

Contract Length: 12 Months

Pay Range: $50/hr – $175/hr

Start Date: ASAP

Location: Remote

About the Opportunity:

Our client, a leader in AI testing and Generative AI solutions, is looking for a skilled AI Reliability Engineer (AI SRE) to join their team for a 12-month engagement. This project involves ensuring the reliability, availability, and performance of mission‑critical AI systems by defining SLOs, implementing automated resilience measures, and leading incident response. This is a high‑impact role that requires a self‑motivated professional who can hit the ground running and deliver results quickly.

Key Responsibilities & Deliverables:
  • Defining and maintaining Service Level Objectives (SLOs) for AI inference latency and availability.
  • Building automated "circuit breakers" and fallback logic (e.g., switching to a smaller model if the primary fails).
  • Leading incident response and root-cause analysis (RCA) for complex AI system failures.
  • Developing stress‑testing and chaos engineering scenarios specifically for AI agent swarms.
  • Optimizing the "cold start" and scaling time for serverless AI functions.
Required

Skills & Experience:
  • 4+ years of experience in Site Reliability Engineering (SRE).
  • Deep expertise in system monitoring, incident management, and cloud resilience. This isn't a learning role—you need to be a subject matter expert.
  • Demonstrated ability to work autonomously and manage your own time effectively to meet project goals.
  • Experience with Python/Go, Kubernetes, and observability stacks (Datadog, New Relic).
  • Strong communication skills to provide clear and concise status updates to the project team.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary