Job Description & How to Apply Below
You’ll be instrumental in shaping the technical architecture and reliability practices r role focuses on end-to-end reliability initiatives, including defining service-level objectives and leading incident management. Through collaboration and mentorship, you will elevate technical standards and advance team capabilities.
Key Responsibilities:
• Develop SLOs and manage production service reliability metrics
• Architect solutions for AI agent failure containment
• Mentor junior engineers and enhance team capabilities
• Drive continuous improvement in operational processes
• Utilize Datadog for service health monitoring and automation
Requirements:
• 6-8 years in site reliability engineering
• Bachelor's degree in Computer Science or applicable field
• Proficiency with AWS services and multiregion patterns
• Strong skills in Terraform and operational tooling
• Experienced in managing CI/CD pipelines
Transform site reliability for AI operations at Tech Insights and drive impactful changes.
#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×