Site Reliability Engineer in Team
Job Description & How to Apply Below
Join a dynamic infrastructure team as a Site Reliability Engineer. Focus on enhancing platform reliability, ensuring availability, and supporting AI workloads for improved system performance.
In this role, you'll directly impact platform operational performance and reliability. Collaborating with Dev Ops and engineering teams, you will help build scalable infrastructure and address incident responses. You'll play a key role in implementing security measures and improving observability for AI systems.
Key Responsibilities:
• Maintain platform reliability and availability
• Optimize and secure infrastructure systems
• Proactively address scaling and reliability challenges
• Configure monitoring and incident response strategies
• Support AI/ML infrastructure and workloads
Requirements:
• Experience in Site Reliability Engineering or similar
• Proven skills with AWS, particularly EKS and RDS
• Familiarity with Kubernetes for production environments
• Proficient in Terraform for infrastructure development
• Strong background in Postgre
SQL and observability tools
Enhance the system performance and contribute to a vibrant engineering culture while supporting AI innovations.
#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×