Metrics Platform Site Reliability Engineer
Listed on 2025-11-27
-
IT/Tech
Cloud Computing, Systems Engineer, IT Support, SRE/Site Reliability
Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues around the world, and where you’ll be able to reimagine what’s possible. Join us and help the world’s leading organizations unlock the value of technology and build a more sustainable, more inclusive world.
Job Location - Atlanta, GAJob Description
Key Responsibilities
Manage and mentor a team of Site Reliability Engineers
Define and implement SRE strategies and best practices in alignment with organizational objectives
Monitor clients service level agreements SLAs service level objectives SLOs and service level indicators SLIs
Lead initiatives to improve system reliability availability scalability and performance
Collaborate with development and operations teams to ensure reliability and resiliency goals are met
Implement and improve incident management processes to minimize downtime and ensure timely resolutions
Review and contribute to the architecture of critical systems ensuring they meet reliability and performance goals
Drive observability practices by implementing robust monitoring logging and alerting systems
Skills requiredProficiency in writing Splunk Queries and Alerts is a must
Hands on experience with at least one APM tool New Relic App Dynamics Honeycomb Data Dog is a must
Proficiency in scripting languages Python or Node Js a must
Proficiency in any cloud platforms AWS GCP Azure is a must
Strong understanding of distributed systems microservices architecture and container orchestration tools eg Kubernetes
Experience with monitoring tools like Prometheus Grafana a must
Job Description Monitoring and AlertingImplement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact users
Respond to incidents and outages diagnose problems and implement solutions to minimize downtime and restore service
AutomationAutomate repetitive tasks and processes to improve efficiency and reduce manual effort
Performance OptimizationIdentify and address performance bottlenecks to ensure systems run efficiently and effectively
Infrastructure ManagementManage and maintain the underlying infrastructure including servers networks and cloud resources
Capacity PlanningPlan for future capacity needs to ensure systems can handle anticipated workloads
Release EngineeringDevelop and maintain processes for deploying software updates and releases
Work closely with developers operations teams and other stakeholders to ensure system reliability and availability
DocumentationMaintain clear and concise documentation of systems processes and procedures
Identify areas for improvement and implement changes to enhance system reliability and performance'
Life at CapgeminiCapgemini supports all aspects of your well-being throughout the changing stages of your life and career. For eligible employees, we offer:
- Healthcare including dental, vision, mental health, and well-being programs
- Financial well-being programs such as 401(k) and Employee Share Ownership Plan
- Paid time off and paid holidays
- Paid parental leave
- Family building benefits like adoption assistance, surrogacy, and cryopreservation
- Social well-being benefits like subsidized back-up child/elder care and tutoring
- Mentoring, coaching and learning programs
- Employee Resource Groups
- Disaster Relief
-
Capgemini is an Equal Opportunity Employer encouraging diversity in the workplace.
All qualified applicants will receive consideration for employment without regard to race, national origin, gender identity/expression, age, religion, disability, sexual orientation, genetics, veteran status, marital status or any other characteristic protected by law. This is a general description of the Duties, Responsibilities and Qualifications required for this position. Physical, mental, sensory or environmental demands may be referenced in an attempt to communicate the manner in which this position traditionally is performed.
Whenever necessary to provide individuals with disabilities an equal employment opportunity, Capgemini will consider…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).