SM_ SRE Job Pune area,Maharashtra India,IT/Tech

About

The Role

Site Reliability Engineer is one of the critical role in the technology team and the person working in this team will be responsible for application performance, availability, reliability and system uptime. Candidate is responsible to provide consultation and strategic recommendations by quickly assessing and remediating complex platform availability issues. Site Reliability Engineer LEAD will dive head-first into creating or applying innovative solutions and techniques that advance the reliability of Digital products.

Key Responsibilities

Installation/deployment of new releases , environments for applications.
Build and maintain highly scalable, large scale deployments globally
Co-Create and maintain architecture for 100% uptime.

E.g. creating alternate connectivity.
Practice sustainable incident response/management and blameless post-mortems.
Monitor and maintain production environment stability.
Own entire platforms (prod environments) Deploying, automating, maintaining and managing production systems, to ensure the availability, performance, scalability and security of productions systems
Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation and refinement.
Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
Collaborate with Agile teams in defining technical requirements and best practices with containerized and cloud-native applications
Represent production support and site reliability in stand-ups, planning sessions, code reviews, and architecture reviews
Help evolve our configuration management (CM) efforts and our move to containers
Help the operations head in selecting the enthusiastic and technically knowledgeable team and guide the existing team members.

Skills Required

Should have good knowhow of application, middleware, Databases (posgres, mongo, mysql etc.), infra, OS.
Should have good understanding in Docker and Kubernetes.
Should have an understanding of CI/CD and Dev Ops tools like Jenkins, Ansible, Shell scripting etc
Monitoring and Logging:

Experience with monitoring and logging tools (e.g. Nagios / appdynamics, ELK, Prometheus).
Good

Experience of distributed systems Rabbit

MQ, Kafka, Redis etc.
Should have an experience of working on Linux, Weblogic/tomcat, Jboss and middleware technology.
Should have worked on high traffic & highly scalable systems in past
Knowledge on fundamental aspects for release automation (packaging, dependencies, promotion, deployment, compliance)
Experience on project management tools such as JIRA and insight on quality analysis as well #BAL