More jobs:
Job Description & How to Apply Below
Position: Site Reliability Engineer
Location:
Hybrid 3x's/week in Bengaluru, Karnataka 560064, India
Required Skills & Experience
~3–5 years of experience in a Site Reliability Engineer or similar role (5–8 years total IT experience)
Strong experience working in Azure cloud environments
Hands-on experience with monitoring and observability tools (e.g., Elastic, Prometheus, Grafana, or similar)
Experience supporting production applications (application-focused SRE vs. infrastructure-only)
Working knowledge of Kubernetes (AKS) for monitoring, alerting, or administration
Ability to troubleshoot and debug applications, including reading and understanding code
Familiarity with .NET/C# application environments
Experience with databases (SQL and/or No
SQL such as Cosmos DB, PostgreSQL, etc.)
Nice to Have Skills & Experience
Experience in multi-cloud or hybrid-cloud environments (Azure + AWS/GCP)
Exposure to EKS or non-Azure Kubernetes environments
Experience supporting single-page applications (e.g., Angular)
Background in automation and scripting
Experience working in global or distributed teams
Job Description
We are seeking a Site Reliability Engineer (SRE) to support a modern, cloud-based application platform. This role will focus on improving reliability, scalability, and observability across core systems while helping transition teams from reactive support to proactive engineering practices.
The ideal candidate will bring solid experience in cloud environments, application monitoring, and automation, along with the ability to collaborate closely with development teams and contribute to the maturation of SRE practices.
Key Responsibilities
Support and enhance monitoring, alerting, and observability frameworks across production environments
Troubleshoot production issues and participate in root cause analysis to reduce recurrence
Support cloud-based applications and contribute to platform reliability improvements
Partner with engineering teams to identify and resolve performance, scalability, and resiliency gaps
Automate operational tasks and workflows to reduce manual intervention and improve efficiency
Support and monitor containerized environments (Kubernetes/AKS)
Contribute to incident response processes and help drive improvements in response time and downtime reduction
Assist in establishing and adhering to SRE best practices and standards
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×