More jobs:
Job Description & How to Apply Below
Agile Engine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.
WHY JOIN US
If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you!
ABOUT
THE ROLE
We are looking for a Senior Site Reliability Engineering to strengthen our platform reliability and observability capabilities. You will own the design and operation of monitoring infrastructure — including Datadog APM, alerting, and distributed tracing — across Kubernetes-based microservices on AWS. The role spans backend engineering and SRE practice in roughly a 65/35 split, with direct involvement in CI/CD integration and observability automation.
You will also support internal teams in adopting monitoring best practices as we modernize our R&D platform.
WHAT YOU WILL DO
- Design, build, and maintain scalable backend and platform components;
- Implement and manage observability solutions across distributed systems;
- Configure dashboards, alerts, and APM for tracing, metrics, and logging;
- Monitor and improve system reliability, scalability, and performance;
- Deploy, operate, and maintain services in Kubernetes environments;
- Integrate observability tools into CI/CD pipelines and cloud infrastructure;
- Automate monitoring and operational workflows using scripting;
- Provide operational and training support for observability platforms, especially Datadog;
- Collaborate with engineering teams to improve system visibility and reliability practices.
MUST HAVES
- 4+ years of experience with Python, Node.js, or Java;
- Hands-on experience with API integrations;
- Strong experience in Kubernetes environments;
- Experience with Datadog or similar tools such as Prometheus and Grafana;
- Ability to configure dashboards, alerts, and APM;
- Experience monitoring containerized and microservices architectures;
- Hands-on experience with AWS;
- Experience integrating observability tools into cloud environments;
- Experience with CI/CD integrations for observability;
- Ability to automate monitoring and operational tasks using scripting;
- Upper-intermediate English level.
NICE TO HAVES
- Experience owning and operating an internal engineering platform, especially observability platforms;
- Demonstrated ownership of reliability, scalability, and performance;
- Ability to proactively lead maintenance and platform improvements;
- Experience installing and configuring Datadog agents and integrations;
- Experience managing API keys and secure configurations;
- Experience managing user roles and access controls;
- Familiarity with Go (Golang);
- Experience with additional observability tools such as New Relic, Dynatrace, Elastic Stack, or Splunk.
PERKS AND BENEFITS
- Remote work & Local connection: Work where you feel most productive and connect with your team in periodic meet-ups to strengthen your network and connect with other top experts.
- Legal presence in India: We ensure full local compliance with a structured, secure work environment tailored to Indian regulations.
- Competitive Compensation in INR:
Fair compensation in INR with dedicated budgets for your personal growth, education, and wellness.
- Innovative Projects: Leverage the latest tech and create cutting-edge solutions for world-recognized clients and the hottest startups.
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×