×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in 243601, Gurgaon, Uttar Pradesh, India
Listing for: ITC Infotech
Full Time position
Listed on 2026-02-17
Job specializations:
  • IT/Tech
    IT Support, Cloud Computing, Systems Engineer
Job Description & How to Apply Below
Key Responsibilities
Platform Design & Architecture
Define and evolve the architecture of observability platform, integrating logs, metrics, traces, events, and alerts
Establish reference implementations and patterns for integrating observability into cloud-native and monolithic applications
Evaluate and integrate best-in-class tools for telemetry (e.g., Open Telemetry, Prometheus, New Relic, Grafana, Elastic, Splunk, etc.)
Governance & Standards
Define enterprise-wide observability standards and maturity models (instrumentation guidelines, SLOs/SLIs, retention policies)
Drive instrumentation consistency across services through libraries, SDKs, and developer onboarding assets
Embed observability standards into CI/CD pipelines, golden paths, and developer enablement frameworks
Platform Engineering & Operations
Build and maintain core observability infrastructure as internal platform services
Ensure observability platform is highly available, scalable, cost-optimized, and compliant with governance controls
Automate provisioning, onboarding, alerting configuration, and tenant lifecycle management for internal teams
Developer Enablement & Integration
Create self-service capabilities for developers and SREs:
Instrumentation kits
Dashboards and alert templates
Troubleshooting guides and observability sandboxes
Collaborate with Developer Experience and Platform teams to embed observability into the developer workflow and developer portal (Velocity)
Adoption & Support
Lead and support migration and onboarding efforts for application teams
Partner with GPS, ISS, and platform teams to define key use cases and integration paths
Define telemetry baselines and observability KPIs for portfolio-level measurement

Required:

6+ years of experience in Site Reliability Engineering, Platform Engineering, or Dev Ops roles
Deep understanding of observability concepts (logs, metrics, traces, events, SLOs, SLIs, RED/USE models)
Hands-on experience with one or more tools in the observability stack (Grafana, Elastic, Prometheus, Splunk, Datadog, Open Telemetry)
Strong scripting or automation skills (Python, Go, Bash, Terraform, etc.)
Familiarity with Kubernetes, container orchestration, and cloud-native environments (AWS/Azure)
Preferred:
Experience designing or operating an enterprise-wide observability platform
Exposure to multi-tenant observability systems, billing or usage metering
Knowledge of developer experience workflows and developer portals
Previous work with standards enforcement and governance-as-code
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary