×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in 500001, Hyderabad, Telangana, India
Listing for: NexionPro Services
Full Time position
Listed on 2026-06-06
Job specializations:
  • IT/Tech
    Systems Engineer, IT Support
Job Description & How to Apply Below
SRE Observability Developer

Location:

Hyderabad |  Exp:  5–10 Years |  Focus:  Observability-as-Code & Automation

Role Overview
We are hiring an SRE Engineer to mature the observability and RCA capabilities of our high-scale UPI payment platforms. This is a  hands-on, code-driven role  focused on building reliable telemetry pipelines, transaction correlation, and automated alerting frameworks. You will treat monitoring configurations as code to ensure consistent, scalable operational intelligence.

Key Responsibilities
Telemetry Standardization:  Build and standardize metrics, logs, and traces across app, middleware, and infra layers. Implement  custom tags/attributes  for unified drill-down dashboards.
Transaction Correlation:  Enable correlation for  asynchronous UPI flows  to provide end-to-end visibility across distributed services.
SLO & Alert Engineering:  Define Golden Signals and SLIs for critical journeys (P2P, P2M). Implement  Alert-as-Code  using config-based anomaly detection and noise-reduction logic.
Observability-as-Code:  Automate the provisioning of Grafana dashboards, alert rules, and collector configurations (Otel/Fluentd) using version-controlled scripts.
RCA & Intelligence:  Build RCA-focused views for Redis, Kafka, Yugabyte

DB, and Nginx. Use  synthetic monitoring  and black-box exporters to gain visibility into partially controlled systems.
Operational Integration:  Convert incident learnings into automated telemetry patterns. Embed observability validation into deployment and release workflows.

Must-Have Skills
1. Observability Stack
Expertise:  Prometheus/Victoria Metrics, Victoria Logs/Traces, Open Telemetry (OTel), and Fluentd.
Tooling:  Advanced Grafana, Alert manager, and various infrastructure exporters.
Development:  Ability to develop  Custom Exporters  using Open Telemetry SDKs for unique business/transaction metrics.
2. Systems & Middleware
Knowledge:  Deep understanding of  Redis, Kafka, Nginx, and Yugabyte

DB  (or similar distributed DBs).
App Tier:  Proficiency with JVM/Spring Boot Actuator metrics and asynchronous request/response patterns.
Environment:  

Experience with high-scale, low-latency platforms;  UPI/Payments domain  is highly preferred.
3. Scripting & Automation
Core

Skills:

Strong  Python and Shell/Bash  for automating telemetry validation and collector lifecycle management.
Mindset:  Ability to treat all monitoring assets (dashboards, rules, configs) as code artifacts.

What We’re Looking For
An engineer who sees a dashboard as a product of code, not just a UI task.
Strong debugging skills across complex, on-prem distributed systems.
The ability to bridge the gap between  what happened  and  where the code failed  through advanced correlation.
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary