×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in 243601, Gurgaon, Uttar Pradesh, India
Listing for: ValueFirst
Full Time position
Listed on 2026-02-17
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, SRE/Site Reliability, IT Support
Job Description & How to Apply Below
About the Job
The Site Reliability Engineering (SRE) team is responsible for ensuring the reliability, scalability, and performance of large-scale telecom and CPaaS platforms. This role combines software engineering and systems operations to build resilient, observable, and automated infrastructure that supports high-throughput messaging services. The team operates in a 24/7 environment and works closely with Engineering, CX and Products to maintain carrier-grade service reliability.

What you’ll be responsible for
Ensure  high availability, performance, and reliability  of CPaaS production systems speread across mutiple locations hosted over cloud and data centers
Own and improve  SLIs, SLOs, and SLAs  for messaging platforms and supporting services.
Monitor system health, latency, TPS, error rates, and delivery metrics using observability tools.
Participate in on-call rotations and handle production incidents with a focus on fast recovery and root cause analysis.
Deploy, configure, and optimize for high-throughput messaging (multiple channels)
Troubleshoot telecom-specific issues including DLR failures, encoding problems, TPS drops   and routing issues.
Work directly with multiple teams for integrations, testing, and incident resolution.
Perform packet-level analysis using  tcpdump and Wireshark  to diagnose network and protocol-level issues.
Write and maintain  shell scripts and automation  to eliminate repetitive operational tasks and reduce human intervention.
Contribute to infrastructure automation using tools like  Ansible  and CI/CD pipelines where applicable.
Improve deployment, configuration, and rollback processes for messaging services.
Design and enhance monitoring, alerting, and dashboards using tools such as Datadog, Site
24x7, ELK and Grafana.
Administer and troubleshoot   Linux based servers in production environments.
Manage and optimize  MySQL and Mongo

DB  databases including performance tuning, backups, and recovery.
Works on API's and webhooks across the product & services. Its enhancements and troubleshooting.
Maintain web and application servers such as  Apache, Nginx, and jboss (Wild Fly)
Support cloud-based and virtualized environments with exposure to auto-scaling and containerization concepts.
Collaborate with engineering teams on  release planning, production deployments, and post-release validation .
Lead or contribute to  incident response & RCA   focusing on long-term reliability improvements.
Track issues, changes, and reliability work using Jira and related tools.

What you’d have
B.Tech / B.E in Computer Science or related field with  2–3 years of experience in SRE, Dev Ops, telecom, or CPaaS operations .
Hands-on experience with  SMS gateways and messaging workflows.
Solid understanding of  Linux systems, networking fundamentals, and production troubleshooting .
Strong experience with  MySQL & Mongo

DB  administration, queries, and performance optimization.
Proficiency in  shell scripting  and a mindset toward automation and reliability engineering.
Hands-on experience with  tcpdump, Wireshark , and protocol-level troubleshooting.

Experience with  monitoring, logging, and alerting systems  (Datadog, ELK, Grafana, Site
24x7, etc.).
Familiarity with  configuration management tools  like Ansible and version control systems (Git).
Working knowledge of  cloud platforms, virtualization, auto-scaling, and containerization .
Strong incident management, analytical thinking, and communication skills.
Certifications such as  RHCE, AWS, or SRE-related credentials  are a plus
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary