×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in 500001, Hyderabad, Telangana, India
Listing for: Sonata Software
Full Time position
Listed on 2026-02-04
Job specializations:
  • IT/Tech
    Cloud Computing, SRE/Site Reliability, Data Engineer, AWS
Job Description & How to Apply Below
Role:

Site Reliability Engineer

Location:

Hyderabad

Notice Period:
Immediate to 20 Days

Employment Type:

Full Time

Experience

- 7–12 years in site reliability, cloud-based data infrastructure, data pipeline observability, automation, and high-availability engineering within EdTech platforms (2U)
- Primary Skills (Must-Have)
- AWS, CI/CD, Jenkins, IAAC, Terraform, Kubernetes
- Secondary Skills (Good-to-Have)
- AWS systems;
Dataiku data, Platform updates and patching
- Tools & Platforms
- Data Warehousing & Processing:
Snowflake, Redshift, Apache Airflow, dbt
- CI/CD & Deployment:
Jenkins, Git Hub Actions, AWS Code Pipeline, Terraform
- Cloud & Event Processing: AWS Lambda, API Gateway, SNS/SQS, Kafka, Step Functions
- Monitoring & Logging:
Data Dog, AWS Cloud Watch, Prometheus, Splunk
- Incident Management:
Pager Duty, Opsgenie, AWS Health Dashboard
- Collaboration & Code Review:
Git Hub, Jira, Confluence

Key Responsibilities

Data Pipeline Reliability & Observability:

- Maintain and optimize highly available, fault-tolerant infrastructure for data pipelines, ETL jobs, and real-time data processing

- Implement end-to-end monitoring of Airflow DAGs, Snowflake queries, and AWS-based data workflows

- Automate data pipeline health checks, error handling, and auto-remediation strategies

Infrastructure & Cloud Automation:

- Deploy and manage AWS-based data infrastructure using Terraform and Cloud Formation

- Optimize Kubernetes (EKS) clusters for processing large-scale datasets and real-time analytics

- Ensure high availability and cost-efficient scaling for Redshift, Snowflake, and data storage solutions

Performance, Monitoring & Incident Response:

- Implement real-time monitoring, logging, and alerting using Data Dog, AWS Cloud Watch, and Prometheus

- Define and track SLOs, SLIs, and error budgets to improve data reliability and uptime

- Conduct Root Cause Analysis (RCA), security audits, and post-mortems for incidents

Security & Compliance:

- Ensure GDPR, CCPA, and SOC 2 compliance for data storage, access controls, and retention policies

- Implement AWS security best practices (IAM, KMS, Shield, WAF) to secure data access and encryption

- Secure API gateways, authentication mechanisms, and data lake permissions to prevent unauthorized access

Collaboration & Leadership:

- Work closely with data engineers, analytics teams, and Dev Ops engineers to enhance data platform reliability

- Participate in incident response drills, disaster recovery planning, and security compliance reviews

- Advocate for best practices in automation, cost optimization, and cloud-native data solutions
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary