×
Register Here to Apply for Jobs or Post Jobs. X

AWS Cloud Ops SRE

Job in New York, New York County, New York, 10261, USA
Listing for: Tata Consultancy Services
Full Time position
Listed on 2026-05-16
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 100000 - 120000 USD Yearly USD 100000.00 120000.00 YEAR
Job Description & How to Apply Below
Location: New York

Job Description

AWS Cloud Operations / Site Reliability Engineer (SRE) is responsible for delivering secure, reliable, and scalable cloud infrastructure. This role covers Infrastructure as a Service, AWS platform release activities, AMI lifecycle management, patching, infrastructure design documentation, terraform scripting and maintaining visibility into the application layer and how it functions in production environments. Experience with Harness for Dev Ops pipelines is a strong plus.

Key

and Must Have Skills
  • Terraform IaC as mandatory skills
  • EKS – Container management as mandatory skills
  • Troubleshooting skills during priority incident
  • Base skill preferably from Linux and Windows
Required Qualifications
  • 10+ years in SRE, Cloud Ops, or Dev Ops with heavy AWS experience.
  • Strong hands‑on experience with AWS compute (EC2, ASG, EKS/ECS, Lambda)
  • Networking (VPC, Route 53, SG/NACL, ALB/NLB)
  • Storage (S3, EBS, EFS)
  • Databases (RDS, Aurora, Dynamo

    DB)
  • Expertise in AMI pipeline management, image building, and OS level hardening.
  • Solid experience with Terraform or Cloud Formation for IaC.
  • Demonstrated ability to troubleshoot AWS and application stack issues end‑to‑end.
  • AWS Platform Operations & Releases
  • Own and execute AWS platform release management across environments, including validation, regression checks, and readiness reviews.
  • Operate and evolve AWS core services: VPC, IAM, KMS, Route 53, networking baselines, proxy layers, and organizational guardrails.
  • Infrastructure as a Service (IaS) using Terraform
  • Build, manage, and scale cloud infrastructure using Terraform as primary IaC tooling.
  • Create reusable Terraform modules covering networking, compute, storage, EKS, and security.
  • Ensure IaC follows best practices—versioned, immutable, peer reviewed, and automated through CI/CD.
  • Amazon EKS (Kubernetes) Operations
  • Deploy, manage, and maintain production‑grade AWS EKS clusters, node groups, and cluster add‑ons.
  • Implement Kubernetes platform standards for security, networking, name spaces, RBAC, and secrets management.
  • Work closely with application teams to ensure workloads run reliably and securely within EKS.
  • Optimize cluster scaling, workload scheduling, resource limits, and performance tuning.
  • AMI Lifecycle & Image Management
  • Manage complete AMI lifecycle: creation, CIS hardening, vulnerability scanning, tagging, publishing, and deprecation.
  • Build automated AMI pipelines using image builders, Packer (if applicable), and validation workflows.
  • Maintain golden images for EC2 fleets, containers, and hybrid workloads.
  • VIT (Vulnerability / Integration / Integrity Testing) & Patch Management
  • Lead VIT process including vulnerability assessments, remediation workflows, compliance tracking, and closure.
  • Own OS level and image patching using AWS Systems Manager (SSM) Patch Manager and automated maintenance windows.
  • Generate patch baselines, dashboards, compliance reports, and ensure measurable SLA adherence.
  • Observability & Application Layer Insights
  • Build and maintain observability stack with Cloud Watch, X Ray, Open Telemetry, and log analytics.
  • Establish deep visibility into application behavior, dependencies, performance, and error patterns.
  • Create “golden signals” dashboards covering latency, traffic, errors, and saturation for both infrastructure and applications.
  • CI/CD & Dev Ops Automation
  • Implement and maintain CI/CD pipelines for infrastructure and application deployments.
  • Harness experience is an added advantage, leveraging workflows, verification steps, and deployment strategies (canary, blue/green).
  • Integrate Terraform, AMI pipelines, EKS updates, and patch automation into CI/CD systems.
  • Reliability Engineering & Incident Response
  • Participate in on‑call rotation; lead incident triage and root‑cause analysis.
  • Build automation and runbooks to reduce operational toil.
  • Drive architectural improvements to increase availability, resilience, and performance.
  • Documentation & Architecture
  • Produce high‑quality Infrastructure Design Documents (IDDs), runbooks, DR procedures, release notes, and architectural diagrams.
  • Conduct operational readiness reviews, capacity planning, and cost‑optimization assessments.
Salary Range

$100,000–$120,000 a year

Qualifications

BACHELOR OF COMPUTER SCIENCE

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary