×
Register Here to Apply for Jobs or Post Jobs. X

Cloud Site Reliability Engineer; SRE

Job in Berkeley Heights, Union County, New Jersey, 07922, USA
Listing for: The Judge Group
Full Time position
Listed on 2026-06-03
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer
Salary/Wage Range or Industry Benchmark: 70 - 80 USD Hourly USD 70.00 80.00 HOUR
Job Description & How to Apply Below
Position: Cloud Site Reliability Engineer (SRE) (1134534)

Job Title: Cloud Site Reliability Engineer (SRE)

Location: Berkeley Heights, NJ / Alpharetta, GA (Onsite 5 Days)

Duration: Contract To Hire

Salary: $70.00 USD Hourly - $80.00 USD Hourly

Overview

Position Overview:

We are seeking a Cloud Site Reliability Engineer (SRE) to drive the reliability, scalability, and performance of our cloud-based infrastructure. The ideal candidate combines software engineering expertise with advanced systems operations skills to maintain highly available systems while reducing operational toil. This role involves automation, monitoring, capacity planning, incident response, and cloud platform management across a dynamic, distributed environment. As a Cloud SRE, you will work closely with Engineering, Architecture, Dev Ops, and security teams to ensure seamless service experiences for our customers while contributing to platform design and operational efficiency.

Position Requirements:
Our Engineers play a critical role in the success of our clients and are expected to effectively communicate our recommended solutions in a consultative role for each client. Therefore, a successful candidate will possess a high degree of self‑management, personal accountability, strong communication skills, and teamwork. The ability to interact, engineer, and communicate collaboratively at the highest technical levels with customers, vendors, partners, and all members of staff is required.

Key Responsibilities
  • System Reliability & Availability:
    Design and maintain fault‑tolerant, high‑availability architectures across AWS, Azure, and GCP. Implement redundancy, load balancing, and automated failover strategies.
  • Cloud Infrastructure Management:
    Deploy, manage, and optimize cloud resources using IaC tools such as Terraform, Ansible.
  • Monitoring & Observability:
    Implement monitoring, alerting, and logging frameworks using Splunk, Azure Monitor, Dynatrace, AWS Cloud Watch or similar to detect and resolve issues proactively.
  • Incident Management:
    Lead real‑time incident response, root‑cause analysis, and post‑mortems to continuously improve uptime and resilience.
  • Capacity Planning & Scaling:
    Predict traffic patterns, optimize resource utilization, and enforce autoscaling and performance best practices.
  • Automation & Tooling:
    Develop scripts and internal tooling for automating routine tasks to reduce manual intervention. Languages may include Python, Power Shell, or Bash.
  • Security & Compliance:
    Collaborate with security teams to implement secure infrastructure practices including encryption, role‑based access, auditing, and vulnerability management.
  • Collaboration & Mentorship:
    Work across engineering and Dev Ops teams, providing guidance on reliability best practices and mentoring junior SREs.
Required

Skills & Qualifications
  • Programming & Scripting:
    Proficiency in Python, Power Shell, Bash, or equivalent for automation and system management.
  • Cloud Platforms:
    Hands‑on experience with AWS, Azure, or GCP; strong understanding of VPCs, IAM, serverless architectures, and managed Kubernetes services.
  • Containers & Orchestration:
    Experience with Docker and Kubernetes.
  • Infrastructure as Code (IaC):
    Proficient in Terraform, Ansible.
  • Monitoring & Observability:
    Expertise with Splunk, Azure Monitor, Dynatrace, AWS Cloud Watch or similar tools.
  • Expert Knowledge and practical experience using Cloud data migration tools.
  • Operating Systems:
    Advanced knowledge of Windows, Linux/Unix environments, with experience in system administration and networking fundamentals.
  • Incident Response:
    Strong problem‑solving skills under pressure, with experience managing outages and mitigating risk.
  • Collaboration & Communication:
    Ability to articulate technical insights, coordinate across teams, and contribute to a blameless culture to resolve issues and drive consistent results.
  • Preferred Qualifications
    • Industry certifications such as AWS Certified Solutions Architect, Google Cloud Professional Dev Ops Engineer, Azure Dev Ops Engineer.
    • Exposure to chaos engineering or resilience testing frameworks.
    • Prior experience in Multicloud deployments or hybrid cloud environments.
    • Familiarity with SLOs, SLIs, and error budgets for service reliability.
  • Gather feedback from the department on areas of improvement and provide solutions utilizing Azure.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary