Cloud Systems Engineer Job Reading area,Pennsylvania USA,IT/Tech

Overview

This is a hybrid role - 2 days remote and 3 days in the Malvern, PA office.

We are seeking a highly skilled Site Reliability & Cloud Systems Engineer to design, build, and operate scalable, secure, and highly automated cloud platforms in AWS. This role combines hands‑on reliability engineering with cloud architecture and automation expertise, with a strong emphasis on building immutable infrastructure and improving system resilience.

You will play a key role in evolving our AWS ecosystem into a “push‑button” platform—reducing manual operations, embedding security into every layer, and ensuring production systems are observable, performant, and self‑healing. This role is well‑suited for a proactive engineer who excels at the intersection of infrastructure, automation, and system reliability, blending responsibilities across SRE, Dev Ops, and Cloud Engineering.

Responsibilities Reliability, Performance & Operations

Ensure uptime, reliability, and performance of AWS‑hosted, Linux‑based (Ubuntu) production systems and associated lower environments
Build and optimize observability using tools like Datadog, Cloud Watch, Prometheus/Grafana, and Pager Duty
Working closely with the Dev teams, you will be diagnosing site issues, mitigating impact, and restoring system reliability while communicating clearly with stakeholders
Lead incident response, root cause analysis, and post‑incident reviews
Participate in on‑call rotations and support 24/7 production environments

Cloud Architecture & Automation

Architect and implement fully automated, fleeting, and immutable AWS production and lower environments
Design scalable, resilient distributed systems using AWS best practices
Eliminate manual processes through Infrastructure as Code (Terraform, Ansible, Packer)
Build and maintain CI/CD and Git Ops workflows (Jenkins, Git Hub Actions, Git Lab CI, ArgoCD/Flux)
Develop automation and tooling using Python and Bash to reduce operational toil

Infrastructure & Platform Engineering

Deploy and manage AWS services including EKS, ECS, Fargate, Lambda, and RDS (Aurora Postgre

SQL), Open search, Redis, Elasticache
Design and manage networking components such as Transit Gateways, load balancers, and service meshes
Implement caching, microservices, and distributed system design patterns

Security & Governance

Architect and implement zero‑trust security models using IAM, SCPs, and OIDC
Embed security into CI/CD pipelines using SAST/DAST tools (e.g., Snyk)
Ensure compliance through automated auditing, backup strategies, and governance controls

Collaboration, Leadership & Strategy

Partner with development, security, and operations teams to build reliable, observable platforms
Document systems, runbooks, and operational procedures
Drive Fin Ops initiatives for cost optimization and forecasting
Integrate infrastructure changes into ITIL‑compliant workflows (e.g., Fresh service)
Influence architectural decisions and promote engineering best practices across teams

Qualifications

6–10+ years of experience in Site Reliability Engineering, Dev Ops, or Cloud Engineering roles
Deep hands‑on expertise with AWS services and cloud architecture
Strong Linux systems engineering experience (Ubuntu preferred)
Proven experience with Infrastructure as Code (Terraform, Ansible, etc.)
Experience building and maintaining CI/CD pipelines
Proficiency in scripting/programming (Python, Bash)
Hands‑on experience with monitoring and observability platforms
Solid understanding of cloud security principles (IAM, KMS, Secrets Management, Ansible Vault, Hashicorp Vault)
Bachelor’s degree or equivalent practical experience

Preferred Qualifications

Experience with containerization and orchestration (Docker, Kubernetes, EKS/ECS)
Familiarity with Git Ops tools such as ArgoCD or Flux
Experience with SAST/DAST tools and secure SDLC practices
Knowledge of distributed systems, caching, and microservices architectures
Experience with Fin Ops and cost optimization strategies
Exposure to ITIL processes and service management platforms

#J-18808-Ljbffr