DevOps Engineer; AWS
Listed on 2025-12-01
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, Cybersecurity
About Us and the Role is a fast-growing software development company delivering innovative solutions to clients around the world, with a strong focus on the US market. Headquartered in sunny San Diego, California, we’ve built a team of over 300 talented professionals across multiple countries. Our success comes from a deep commitment to results and long‑term partnerships built on trust. We’re launching a transformative partnership with a new US‑based client in the data quality and enrichment space who is embarking on a bold modernization initiative.
This client is transitioning from a fragmented legacy ecosystem to a fully cloud‑native, scalable platform built on AWS. The transformation is not just technical – it’s strategic, touching every layer of infrastructure, product delivery, and operational efficiency. As a Dev Ops Engineer, you’ll be at the center of this journey, helping architect the systems and practices that will support long‑term growth, automation, and innovation.
You’ll play a critical role in designing and implementing the infrastructure and Dev Ops strategy that enables this transformation. This includes solving deep‑rooted challenges around scalability, release orchestration, cost optimization, and environment consistency – while laying the foundation for a secure, observable, and automated cloud environment. This is a hands‑on role with strategic influence, ideal for someone who thrives in high‑autonomy environments and enjoys solving complex infrastructure challenges.
Stack Snapshot
- Cloud Provider: AWS
- Compute & Orchestration: ECS, Fargate, Elastic Beanstalk, Lambda, EC2 (legacy)
- Storage & Data: S3, RDS (Postgre
SQL), Aurora - IaC:
Terraform (selective use, expanding) - Containers & Configuration: ECR, App Config
- Messaging & Coordination: SNS, SQS
- Monitoring & Logging:
Cloud Watch, RDS Console - CI/CD:
Git Hub Actions, manual deployments (transitioning to full automation)
- Design and provision isolated environments for development, QA, staging, and production using AWS best practices.
- Standardize infrastructure provisioning using Terraform, ensuring consistency and version control across services.
- Improve IAM role management and automate access provisioning to support secure and flexible operations.
- Define ownership and review protocols for infrastructure changes and environment templates.
- Architect and implement unified CI/CD pipelines across a diverse service landscape using Git Hub Actions.
- Integrate automated testing, linting, security scanning, and deployment validation into every pipeline.
- Formalize and enforce a branching strategy to improve release management, collaboration, and CI/CD stability.
- Introduce rollback strategies (e.g., blue/green or canary deployments) to support safe and resilient releases.
- Establish isolated QA environments to support pre‑production testing and reduce deployment risk.
- Implement auto‑scaling policies based on real‑time load metrics using ECS, Fargate, and Lambda.
- Conduct performance simulations to validate scaling behavior and forecast capacity needs.
- Address architectural bottlenecks, particularly in the database layer (e.g., write throughput, replication latency).
- Define and monitor non‑functional scalability requirements (NFRs) and continuously improve system responsiveness.
- Introduce full‑stack observability tools (e.g., Datadog, AWS X‑Ray, New Relic) for distributed tracing and performance insights.
- Implement centralized logging using ELK stack or Cloud Watch Logs Insights.
- Define and monitor SLOs/SLIs for critical services and set up alerting and dashboards for ingestion pipelines and APIs.
- Participate in incident response and postmortems, driving continuous improvement in system reliability and recovery.
- Enforce secure configuration management and secrets handling across environments.
- Support SoC2 and GDPR compliance efforts through infrastructure‑level controls and audit readiness.
- Evaluate and implement network segmentation, VPC isolation, and firewall…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).