Staff Engineer, Cloud Computing
Listed on 2026-05-25
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability
Staff Engineer, Cloud & Platform Engineering (Application Modernization - Agentic AI)
Location: Onsite – Grapevine, TX (5 days/week)
Employment Type: Full-Time
Staff: 9 engineers
About the RoleWe are seeking a Staff Engineer, Cloud & Platform Engineering to lead and influence the modernization of the technology platform, migrating legacy applications to a cloud-native architecture on AWS. This role is hands‑on and strategic, with a strong focus on application modernization, containerization, and cloud-native infrastructure.
As a Staff Engineer, you will play a critical role in modernizing legacy Windows-based applications, building and evolving our application deployment pipelines, and leading migrations to Linux-based, containerized workloads running on AWS EKS. You will partner closely with application teams, Dev Ops, and leadership to ensure platforms are scalable, reliable, secure, and cost‑efficient.
Key Responsibilities Platform & Infrastructure Leadership- Lead the design, optimization, and evolution of AWS-native infrastructure, ensuring scalability, reliability, and security.
- Serve as a technical authority for cloud and platform decisions, aligning infrastructure strategy with business objectives.
- Design, implement, and maintain agentic AI systems for self‑healing workflows, intelligent capacity planning, automated incident response, and infrastructure‑as‑code generation.
- Lead and support application modernization efforts, including refactoring and replatforming legacy applications.
- Drive migrations from Windows-based environments to Linux-based, containerized architectures.
- Partner with application teams to migrate legacy runtimes (including older .NET frameworks) into container-based deployments on AWS EKS.
- Design, manage, and optimize AWS EKS clusters, including capacity management, performance tuning, and cost optimization.
- Implement and maintain scalable deployment patterns using Kubernetes, autoscaling, and modern cloud-native practices.
- Design and evolve CI/CD pipelines using Git Lab and related tooling to support modern application delivery.
- Leverage Terraform and Infrastructure as Code (IaC) to automate provisioning and ensure consistency across environments.
- Utilize Python and other scripting languages to improve automation and operational efficiency.
- Lead investigation and resolution of complex cloud and platform incidents.
- Ensure systems meet or exceed performance, reliability, and SLA expectations.
- Partner cross‑functionally to identify and implement cloud cost optimization strategies, including use of cloud financial management tools.
- Ensure platform designs follow operational best practices and support long‑term maintainability.
- Work independently while collaborating across engineering, application, and leadership teams.
- Mentor engineers and influence best practices across cloud, Dev Ops, and platform engineering functions.
- Senior/Staff-level experience designing and managing large-scale IT infrastructure in cloud environments.
- Agentic AI
Experience:
Hands‑on experience with agentic AI systems for infrastructure automation (AI‑driven runbook execution, LLM-based ops tooling, autonomous remediation pipelines).- Minimum 3 years of production‑grade Generative AI & Agentic AI implementations experience on cloud, devops, SRE & platform engineering.
- Strong hands‑on experience with AWS, including services such as EKS, Lambda, and supporting AWS‑native tooling.
- Deep knowledge of Kubernetes, container orchestration, and cloud-native deployment patterns.
- Proven experience with application modernization and migration projects, particularly replatforming legacy applications.
- Strong experience with CI/CD pipelines, Git Lab, and Dev Ops methodologies.
- Experience using Terraform and Infrastructure as Code to manage cloud environments.
- Solid understanding of Linux‑based environments OR strong Windows-to-cloud migration experience with the ability to operate in Linux-based platforms.
- Exper…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).