AI Platform Engineer
Listed on 2026-06-13
-
Software Development
AWS, Cloud Engineer - Software, DevOps
What We Do
is the go-to eCommerce platform for auto care and maintenance. We provide drivers with quality parts at competitive prices and enable them to schedule appointments with trusted mechanics directly through our website. Using world‑class design principles and the latest technologies, we deliver a fast, intuitive digital experience backed by our company‑owned national distribution network.
Our CultureAt , our culture goes beyond our core values of Safety First, Customer Focused, and Commitment to Excellence. We are a performance‑driven, data‑focused, and fast‑paced team where results matter and winning is expected.
- Hungry & Hardworking:
We set ambitious goals, measure progress with clear metrics, and hold ourselves accountable to deliver results. - Promote from Within:
We reward top performers with opportunities for growth and advancement. - Collaborative & In‑Person:
We believe the best ideas and fastest execution happen face‑to‑face. - High Standards:
We move quickly, pay attention to details, and dig deep – whether it’s analyzing contracts, aggregating complex scenarios, or building clear, data‑driven presentations. - No Passengers:
We value grit, ownership, and the relentless pursuit of results
One exceptional engineer. AI as the team.
This is not a standard Dev Ops posting. We are looking for one unusually capable, AI‑native engineer to own our entire platform engineering and SRE function – using autonomous agents, LLM‑powered pipelines, and MCP‑based tooling as force multipliers to do the work of a team, on‑site, in close partnership with our engineering leadership.
You will inherit a mature, fully containerized AWS estate (9 EKS clusters, 27 accounts, 228 Kubernetes nodes), an Akamai CDN layer managing live traffic splits, Git Hub Actions + Jenkins CI/CD pipelines for a Webpack 5 micro‑frontend monorepo, and an operational AI agent platform – Ops Whisperer – already in production monitoring 25 AWS accounts with a 91% autonomous resolution.
Your job is to extend all of it, automate what remains manual, and be the person who makes every deployment, incident, and infrastructure change happen with speed, precision, and intelligence.
Scope of Ownership What You’ll Own AWS Multi‑Account Infrastructure- EKS clusters across dedicated AWS accounts
- EC2 worker nodes via Auto Scaling Groups
- SQS pipelines
- AWS Bedrock (Claude) for AI agent workloads
- EKS clusters
- Node group management
- Kops clusters alongside EKS
- Multiple environment tiers with full blast‑radius isolation
- Multiple Repos
- Git Hub Actions workflows + Jenkins pipeline management
- Turbo build system across multiple micro‑frontend packages
- Canary release gating and rollback automation
- Akamai Property Manager config
- Phased Release Cloudlet for Canary and Production split
- Security, throttling and monitoring
- Jenkins‑driven cache invalidation
- Elastic/Kibana
- Cloud Watch across all AWS accounts
- Business performance monitoring
- SQS backlog + pipeline health alerting
- On‑call ownership, proactive, AI‑assisted triage
This is a role where AI fluency is not a bonus – it is how you do the job. We expect you to build, operate, and improve autonomous agents that handle monitoring, alerting, triage, and routine operational work. You are not just a consumer of AI tools; you are the person who builds them, deploys them into production, and iterates on them based on real operational data.
You will extend Ops Whisperer (AI Platform and Observability agent), contribute to the Axle platform, build MCP servers that give agents new capabilities, and apply LLM‑powered reasoning to infrastructure problems that previously required multiple humans. If you’ve never built an agent that runs in production unsupervised, this is not the right role.
What You’ll Inherit & Extend The tech stack Cloud & Orchestration- AWS EKS
- Kubernetes
- Kops
- AWS Organizations
- Auto Scaling Groups
- AWS SQS
- AWS Bedrock
- Cloud Watch
- Akamai Property Manager
- Phased Release Cloudlet
- Fast Purge
- Content Protector
- Git Hub Actions
- Jenkins
- Turbo (monorepo)
- Webpack…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).