DevOps Engineer Job Cambridge area,Massachusetts USA,IT/Tech

Full Time Employment with Blitzy. We are committed to building a transformative AI platform that revolutionizes software development. Our goal is to enable you to have a long, impactful career at Blitzy with opportunity for advancement. If you want a role where you can shape the future of AI‑powered infrastructure, read on!

Blitzy is a Boston, MA based Generative AI Start‑up on a mission to automate custom software creation to unlock the next industrial revolution. We’re building an AI‑powered platform capable of autonomously generating enterprise‑grade software, powered by thousands of cooperative AI agents working in concert.

We’re backed by multiple tier 1 investors, have success as founders at our previous start‑up, and hold dozens of Generative AI patents.

Compensation: $140,000 – $180,000/year

Location: 1 Kendall Square, Cambridge, MA (In‑person role)

About the Role

We’re looking for an exceptional Dev Ops Engineer to architect and maintain the infrastructure that powers our revolutionary AI agent ecosystem. You’ll be instrumental in building scalable, resilient systems that support both our cutting‑edge AI platform and modern applications. This role offers the unique opportunity to work at the intersection of traditional Dev Ops and emerging AI infrastructure, creating systems that enable thousands of AI agents to collaborate seamlessly.

As our Dev Ops Engineer, you’ll take ownership of our entire infrastructure stack, from Kubernetes orchestration to AI agent deployment pipelines. You’ll work directly with our engineering teams to ensure our platform can scale to support enterprise customers while maintaining the performance and reliability they demand.

What Success Looks Like

You architect and implement robust Kubernetes infrastructure that scales effortlessly to support our growing AI agent ecosystem
You create sophisticated CI/CD pipelines that enable rapid, reliable deployment of both traditional services and AI agents
You develop Python‑based automation that eliminates manual tasks and accelerates our development velocity
You design monitoring and observability systems that provide deep insights into both infrastructure and AI agent performance
You optimize our cloud infrastructure for cost‑efficiency while maintaining enterprise‑grade reliability
You collaborate effectively with development teams to improve developer experience and productivity
You proactively identify and resolve infrastructure bottlenecks before they impact customers
You establish infrastructure best practices that support our rapid growth
You build systems that can handle the unique challenges of AI workloads at scale
You maintain 99.9%+ uptime for critical production services

Areas of Ownership

Core Infrastructure:

Kubernetes cluster design, deployment, and management for AI and application workloads
Infrastructure as Code using Terraform for multi‑cloud environments
Container orchestration and optimization for AI agent deployment
Network architecture and security for distributed systems

Automation & Tooling:

Python‑based automation scripts for infrastructure management
Helm chart development and maintenance for application deployment
Developer productivity tooling and automation

Monitoring & Reliability:

Comprehensive monitoring, alerting, and tracing systems
Performance optimization for AI workloads
Incident response and disaster recovery planning
Cost optimization and resource management

AI Infrastructure (Unique to Blitzy):

Infrastructure for AI agent orchestration and management
Resource optimization for GPU/compute‑intensive workloads

Required Technical Experience

5-8 years of Dev Ops/Infrastructure experience
Expert‑level Python proficiency for automation and scripting
Deep Kubernetes expertise : deployment, scaling, troubleshooting, and optimization
Strong experience with Helm for application package management
Proven track record designing and implementing CI/CD pipelines
Hands‑on experience with major cloud platforms (AWS, Azure, or GCP)
Terraform expertise for Infrastructure as Code
Strong Linux administration and containerization (Docker) skills
Experience with monitoring tools (Prometheus, Grafana, ELK stack)
Understanding of…


Increase/decrease your Search Radius (miles)



Job Posting Language