DevOps Engineer
Listed on 2025-12-17
-
IT/Tech
Cloud Computing, Systems Engineer
Full Time Employment with Blitzy. We are committed to building a transformative AI platform that revolutionizes software development. Our goal is to enable you to have a long, impactful career at Blitzy with opportunity for advancement. If you want a role where you can shape the future of AI‑powered infrastructure, read on!
Blitzy is a Boston, MA based Generative AI Start‑up on a mission to automate custom software creation to unlock the next industrial revolution. We’re building an AI‑powered platform capable of autonomously generating enterprise‑grade software, powered by thousands of cooperative AI agents working in concert.
We’re backed by multiple tier 1 investors, have success as founders at our previous start‑up, and hold dozens of Generative AI patents.
Compensation: $140,000 – $180,000/year
Location: 1 Kendall Square, Cambridge, MA (In‑person role)
About the RoleWe’re looking for an exceptional Dev Ops Engineer to architect and maintain the infrastructure that powers our revolutionary AI agent ecosystem. You’ll be instrumental in building scalable, resilient systems that support both our cutting‑edge AI platform and modern applications. This role offers the unique opportunity to work at the intersection of traditional Dev Ops and emerging AI infrastructure, creating systems that enable thousands of AI agents to collaborate seamlessly.
As our Dev Ops Engineer, you’ll take ownership of our entire infrastructure stack, from Kubernetes orchestration to AI agent deployment pipelines. You’ll work directly with our engineering teams to ensure our platform can scale to support enterprise customers while maintaining the performance and reliability they demand.
What Success Looks Like- You architect and implement robust Kubernetes infrastructure that scales effortlessly to support our growing AI agent ecosystem
- You create sophisticated CI/CD pipelines that enable rapid, reliable deployment of both traditional services and AI agents
- You develop Python‑based automation that eliminates manual tasks and accelerates our development velocity
- You design monitoring and observability systems that provide deep insights into both infrastructure and AI agent performance
- You optimize our cloud infrastructure for cost‑efficiency while maintaining enterprise‑grade reliability
- You collaborate effectively with development teams to improve developer experience and productivity
- You proactively identify and resolve infrastructure bottlenecks before they impact customers
- You establish infrastructure best practices that support our rapid growth
- You build systems that can handle the unique challenges of AI workloads at scale
- You maintain 99.9%+ uptime for critical production services
Core Infrastructure:
- Kubernetes cluster design, deployment, and management for AI and application workloads
- Infrastructure as Code using Terraform for multi‑cloud environments
- Container orchestration and optimization for AI agent deployment
- Network architecture and security for distributed systems
Automation & Tooling:
- Python‑based automation scripts for infrastructure management
- Helm chart development and maintenance for application deployment
- Developer productivity tooling and automation
Monitoring & Reliability:
- Comprehensive monitoring, alerting, and tracing systems
- Performance optimization for AI workloads
- Incident response and disaster recovery planning
- Cost optimization and resource management
AI Infrastructure (Unique to Blitzy):
- Infrastructure for AI agent orchestration and management
- Resource optimization for GPU/compute‑intensive workloads
- 5-8 years of Dev Ops/Infrastructure experience
- Expert‑level Python proficiency for automation and scripting
- Deep Kubernetes expertise : deployment, scaling, troubleshooting, and optimization
- Strong experience with Helm for application package management
- Proven track record designing and implementing CI/CD pipelines
- Hands‑on experience with major cloud platforms (AWS, Azure, or GCP)
- Terraform expertise for Infrastructure as Code
- Strong Linux administration and containerization (Docker) skills
- Experience with monitoring tools (Prometheus, Grafana, ELK stack)
- Understanding of…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).