Software Engineer - Infrastructure Job San Francisco area,California USA,Software Development

Emergent builds autonomous coding agents that replace traditional software development by generating, testing, and deploying production applications directly from plain-language intent. Our systems run in production at global scale and are used to build millions of real applications.

Since public launch, Emergent has reached $100M ARR in 8 months. 6M+ users across 190+ countries have built 6.5M+ applications on Emergent. We've raised $100M+, backed by Khosla Ventures, Soft Bank, Google, Lightspeed, Prosus, Together, and Y Combinator.

We're solving the hard part of AI-driven software creation: correctness, reliability, security, and scale in real production systems. The team is built by repeat founders, Olympiad medalists, IIT & IIM alumni, and leaders from Google, Amazon, and Dropbox.

We're hiring builders who want ownership, speed, and impact at global scale.

What You'll Be Responsible For Platform & Infrastructure

Maintain stability of our platform consisting of distributed microservices closely interacting with Kubernetes and cloud providers (GCP, AWS)
Manage Kubernetes workloads with ArgoCD (Git Ops) — deploy, monitor, and troubleshoot application syncs, resource trees, and rollouts
Debug and resolve complex Kubernetes issues across clusters
Manage CDN and edge infrastructure (Cloudflare) for performance, caching, and traffic management
Automate infrastructure lifecycle operations and workflows

Observability & Incident Response

Own the observability stack:
Grafana (dashboards, Loki logs, Prometheus metrics),
New Relic (APM, golden metrics, transaction analysis)
Enhance monitoring, alerting, and distributed tracing across services
Participate in on-call rotation via Pager Duty
, handle incident response, and perform root cause analysis
Proactively identify reliability risks before they become incidents

AI Agent Infrastructure

Support the platform that runs AI agent workloads — job scheduling, trajectory tracking, environment provisioning, deployments and cost attribution
Develop Kubernetes controllers and operators to extend platform capabilities for agent orchestration

Collaboration & Internal Tooling

Work closely with product and backend teams to ensure platform scalability and reliability
Build internal tools, automate workflows, and integrate systems to improve team productivity
Stay current with Kubernetes releases, CNCF ecosystem updates, and cloud-native best practices

What We're Looking For Core Requirements

4+ years of software/platform engineering experience with production systems
Strong proficiency in Go or Python
— you write production code in at least one daily
Hands-on experience building and deploying services on Kubernetes
— not just YAML, you've developed something that runs on K8s
Experience with Git Ops tooling (ArgoCD, Flux, or similar)
Strong networking and DNS fundamentals
— TCP/IP, HTTP, load balancing, DNS resolution, TLS, and debugging connectivity issues
Solid Linux/OS fundamentals
— process management, file system, memory, systemd, and comfortable debugging with tools like strace, tcpdump, and netstat

Data & Messaging Infrastructure

Relational databases
— experience with Postgre

SQL, MySQL, or similar; indexing, query optimization, replication, and backup/restore procedures
No

SQL databases
— familiarity with Mongo

DB, Dynamo

DB, Redis, or similar for document/key‑value workloads
Caching
— experience with Redis, Memcached, or similar for application and infrastructure‑level caching
Message queues & streaming
— hands‑on with Kafka, SQS, Rabbit

MQ, or similar for event‑driven architectures
Strong SQL skills for debugging and operational queries

Infrastructure & Observability

Comfortable with the CNCF ecosystem
— Helm, Kustomize, cert‑manager, Ingress controllers, CNI/CSI interfaces
Hands‑on with at least one observability stack (Grafana/Prometheus/Loki, New Relic, Datadog, or similar)
Familiarity with GCP and/or AWS
— managed Kubernetes (GKE/EKS), networking, IAM, storage, and cloud-native services (SES, SQS, S3, etc.)
Experience with CDN/edge platforms (Cloudflare, Cloud Front, or similar)

Nice to Have

Experience building Kubernetes Operators (kubebuilder, operator-sdk, or controller-runtime)
Familiarity with AI/LLM…