Infrastructure Software Engineer Job San Francisco area,California USA,Software Development

Normal Computing | Incredible Opportunities

The Normal Team builds foundational software and hardware that help move technology forward, supporting the semiconductor industry, critical AI infrastructure, and the broader systems that power our world. We work as one team across New York, San Francisco, Copenhagen, and London.

Your Role in Our Mission

We’re looking for an Infrastructure Software Engineer to build the production systems behind Normal’s AI products.

This is an application engineering role focused on infrastructure‑shaped software: orchestration services, execution runtimes, internal APIs, persistence layers, observability, and developer experience. You’ll help define the runtime layer for a new class of AI products: systems where agents execute long‑running work, coordinate across distributed environments, interact with code and tools, and need to be reliable enough for real customer workflows.

This role sits between product engineering, AI engineering, and platform engineering. You will not primarily be managing Terraform, Helm charts, CI/CD, or company‑wide SaaS infrastructure. Instead, you’ll own the application‑level infrastructure that powers long‑running AI workflows: session lifecycle, sandboxed execution, workload orchestration, persistence, observability, reliability, and the internal interfaces other engineers build on.

The systems you build will be used directly by product, AI, research, and platform teams as new capabilities move from early ideas into production. Developer experience matters: APIs should be understandable, failure modes should be debuggable, and abstractions should make the right thing easy.

This is a highly cross‑functional role for someone who enjoys ambiguity, cares about clean abstractions, and wants to help shape how a frontier AI company builds and operates production systems. Strong engineering judgment and ownership matter more than rigid specialization.

On any given day, you might design the runtime architecture for a new AI product capability, build the orchestration layer for long‑running autonomous workflows, improve how workloads are scheduled and isolated across distributed environments, or create the systems abstractions that let engineers turn ambitious AI prototypes into reliable production products.

Responsibilities

Build and maintain production software infrastructure for Normal’s AI products, especially orchestration, execution, and runtime systems.
Design internal backend services and APIs used by product engineers, AI engineers, execution services, and other internal systems.
Improve the operational maturity of rapidly evolving systems through better state management, failure handling, metrics, tracing, and debugging tools.
Work with Kubernetes‑backed execution environments, including container lifecycle, scheduling behavior, autoscaling, resource isolation, and runtime reliability.
Build developer‑facing tools and abstractions that make it easier for other engineers to use and extend the systems you own.
Turn promising prototypes into durable production systems by designing clear abstractions, hardening critical paths, and creating operational patterns that scale with the product.
Collaborate closely with product, AI, research, and platform engineers to define the right interfaces between product features, AI workloads, and production infrastructure.
Lead design discussions for core runtime and orchestration systems, including API boundaries, state management, execution models, and operational tradeoffs.

What We’re Looking For

4+ years of experience in infrastructure software, backend infrastructure, production infrastructure, platform engineering, distributed systems, or related areas.
Strong software engineering fundamentals, including backend programming, APIs, data modeling, concurrency, debugging, and testing.
Experience building or operating production services where reliability, observability, and maintainability matter.
Practical experience with Docker and Kubernetes, including debugging containerized workloads, deployments, networking, resource limits, and lifecycle issues.
Comfort working with persistence systems such as Postgres,…