More jobs:
Software Engineer, Infrastructure
Job in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2026-06-05
Listing for:
COL Limited
Full Time
position Listed on 2026-06-05
Job specializations:
-
Software Development
Software Engineer, DevOps
Job Description & How to Apply Below
Apollo Research's mission is to reduce the risks from scheming frontier AI systems. We work with and are trusted by every frontier AI lab. We test their models before deployment and collaborate with them on scheming mitigations.
We’re looking for a Software Engineer to build the platform that the rest of Apollo runs on. This platform determines how quickly we can take on new research and scale our operations, whilst keeping our most valuable assets secure.
Apollo is fast‑growing and fast‑moving and this role is critical to enabling that growth and strengthening the trust of our partners, which is essential to our mission.
RESPONSIBILITIES- Help set the vision for the platform
. As Apollo takes on greater and more complex research, the platform must keep pace. You’ll talk with researchers and engineers, understand where things are heading, and propose a plan for what we build, why, and in what order. - Build and maintain Apollo's cloud infrastructure
. This means IaC, networking, environment management, observability, and cost control. Everything should be reproducible, auditable, and as automated as possible. - Protect Apollo’s assets. Frontier labs trust us to evaluate their models pre‑release. It is essential to build our infrastructure in such a way that our researchers can do this work securely so that labs continue to trust us. You’ll be a key part of this work, designing and implementing controls to keep these assets safe.
- Keep things running
. We run a collection of services for our team. When something breaks, you’ll be one of the first people they come to for help. You know when a quick fix is fine and when you need to take the time to do it properly. - Create infrastructure for agent tooling
. As more of our team builds and deploys agents, you’ll develop the platform that makes this safe, reliable, and maintainable. You’ll work closely with our security lead to make sure the infrastructure meets the bar our partnerships require.
- Strengthening Security with Agents:
Apollo faces threats every day and we want to make sure our systems are battle hardened. You would build harnesses that use agents to test Apollo’s internal isolation and public facing infrastructure. Finding flaws, reporting and remediating them before an attacker would have a chance. - Observability platform:
Stand up the metrics, logging, and health monitoring layer that any Apollo engineer can plug into and immediately get usage, health, and security signals for their service. - Multi‑cloud GPU orchestration:
The platform that finds and provisions GPUs across providers on demand, so a researcher asking for 10 H200s gets a working cluster with packages, networking and IAM in minutes. - Internal service platform:
All employees are now capable of building services with the help of coding agents. This comes with both massive upsides and security concerns. You would build the paved road, to mitigate the risks, that lets any Apollo engineer, researcher, or employee ship a service. This platform becomes a catalyst for all employees delivering more securely.
- 7+ years of experience in infrastructure, platform, or Dev Ops engineering
- Strong working experience with AWS, Azure, or GCP, in a multi‑account and multi‑project setup.
- Experience with Infrastructure as Code (Terraform, Pulumi, or similar)
- Experience with containerisation technologies like Docker
- Experience designing and owning infrastructure for a growing engineering team, not just contributing to an existing setup
- Experience building systems in programming languages like Python, Rust, Go.
- You've built and operated services that real users depend on, and you've been responsible for their uptime, reliability, and scaling
It will be a bonus if you have the following:
- Exposure to agentic AI workflows or building platforms that support AI/ML workflows
- Familiarity with GPU workloads, ML training infrastructure, or research compute
- Demonstrated interest in AI safety (e.g. worked at an AI safety org, relevant coursework or research)
- This role offers market competitive salary, equity, and competitive benefits.
- Salary: 215k - 265k USD
- Flexible work hours and schedule
- Unlimited…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×