Platform Engineer
Listed on 2026-06-13
-
Software Development
AWS
Location: San Francisco, CA on site
Compensation: $200,000 to $250,000 base salary, plus bonus and equity
OverviewShields Group Search is partnering with a fast-growing, Series A AI infrastructure company building the connective layer between AI agents and the tools people use every day, including Git Hub, Gmail, Notion, Salesforce, and more.
The company is building core infrastructure that allows agents to safely and reliably communicate with external tools, execute workflows, manage authentication, run code, trigger actions, and operate across real‑world software environments.
They recently raised a $25M Series A from top‑tier investors and have seen rapid revenue growth, with customers ranging from early AI‑native startups to major technology companies.
This role is for a hands‑on Site Reliability Engineer / Platform Engineer who can help scale, harden, and own the company’s infrastructure as usage grows. The team is looking for someone with real production experience managing cloud infrastructure, reliability, observability, deployment systems, and high‑availability backend services.
This is an individual contributor role. Management experience is not required.
The ideal candidate has hands‑on experience across SRE, Dev Ops, backend engineering, infrastructure engineering, cloud platforms, distributed systems, and performance optimization. They should be comfortable owning infrastructure in a fast‑moving startup environment and should have evidence that they build, experiment, and go deep outside of assigned work.
What You’ll Do- Own reliability, scalability, observability, and performance across core production infrastructure
- Manage and improve infrastructure across cloud platforms such as AWS, Vercel, and related systems
- Build and improve the platform infrastructure supporting AI agent workflows, tool execution, authentication, triggers, APIs, sandboxes, and runtime orchestration
- Design and operate reliable backend systems that interact with many third‑party tools and APIs
- Improve infrastructure supporting high‑throughput, distributed, cloud‑native services
- Work across cloud infrastructure, Linux systems, containers, deployment pipelines, service orchestration, CI/CD, and observability tooling
- Build automation that reduces operational burden and improves incident response
- Develop internal productivity tooling, runbooks, monitoring, alerting, dashboards, and reliability workflows
- Debug complex production issues across application, infrastructure, network, database, deployment, and runtime layers
- Improve system performance through tracing, profiling, database query optimization, workflow optimization, CPU/heap profiling, and deep root‑cause analysis
- Help manage and improve multiple execution environments, including serverless runtimes, sandboxed code execution, and related backend systems
- Partner closely with product engineers and customers to support important workloads and improve the platform in the process
- Write clear documentation that explains complex systems, operational patterns, and infrastructure decisions
- Help define the reliability culture, infrastructure standards, and technical bar for a small, high‑craft engineering team
- 4+ years of software engineering, site reliability engineering, infrastructure engineering, Dev Ops, platform engineering, or distributed systems experience preferred, but not a hard requirement for exceptional candidates
- Hands‑on experience managing production infrastructure across cloud environments
- Experience with AWS, Vercel, Kubernetes, Linux, containers, deployment systems, observability tools, or similar infrastructure
- Strong backend engineering fundamentals and ability to write production‑quality code
- Experience with monitoring, tracing, logging, alerting, incident response, and system performance
- Experience scaling and operating distributed systems, microservices, APIs, databases, queues, or high‑throughput backend services
- Ability to debug hard production issues across many layers of the stack
- Strong systems thinking and ability to understand how infrastructure, application code, databases, deployments, and customer‑facing workflows interact
- Ability…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).