Infrastructure Engineer
Listed on 2026-03-03
-
IT/Tech
Systems Engineer
MatX is designing a custom chip. Our engineering team works in Rust and System Verilog, builds with a hermetic build system, and runs compute-intensive verification workloads on a managed cluster. The infrastructure that supports this work — CI/CD, compute fleet, shared file systems, developer environments — is what you'd own.
This is a small team. There's no ops org, no ticket queue, no on‑call rotation. You'd work directly with engineers who'll tell you "X is broken" or "we need Y" and you'd figure out how to make it happen. You'd use the same tools they do — git, SSH, the same VMs. The current infrastructure was built by engineers who needed it and moved on to other things;
you'd inherit it, clean it up, and make it yours and improve it.
We use AI‑assisted development tools extensively as a force multiplier. You should be comfortable with that or willing to learn.
What You'll Do Here- You'd be the person who makes sure 20+ engineers are set up for success to do their work without thinking about infrastructure. Concretely:
- Manage an HPC‑style compute cluster on GCP — job scheduling, autoscaling, node provisioning
- Provision and maintain developer VMs with consistent tooling, shared storage mounts, and remote desktop access
- Manage shared network storage for home directories, CAD tools, and IP libraries
- CI/CD
- Maintain self-hosted CI runner fleet on GCP (registration, scaling, image management)
- Own the remote build and caching infrastructure
- Debug CI failures that turn out to be infrastructure, not code — runner registration races, mount timing, network conflicts
- Keep the tool stack working: build system, EDA tools on shared storage, license servers, auto mounted shares
- Onboard new engineers onto the development environment
- Solve the kind of problems that start with "my build is slow" and end with tracing a metadata server timeout to a missing link‑local route
- Infrastructure as Code
- All infrastructure is codified and version‑controlled. You'd maintain and extend modules for VMs, networking, fleet policies, IAM, DNS
- Execute migrations: subnet changes, fleet resizing, blue/green cutovers
- Review your own plans carefully — a bad apply can take down the shared file system
- Deep Linux systems knowledge — you can debug from userspace down to syscalls and routing tables
- Infrastructure‑as‑code experience on a major cloud provider (we use GCP, therefore GCP is preferred)
- Comfort with networking fundamentals: VPCs, subnets, DNS, firewalls, shared file systems, SSH tunneling
- Experience managing HPC job schedulers or similar batch compute systems
- Git proficiency — you'll interact with the same repos and PR workflows as the engineering team
- Hands‑on proficiency with core networking and security concepts that influence infrastructure integrity
- Willingness to read code you didn't write to understand what infrastructure it needs
- You don't need to be a software engineer — but you should be able to read a build rule, a Rust error message, or a CI workflow and figure out what went wrong
- This is a hybrid role that will require you to work from our Mountain View, CA office 3 days a week on Tuesday through Thursday
- EDA/semiconductor tool chain familiarity (Synopsys, Cadence)
- Rust or Python scripting (for tooling, not product code)
- Experience with OS‑level fleet management (policies, images, package distribution)
- You don't need to write RTL or understand hardware architect but this is a plus
The US base salary for this full‑time position is determined based on a variety of factors, including role, experience, location, job‑related skills, and relevant education and training. Career length is only a guideline for compensation.
- $200,000 - $300,000 + equity
- A Stake in our success A cash/equity mix that fits your needs, and option to do early exercise
- Health & Wellness Company subsidized Health, Dental, Vision, and Life insurance;
Pre‑tax Health Savings Accounts with generous company contribution (even if you don't) - Time To Recharge 4 weeks paid time off (accrued), 12 company holidays, and 3 weeks remote/flexible work per year
- Support to Parents Up to 12 weeks of paid parental leave, regardless of…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).