Infrastructure Engineer Job Mountain View area,California USA,IT/Tech

What MatX Is Building

We're a small engineering team designing a custom chip. The work is compute-heavy and tooling-heavy: hermetic builds, large verification jobs, custom developer environments, a self-hosted CI fleet, and a steadily growing collection of internal services that engineers depend on every day. The infrastructure that supports all of this - CI/CD, compute, shared file systems, networking, internal tooling - already exists and has a system owner.

We're hiring a second infrastructure engineer to broaden our capacity and add depth in areas adjacent to what we already have.

We're looking for a strong generalist with a network and systems bent. Someone who's comfortable debugging a Linux kernel issue in the morning, untangling a cloud networking problem at lunch, and writing a new MCP server for an unfamiliar protocol in the afternoon.

What You'll Do Here

* Day to day, the work spans:

Linux and networking work (the core of the role)

* Diagnose and fix issues across the OS, network, and cloud stack

* Reason about routing, DNS, firewalls, VPCs, private connectivity, and trust boundaries

* Track down "permission denied" that's actually a mount option, or "build is slow" that's actually a metadata-server timeout

* Improve, harden, and extend the network and host configuration we already have

Building tools and integrations

* Write internal tools, scripts, and small services that make the engineering team faster

* Pick up unfamiliar protocols and codebases and ship working integrations against them

Supporting the infrastructure stack

* Pair with the system owner on compute, CI, shared storage, developer VMs, and the Terraform-managed cloud setup; take ownership of areas as you grow into them, and cover when they're out

* Execute and review production changes carefully - a bad apply can take down the shared file system

Helping engineers

* Onboard new hires and debug their environment problems

* Solve the kind of problems that start with "X is broken" and end with a fix three layers down the stack

Who You Are

* We care more about instincts and pattern recognition than a checklist of tools. The right person has seen enough systems like ours to know which questions to ask

* Deep Linux systems knowledge. You can debug from userspace down to syscalls and routing tables, and you've spent enough time with name spaces, mounts, and process semantics to recognize their failure modes on sight

* Deep networking. VPCs, DNS, firewalls, shared file systems, private connectivity. Has opinions on when to reach for peering vs a private-service endpoint vs an identity-aware proxy vs an overlay network - and can articulate which choices expand the trust boundary and which don't

* Strong generalist instincts. You don't need a paved path to make progress. You'll learn enough of a build system to debug a remote-cache miss, ship a small service against a protocol you've never seen, or read upstream source to verify a claim - preferring the source over the docs when it matters

* Infrastructure-as-code experience on a major cloud. Comfortable in production: reading plans, reasoning about drift, executing migrations without taking the cluster down. We use Terraform on GCP; depth there is a plus, but the principles transfer and we'll happily talk to people coming from AWS, Azure, or other IaC tools

* Conservative about new patterns. When introducing a new module or tool, reads a few siblings first to pick up conventions. Spots and questions inherited patterns that don't apply to the new use case

* Threat-modeling instincts for shared infrastructure. Reasons about who can talk to what, what gets cached and trusted by whom, and the blast radius when something goes wrong. Distinguishes load-bearing security choices from defense-in-depth

* Operational thinking. Reasons about apply ordering, coordination windows, and "what fails first if X is misconfigured"

* Surgical git workflow. Knows the rebase tooling well enough that rewriting a branch isn't scary. Splits unrelated work into separate PRs. Never resorts to --no-verify or destructive shortcuts to make a problem go away

* This is a hybrid role that will require you to work from our Mountain View, CA office 3 days a week on Tuesday through Thursday

Bonus Points If You Have

* GCP depth specifically: IAM, managed compute, identity-aware proxies

* Bazel and remote build/cache internals; buildbarn or equivalent

* Operating batch compute or job schedulers - HPC, Slurm, Nomad, Kubernetes batch, or similar

* Working understanding of token-based auth and cloud identity flows

* Rust or Python scripting for tooling (not product code)

* EDA/semiconductor tool chain familiarity (Synopsys, Cadence)

* Managing fleets at the OS level: policies, images, package distribution

* You don't need to write RTL or understand hardware architect but this is a plus

* You don't need to be a product-software engineer - but you should be able to read a build rule, a Rust error message, or a CI workflow and figure out what went wrong, and write small…