×
Register Here to Apply for Jobs or Post Jobs. X

Senior SRE

Job in Palo Alto, Santa Clara County, California, 94306, USA
Listing for: Pylonlending
Full Time position
Listed on 2026-01-12
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Systems Engineer
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

At Pylon, we're a small team building a very ambitious product in the mortgage space.

At this early stage, we're looking for engineers who can see the opportunity of what we're building towards and want to have a hand in building it.

We're in search of people who find difficult problems invigorating and who fit well into a high-performing team built on mutual respect and reliance. If you like pushing yourself to learn a massive amount while shipping code that has a huge impact on the end product, Pylon Engineering could be a great place for you.

About the Job

You'll own reliability and operational excellence for Pylon's production systems. This means designing and implementing monitoring, alerting, and incident response processes that scale as we grow. You'll build tooling that makes the entire engineering team more effective, establish on-call rotations and runbooks, and ensure our platform can handle the demands of a regulated, high-stakes financial product.

This is not a pure ops role. At Pylon, we believe SRE work should be a maximum of 50% operational toil. If you're spending more than half your time firefighting and keeping things running, you're not doing SRE work, you're doing sysadmin work. The other 50%+ of your time should be spent writing code: building infrastructure tooling, automating away operational burden, making reliability improvements to core services, and creating internal developer productivity tools that make the entire team more effective.

SRE is about making things better, not just keeping them alive.

We're looking for someone who has operated production systems at scale in a professional engineering environment. You know what good looks like because you've built it before.

What We’re Looking For

Must-haves:

  • 4+ years experience in SRE, infrastructure, or platform engineering roles
  • Experience working on a team of SREs at a company with mature SRE practices (not solo SRE roles)
  • Real on-call experience at scale in a large production environment (you've carried the pager and lived through incidents)
  • Deep AWS expertise (ECS, RDS, networking, security)
  • Strong experience with declarative infrastructure (Terraform, CDK, or similar)
  • Nix experience (we use it and want to expand its adoption)
  • Track record of building reliability tooling and automation
  • Can design and implement monitoring, alerting, and observability systems from first principles
  • Comfortable working in a regulated environment where "breaking things" is not an option

Nice-to-haves:

  • Experience at companies with strong SRE cultures (Google, Replit, Stripe, etc.)
  • Background in fintech, healthtech, or other regulated domains
  • Experience migrating monitoring systems or implementing SLOs
  • Contributions to infrastructure tooling or open source projects
Basics
  • Job title: Senior Site Reliability Engineer
  • Stock options: own a piece of the company and we all win together
  • Health insurance, 401K, dental, etc.
Our technology stack:

We don't require that you've worked with any of these technologies before, this is just our stack for your information:

  • Infrastructure: AWS (ECS, RDS, Cloud Front, Lambda), CDK for infrastructure-as-code
  • Observability: Honeycomb, Open Telemetry
  • CI/CD: Git Hub Actions, Nix for builds and dev environments
  • Core platform: Type Script/Node backend, Postgre

    SQL, React frontend
  • Languages: Type Script, Python, Nix, SQL
About you

You:

Have operated production systems at scale. You've been on-call for a large, complex system. You know what 3am pages feel like and you've built systems to prevent them. You understand the difference between alerts that matter and noise.

Write code, not just YAML. You can build internal tools, automation, and reliability improvements. You're comfortable contributing to the core product when reliability requires it. You can read and understand the codebase you're responsible for keeping up.

Think in systems. You understand distributed systems, failure modes, cascading failures, and graceful degradation. You can diagnose production issues quickly and know when to elevate vs. when to fix.

Know your tools deeply. You've used observability platforms at scale and understand how to instrument systems properly. You can design…

Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary