×
Register Here to Apply for Jobs or Post Jobs. X

Senior Software Engineer, Reliability Engineering

Job in Redwood City, San Mateo County, California, 94061, USA
Listing for: Box
Full Time position
Listed on 2026-05-01
Job specializations:
  • Software Development
    DevOps, Cloud Engineer - Software, Software Engineer
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

WHAT IS BOX?

Box (NYSE:

BOX) is the leader in Intelligent Content Management. Our platform enables organizations to fuel collaboration, manage the entire content lifecycle, secure critical content, and transform business workflows with enterprise AI. We help companies thrive in the new AI‑first era of business. Founded in 2005, Box simplifies work for leading global organizations, including JLL, Morgan Stanley, and Nationwide. Box is headquartered in Redwood City, CA, with offices across the United States, Europe, and Asia.

WHY

BOX NEEDS YOU

The Reliability Engineering team at Box ensures our platform delivers world‑class performance, scalability, and reliability as we continue to serve millions of users worldwide. As our business grows, so does the complexity of operating distributed systems  mission is to proactively identify and solve the hardest reliability and performance challenges across Box’s infrastructure, working closely with product and platform teams to build resilient, scalable, and highly performant services.

As a Senior Software Engineer on the Reliability Engineering team, you’ll have a direct impact on the performance and scalability of our most critical services.

What You’ll Do
  • Partner with product and platform engineering teams to assess service designs for scalability and performance risks, ensuring systems are built for long‑term growth.
  • Analyze production workloads, system metrics, and load test results to identify bottlenecks, resource inefficiencies, and architectural scaling limits.
  • Design and build frameworks for load testing, capacity modeling, and performance validation that enable teams to proactively address scale concerns.
  • Drive improvements in backend service efficiency, API response times, and resource utilization across Box’s globally distributed platform.
  • Collaborate with SRE, infrastructure, and platform teams to optimize scaling strategies, auto‑scaling policies, and resource allocation.
  • Build automation and tooling that integrate performance validation into CI/CD pipelines, enabling early detection of regressions.
  • Participate in root cause analysis of performance‑related incidents, identify systemic issues, and drive cross‑team remediation efforts.
  • Contribute to the evolution of observability standards (SLIs, SLOs, latency/error budgets) that measure and guide service health.
Who You Are
  • 5+ years of experience in software engineering, performance engineering, or site reliability engineering, with a focus on backend systems and scalability.
  • Proficient in one or more programming languages such as Go or Java, with an emphasis on building performant services.
  • Strong understanding of distributed systems, concurrency, resource contention, and efficient system design.
  • Hands‑on experience analyzing and improving application and system performance across compute, storage, database, and networking layers.
  • Familiarity with load testing and performance benchmarking tools (e.g., Locust, JMeter, Gatling, or custom frameworks).
  • Experience working with cloud infrastructure (AWS, GCP) and container orchestration (Kubernetes).
  • Proficient with observability tools and telemetry systems (e.g., Prometheus, Chronosphere, Grafana, Datadog, ELK).
  • Excellent problem‑solving and analytical skills, with a data‑driven approach to diagnosing complex system behaviors.
  • Strong collaboration and communication skills; comfortable partnering across engineering teams to drive reliability improvements.
Preferred Qualifications
  • Experience with service mesh technologies (Istio, Envoy) and cloud‑native networking performance optimization.
  • Exposure to capacity planning, cost optimization, and long‑term resource forecasting in cloud environments.
  • Familiarity with incident response processes, post‑incident reviews, and reliability improvement practices.
  • Experience contributing to internal platforms, developer tooling, or performance automation frameworks.
METHODOLOGY
  • Agile management - Scrum
  • Issue tracking tool - Jira
  • Knowledge repository - Confluence
  • Code reviews - Git Hub Enterprise
  • Version control system - Git
EQUAL OPPORTUNITY

We are an equal opportunity employer and value diversity at our company. We do not discriminate on…

Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary