×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer; Production Excellence

Job in Redwood City, San Mateo County, California, 94061, USA
Listing for: Poshmark
Full Time position
Listed on 2026-02-16
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Position: Senior Site Reliability Engineer, (Production Excellence)

About Poshmark

Poshmark is a leading fashion resale marketplace powered by a vibrant, highly engaged community of buyers and sellers and real-time social experiences. Designed to make online selling fun, more social and easier than ever, Poshmark empowers its sellers to turn their closet into a thriving business and share their style with the world. Since its founding in 2011, Poshmark has grown its community to over 130 million users and generated over $10 billion in GMV, helping sellers realize billions in earnings, delighting buyers with deals and one-of-a-kind items, and building a more sustainable future for fashion.

For more information, please visit , and for company news, visit

Senior Site Reliability Engineer (Production Excellence)

We are looking for a Senior Site Reliability Engineer to serve as the guardian of our complex, web-scale ecosystem. You won't just be "managing" systems; you will be the architect of their health, ensuring they are monitored, automated, and designed to scale flawlessly. The ideal candidate is an SRE purist who believes that automation is the antidote to toil and that deep application knowledge is the key to operating large-scale systems.

6-Month Accomplishments

  • Audit & Observe: Deep-dive into the Poshmark tech stack and infrastructure requirements.
  • Automate Toil: Master and improve existing automation tools/frameworks within the Cloud Ops organization.
  • Primary Integration: Transition from secondary on-call support to a primary contributor on small to medium-scale architectural projects.

12+ Month Accomplishments

  • System Ownership: Execute complex communications and infrastructure projects independently.
  • Precision Alerting: Engineer meaningful alerts and high-fidelity dashboards that reduce "alert fatigue" and focus on system health.
  • Architectural Evolution: Identify systemic gaps and lead the implementation of infrastructure improvements to bolster uptime.
  • Incident Leadership: Serve as a core pillar of the on-call rotation, leading incident response and blameless post-mortems.

Responsibilities

  • Serve as the primary point of accountability for the health, performance, and capacity of mission-critical, internet-facing services.
  • Partner with development teams beginning at the design phase to ensure all platforms are built with "operability" and "recoverability" at their core.
  • Improve and exchange tools that automate the deployment and monitoring of custom applications in large-scale UNIX environments.
  • Thrive in a fast-paced environment where you bridge the gap between "moving fast" and "staying up"
  • Participate in a structured 12x7 on-call rotation designed to maintain 24/7 support for production environments.

Desired Skills

  • Battle
    - Proven Experience:

    5–8+ years in a Systems Engineering or Site Reliability role, specifically within a startup or fast-growing environment.
  • Scale Mastery: Proven track record in a UNIX-based, large-scale web operations role.
  • Production Support Mindset: Extensive experience providing 24/7 support for high-traffic production environments.
  • Cloud Architecture: Expert-level experience with AWS, GCP, or Azure.
  • The SRE Toolkit:
    • CI/CD & Config: Jenkins, Ansible, and Terraform.
    • Observability: Hands-on experience with Datadog, New Relic, Graphite, or Nagios.
    • Orchestration: Deep knowledge of Kubernetes, Docker
    • Code: Strong scripting/coding skills used for infrastructure-as-code and automation.

Technologies we use:

  • Languages/Servers: Ruby, JavaScript, Node.js, Tomcat, Nginx, HAProxy.
  • Data & Messaging: Mongo

    DB, Rabbit

    MQ, Redis, Elastic Search.
  • Infrastructure: AWS (EC2, RDS, Cloud Front, S3), Kubernetes, Docker.

Note:
1) Poshmark is currently unable to provide visa sponsorship for this position.
2) This is a hybrid role based out of Redwood City, CA.

#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary