Sr Site Reliability Engineer - Veza Job Santa Clara area,California USA,IT/Tech

Position: Sr Staff Site Reliability Engineer - Veza
Veza is the pioneer in identity security, purpose-built to answer the fundamental question enterprises face: who can and should take what action on what data. Veza's Access Graph platform maps an organization's entire identity ecosystem across users, groups, roles, policies, permissions, and resources providing deep visibility and control over human, non-human, and agentic identities across SaaS, cloud, on-prem, and custom applications.

() With over 30 billion access permissions under management, global enterprises including Blackstone, Expedia, and Wynn Resorts trust Veza to manage privileged access monitoring, non-human identity security, access entitlement management, and next-generation identity governance. ()

Founded in 2020 and headquartered in Redwood City, California, Veza is now part of the Service Now family, with the acquisition closing in March 2026. The combination brings together Veza's AI-native Access Graph with Service Now's AI Control Tower and agentic workflows, enabling organizations to enforce end-to-end identity security rooted in the principle of least privilege across applications, data, cloud environments, and AI agents.

() For engineers joining Veza today, this means the scale and resources of an enterprise platform company, with the product velocity and mission-driven focus of a security innovator at a pivotal moment in the industry.

** 3 days onsite at the Redwood City office*
* We are seeking an exceptional Sr Staff Site Reliability Engineer to lead critical infrastructure initiatives and drive innovation across our organization. You'll architect scalable solutions, navigate complex technical challenges independently, and deliver results under tight deadlines in a fast-paced environment. You'll work cross-functionally alongside builders who have helped shape the success of companies such as Google, Okta, AWS, and Snowflake.

We are building the next generation identity security platform for the multi-cloud era - will you join us?

** You will:*
* ** Strategic Leadership & Technical Execution*
* + Lead enterprise-wide reliability and infrastructure projects across multiple teams with high autonomy

+ Navigate ambiguous problem spaces and deliver innovative solutions under tight deadlines

+ Architect and deploy solutions for Cloud Prem and SaaS customers at scale

+ Drive technical innovation and establish SRE best practices across the organization

+ Respond to critical incidents, lead root cause analysis, and implement long-term resolutions

+ Develop automation solutions to streamline operations and reduce manual workload

+ Participate in on-call rotation and ensure effective incident handoff and documentation

** Cross-Functional Collaboration & Communication*
* + Partner with Engineering, Product, and Customer Success teams to align reliability goals with business objectives

+ Communicate complex technical concepts effectively to technical and non-technical audiences, including executives

+ Influence technical decisions across teams through thought leadership and demonstrated expertise

+ Build consensus and drive adoption of new tools, processes, and architectural patterns

** Customer-Facing Technical Leadership*
* + Provide tier 2/3 technical support to enterprise customers for complex troubleshooting

+ Work directly with customer technical teams to resolve deployment, configuration, and integration challenges

+ Conduct technical onboarding and provide expert guidance on platform architecture and best practices

+ Create customer-facing documentation, troubleshooting guides, and run-books

+ Lead customer calls and technical discussions as a trusted advisor

+ Team Development

+ Mentor SRE and engineering team members, elevating technical capabilities

+ Foster a culture of reliability, operational excellence, and continuous improvement

** Required Experience*
* + BS degree in Computer Science or related field (or equivalent practical experience)

+ 7+ years in Site Reliability Engineering, Dev Ops, or Infrastructure Engineering

+ Proven track record leading large-scale, cross-team infrastructure projects from conception to production

+ Demonstrated ability to work autonomously on ambiguous projects with tight deadlines

** Technical Expertise*
* + 5+ years with AWS (VPC, EC2, RDS, EKS, Cloud Formation) and cloud automation

+ Expert-level experience with Kubernetes, Helm, Linux, and Terraform

+ Strong experience with Git Ops model, distributed version control, and CI/CD pipelines

+ Proficiency with monitoring tools (Prometheus, Grafana, Data Dog)

+ Strong programming/scripting skills (Python, Go, Bash) for automation

+ Deep understanding of distributed systems, microservices, and reliability patterns

+

Experience with Bazel and Cue Lang a plus

+ Hands-on experience with at least one major compliance framework (SOC 1/2, ISO 27001, FedRAMP Moderate/High) through an audit cycle

** Leadership & Communication*
* + Exceptional ability to articulate complex technical concepts to diverse audiences

+ Track record of driving…