×
Register Here to Apply for Jobs or Post Jobs. X

BXTI, Site Reliability Engineer-VP

Job in Berkeley, Alameda County, California, 94709, USA
Listing for: The Blackstone Group L.P.
Full Time position
Listed on 2026-02-21
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, IT Support, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below
BXTI, Site Reliability Engineer-VP page is loaded## BXTI, Site Reliability Engineer-VP locations:
Berkeley Square House London time type:
Full time posted on:
Posted Todayjob requisition :
41555

Blackstone is the world’s largest alternative asset manager. We seek to create positive economic impact and long-term value for our investors, the companies we invest in, and the communities in which we work. We do this by using extraordinary people and flexible capital to help companies solve problems. Our $1.1 trillion in assets under management include investment vehicles focused on private equity, real estate, public debt and equity, infrastructure, life sciences, growth equity, opportunistic, non-investment grade credit, real assets and secondary funds, all on a global basis.

Further information is available at . Follow @blackstone on , , and .Blackstone’s Site Reliability Engineering team is responsible for improving the reliability of systems and services to meet the needs of the business. This is achieved through collaboration with the development and engineering teams to leverage SRE practices and principles. You’ll have the opportunity to identify and solve new problems as they arise, deploy and maintain observability systems and pipelines, mature the operations and support of services and platforms, and pursue emerging opportunities for efficiency and business value.

This position involves the selection, implementation, and maintenance of key observability tooling. It requires ongoing evaluation of the firm’s needs in observability, monitoring, alerting, resilience, and recovery. We work alongside service owners on design, implementation, and management of services for continuous improvement. We achieve the requisite reliability of services using clear definitions and measurable targets. We plan for and practice recovery from disaster scenarios and respond in real time to incidents.

We guide the postmortem process in order to mitigate risks, prevent future disruptions, and improve the on-call experience. We aim to eliminate manual work, improve operational efficiency, and ensure the high quality outputs in all that we do.
*
* Key Responsibilities:

*** Provide technical leadership in the understanding and adoption of SRE methodologies across the firm
* Incorporating observability standards into code and deployment pipelines.
* Evolving the SRE standards that are adopted across all teams
* Partnering with colleagues in various roles and reporting lines to improve service reliability and operational efficiency
* Assisting developers and engineers directly and through AI assistants.
* Implement instrumentation and provide comprehensive performance insights to service owners
* Ensuring monitoring and alerting that reflects the reliability of services for users and enables effective on-call operations
* Implementing strategic observability tools and working to control overhead in maintenance and cost
* Participate in on-call rotations and respond to system incidents to ensure service availability and minimize operational impact
* Using automation to manage, maintain, and scale SRE systems with minimal human intervention
* Fostering a blameless culture while assisting in postmortem discussions and reporting        
*
* Qualifications:

*** Ability to write automation scripts, as well as read and troubleshoot code (Python, C#, Typescript, etc.),
* Make effective use of coding assistants and chat models (Anthropic, Open AI)
* Proficiency with public cloud providers (strong AWS experience required, preferred Azure experience)
* Configuration as code, infrastructure management, and CI/CD tooling (Terraform, Puppet, Gitlab CI)
* Hand on experience with Docker and container schedulers including AWS ECS & EKS
* Excellent troubleshooting skills for Linux, Windows, and Networking
* Experience with observability tools (Grafana, Prometheus, Splunk, etc)
* Comfortable under pressure with incident management and collaborating during postmortems
* Excellent communication and organizational skills
* Curiosity and drive to improve systems and processes through a sense of shared ownerships

The duties and responsibilities described here are not…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary