×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer

Job in New York, New York County, New York, 10261, USA
Listing for: Salesforce, Inc..
Full Time position
Listed on 2026-03-07
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, IT Support, Cybersecurity
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Location: New York

Job Title:
Senior Site Reliability Engineer

About the Role:

The Site Reliability Engineering team is part of the Digital Enterprise Technology Platform Engineering organization, responsible for architecting, scaling, and maintaining the IT monitoring and observability ecosystem. You will ensure Enterprise IT services' reliability by driving proactive telemetry strategies and deep-system visibility.

We're looking for a self-starter with the ability to take ownership of tasks, work under pressure, and balance multiple assignments simultaneously while maintaining a positive outlook. You'll lead the evolution of observability frameworks, contribute ideas, and provide feedback on complex monitoring architectures while providing expertise for IT projects and enhancements across various IT organizations.

Responsibilities:
  • Manage, assess, plan, and support core observability platform operations and strategy.
  • Lead process changes and implementations related to the monitoring and logging stack (e.g., Splunk, Grafana, New Relic).
  • Provide escalation support for configuration and platform issues, participating in on-call schedules to resolve major incidents using deep-dive observability data.
  • Collaborate with key stakeholders (Service Managers, Product Managers, Application Architects, Business Support, and Operations) to gather and develop complex monitoring and alerting requirements.
  • Develop AI, automation, and integrations to deliver predictive monitoring and automated anomaly detection.
  • Work with third-party vendors and partners to address platform-related enhancements and evaluate next-gen observability tooling.
  • Support and manage the introduction of new monitoring tools and orchestrate migrations to modern Open Telemetry-based standards.
  • Present reports on Service Level Indicators (SLIs), Service Level Objectives (SLOs), and correlation metrics to the Enterprise Operations team periodically.
  • Work under Agile scrum methodology and provide technical mentorship on observability best practices to junior team members.
  • Create standard operating procedures for monitoring-as-code and share them with the team for effective execution.
Minimum Qualifications:
  • Bachelor's degree in Computer Science or related technical field, or equivalent experience in technical leadership
  • 7 - 10 years of experience designing and implementing distributed systems to handle large-scale telemetry and log data
  • 7 - 10 years of experience building and scaling high-volume observability pipelines.
  • Proven mastery of full-stack observability suites (Splunk, Thousand Eyes, or similar).
  • Direct experience implementing Open Telemetry (OTel) standards.
  • Strong background in "Monitoring as Code" using Terraform or similar automation tools.
  • Demonstrable ability in Bash/Powershell, Python, and JavaScript (NodeJS), especially program comprehension
  • Understanding of REST-based API design principles and best practices
  • Experience with server administration (Linux and Windows)
  • Knowledge of monitoring tools like Zabbix, Splunk, Grafana, New Relic, or Thousand Eyes
  • Experience with AWS public cloud and VMware vSphere
  • Knowledge of configuration management and orchestration tools like Puppet, Ansible, or Terraform
  • Experience with Docker and containerized applications
  • Strong troubleshooting and debug skills (reading log files, analyzing memory leaks)
  • Strong analytical skills and ability to gather and synthesize data for review
  • Ability to problem-solve in a fast-paced environment and shift gears effectively
  • Subject matter expertise in at least one monitoring and telemetry product
Preferred Qualifications:
  • Experience with AI and machine learning applications in operations
  • Experience with predictive monitoring and auto-healing solutions
  • Master's degree in Computer Science or related field
  • Experience translating technical concepts into visual representations
#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary