Site Reliability Engineer II - Epic
Listed on 2026-04-23
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, Cybersecurity
Job Description
As a Site Reliability Engineer II-Epic, your role is to provide reliability engineering services through observability and performance engineering techniques. Using monitoring and performance tools to deliver detailed feedback to product owners and development teams.
You will partner with Product Owners to define service level objectives and develop service level indicators. Collaborate with cross-functional teams to design, build, automate, and maintain scalable infrastructure.
Your responsibilities will include ensuring high availability, monitoring system performance, and aiding support staff with resolving incidents. This role requires a strong background in scripting, cloud platforms, and a passion for optimizing operational efficiency. You will use Site Reliability Engineering practices to deliver a seamless user experience.
Pay Range$,000, plus yearly bonus. Salary offers are based on a wide range of factors including relevant skills, training, experience, education, and, where applicable, certifications obtained. Market and organizational factors are also considered. Successful candidates may be eligible to receive an annual performance bonus compensation.
RemoteThis position supporting Epic can be remote if not located near a hub within certain criteria.
LocationThis position is hybrid and will require 3 days on site at one of the following Quest sites:
Secaucus, NJ or Schaumburg, IL.
- Day 1 Medical, supplemental health, dental & vision for FT employees who work 30+ hours
- Best-in-class well-being programs
- Annual, no-cost health assessment program
- Blueprint for Wellness
- healthy
MINDS mental health program - Vacation and Health/Flex Time
- 6 Holidays plus 1 MyDay off
- Fin Fit financial coaching and services
- 401(k) pre-tax and/or Roth IRA with company match up to 5% after 12 months of service
- Employee stock purchase plan
- Life and disability insurance, plus buy-up option
- Flexible Spending Accounts
- Annual incentive plans
- Matching gifts program
- Education assistance through MyQuest for Education Career advancement opportunities and so much more!
- Implement and maintain robust observability solutions to monitor system performance, identifying bottlenecks, and ensuring optimal operation.
- Utilize tools to gather, analyze, and visualize key performance metrics.
- Proactively identify and address performance bottlenecks through in-depth analysis and optimization strategies.
- Work closely with development teams to implement performance improvements and enhance overall system efficiency.
- Conduct capacity planning exercises based on observed patterns and future growth projections.
- Collaborate with infrastructure and development teams to ensure adequate resources are available to meet system demands.
- Develop and maintain automation scripts for routine tasks, enabling efficient monitoring and response procedures.
- Implement automated processes for scaling and provisioning resources based on observed workload patterns.
- Document system architecture, configurations, and observability best practices to facilitate knowledge transfer and onboarding for team members.
- Keep documentation up-to-date to reflect changes in the system and its monitoring setup.
- Work closely with software engineers to integrate observability tools into the development lifecycle.
- Provide guidance on building observable systems and assist in instrumenting applications for effective monitoring.
- Stay informed about industry best practices and emerging technologies related to observability and performance engineering.
- Drive continuous improvement initiatives to enhance the reliability and performance of systems.
- Collaborate with security teams to implement monitoring and observability measures that align with security requirements and compliance standards.
- Participate in security incident response activities and contribute to ongoing security assessments.
- Conduct training sessions for team members and other stakeholders on observability tools, best practices, and performance engineering concepts.
- Foster a culture of knowledge sharing within the organization.
And other duties as assigned.
Required Work Experience- 4 plus years of experience with multiple APM tools and extensive experience with Dynatrace
- 3 plus years SRE experience
- Experience in software development, infrastructure, or operations roles
- Certifications in relevant technologies (e.g. AWS, Dev Ops, Kubernetes, Dynatrace, Azure, etc.)
- Working experience building CI/CD pipelines and version control systems
- Working experience with scripting languages (e.g. Python, Bash, Go, etc.)
- Excellent problem-solving and communication skills
- Ability to work collaboratively in a fast-paced, agile environment
- Working experience with Neoload, Jmeter or equivalent performance…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).