×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer, Observability

Job in Reston, Fairfax County, Virginia, 22090, USA
Listing for: Sciencelogic
Full Time position
Listed on 2026-01-28
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, SRE/Site Reliability
Job Description & How to Apply Below

See why our AI Platform fuels innovation for top-tier organizations.

From automating workflows to reducing MTTR, there's a solution for your use case.

Catalyze and automate essential operations throughout the organization with these insights.

We’re on a mission to make your IT team’s lives easier and your customers happier.

Senior Site Reliability Engineer, Observability

Reston, VA or Remote

This position can be remote within the U.S.

Who we are...

Science Logic is going through a product transformation and the Site Reliability team is at the forefront of it. We are responsible for the design, deployment,and maintenance of the Cloud Infrastructure used for running the company’s revenue generating go-forward SaaS product line.

Science Logic’s current SaaS product is a single tenancy, highly available and secure platform used by many customers for achieving their AIOps objectives. Cloud Operations leads the SaaS portfolio from the front by onboarding new customers on their own dedicated instance of the product, performing capacity planning, platform maintenance, upgrades, security and triaging incident response for the SaaS platform.

Overall, we’re passionate about automation and solving complex business and technology challenges. Our team combines SRE, Dev Ops, Software Development and Information Security knowledge to help make Cloud operations agile, elastic inside the security and governance framework boundaries. If you are well versed in cloud technologies, have an automation mindset and are ardent follower of the SRE discipline…then our team will be benefited by your skillset!

What

we're looking for...

We’re seeking an experienced Site Reliability Engineer who is passionate about building and owning modern monitoring and observability solutions ’ll play a key role in designing proactive monitoring strategies, defining SLIs/SLOs, automating detection and remediation, and improving platform reliability across our SaaS environment.

The ideal candidate is a hands-on engineer with strong cloud, automation, and scripting experience, deep familiarity with tools like Prometheus, AWS Cloud Watch, and New Relic, and a collaborative mindset. You enjoy solving complex problems, mentoring others, and continuously improving systems before issues impact customers.

What you'll be doing...
  • Be a key contributor on an Agile development team, collaboratively realizing business value through iterative software development lifecycle
  • Build and execute the monitoring strategy for Science Logic SaaS infrastructure
  • Define, deploy, and maintain system and service monitors
  • Be the authority for various monitoring technologies like Prometheus, AWS Cloudwatch, Scylla manager, New Relic to provide next generation monitoring solutions for Science Logic SaaS
  • Employ advanced monitoring practices and technologies to detect and automatically resolve platform issues before they impact the customer’s experience.
  • Participate in architecture and operations reviews
  • Identify and automate measurement of operations SLAs, SLOs using SLIs
  • Triage incident response, document SOPs, Runbooks and train NOC team members
  • Participate in shared on‑call manager rotation for escalations during incidents and outages, occasionally during off hours
  • Provide dash boarding and analytics solutions to internal teams based on requirements
Qualities you possess...
  • 8+ years of software development or site reliability engineering or equivalent experience
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like Cloud Formation, Terraform, CDK, aws-cli
  • Configuration automation using Ansible or equivalent tools
  • Exposure to Windows and Linux administration skills
  • Project management tools like Jira, Trello
  • Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, Dynamo

    DB is desirable
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills
  • 401(k) plan with employer match
  • Flexible Paid Time Off…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary