Senior Site Reliability Engineer Job Boston area,Massachusetts USA,IT/Tech

Overview

Hybrid (minimum 2 days/week):
Boston, MA Headquarters

Senior Site Reliability Engineer to establish Line Vision's dedicated SRE practice and ensure our grid intelligence platform delivers the exceptional reliability our utility customers depend on. If you are looking to own the development of critical systems observability, deployment processes, and incident response protocols that directly impact grid operations, join us at Line Vision.

What will you do?

Establish and maintain Service Level Objectives (SLOs) and observability frameworks for critical services supporting utility grid operations
Implement CI/CD guardrails including canary deployments, automated rollbacks, and pre-production validation to improve deployment reliability
Develop comprehensive incident response procedures with documented runbooks, escalation paths, and blameless post-incident review processes
Partner with platform, engineering, and customer support teams to instrument systems and build reliability capabilities where they deliver maximum impact
Design and implement monitoring dashboards tracking SLA compliance, reliability metrics, and error budgets

Within the first 3 months:

Complete comprehensive assessment of Line Vision's current infrastructure, identifying critical services requiring immediate observability improvements
Establish baseline SLOs for top-priority services and implement initial monitoring dashboards in partnership with platform and support teams
Document current deployment processes and incident response procedures, identifying gaps and quick-win improvements

Within the first 6 months:

Deploy production-ready observability framework covering all critical customer-facing services, with alerts configured for key reliability signals
Implement CI/CD improvements including automated testing gates, canary deployments, and rollback capabilities for core platform services
Lead 3+ blameless post-incident reviews, establishing templates and processes that become standard practice across engineering

Within the first year:

Achieve measurable improvements in deployment success rates and mean time to recovery (MTTR) through implemented SRE practices
Build strong cross-functional partnerships resulting in proactive reliability improvements identified through error budget monitoring
Establish Line Vision's SRE practice as a recognized capability, with documentation, runbooks, and processes that can scale with company growth

How to succeed in this role
Key Competencies

Critical Thinking
:
Lead problem-solving efforts around complex reliability challenges, consistently applying critical thinking to identify root causes and prevent future incidents
Taking Ownership
:
Lead reliability projects with minimal supervision, taking full ownership of SRE practice development and system observability outcomes
Stakeholder Management
:
Manage relationships across engineering, platform, and support teams, providing clear updates on reliability metrics and leveraging influence to align on SRE priorities
Delivering Innovative Solutions
:
Lead implementation of modern SRE practices, inspiring teams to think creatively about reliability challenges in utility infrastructure context

Essential Skills

AWS Expertise
:
Strong experience with core AWS services including EC2, RDS, Lambda, and networking/VPC configuration for production environments
Observability & Monitoring
:
Hands-on proficiency with tools like Datadog, Prometheus, Grafana, or Cloud Watch for instrumenting distributed systems
Infrastructure as Code
:
Experience with Terraform, Cloud Formation, or Pulumi for managing and versioning infrastructure
Programming
:
Python and Type Script experience for automation, tooling, and system instrumentation
SLO/SLA Frameworks
:
Demonstrated experience establishing Service Level Objectives and tracking error budgets

What Sets the Best Candidates Apart

Background in energy, utility, or critical infrastructure sectors where reliability directly impacts public services
AWS certifications demonstrating deep platform expertise
Experience with security compliance frameworks (NERC CIP, ISO 27001, SOC
2) relevant to utility operations
Track record of building SRE practices from the ground up in fast-growing technical organizations

Interview Process

Apply Online
Round 1:
Phone screen
Round 2:
Hiring Manager Interview
Round 3:
Panel Interviews
- Panel 1:
  Technical competency & experience - AWS architecture, observability tooling, SRE practices
- Panel 2:
  Teamwork, culture fit, and cross-functional collaboration
Final Round:
Leadership Team & Hiring Manager Sign-Off

What does joining Line Vision mean for you?

Impact. Your talent, time, and energy will critically impact our success in accelerating our mission of providing utilities with grid intelligence to enable affordable, reliable power.
Ownership. You will hold broad responsibilities with high autonomy and trust in a communicative, collaborative, and fast-paced environment.
Flexibility. You will be empowered to maintain work-life balance with trust-based PTO…


Increase/decrease your Search Radius (miles)



Job Posting Language