Senior Site Reliability Engineer
Job in
Boston, Suffolk County, Massachusetts, 02298, USA
Listed on 2026-02-09
Listing for:
LineVision
Part Time
position Listed on 2026-02-09
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Job Description & How to Apply Below
Overview
Hybrid (minimum 2 days/week):
Boston, MA Headquarters
Senior Site Reliability Engineer to establish Line Vision's dedicated SRE practice and ensure our grid intelligence platform delivers the exceptional reliability our utility customers depend on. If you are looking to own the development of critical systems observability, deployment processes, and incident response protocols that directly impact grid operations, join us at Line Vision.
What will you do?- Establish and maintain Service Level Objectives (SLOs) and observability frameworks for critical services supporting utility grid operations
- Implement CI/CD guardrails including canary deployments, automated rollbacks, and pre-production validation to improve deployment reliability
- Develop comprehensive incident response procedures with documented runbooks, escalation paths, and blameless post-incident review processes
- Partner with platform, engineering, and customer support teams to instrument systems and build reliability capabilities where they deliver maximum impact
- Design and implement monitoring dashboards tracking SLA compliance, reliability metrics, and error budgets
- Complete comprehensive assessment of Line Vision's current infrastructure, identifying critical services requiring immediate observability improvements
- Establish baseline SLOs for top-priority services and implement initial monitoring dashboards in partnership with platform and support teams
- Document current deployment processes and incident response procedures, identifying gaps and quick-win improvements
- Deploy production-ready observability framework covering all critical customer-facing services, with alerts configured for key reliability signals
- Implement CI/CD improvements including automated testing gates, canary deployments, and rollback capabilities for core platform services
- Lead 3+ blameless post-incident reviews, establishing templates and processes that become standard practice across engineering
- Achieve measurable improvements in deployment success rates and mean time to recovery (MTTR) through implemented SRE practices
- Build strong cross-functional partnerships resulting in proactive reliability improvements identified through error budget monitoring
- Establish Line Vision's SRE practice as a recognized capability, with documentation, runbooks, and processes that can scale with company growth
Key Competencies
- Critical Thinking
:
Lead problem-solving efforts around complex reliability challenges, consistently applying critical thinking to identify root causes and prevent future incidents - Taking Ownership
:
Lead reliability projects with minimal supervision, taking full ownership of SRE practice development and system observability outcomes - Stakeholder Management
:
Manage relationships across engineering, platform, and support teams, providing clear updates on reliability metrics and leveraging influence to align on SRE priorities - Delivering Innovative Solutions
:
Lead implementation of modern SRE practices, inspiring teams to think creatively about reliability challenges in utility infrastructure context
- AWS Expertise
:
Strong experience with core AWS services including EC2, RDS, Lambda, and networking/VPC configuration for production environments - Observability & Monitoring
:
Hands-on proficiency with tools like Datadog, Prometheus, Grafana, or Cloud Watch for instrumenting distributed systems - Infrastructure as Code
:
Experience with Terraform, Cloud Formation, or Pulumi for managing and versioning infrastructure - Programming
:
Python and Type Script experience for automation, tooling, and system instrumentation - SLO/SLA Frameworks
:
Demonstrated experience establishing Service Level Objectives and tracking error budgets
- Background in energy, utility, or critical infrastructure sectors where reliability directly impacts public services
- AWS certifications demonstrating deep platform expertise
- Experience with security compliance frameworks (NERC CIP, ISO 27001, SOC
2) relevant to utility operations - Track record of building SRE practices from the ground up in fast-growing technical organizations
- Apply Online
- Round 1:
Phone screen - Round 2:
Hiring Manager Interview - Round 3:
Panel Interviews- Panel 1:
Technical competency & experience - AWS architecture, observability tooling, SRE practices - Panel 2:
Teamwork, culture fit, and cross-functional collaboration
- Panel 1:
- Final Round:
Leadership Team & Hiring Manager Sign-Off
- Impact. Your talent, time, and energy will critically impact our success in accelerating our mission of providing utilities with grid intelligence to enable affordable, reliable power.
- Ownership. You will hold broad responsibilities with high autonomy and trust in a communicative, collaborative, and fast-paced environment.
- Flexibility. You will be empowered to maintain work-life balance with trust-based PTO…
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×