Site Reliability Engineer Job Bedford area,Massachusetts USA,IT/Tech

Overview

Alexander Technology Group is seeking a Site Reliability Engineer (SRE) to support, monitor, and improve the performance and reliability of complex deployed systems at customer locations. This role sits at the intersection of operations, infrastructure, and software engineering and is ideal for someone who enjoys troubleshooting real-world system issues while building long-term solutions that enhance stability, scalability, and performance.

Position Overview

The Site Reliability Engineer will play a key role in ensuring deployed systems operate reliably and efficiently. This individual will diagnose and resolve issues across hardware, software, networking, and infrastructure layers while developing tools and automation to improve observability, diagnostics, and deployment processes. This is a hands-on role that combines production support with proactive reliability engineering.

Responsibilities

Investigate, troubleshoot, and resolve system issues across hardware, software, networking, and infrastructure layers
Diagnose deployment, configuration, and integration challenges in complex technical environments
Partner with field teams, operations, and engineering groups to resolve customer-impacting issues
Participate in on-call rotation or escalation support as needed
Travel to customer or partner sites as required

Reliability & Infrastructure Improvement

Develop tools, scripts, and automation to enhance reliability, observability, and diagnostics
Improve deployment, configuration, and commissioning workflows to reduce manual effort and errors
Build and maintain monitoring, logging, and alerting systems
Conduct root-cause analyses and implement long-term solutions to prevent recurring issues
Contribute to documentation, runbooks, and operational best practices

What Success Looks Like

Deployed systems are stable, observable, and well-supported
Issues are diagnosed efficiently with clear root-cause identification
Automation reduces operational overhead
Deployment and commissioning processes become faster, more reliable, and repeatable

Collaboration

Work cross-functionally with software, infrastructure, and operations teams to improve system design and supportability
Provide feedback from field deployments to inform product and infrastructure improvements

Qualifications Required

Bachelor’s degree in Computer Science, Computer Engineering, or a related STEM field
Strong troubleshooting experience with Linux systems, networking, and distributed systems
Experience diagnosing deployment and configuration issues in complex software environments
Programming or scripting experience (e.g., Python, Bash)
Proficiency with Git, including branching, code reviews, and multi-environment change management
Experience using issue tracking tools to manage incidents and prioritize work
Ability to work independently and manage multiple priorities
Comfort working in environments involving hardware and software integration

Preferred

2+ years of experience in fast-paced technology integration environments
Experience supporting complex distributed systems or mission-critical infrastructure
Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes)
Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK, Datadog)
Experience improving diagnostics, reliability, or commissioning workflows
Exposure to CI/CD pipelines and infrastructure automation tools

Contract to hire - $55hr - $65hr

If you are interested in learning more about this position please feel free to Dustin Moriarty at

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language