Site Reliability Engineer
Listed on 2026-02-16
-
IT/Tech
IT Support, Systems Engineer, Cloud Computing, Network Engineer
Overview
Alexander Technology Group is seeking a Site Reliability Engineer (SRE) to support, monitor, and improve the performance and reliability of complex deployed systems at customer locations. This role sits at the intersection of operations, infrastructure, and software engineering and is ideal for someone who enjoys troubleshooting real-world system issues while building long-term solutions that enhance stability, scalability, and performance.
Position OverviewThe Site Reliability Engineer will play a key role in ensuring deployed systems operate reliably and efficiently. This individual will diagnose and resolve issues across hardware, software, networking, and infrastructure layers while developing tools and automation to improve observability, diagnostics, and deployment processes. This is a hands-on role that combines production support with proactive reliability engineering.
Responsibilities- Investigate, troubleshoot, and resolve system issues across hardware, software, networking, and infrastructure layers
- Diagnose deployment, configuration, and integration challenges in complex technical environments
- Partner with field teams, operations, and engineering groups to resolve customer-impacting issues
- Participate in on-call rotation or escalation support as needed
- Travel to customer or partner sites as required
- Develop tools, scripts, and automation to enhance reliability, observability, and diagnostics
- Improve deployment, configuration, and commissioning workflows to reduce manual effort and errors
- Build and maintain monitoring, logging, and alerting systems
- Conduct root-cause analyses and implement long-term solutions to prevent recurring issues
- Contribute to documentation, runbooks, and operational best practices
- Deployed systems are stable, observable, and well-supported
- Issues are diagnosed efficiently with clear root-cause identification
- Automation reduces operational overhead
- Deployment and commissioning processes become faster, more reliable, and repeatable
- Work cross-functionally with software, infrastructure, and operations teams to improve system design and supportability
- Provide feedback from field deployments to inform product and infrastructure improvements
- Bachelor’s degree in Computer Science, Computer Engineering, or a related STEM field
- Strong troubleshooting experience with Linux systems, networking, and distributed systems
- Experience diagnosing deployment and configuration issues in complex software environments
- Programming or scripting experience (e.g., Python, Bash)
- Proficiency with Git, including branching, code reviews, and multi-environment change management
- Experience using issue tracking tools to manage incidents and prioritize work
- Ability to work independently and manage multiple priorities
- Comfort working in environments involving hardware and software integration
- 2+ years of experience in fast-paced technology integration environments
- Experience supporting complex distributed systems or mission-critical infrastructure
- Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes)
- Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK, Datadog)
- Experience improving diagnostics, reliability, or commissioning workflows
- Exposure to CI/CD pipelines and infrastructure automation tools
Contract to hire - $55hr - $65hr
If you are interested in learning more about this position please feel free to Dustin Moriarty at
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).