More jobs:
Site Reliability Engineer
Job in
Frederick, Frederick County, Maryland, 21701, USA
Listed on 2026-05-21
Listing for:
Axle
Full Time
position Listed on 2026-05-21
Job specializations:
-
IT/Tech
Cloud Computing, SRE/Site Reliability
Job Description & How to Apply Below
Axle is a bioscience and information technology company that offers advancements in translational research, biomedical informatics, and data science applications to research centers and healthcare organizations nationally and abroad. With experts in biomedical science, software engineering, and program management, we focus on developing and applying research tools and techniques to empower decision‑making and accelerate research discoveries. We work with some of the top research organizations and facilities in the country, including multiple institutes at the National Institutes of Health (NIH).
BenefitsWe Offer
- Paid Time Off and Paid Holidays
- 401K match up to 5%
- Educational Benefits for Career Growth
- Employee Referral Bonus
- Flexible Spending Accounts:
- Healthcare (FSA)
- Parking Reimbursement Account (PRK)
- Dependent Care Assistant Program (DCAP)
- Transportation Reimbursement Account (TRN)
- Design and implement enterprise‑grade monitoring and observability frameworks (metrics, logs, traces) across distributed systems using enterprise Splunk, Grafana and Open‑Telemetry tools.
- Establish and manage SLIs, SLOs, and error budgets to drive reliability improvements.
- Develop and maintain real‑time asset inventory systems across cloud, on‑prem, and hybrid environments.
- Automate workload onboarding and offboarding processes, ensuring standardization and governance.
- Track system ownership, dependencies, and lifecycle states for operational transparency.
- Build proactive detection mechanisms using AIOps and intelligent alerting to minimize incident impact.
- Design and operate scalable, resilient, and secure infrastructure platforms across cloud and hybrid environments.
- Implement automated compliance tracking and enforcement aligned with organizational and regulatory standards (e.g., NIST, FISMA, FedRAMP).
- Embed ITIL processes (incident, change, problem, configuration management) into SRE workflows.
- Build and maintain automated deployment environments and pipelines that enforce security, compliance, and operational standards.
- Develop “golden paths” and standardized platform templates for consistent workload deployment.
- Automate provisioning, patching, configuration management, and environment lifecycle.
- Leverage AI/ML coding assistants and vibe coding practices to rapidly develop automation scripts, tools, and internal platforms.
- Integrate AI‑driven tooling into Dev Ops pipelines for code quality, security scanning, and operational insights.
- Lead adoption of AI‑enhanced SRE practices, including intelligent remediation and predictive operations.
- Champion Dev Ops and SRE practices including Infrastructure as Code, CI/CD, observability, and reliability engineering.
- Build developer‑friendly platforms (“golden paths”) that simplify deployments, reduce friction, and improve velocity.
- Enable and optimize infrastructure for AI/ML workloads, including data pipelines, storage systems, and inference environments, GPU‑enabled and high‑performance compute workloads.
- Build and manage containerized and orchestrated platforms (Docker, Kubernetes).
- Support cloud migration, modernization, and platform standardization initiatives.
- Ensure systems meet security, compliance, backup, and disaster recovery requirements.
- Evangelize and promote best practices in Dev Ops, SRE, and platform engineering to developer communities.
- Stay abreast of new technologies in your areas but not limited to AIOps, MLOps, cloud computing & deployment, site reliability engineering, infrastructure automation, security best practices, data engineering etc.
- Must have total of 6+ experience in Dev Ops / SRE roles with monitoring and observability tools (Prometheus, Grafana, ELK, or cloud‑native equivalents) for on‑prem and cloud hosted workloads.
- Must have 4+ years of Hands‑on Linux experience that includes Ubuntu / CentOS / Red Hat operating systems, containers, dependency management and administration support.
- Must have 4+ years of experience automating Infrastructure‑as‑Code (IaC) deployments to one of the following cloud platforms Amazon AWS, Google GCP and Microsoft Azure.
- Must have 4+ years with CI/CD and automation tools such as Terraform, Ansible, Chef,…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×