IT Operations Technical Lead Frederick, MD
Listed on 2026-06-05
-
IT/Tech
Cloud Computing, Systems Administrator
Axle is a bioscience and information technology company that offers advancements in translational research, biomedical informatics, and data science applications to research centers and healthcare organizations nationally and abroad. With experts in biomedical science, software engineering, and program management, we focus on developing and applying research tools and techniques to empower decision‑making and accelerate research discoveries. We work with some of the top research organizations and facilities in the country, including multiple institutes at the National Institutes of Health (NIH).
BenefitsWe
Offer:
- Paid Time Off and Paid Holidays
- 401K match up to 5%
- Educational Benefits for Career Growth
- Employee Referral Bonus
- Flexible Spending Accounts:
- Healthcare (FSA)
- Parking Reimbursement Account (PRK)
- Dependent Care Assistant Program (DCAP)
- Transportation Reimbursement Account (TRN)
This role combines hands‑on technical leadership with ITIL‑based operations, automation, and incident management. The ideal candidate brings deep Linux expertise and a focus on reliability, scalability, and modern workloads such as AI/ML.
Responsibilities:- Lead and manage IT operations aligned with ITIL processes including Incident, Problem, Change, and Release Management.
- Provide hands‑on leadership in managing Linux and Windows environments across cloud and on‑premises infrastructure.
- Own and drive incident response, root‑cause analysis, and service restoration for mission‑critical systems.
- Design, build, and maintain golden images, patching strategies, and system hardening standards.
- Lead patch management and vulnerability remediation programs ensuring compliance and system integrity.
- Develop and implement automation solutions using modern approaches, including Vibe Coding (AI‑assisted development) to accelerate operational efficiency and reduce toil.
- Support and optimize infrastructure for AI/ML workloads, including provisioning, scaling, and performance tuning.
- Manage and maintain GPU‑enabled environments and instances for high‑performance computing and machine‑learning use cases.
- Oversee and optimize infrastructure monitoring, logging, alerting, and observability frameworks.
- Manage and mentor a team of systems engineers; provide technical guidance and performance oversight.
- Collaborate with architecture, security, and development teams to improve reliability, scalability, and operational efficiency.
- Support hybrid environments, including cloud platforms and on‑premise data centers.
- Ensure proper documentation, runbooks, SOPs, and operational readiness.
- Stay abreast of new technologies in your areas, including but not limited to U.S. Federal Standards, NIST publications, cloud computing & deployment, site reliability engineering, security standards, and compliance best practices.
- Must have 5+ years of experience leading an operations team with hands‑on experience in driving operational process improvements and technological advancements.
- Proven experience implementing and operating within ITIL frameworks.
- Must have 10+ years of hands‑on Unix/Linux experience, specifically with CentOS / Red Hat systems administration support for large‑scale distributed environments.
- Hands‑on experience with incident management, patching, system hardening, and production support.
- Experience building and maintaining golden images and standardized environments.
- Strong scripting/automation skills (e.g., Python, Bash, Power Shell or similar).
- Experience with configuration management and automation tools (Ansible, Terraform, Puppet, Chef, or similar).
- Strong understanding of networking fundamentals (DNS, TCP/IP, firewalls, load balancing).
- Experience with monitoring and logging tools (e.g., Nagios, Splunk, ELK, Prometheus, Grafana).
- Must have Cloud Build‑Out or Migration experience in at least one of the following providers:
Amazon AWS, Google GCP, or Microsoft Azure. - Must have 2+ years with CI/CD and automation tools such as Terraform, Ansible, Chef, Puppet, Jenkins, Git Hub.
- Experience supporting AI/ML workloads or data‑intensive platforms.
- Familiarity with GPU‑based compute environments (e.g., NVIDIA GPU instances).
- Must…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).