More jobs:
AI Systems Administrator Security Clearance
Job in
Cambridge, Middlesex County, Massachusetts, 02138, USA
Listed on 2026-06-11
Listing for:
Draper
Part Time
position Listed on 2026-06-11
Job specializations:
-
IT/Tech
Cybersecurity, IT Support
Job Description & How to Apply Below
Overview:
Draper is an independent, nonprofit research and development company headquartered in Cambridge, MA. The 2,000+ employees of Draper tackle important national challenges with a promise of delivering successful and usable solutions. From military defense and space exploration to biomedical engineering, lives often depend on the solutions we provide. Our multidisciplinary teams of engineers and scientists work in a collaborative environment that inspires the cross-fertilization of ideas necessary for true innovation.
For more information about Draper, visit .
Job Description
Summary:
The AI Systems Administrator is instrumental in bringing AI to Draper. The incumbent implements a closed GPT environment at Draper in which several different LLM models are maintained and used throughout the organization. This role works with engineering to ensure that multiple LLMs are accessible through a chat interface, API, and assistive tools for the general purpose of the organization. In addition, they will ensure the system health of the Draper
GPT server to allow for additional AI infrastructure requiring large amounts of compute to be utilized without impacting the performance of other LLM resources. This will also include API interfaces with various software platforms across Draper (e.g., engineering, accounting, legal). This role helps Draper implement automation, streamline processes, and support mission-critical AI/ML workloads. Resource allocation is critical. It also involves traditional Linux admin duties (installing, configuring, securing servers, scripting, monitoring) but with a strong focus on supporting AI/ML (e.g., GPU servers, Kubernetes, data pipelines), managing AI.
This job supports AI engineers using their knowledge to guide AI engineers with solutions and recommendations. The role is part of a team of Linux system administrators responsible for managing the functionality and efficiency of a group of computers, approximately 750, running primarily Oracle Linux. Additional operating system knowledge, e.g. Ubuntu and RHEL, maybe be necessary. Maintain the integrity and security of servers and systems.
Serves as a front-line interface to end users and other IS teams. The Systems Administrator makes recommendations for hardware and software purchases. Interacts with vendors and VARs directly on proactive projects as well as reacting to support issues. Duties may include installation, configure, and maintain new hardware/software, troubleshooting, permissions and training other administrators. Requires a solid understanding of UNIX based operating systems.
This role will by hybrid (3 days/week) in Cambridge, MA and will require an Active Secret Clearance.
Job Description:
Duties/Responsibilities
* Build, operate, and troubleshoot RHEL/Oracle systems supporting GPU workloads (OS lifecycle, patching, performance, reliability).
* Manage the GPU enablement layer: driver/toolkit lifecycle, kernel/driver compatibility, coordinated upgrades and rollback plans, and ongoing health monitoring.
* Implement and maintain observability (metrics, logs, alerting) for system, GPU, and storage performance/health (e.g., Prometheus/Grafana and GPU telemetry such as DCGM/NVML or equivalent).
* Couple above observability with LLM performance and usage, and identify and warn users over allocating resources.
* Maintaining (ie resetting or rebuilding) LLM servers to ensure high up times and usage capabilities across organization.
* Working with a team of engineers to allow for software upgrades (e.g. new models, or additional AI software) to the server while maintaining security needs.
* Partner with storage/network peers to baseline throughput/latency, identify bottlenecks, and tune the platform for predictable performance.
* Automation & scripting: create and maintain automation for platform administration and broader Linux team workflows (provisioning/config enforcement, patch orchestration, reporting, routine maintenance), using Git-based practices. (Python/Ansible)
* Work to support various Linux, Cloud AWS/Azure projects
* Lead projects including large scale migrations as well as platform redesign and implementation. Utilize resources within the Linux team as well as across the IS department to reach goals Skills/Abilities
* Strong production Linux administration experience (RHEL/Oracle preferred): systemd, networking, troubleshooting, performance analysis, patching, package management.
* Strong automation skills:
Bash and/or Python, plus Ansible (preferred) or equivalent configuration management; comfortable with CI/Git workflows.
* Experience supporting enterprise platforms (incident response, root-cause analysis, postmortems, runbooks/documentation).
* Security-minded operations in regulated environments; familiarity with CUI handling concepts and control expectations (audit logging, vulnerability remediation, change control). Education
* Bachelor's degree in Computer Science or a related field. Experience
* 3…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×