AI Platform & Cloud Engineer Rockville, MD
Listed on 2026-02-21
-
IT/Tech
Data Engineer, AI Engineer, Cloud Computing
Axle is a bioscience and information technology company that offers advancements in translational research, biomedical informatics, and data science applications to research centers and healthcare organizations nationally and abroad. With experts in biomedical science, software engineering, and program management, we focus on developing and applying research tools and techniques to empower decision-making and accelerate research discoveries. We work with some of the top research organizations and facilities in the country including multiple institutes at the National Institutes of Health (NIH).
BenefitsWe Offer
- Paid Time Off and Paid Holidays
- 401K match up to 5%
- Educational Benefits for Career Growth
- Employee Referral Bonus
- Flexible Spending Accounts:
- Healthcare (FSA)
- Parking Reimbursement Account (PRK)
- Dependent Care Assistant Program (DCAP)
- Transportation Reimbursement Account (TRN)
The AI Platform & Cloud Engineer will help sustain the hybrid cloud production environment for the SOM Center’s data ecosystem. This role serves as the technical interface between Data Science and IT, focusing on Platform Engineering: building the internal developer platform (IDP) that utilizes the IT-managed Kubernetes infrastructure and cloud resource to scale resources for workflow orchestration, knowledge graph data pipelines, and distributed model inference.
Responsibilities- IT Collaboration & K8s Support: Collaborate closely with the dedicated IT team to define compute requirements and orchestrate workloads on the new Kubernetes cluster. The engineer will not manage the cluster directly but will ensure data science applications are correctly containerized and configured to run efficiently on the infrastructure provided by IT.
- Infrastructure Strategy: Define the Infrastructure as Code (IaC) specifications for application-level resources, working with IT to ensure on-premises GPU clusters and public cloud environments (GCP/AWS) are utilized effectively.
- Refactoring & Model Serving: Transform experimental code (Jupyter Notebooks, R scripts) developed by NLP and Omics researchers into robust, containerized software packages. Deploy and optimize model inference servers (e.g., vLLM, Triton Inference Server) to expose AI models as reliable internal APIs.
- Workflow Orchestration: Deploy and maintain the Workflow Orchestration platform (e.g., Apache Airflow, Prefect, or Dagster) to manage dependencies between data ingestion, model inference, and state updates, serving as the central execution controller for distributed processes.
- AI-Assisted Development: Actively utilize AI-assisted coding tools (e.g., Git Hub Copilot) to accelerate code generation, documentation, and refactoring processes to increase overall productivity.
- Data Foundation: Administer the Data Foundation infrastructure, including supporting Graph Databases (e.g., Neo4j), Vector Databases (e.g., Milvus, pgvector) for RAG implementations, and ETL pipelines to ingest massive public datasets (e.g., Human Cell Atlas) into the Data Lake.
- Cloud Agent Architecture: Architect and deploy managed Cloud AI Agents (e.g., via Vertex AI) to orchestrate complex reasoning workflows, including and not limited to parsing scientific literature, querying omics databases, and validating experimental protocols against Knowledge Graphs.
- Security Implementation: Collaborate with data scientists to implement Workload Identity federation and secrets management (e.g., Vault), ensuring automated workflows securely authenticate against enterprise resources managed by IT.
- Bachelor’s or master’s degree in computer science or engineering with experience in Cloud Engineering, MLOps, or SRE.
- Proficiency in Python and Infrastructure as Code concepts, with experience in major cloud platforms (GCP preferred, or AWS).
- AI Productivity:
Demonstrated ability to leverage AI-driven coding assistants and LLMs to increase development velocity and code quality. - Experience utilizing Hybrid Cloud architectures and configuring workloads for burst computing (Spot instances, Autoscaling groups).
- Experience refactoring research-grade code into production-grade services (Docker/Kubernetes).
- Ex…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).