AI Platform & Cloud Engineer Rockville,MD Job Rockville area,Maryland USA,IT/Tech

Position: AI Platform & Cloud Engineer New Rockville, MD

Axle is a bioscience and information technology company that offers advancements in translational research, biomedical informatics, and data science applications to research centers and healthcare organizations nationally and abroad. With experts in biomedical science, software engineering, and program management, we focus on developing and applying research tools and techniques to empower decision-making and accelerate research discoveries. We work with some of the top research organizations and facilities in the country including multiple institutes at the National Institutes of Health (NIH).

Benefits

We Offer

Paid Time Off and Paid Holidays
401K match up to 5%
Educational Benefits for Career Growth
Employee Referral Bonus
Flexible Spending Accounts:
- Healthcare (FSA)
- Parking Reimbursement Account (PRK)
- Dependent Care Assistant Program (DCAP)
- Transportation Reimbursement Account (TRN)

Overview

The AI Platform & Cloud Engineer will help sustain the hybrid cloud production environment for the SOM Center’s data ecosystem. This role serves as the technical interface between Data Science and IT, focusing on Platform Engineering: building the internal developer platform (IDP) that utilizes the IT-managed Kubernetes infrastructure and cloud resource to scale resources for workflow orchestration, knowledge graph data pipelines, and distributed model inference.

Responsibilities

IT Collaboration & K8s Support: Collaborate closely with the dedicated IT team to define compute requirements and orchestrate workloads on the new Kubernetes cluster. The engineer will not manage the cluster directly but will ensure data science applications are correctly containerized and configured to run efficiently on the infrastructure provided by IT.
Infrastructure Strategy: Define the Infrastructure as Code (IaC) specifications for application-level resources, working with IT to ensure on-premises GPU clusters and public cloud environments (GCP/AWS) are utilized effectively.
Refactoring & Model Serving: Transform experimental code (Jupyter Notebooks, R scripts) developed by NLP and Omics researchers into robust, containerized software packages. Deploy and optimize model inference servers (e.g., vLLM, Triton Inference Server) to expose AI models as reliable internal APIs.
Workflow Orchestration: Deploy and maintain the Workflow Orchestration platform (e.g., Apache Airflow, Prefect, or Dagster) to manage dependencies between data ingestion, model inference, and state updates, serving as the central execution controller for distributed processes.
AI-Assisted Development: Actively utilize AI-assisted coding tools (e.g., Git Hub Copilot) to accelerate code generation, documentation, and refactoring processes to increase overall productivity.
Data Foundation: Administer the Data Foundation infrastructure, including supporting Graph Databases (e.g., Neo4j), Vector Databases (e.g., Milvus, pgvector) for RAG implementations, and ETL pipelines to ingest massive public datasets (e.g., Human Cell Atlas) into the Data Lake.
Cloud Agent Architecture: Architect and deploy managed Cloud AI Agents (e.g., via Vertex AI) to orchestrate complex reasoning workflows, including and not limited to parsing scientific literature, querying omics databases, and validating experimental protocols against Knowledge Graphs.
Security Implementation: Collaborate with data scientists to implement Workload Identity federation and secrets management (e.g., Vault), ensuring automated workflows securely authenticate against enterprise resources managed by IT.

Required Qualifications

Bachelor’s or master’s degree in computer science or engineering with experience in Cloud Engineering, MLOps, or SRE.
Proficiency in Python and Infrastructure as Code concepts, with experience in major cloud platforms (GCP preferred, or AWS).
AI Productivity:
Demonstrated ability to leverage AI-driven coding assistants and LLMs to increase development velocity and code quality.
Experience utilizing Hybrid Cloud architectures and configuring workloads for burst computing (Spot instances, Autoscaling groups).
Experience refactoring research-grade code into production-grade services (Docker/Kubernetes).
Ex…


Increase/decrease your Search Radius (miles)



Job Posting Language

AI Platform & Cloud Engineer Rockville, MD