DevOps/Infrastructure Engineer
Listed on 2026-01-07
-
IT/Tech
Systems Engineer, Cloud Computing, Cybersecurity
Guidehouse is seeking a Dev Ops / Infrastructure Engineer cloud developer to join our Technology / AI and Data team, supporting mission‑critical initiatives for Defense and Security clients. In this role, you will lead the design, deployment, and automation of secure, scalable cloud infrastructure that powers advanced AI‑driven platforms. You will architect solutions leveraging containerized environments, GPU‑accelerated clusters, and high‑throughput pipelines, while implementing robust Dev Sec Ops practices to ensure compliance with stringent federal security and regulatory standards.
Collaborating with engineers, architects, and mission stakeholders, you will deliver innovative cloud capabilities that enable reliable, high‑performance workflows in support of national security objectives.
- Lead the design, deployment, and automation of secure AWS Gov Cloud infrastructure supporting the FBI adjudication AI platform.
- Develop GPU‑accelerated EKS clusters, secure containerized model‑serving environments, distributed inference gateways, vector databases, and high‑throughput ingestion pipelines.
- Own the platform's Dev Sec Ops toolchain including CI/CD automation, IaC, secure pipelines, logging/monitoring integrations, and identity‑boundary enforcement aligned with federal requirements.
- Ensure full FedRAMP High, RMF, and FBI ATO alignment across infrastructure controls, logging coverage, network segmentation, encryption, monitoring, and boundary configurations.
- Design, deploy, and maintain secure AWS Gov Cloud architectures supporting LLM inference, retrieval services, vector databases, backend APIs, and large‑scale document processing pipelines.
- Build and manage GPU‑accelerated EKS clusters including autoscaling node groups, GPU scheduling, operators, and optimization for high‑performance inference workloads.
- Architect multi‑AZ high‑availability patterns including health checks, failover mechanisms, and distributed storage strategies.
- Implement VPC designs including private subnets, NAT gateways, VPC endpoints, NACLs, SGs, and traffic inspection layers supporting zero‑trust boundaries.
- Develop CI/CD pipelines automating build, scan, test, deploy, and rollback processes for AI services, APIs, UI applications, and data pipelines.
- Use Terraform/Cloud Formation for automated provisioning of networks, clusters, storage, identity boundaries, and monitoring components.
- Embed SAST, SCA, IaC scanning, container scanning, dependency checks, and image attestation into pipelines to enforce supply chain security.
- Automate promotion workflows across dev, staging, and production environments under controlled change‑management policies.
- Implement NIST 800‑53, FedRAMP High, RMF, and CJIS controls across encryption, identity management, logging, monitoring, container hardening, and network segmentation.
- Configure KMS key hierarchies, secrets management, token‑scoped identities, certificate rotation, and workload identity policies.
- Develop logging and monitoring pipelines using Cloud Trail, Cloud Watch, Guard Duty, Config Rules, and SIEM integrations.
- Support SSP documentation, boundary diagrams, control‑implementation statements, and continuous monitoring filings for the FBI ATO process.
- Deploy and tune GPU compute environments using G‑series or P‑series instances optimized for hosting open‑weight LLMs and retrieval workloads.
- Enable LLM‑serving frameworks (vLLM, TGI, Sagemaker, Deep Speed‑based endpoints) with secure gateways and autoscaling rules.
- Support vector databases (FAISS, pgvector, Elasticsearch), embedding pipelines, retrieval services, and memory‑optimized storage.
- Optimize I/O throughput, caching, and container networking for large‑scale investigative document ingestion.
- Implement observability via metrics, traces, logs, health checks, SLOs/SLIs, and operational dashboards.
- Improve reliability using circuit breakers, retry/backoff logic, blue/green deployments, canary rollouts, and automated…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).