Lead, AI Platforms & Operations; Remote
Baltimore, Anne Arundel County, Maryland, 21276, USA
Listed on 2026-04-29
-
IT/Tech
Systems Engineer, Cloud Computing, AI Engineer
Responsibilities
Drives enterprise optimization by introducing and maturing capabilities for the AI Platform & Operations domain, translating business needs into architectural solutions that meet performance, reliability, and security expectations. Owns the technical foundation and operational excellence required to run AI systems in production ensuring platforms are scalable, secure, cost-effective, and production‑ready so delivery teams can ship AI use cases quickly and safely.
Applies enterprise roadmaps, principles, standards, and practices; develops domain strategies, reference architectures, and reusable patterns (paved roads) for AI/ML and LLM/generative AI workloads across cloud environments.
- Leads preparation of domain architecture viewpoints/models for current, target, and interim states, identifying pain points and opportunities.
- Architects and evolves the enterprise AI platform supporting ML, LLM, and generative AI workloads; defines reference architectures, patterns, and processes for development and deployment.
- Ensures target‑state alignment and integration across architecture domains, including seamless integration with enterprise data platforms, identity, security, and legacy systems.
- Leads design, engineering, and implementation of reusable assets that improve solution quality; collaborates through implementation phase.
- Owns end‑to‑end AI production lifecycle architecture: CI/CD, packaging, deployment, versioning, rollback, and governed scale mechanisms (e.g., registries/artifacts/evaluation frameworks).
- Represents the domain in peer reviews and briefings (e.g., Architecture Review Board); maintains domain work products in the repository and dispositions stakeholder feedback.
- Enables delivery teams by providing reusable infrastructure, tooling, and automation, serving as technical escalation for complex platform/ops issues.
- Leads collaboration with business and technical stakeholders to maximize architecture impact and address constraints; ensures traceability upstream to business needs and downstream to solution building blocks.
- Ensures platforms meet availability, scalability, performance, and resilience requirements; implements monitoring/alerting and SRE‑style runbooks for AI services.
- Drives continuous improvement to reduce operational friction and technical debt in production AI systems.
- Leads vendor evaluations/selection for domain‑specific tooling; represents the domain in RFIs/RFPs and evaluation/scoring of proposals.
- Evaluates and adopts market‑leading AI tools/services aligned with enterprise standards.
- Leads development/enhancement of architecture domain methods/tools; aligns domain processes with other architecture domains and SDLC disciplines; develops communications/education materials.
- Partners with cybersecurity and Trust & Risk teams to implement secure‑by‑design AI platforms (access control, logging, data protection, release governance) and supports production readiness reviews without duplicating governance ownership.
- Leads domain assessments for complex proposed projects; provides input to project/product/enterprise technology roadmaps; provides regular reporting on progress, issues, and opportunities related to the domain.
Education Level: Bachelor's Degree in Computer Science, Information Technology, or related field OR in lieu of a Bachelor's degree, an additional 4 years of relevant work experience is required in addition to the required work experience.
Licenses/Certifications Upon Hire Preferred- Certified System Architect, Cloud Architect, Kubernetes/Platform Engineering, MLOps Certification.
10 years Experience in Architecture Domain.
Preferred Qualifications- Advanced degree.
Skills And Abilities
(KSAs)
- Ability to motivate and influence others so that project…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).