More jobs:
Job Description & How to Apply Below
Role Overview
As the AI Systems Architect , you’ll own the end-to-end design and delivery of production-grade agentic and Generative AI systems. This is a highly hands-on role requiring deep architectural insight, coding proficiency, and an obsession with performance, scalability, and reliability. You’ll architect secure, cost-efficient AI platforms on AWS, guide developers through complex debugging and optimization, and ensure all systems are observable, governed, and production-ready.
Key Responsibilities
Architect Production AI Systems: Design robust architectures for agentic systems (planning, reasoning, tool-calling), GenAI/RAG pipelines, and evaluation workflows. Create detailed design documents including flow/UML/sequence diagrams and AWS deployment topologies.
Optimize for Cost & Performance: Model throughput, latency, concurrency, autoscaling, CPU/GPU sizing, and vector index performance to ensure scalable, efficient deployments.
Lead Debugging & Stability Efforts: Conduct deep-dive debugging, fix critical defects, and resolve production incidents; pair-program with developers to improve code quality and performance.
Standardize Agentic Frameworks: Build reference implementations using Semantic Kernel (preferred), Lang Graph, Auto Gen, or CrewAI with strong schema validation, grounding, and memory management.
Engineer Retrieval & Search Systems: Architect hybrid retrieval solutions including ingestion, chunking, embeddings, ranking, caching, and freshness management while minimizing hallucination risk.
Productionize on AWS: Deploy and manage systems using Amazon EKS, Bedrock, S3, SQS/SNS, RDS, and Elasti Cache. Integrate IAM/Okta, Secrets Manager, and Datadog for observability, enforcing SLIs/SLOs and error budgets.
Implement Observability & Monitoring: Set up distributed tracing, metrics, and logging via Open Telemetry and Datadog. Standardize dashboards, alerts, and incident response workflows.
Govern Evaluation & Rollouts: Build test and evaluation frameworks—golden sets, A/B experiments, regression suites, and controlled rollouts—to ensure consistent quality across releases.
Embed Security & Safety: Enforce least privilege, PII protection, and policy compliance through threat modeling, sandboxed execution, and prompt-injection defense.
Establish Engineering Standards: Create reusable SDKs, connectors, CI/CD templates, and architecture review checklists to promote consistency across teams.
Cross-Functional Leadership: Collaborate with product, data, and SRE teams for capacity planning, DR strategies, and post-incident RCA reviews. Mentor engineers to strengthen design and reliability practices.
Must-Have Qualifications
7–10 years in software/AI engineering, including 4+ years in GenAI application development and 2+ years architecting agentic AI systems.
Expert in Python 3.11+ (asyncio, typing, packaging, profiling, pytest).
Hands-on experience with Semantic Kernel , Lang Graph , Auto Gen , or CrewAI .
Proven delivery of GenAI/RAG systems on AWS Bedrock or equivalent vector-based platforms (Open Search Serverless, Pinecone, Redis).
Deep understanding of AWS ecosystem : EKS, Bedrock, S3, SQS/SNS, RDS, Elasti Cache, Secrets Manager, IAM/Okta, Kong API Gateway, Datadog.
Expertise in observability and incident management using Open Telemetry and Datadog.
Strong focus on cost, performance, and security engineering —Fin Ops mindset, autoscaling, caching, and policy enforcement.
Exceptional communication—clear diagrams, ADRs, and peer review practices.
Nice-to-Have Skills
Multi-agent orchestration (task decomposition, coordinator-worker, graph-based planning).
Expertise with vector databases (Open Search, Pinecone, pgvector, Redis).
Experience with AI evaluation, guardrails, and rollout gating.
Familiarity with frontend agent interfaces, secure APIs, and AuthN/Z best practices.
Exposure to policy-as-code , multi-tenant architectures, and feature management (Kong, Launch Darkly, Flipt).
Experience with CI/CD via Git Hub Actions and IaC (Terraform/AWS Cloud Formation).
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×