Senior AI Engineer; Full-Stack Job Seattle area,Washington USA,Software Development

Position: Senior AI Engineer (Full-Stack)

We are hiring a Senior AI Engineer who builds production-grade AI products end-to-end. You will design and ship AI agents, Retrieval-Augmented Generation (RAG) systems, and fine-tuned small language models, while also owning the full-stack delivery from React/Vue/Angular frontends through Python/Node backends to AWS, GCP and Azure deployments.

Equally important: you are an AI-adopted engineer. You use Claude Code, Cursor, Codex, and other AI coding assistants as a daily multiplier, and you know how to use them well — managing context, controlling token spend, writing CLAUDE.md / AGENTS.md files, using subagents and MCP servers, and applying evaluation-driven workflows so that AI-generated code is shipped responsibly.

What You Will Do

Design, build and deploy AI agents using Lang Chain, Lang Graph, Llama Index, CrewAI or equivalent frameworks — including multi-agent orchestration, tool use, memory, and planning loops.
Architect RAG pipelines end-to-end: ingestion, chunking, embedding selection, vector stores (Pinecone / Weaviate / Qdrant / pgvector), hybrid search, re-ranking, query rewriting, and evaluation.
Fine-tune small and open-source language models (Llama, Mistral, Phi, Gemma, Qwen) using LoRA, QLoRA, PEFT, instruction tuning and DPO — and decide when fine-tuning is the right answer versus prompting or RAG.
Build full-stack AI applications:
React/Next.js frontends with streaming UIs (Vercel AI SDK / SSE / Web Sockets), FastAPI or Node backends, and well-designed APIs.
Own deployment, scaling and observability on AWS (Bedrock, Sage Maker, Lambda, ECS/EKS) and GCP (Vertex AI, Cloud Run, GKE), with Docker, Kubernetes, Terraform and CI/CD.
Implement LLM observability and evals using Lang Smith, Langfuse, RAGAS, Deep Eval — and treat evaluation as a first-class engineering artifact, not an afterthought.
Apply AI coding assistants (Claude Code, Cursor, Codex, Windsurf, Copilot) as a daily tool with strong discipline around context management, token efficiency, subagents, hooks, slash commands, and MCP servers.
Address non-functional requirements: latency budgets, cost/token economics, prompt injection defense, PII handling, OWASP LLM Top 10, rate limiting, semantic caching, and graceful degradation.
Collaborate with product, design and business stakeholders to translate ambiguous problems into shippable AI solutions, and mentor mid-level engineers on AI engineering practices.

Must-Have Skills

4+ years of software engineering and at least 2 years of hands-on production work with LLMs (OpenAI, Anthropic Claude, Gemini, or open-source).
Strong RAG experience: chunking strategies, embedding models, vector databases, hybrid search, re-ranking, evaluation, and avoiding common failure modes.
Production experience building AI agents with Lang Chain and Lang Graph (or Llama Index, CrewAI, Auto Gen, Pydantic AI). Comfortable with tool/function calling, structured outputs, agent memory and multi-agent patterns.
Experience fine-tuning small/open-source models (LoRA, QLoRA, PEFT) and using Hugging Face Transformers, Datasets, Accelerate, and the Hub.
Strong prompt engineering: system design, few-shot, chain-of-thought, prompt caching, structured output schemas, evaluation of prompts as code.

AI-Augmented Development

Daily, production-grade use of Claude Code, Cursor, or Codex. Understands CLAUDE.md / AGENTS.md, project memory files, slash commands, subagents, hooks, MCP servers, and plan-vs-execute workflows.
Deliberate token and context management: knows when to use Haiku vs Sonnet vs Opus (and equivalents on other providers), uses prompt caching, batches work, prunes context aggressively.
Disciplined review of AI-generated code, with tests and evals — never ships unread output.
Backend:
Python (FastAPI / Flask) and/or Node.js (Type Script). Solid grasp of async patterns, streaming responses (SSE / Web Sockets/ API).
Frontend:
React, Next.js, Type Script, Tailwind CSS. Comfortable building streaming chat UIs and agentic interfaces.
Databases:
Postgre

SQL, Redis, at least one vector DB. Familiar with schema design, indexing, and query optimization.

Non-Functional Engineering

Latency: streaming, parallel tool calls, model…