AI Engineer Job New York New York USA,Software Development

Position: Staff AI Engineer
Location: New York

Location

New York City

Employment Type

Full time

Department

Product Design + Engineering Software Engineering

Why this role matters

We're on a mission to automate one of the most costly and expertise-dependent bottlenecks in the built environment —
construction plan review
. Today, plan review is slow, expensive, and highly manual, requiring licensed experts to navigate thousands of unique jurisdictional construction codes and complex architectural documents. We believe AI can help.

As a key hire in our AI engineering organization, you’ll operate with a founder mindset, help in defining our long‑term data strategy, and help to grow multiple squads behind it. Green Lite is building agentic workflows that compress permitting cycle time across thousands of jurisdictions—spanning permit intake, checklist generation, document QA, code citation, and review ops inside Lite Table. Your work turns messy PDFs/CAD plans and scattered code texts into decisive, auditable agent actions that help customers get permits faster.

Our

agent stack leans on:

Bedrock Agent Core to deploy and operate agents securely at scale with session isolation, long-running workloads, built-in tools, memory, identity, gateway, and observability.
Lang Graph for graph-based orchestration and error‑tolerant control flow.
Strands Agents for structured reasoning and tool use.

What you’ll do:

Design & ship production agents: Own one or more high‑impact agent workflows (e.g., Permit Intake Triage, Smart Document QA, Compliance Comment Copilot, Code Lookup inside Lite Table). Compose multi‑step graphs (Lang Graph) with Strands‑based reasoning/tooling and develop internal/external tools for targeting heterogeneous datasets.
Operationalize on Bedrock Agent Core: Use Runtime for secure, scalable hosting and streaming;
Gateway to expose APIs as agent‑ready tools;
Memory for persistent context;
Identity for least‑privilege access;
Observability for traces/metrics; and built‑in Browser/Code‑Interpreter where appropriate.
Evaluation & safety harness: Stand up task‑level and end‑to‑end evals (success rate, cost, latency, human‑handoff rate) and regression suites; borrow ideas from Agent Bench/Web Arena/SWE‑bench where useful but bias toward domain‑grounded tests for permitting.
Retrieval & knowledge: Partner with Data to wire agents to building‑code knowledge (RAG), evaluate vector store options (incl.
S3 Vectors preview, or Pinecone
) and set up durable knowledge interfaces the agents can trust.
UX handoffs: Collaborate with Product/Design to craft agent‑first reviewer UX in Lite Table (great inline citations, diffs, and remediation suggestions).
Quality & reliability: Build fault‑tolerant flows (retries/rollbacks/compensation), observability dashboards, and on‑call runbooks for agent incidents.
Technical leadership: Create internal libraries/templates for agent patterns; mentor engineers; review designs with domain experts (architects/code officials).

You may be a fit if you have:

Agent frameworks: Depth with Lang Graph (graph orchestration, state, recovery) and Strands Agents (structured tool reasoning); experience integrating MCP clients/servers.
Production chops: 7–12+ years shipping high‑reliability backend or ML systems (Python/Type Script), cloud infra (AWS), containers (ECS/EKS), CI/CD, IaC, and secure integration patterns.
Retrieval & data: Hands‑on with RAG, document stores/vector DBs (bonus: knowledge of Pinecone or early exposure to S3 Vectors
), schema/versioning for code texts and comments.
Evaluation mindset: Ability to design realistic, auditable testbeds; familiarity with agent benchmarks (and their pitfalls) and how to turn real user flows into acceptance tests.
Safety & governance: Policy-driven guardrails (Grounding checks, topic/word filters), auditability, and human‑in‑the‑loop controls.
Domain empathy: Curiosity for building codes, plan sets, and reviewer workflows; comfort working with messy PDFs/CAD and plan‑review UX.
Bedrock Agent Core knowledge: Nice to have - comfort with Runtime (session isolation, 8‑hour sessions, large payloads), Memory, Gateway, Identity, Observability, and built‑in Browser/Code‑Interpreter.
Nice‑to‑haves: CrewAI/Auto…