Principal Engineer
Listed on 2026-05-08
-
Software Development
Cloud Engineer - Software, Software Engineer
Why This Role Exists
We operate a multi‑tenant automotive SaaS platform serving thousands of dealer groups across the United States. Our backend — event‑driven serverless on AWS (Lambda, Event Bridge, Dynamo
DB, S3, Step Functions) — orchestrates everything from dealer onboarding to inventory management to real‑time transaction processing. That platform works. Now we need to make it think. We are building agentic AI systems: autonomous, tool‑using agents that observe platform state, reason over dealer context, take action through production APIs, and learn from outcomes. These are not chatbots bolted onto a dashboard.
They are first‑class platform services backed by AWS Bedrock, connected to production systems via MCP servers – that make decisions, execute workflows, and close loops without human intervention unless guardrails say otherwise. This Principal Engineer owns that entire surface. You are not advising on AI strategy from a whiteboard – you are writing agent code, defining tool interfaces, building evaluation harnesses, setting cost and latency budgets, and shipping production AI workflows that touch real dealers and real money.
& Scale
- 5000+ destination dealer tenants, each with isolated databases and per‑tenant configuration
- Billions in annual Gross Merchandise Value (GMV) flowing through platform transactions
- Tens of thousands of API requests per minute across REST, SOAP, and event‑driven integration surfaces
- Data pipelines spanning 6 integration domains with multi‑protocol vendor connectivity
- Ownership and core development of agentic AI systems — designing, building, and operating the AI agent infrastructure (AWS Bedrock, MCP servers) that powers intelligent automation across the platform.
- AI agent lifecycle end to end — from prompt engineering and tool‑use design through guardrails, evaluation, cost optimization, and production observability.
- System design and technical decision‑making for migration waves — from identity/tenant services through core domain extraction and frontend decomposition.
- The dual‑write framework, API Gateway traffic‑splitting, and per‑tenant feature flag rollout that make every migration step reversible.
- Cross‑cutting concerns: observability (Open Telemetry, Cloud Watch), security posture (Auth0 consolidation, IAM), and data architecture (Dynamo
DB single‑table design, Aurora consolidation). - Mentoring and force‑multiplying senior ICs — establishing patterns, reviewing designs, and raising the technical bar across 5 engineering teams.
- Consolidate and strategize 30+ different integrations and make the future integrations easier.
- Cloud Services:
High‑availability AWS stack including Lambda, Event Bridge, Dynamo
DB, S3, ECS Fargate, Aurora, API Gateway, Cloud Watch, and Secrets Manager. - Development
Languages:
Modern Python and Java (Spring Boot) alongside Type Script/React (Next.js 16) frontends, with legacy domain coverage in PHP/Laravel. - AI & Agentic Systems:
Advanced agentic workflow orchestration utilizing lean AWS Bedrock Agent Core, MCP servers, or Lang Chain/Lang Graph frameworks. - Data Engineering:
Complex data architectures featuring Dynamo
DB single‑table design, MySQL/Aurora, S3 data lakes, Glue Data Catalog, Athena, and data pipelines. - Infrastructure & Security:
Enterprise‑grade CI/CD and observability via Cloud Formation, Auth0 consolidation, Open Telemetry, and Circle
CI. - Integration Surfaces:
Multi‑protocol connectivity spanning REST, SOAP/XML, Event Bridge event‑bus patterns, SES processing, and Playwright browser automation.
- Months 1‑3:
Immerse in the codebase. Audit the current architecture across all stacks. Publish the first Architecture Decision Record (ADR) for the next migration wave. Establish your design review cadence with the team. - Months 4‑6:
Drive the AI/agentic integration layer — Bedrock‑powered automation in at least one production workflow. Establish the patterns for how the team builds with AI going forward. - Months 7‑9:
Own and deliver the first migration wave end‑to‑end — from design doc through production cutover with dual‑write validation. Stand up the observability baseline (Open Telemetry…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).