Software Engineer - Search Platform,Ingestion & Indexing Job Eagan area,Minnesota USA,Software Development

Position: Staff Software Engineer - Search Platform, Ingestion & Indexing

Overview of the Role

Advanced Content Engineering (ACE) is seeking a Staff Software Engineer to serve as the technical anchor for the search platform's ingestion and indexing systems. The platform processes millions of documents across TR's legal, tax, and professional content corpora – parsing, chunking, enriching, embedding, and indexing them into a hybrid search engine that powers both human-facing search interfaces and autonomous AI agents.

Getting this pipeline right, at scale, with zero‑downtime operations and increasingly agentic retrieval patterns, is one of the platform's most consequential engineering challenges.

This role owns the design, implementation, and operational health of the document ingestion pipeline and search index management systems – from the Kafka‑based streaming infrastructure that moves documents through processing stages, to the Vespa application architecture that stores and serves them. Staff Engineers on this team define, build, test, deploy, scale, and operate what they ship – full‑stack ownership is the daily reality.

AI‑assisted development is the team norm, and constant delivery to production is the expectation. This is a role for someone who sets architectural boundaries, not just executes within them.

About the Role

In this position, you will focus on:

Ingestion Pipeline Architecture & Engineering

Plan, design, develop, and own the end‑to‑end document ingestion pipeline – a Kafka‑based stream processing architecture that moves documents through parsing, chunking, enrichment (entity extraction, embedding generation, metadata enrichment), and indexing stages, including all fault tolerance, version ordering, and at‑least‑once delivery guarantees.
Architect and implement pluggable, configurable pipeline components (parsers, chunkers, enrichers, indexers) that client teams can assemble into custom topologies via the platform's self‑service APIs, while maintaining reliable, observable, and performant execution.
Own the platform's Protobuf‑based document schema and schema registry integration – establishing schema governance standards, enforcing backward‑compatible evolution, and ensuring reliable serialization across all pipeline stages.
Design and implement dual‑flow ingestion: a high‑throughput batch path for full reindexing and a low‑latency incremental path for real‑time document updates, with strong guarantees around document version ordering and idempotent processing.
Lead the migration of ingestion infrastructure from Open Search to Vespa, including design of Vespa document processors, custom Kafka feeders, and application package architecture – resolving complex technical challenges that have little or no precedent within the team.

Custom Model Operationalization

Own the end‑to‑end lifecycle for custom models integrated into the ingestion pipeline – re‑ranking models, embedding models, and enrichment components – including inference serving behind a stable API surface, latency SLO management, hardware and runtime configuration (batching, quantization), and scaling.
Build and operate the model promotion pipeline: the CI/CD workflow that moves a model artifact from the fine‑tuning team through staging to production, including versioning, canary rollouts, and rollback mechanisms – ensuring the platform team can operate model updates independently without depending on the research team for production changes.
Define and maintain integration contracts between custom models and downstream pipeline components – governing input/output schemas, compatibility requirements, and the governance process for model updates that ensures search pipeline consumers are not broken by changes upstream.
Instrument model serving for production observability: latency distributions, throughput, error rates, and quality signals such as re‑ranking score distributions – enabling the team to detect regressions or model drift without involving the fine‑tuning team.

Search Engine & Index Management

Own the search engine layer end‑to‑end: design and operate Vespa (and Open Search during transition) index configurations, ranking profiles, schema definitions, and application package lifecycle…