×
Register Here to Apply for Jobs or Post Jobs. X

Software Engineer — Search Platform, Ingestion & Indexing

Job in Frisco, Collin County, Texas, 75034, USA
Listing for: Thomson Reuters
Full Time position
Listed on 2026-04-23
Job specializations:
  • Engineering
    Data Engineer, Software Engineer, AI Engineer, Systems Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Position: Staff Software Engineer — Search Platform, Ingestion & Indexing

Overview

Advanced Content Engineering (ACE) is seeking a Staff Software Engineer to serve as the technical anchor for the search platform’s ingestion and indexing systems. The platform processes millions of documents across legal, tax, and professional content corpora — parsing, chunking, enriching, embedding, and indexing them into a hybrid search engine that powers both human-facing search interfaces and autonomous AI agents. This role owns the design, implementation, and operational health of the document ingestion pipeline and search index management systems, from Kafka-based streaming infrastructure to Vespa application architecture.

The expectation is full‑stack ownership and constant delivery to production, with a focus on architectural leadership.

Responsibilities

Ingestion Pipeline Architecture & Engineering

  • Plan, design, develop, and own the end‑to‑end document ingestion pipeline — a Kafka‑based stream processing architecture that moves documents through parsing, chunking, enrichment (entity extraction, embedding generation, metadata enrichment), and indexing stages, with fault tolerance, version ordering, and at‑least‑once delivery guarantees.
  • Architect and implement pluggable, configurable pipeline components that client teams can assemble into custom topologies via self‑service APIs, maintaining reliable, observable, and performant execution.
  • Own the platform’s Protobuf‑based document schema and schema registry integration, establishing schema governance, enforcing backward compatibility, and ensuring reliable serialization.
  • Design and implement dual‑flow ingestion: a high‑throughput batch path for full reindexing and a low‑latency incremental path for real‑time updates, with strong guarantees around version ordering and idempotency.
  • Lead the migration of ingestion infrastructure from Open Search to Vespa, including custom Kafka feeders and application package architecture.

Custom Model Operationalization

  • Own the end‑to‑end lifecycle for custom models integrated into the ingestion pipeline — re‑ranking, embedding, and enrichment components, including inference serving, latency SLO management, batching, and scaling.
  • Build and operate the model promotion pipeline (CI/CD workflow) from fine‑tuning through staging to production, with versioning, canary rollouts, and rollback mechanisms.
  • Define and maintain integration contracts between models and downstream components, governing input/output schemas and governance processes.
  • Instrument model serving for production observability: latency distributions, throughput, error rates, and quality signals like re‑ranking score distributions.

Search Engine & Index Management

  • Own the search engine layer end‑to‑end: design and operate Vespa (and transition from Open Search) index configurations, ranking profiles, schema definitions, and application package lifecycle management.
  • Build and operate zero‑downtime index management: shadow indexing, blue/green promotion, and rolling reindex workflows.
  • Implement and maintain the Component Registry and Index Registry, ensuring correctness, observability, and safe concurrent modification.
  • Develop full‑reindex and incremental‑update orchestration logic, including change detection, document tracking, Kafka topic management, and Dynamo

    DB‑backed state management.

Agentic Search Infrastructure

  • Design ingestion and indexing infrastructure with agentic retrieval patterns, explicit latency budgets, chunking and result compression strategies, and index boundary definitions.
  • Build trace‑level observability into the retrieval stack to capture tool usage and order for deterministic diagnosis.
  • Design session state and cache invalidation patterns for multi‑turn agentic search, reasoning on cache validity windows, session state scope, and stale‑context prevention.

Evaluation & Search Quality

  • Build and own the integration between the ingestion pipeline and the platform’s offline evaluation framework, supporting evaluation, grading, and ranking comparison.
  • Instrument query and retrieval stack for online analytics: real‑time latency, throughput, query log collection, and support for A/B experiments.
  • Partner with research scientists to evaluate…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary