×
Register Here to Apply for Jobs or Post Jobs. X

Senior Data Architect

Job in Town of Poland, Jamestown, Chautauqua County, New York, 14701, USA
Listing for: Omilia
Full Time position
Listed on 2026-04-23
Job specializations:
  • IT/Tech
    Data Engineering, Data Scientist, Data Analyst, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Location: Town of Poland

Accountabilities

  • Own the Training Environment data architecture end-to-end: dataset design and schema for all ML training pipelines, including dialog corpora for LLM training, conversational steps for NLU models, annotated evaluation sets, and whole-call recordings for speech-to-speech model development
  • Define and govern data selection and sampling strategy: establish criteria that determine which production conversations have the highest training value, including diversity-optimized sampling, confidence-based filtering, edge-case prioritization, and deduplication strategies
  • Build and maintain the data catalog and dataset discovery infrastructure: enable ML engineers across LLM, NLU, Speech, and Agentic teams to find, understand, and use training data without friction
  • Define annotation pipeline architecture: establish requirements for data labeling — intent annotation, entity tagging, dialog act classification, task completion scoring, and agentic reasoning evaluation — across internal annotators and external vendors
  • Architect the data flywheel: the closed-loop system where real customer conversations feed back into training data collection, curation, annotation, model retraining, and evaluation
  • Own and maintain data pipelines and infrastructure spanning Snowflake, AWS S3, ETL/ELT pipelines (Airflow), and integration with ML training workflows on AWS Sage Maker
Key Responsibilities
  • Work directly with LLM, NLU, and Agentic systems teams to understand training data requirements — what conversational patterns improve zero-shot routing accuracy, what dialog structures train better task planners, what edge cases stress-test agentic reasoning — and translate these into concrete dataset specifications and pipeline configurations
  • Define and maintain the data architecture for Omilia's Training Environment: schema design, data flow patterns from production (OCP) to centralized training infrastructure, storage strategy (Snowflake + S3), cross-pipeline consistency, and clear auditable data lineage, including anonymization requirements as part of the compliance layer
  • Design data quality frameworks that directly improve model outcomes: content-based deduplication, diversity-maximizing sampling, confidence-based filtering using NLU scores and behavioral signals, and dedicated NLU improvement corpus extraction from low-confidence and no-match production data
  • Define annotation requirements for ML model development — intent labeling guidelines, entity tagging schemas, dialog act classification, task completion scoring, and reasoning quality assessment — and design annotation workflows that produce consistent, high-quality labels at scale; evaluate and manage external data annotation vendors
  • Build and maintain the data catalog that enables cross-team dataset discovery: document dataset contents, schemas, lineage, quality metrics, intended use cases, and known limitations; define the taxonomy for organizing training datasets across model types (LLM, S2S, NLU, ASR, TTS, agentic)
  • Architect the closed-loop data flywheel: production conversations → data selection → anonymization → curation → annotation → model training → evaluation → safe redeployment → back to production; define feedback mechanisms that route model failure cases into targeted training data collection
  • Identify gaps in production training data and define requirements for external data acquisition (public datasets, synthetic data generation, vendor-sourced corpora); design data augmentation strategies for underrepresented languages, domains, or conversational patterns
  • Work closely with LLM/NLU/S2S/ASR/TTS/VB Tech Leads and Senior Engineers to align data architecture with model training requirements; collaborate with Platform Engineering, Security & Compliance, and Product Management stakeholders
  • Maintain comprehensive documentation of data architecture, dataset specifications, pipeline configurations, and data catalog; produce data architecture RFCs for significant changes and share best practices with ML teams
Requirements Technical / Professional Skills
  • 5+ years in data architecture, data engineering, or LLM/ML data infrastructure, with demonstrated ownership of…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary