AI Engineer Job Leawood area,Kansas USA,IT/Tech

Description

Propio Language Services is a provider of the highest quality interpretation, translation, and localization services. Our people take pride in every resource we offer, and our users always have access to cutting-edge technology, exceptional support, and collaborative user experiences. We are driven by our passion for innovation, growth, and bridging communication gaps in a diverse world. If you’re passionate about delivering technology-driven solutions and building lasting client relationships while contributing to client growth, Propio could be the ideal place for you.

Job

Type

Full-time

Position

AI Data Strategy Engineer / Applied Scientist, LLM Data

Scope

Propio is building AI-powered systems that enhance multilingual communication, improve interpreter workflows, and support next-generation AI applications across text, speech, and multimodal experiences. This role owns the data strategy, curation pipelines, annotation workflows, and evaluation datasets that power our multilingual AI systems. It is a hands‑on technical role for someone who understands how to manage the full AI data lifecycle, from acquisition, curation, annotation, and quality control to evaluation datasets and post‑training data, to directly improve model performance.

The ideal candidate can build scalable data pipelines, design high‑quality annotation and QA processes, identify model failure modes, and close performance gaps through targeted data acquisition, curation, and synthetic data generation.

Requirements

Define the end‑to‑end data roadmap for multilingual and multimodal AI systems, including text, speech, translation, interpretation, low‑resource languages, and agentic AI workflows.
Design and build dataset curation pipelines for training, post‑training, and evaluation, including cleaning, deduplication, filtering, PII redaction, quality scoring, sampling, balancing, and versioning.
Create annotation schemas, labeling guidelines, QA rubrics, golden datasets, and reviewer workflows for multilingual, speech, translation, and agentic AI data.
Build evaluation datasets and benchmarks, analyze model failure modes, and translate performance gaps into targeted data improvements.
Support post‑training data workflows such as SFT, instruction tuning, preference data, RLHF/DPO‑style data, reward model data, and synthetic data generation.
Use modern annotation tools and AWS‑based data infrastructure to scale secure, traceable, and compliant AI data workflows.

Qualifications

Bachelor’s degree in Computer Science, Machine Learning, Data Science, Computational Linguistics, Linguistics, Statistics, or a related field, or equivalent practical experience.
4+ years of experience in AI data, ML data operations, NLP data engineering, applied ML, speech/translation data, or LLM data workflows.
Strong hands‑on experience with Python, SQL, and dataset curation pipelines.
Experience with annotation workflows, QA rubrics, evaluation datasets, or human‑in‑the‑loop data processes.
Familiarity with multilingual NLP, speech data, translation data, low‑resource languages, conversational AI, or agentic AI datasets.
Working knowledge of AWS data and ML tools such as S3, Glue, Sage Maker, Bedrock, Lambda, Step Functions, EKS/ECS, IAM, or KMS.
Strong communication skills and ability to work with ML engineers, applied scientists, product teams, linguists, data teams, and vendors.

Preferred Qualifications

Master’s or PhD in Computer Science, Machine Learning, NLP, Computational Linguistics, Data Science, Statistics, or a related field.
Experience with LLM post‑training workflows such as SFT, instruction tuning, preference data, RLHF, DPO, reward modeling, or evaluation data generation.
Experience with synthetic data generation, active learning, weak supervision, LLM‑as‑judge workflows, or automated data quality scoring.
Experience with modern annotation and data platforms such as Labelbox, Scale AI, Prodigy, Argilla, Snorkel, Humanloop, or custom internal tooling.

#J-18808-Ljbffr