Sr Data Scientist GenAI Job Austin area,Texas USA,IT/Tech

Overview

Sr Data Scientist (NLP / LLM / Generative AI)

Location:

Dallas, TX

Benefits

Responsibilities

Design, build, fine-tune, and deploy LLMs, transformer-based NLP models, and GenAI solutions for both batch and real-time/streaming contexts.
Own all major components of ML pipelines: data ingestion, cleaning, pre-processing (structured & unstructured), embedding, search & retrieval, prompt engineering, RAG (Retrieval-Augmented Generation).
Collaborate closely with ML Engineers, MLOps, software engineering, product, compliance, legal etc., to move models from prototype to production—ensuring reliability, scalability, monitoring, and maintainability.
Define and implement evaluation frameworks: accuracy, bias, fairness, hallucination, consistency, latency; run UAT, stress-tests, drift detection.
Optimize models and pipelines for performance, cost, and efficiency.
Ensure best practices in model development: version control, repeatability, documentation, governance, and ethical AI use.
Mentor more junior data scientists; help build team skills in NLP, GenAI practices, prompt engineering, fine-tuning.
Identify new use cases; prototype innovations in GenAI/NLP; keep up with latest research and open source developments, decide what to adopt.

Must-Have Qualifications

10+ years of experience in data science / ML, with substantial work in NLP, LLMs, or Generative AI.
Deep hands-on experience in Python, using frameworks like PyTorch, Tensor Flow, Hugging Face etc.
Proven track record building transformer/NLP / LLM models; experience with fine-tuning, prompt engineering.
Solid experience with information retrieval / search: keyword + semantic search, embeddings, vector databases.
Experience working in production / deploying models (batch and streaming), working with MLOps practices.
Strong algorithmic / statistical / mathematical fundamentals. Ability to reason about model behaviour, bias, uncertainty.
Good communicator: able to translate complex technical detail to business / non-technical stakeholders.

Nice to Have

Master's in Computer Science, Computational Linguistics, Statistics, Machine Learning or related field.
Experience with multimodal models (vision + text) or emerging LLMs and agent-based systems.
Experience with open source LLMs & toolkits; familiarity with Lang Chain or similar frameworks.
Prior experience in regulated environments (finance, risk, legal, compliance) with strong governance, privacy requirements.

Work remote temporarily due to COVID-19.

#J-18808-Ljbffr