×
Register Here to Apply for Jobs or Post Jobs. X

Senior AI Data Engineer​/Data Scientist

Job in Town of Poland, Jamestown, Chautauqua County, New York, 14701, USA
Listing for: Billennium
Full Time position
Listed on 2026-06-13
Job specializations:
  • IT/Tech
    AI Engineer (Applied/Software), Data Scientist, Data Engineering, Data Analyst
Salary/Wage Range or Industry Benchmark: 60000 - 80000 USD Yearly USD 60000.00 80000.00 YEAR
Job Description & How to Apply Below
Position: Senior AI Data Engineer/ Data Scientist
Location: Town of Poland

Billennium is a global technology company with over 20 years of experience, committed to innovation and empowering businesses. As an employer, we offer a supportive, growth-focused environment where collaboration and creativity thrive. Join us to shape the future of technology together!

About the Role:

We are looking for a Senior AI Data Engineer / Data Scientist who can turn messy enterprise data into AI-ready, high-quality knowledge assets.

You will lead the cleanup, preparation, and enrichment of unstructured content (SharePoint/document repositories) and structured/semi-structured data (data lakes, databases) so our agents, copilots, and RAG systems are accurate, trustworthy, and scalable.

This is a senior, hands-on role. You will own data quality outcomes end-to-end: discovery - cleanup - enrichment - ingestion - refresh cycles - governance. We value AI-native generalists who can remove bottlenecks by working directly with AI Engineers, Architects, and business stakeholders to decide what data is worth using and how to structure it for retrieval and reasoning.

Our standardized stack includes (and this role actively uses it): ingestion/ETL foundations, Postgres + pgvector as default RAG store, Redis caching, LLM gateway patterns, Langfuse observability, Deep Eval/RAGAS evaluation, and Presidio for PII detection/masking when required.

Must-have requirements:
  • 5+ years in data engineering / applied data science / analytics engineering with ownership of production pipelines.
  • Proven experience working with unstructured enterprise data (documents, PDFs, Office files, wikis, knowledge bases).
  • Solid understanding of data quality engineering: validation, monitoring, lineage, refresh cycles.
  • Strong stakeholder skill
    : can work with business to define what data matters and what “good” looks like.
Nice to have:
  • Experience with Postgres + pgvector (or similar vector stores), retrieval optimization, and hybrid search concepts.
  • Familiarity with observability practices for AI pipelines and the use of RAG evaluation metrics (RAGAS-style).
  • Experience with governance tooling and privacy controls for enterprise AI (e.g., PII workflows).
What you will do:
  • Lead “data triage” for AI use cases: identify authoritative sources, duplicates, outdated content, and low-quality documents.
  • Clean, normalize, deduplicate, and standardize enterprise content at scale (documents, PDFs, Word/Excel, wiki pages, etc.).
  • Define what data should be excluded from AI systems (stale, contradictory, low-trust, or sensitive content).
  • Unstructured ingestion (SharePoint + document repositories)
  • Build robust ingestion pipelines for SharePoint and file repositories: parsing, text extraction, structure recovery, and metadata capture.
  • Implement document normalization strategies (naming, taxonomy, metadata standards, canonical IDs).
  • Design chunking strategies, metadata enrichment, and document structuring optimized for retrieval performance and cost.
  • Improve retrieval quality through practical techniques such as filtered retrieval and post-retrieval optimization where appropriate (e.g., reranking), collaborating with AI Engineers on the retrieval interface.
  • Prepare and maintain “AI-ready knowledge sets” that can be embedded and served via Postgres + pgvector (default).
  • Data quality, evaluation, and feedback loops (non-negotiable)
  • Define and implement data quality gates (freshness, completeness, relevance, dedupe rate, metadata coverage).
  • Partner with AI Engineers to evaluate retrieval and RAG performance using frameworks like RAGAS (answer correctness, context recall/precision) and to monitor trust metrics over time.
  • Establish human feedback loops where needed (review queues, sampling, targeted audits) to continuously improve data usefulness and user trust. Governance, privacy, and auditability
  • Apply privacy and enterprise constraints; where required, implement PII detection/masking using Presidio patterns.
  • Reuse Package reusable “data cleanup + RAG readiness” recipes: ingestion templates, metadata schemas, chunking playbooks, dedupe strategies.
  • Build a repeatable data foundation that accelerates future use cases (not a one-off cleanup project).
Our offer:
  • Comprehensive be…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary