Lead Data Scientist Job Jamestown area,Town of Poland New York USA,IT/Tech

Location: Town of Poland

While Xebia is a global tech company, our journey in CEE started with two Polish companies – PGS Software, known for world-class cloud and software solutions, and Get In Data , a pioneer in Big Data. Today, we’re a team of 1,000+ experts delivering top-notch work across cloud, data, and software. And we’re just getting started.

What We Do

We work on projects that matter – and that make a difference. From fintech and e-commerce to aviation, logistics, media, and fashion, we help our clients build scalable platforms, data and AI solutions, and cutting-edge applications to shape the future of tech. Our clients include McLaren, Aviva, Deloitte, Spotify, Disney, ING, UPS, Tesco, Truecaller, All Saints, Volotea, Schmitz Cargobull, Allegro, InPost, and many, many more.

We value smart tech, real ownership, and continuous growth. We use modern, open-source stacks, and we’re proud to be trusted partners of Databricks, dbt, Snowflake, Azure, GCP, and AWS. Fun fact: we were the first AWS Premier Partner in Poland!

Beyond Projects

What makes Xebia special? Our community. We support tech communities, organize meetups (Software Talks, Data Tech Talks), and have a culture that actively support your growth via Guilds, Labs, and personal development budgets — for both tech and soft skills. It’s not just a job. It’s a place to grow.

What sets us apart?

Our mindset. Our vibe. Our people. And while that’s hard to capture in text – come visit us and see for yourself.

You will be:

designing and developing statistical models for property price adjustments across time, location, quality, and condition,
building spatial algorithms (adaptive heatmaps, geographic clustering, polygon-based property search) to capture local market dynamics,
implementing comparable property recommendation with feature engineering across different property types,
developing market analysis pipelines with solid diagnostics: trend fitting, outlier detection, goodness-of-fit metrics,
integrating LLM-based classification services for document and property analysis,
exposing model outputs through production API endpoints and working with frontend engineers on data contracts,
debugging models in production: edge cases, numerical issues, data quality problems.

Your profile:

solid statistics background: regression, GAMs, mixed/random effects, link functions, robust estimation, outlier handling,
proficiency in Python and the data science stack:
Num Py, Pandas, stats models, Sci Py, scikit-learn,
experience building and maintaining production APIs with FastAPI and Pydantic,
comfortable working with PostgreSQL and SQL Alchemy,
familiar with containerized environments (Docker, Kubernetes, GCP),
able to turn domain requirements into quantitative solutions and communicate trade-offs,
good command of English (spoken and written),
familiarity with basic statistical concepts (e.g., Bayes’ rule, linear regression, maximum likelihood estimation,
practical experience using AI-powered assistants (e.g. Claude Code, Git Hub Copilot, Cursor) to improve productivity, quality, or decision-making in software delivery.

Nice to have:

geospatial data and libraries (Geo Pandas, Shapely, H3, GeoAlchemy2),
GAM libraries (PyGAM), JAX, or Tensor Flow Probability,
task queues and async workflows (Celery, Redis),
observability tooling (Open Telemetry),
data validation and property-based testing (Pandera, Hypothesis, Test Containers),
R integration (rpy2),
LLM integrations (Google Gemini or similar),
frontend awareness (React, Type Script),
real estate data, valuation methodology, or appraisal workflows.

Work from the European Union region and a work permit are required.

#J-18808-Ljbffr