Senior Data Scientist Job Hamilton area,Scotland UK,Software Development

We need someone who can build high-quality forecasting models for UK energy balancing markets — not a generalist who's touched a bit of everything, but a specialist who genuinely understands time series, knows how to extract signal from massive feature sets, and can produce reliable probabilistic forecasts.

You’ll spend significant time on tasks like: engineering features from raw market data, selecting the most predictive subset from hundreds of thousands of candidates, building gradient boosting models that output well-calibrated prediction intervals, and rigorously validating everything to avoid the subtle leakage problems that plague time series work.

You won’t be responsible for deployment — we have experienced Dev Ops for that. But you’ll need to hand off models that are well‑documented, reproducible, and actually work in production. If you find satisfaction in the craft of building models that hold up under scrutiny — rather than just hitting a metric on a test set — this role is for you.

Feature

Engineering and Selection

Engineer predictive features from energy market data (prices, volumes, grid conditions, weather, calendar effects)
Work with feature sets in the hundreds of thousands — you’ll need systematic approaches, not manual inspection
Apply and evaluate feature selection methods (mRMR, importance‑based selection, recursive elimination) to build parsimonious models
Analyse feature importance and stability across time periods and market conditions
Understand the domain well enough to create features that reflect how the balancing market actually works

Model Development

Build gradient boosting models (XGBoost, Light

GBM, Cat Boost) for multi‑horizon forecasting
Produce probabilistic forecasts — prediction intervals, quantile regression, or distribution outputs — not just point estimates
Handle class imbalances appropriately when the problem requires classification
Design proper time series cross‑validation schemes that respect temporal ordering
Diagnose and fix target leakage — you should be able to explain why a 'too good' result is suspicious

Validation and Testing

Test pipeline components using synthetic/artificial data where ground truth is known
Validate that preprocessing steps (missing value imputation, outlier handling) don’t introduce leakage
Build confidence that models will generalise, not just interpolate

Experiment Tracking and Reproducibility

Track experiments systematically (MLflow or similar)
Maintain reproducible training pipelines with proper configuration management
Document model decisions, hyperparameter choices, and validation results clearly

Domain Understanding

Invest time learning UK energy balancing markets — BM units, settlement periods, system prices, imbalance dynamics
Translate domain knowledge into model improvements (better features, appropriate loss functions, sensible constraints)
Collaborate with colleagues who understand the data infrastructure and market context

Must Have

Deep time series experience — you understand why random CV splits fail for forecasting, how to handle multiple horizons, and the pitfalls of lookahead bias
Strong feature engineering and selection skills — you’ve worked with high‑dimensional feature sets and know multiple approaches to reduce them systematically
Gradient boosting expertise — XGBoost, Light

GBM, or Cat Boost are your core tools; you understand their hyperparameters and when each matters
Probabilistic forecasting ability — you can produce calibrated prediction intervals or quantile forecasts, not just point predictions
Rigorous validation mindset — you’re paranoid about leakage, you test your assumptions, and you don’t trust results that seem too good
Python fluency — clean, testable code; comfortable with pandas/Polars, scikit‑learn, and the GBM libraries
SQL competence — you can pull and reshape data from Postgre

SQL without friction
Clear communication — you document your work and can explain model behaviour to non‑ML colleagues

Nice to Have

Experience with MLflow, Hydra, Metaflow, or similar tooling for experiment tracking and pipeline management
Polars experience (we’re migrating some workloads from pandas)
Background in energy, utilities,…


Increase/decrease your Search Radius (miles)



Job Posting Language