Senior Storage & Data Engineer
Listed on 2026-06-21
-
IT/Tech
Data Engineering
CSCS is looking for a Data & Storage Engineer
, working at the intersection of high-performance storage and research data management.
CSCS (operated by ETH Zurich, with offices in Lugano and Zurich) runs supercomputing infrastructure for researchers across academia and industry. This is a two-year position.
You’ll work across two layers: the storage layer — throughput, integrity, and tiering at multi‑petabyte scale — and the data layer above it, tackling lineage, provenance, discoverability, and access patterns. The goal is to close the gap between raw bytes sitting on a parallel file system and data that researchers can actually trust, find, and reproduc.
Responsibilities- Bridge ingestion and use. Design the pipelines and metadata that turn ingested data into something findable and consumable — catalogs, schemas, and access layers that match how training jobs and simulations actually read, not just where bytes sit.
- Make data traceable. Build lineage and provenance so any dataset, checkpoint, or result can be traced back to its inputs and transformations. Reproducibility is a first‑class requirement here, not a retro‑fit.
- Tune for the workload. Optimize parallel file systems (Lustre, GPFS) and object storage for the concurrency, small‑file, and large‑checkpoint patterns of distributed GPU training and HPC simulation.
- Operate at scale, safely. Design and run multi‑petabyte storage with the integrity and availability scientific work depends on — erasure coding, redundancy, hot‑to‑archival tiering.
- Automate everything. Deploy and scale storage and data services as code. Snowflake infrastructure doesn’t survive at this scale.
- Make it observable. Instrument storage health, capacity trends, and pipeline performance so problems surface before users feel the pain.
- Translate. Turn real access patterns from domain scientists and ML engineers into technical requirements — and push back when a request would quietly break something downstream.
- A technical degree (CS, engineering) or equivalent experience that demonstrates the same depth.
- Solid storage grounding: file systems (block and object), performance tuning, redundancy (RAID, erasure coding).
- Python, and comfort automating infrastructure (Ansible, Terraform, or similar).
- A working understanding of how ML and scientific workloads consume data — billions of small files, large checkpoints, sharding — and why naive layouts fall over.
- A point of view on data lineage, provenance, or reproducibility — and ideally tooling you’ve used to enforce it.
- Hands‑on parallel file systems (Lustre, Spectrum Scale/GPFS) or distributed storage (Ceph, VAST).
- Scientific data formats — HDF5, Zarr, Parquet — and opinions on when each earns its place.
- Object storage (S3) interfaced with ML frameworks (PyTorch, Tensor Flow).
- Orchestration (Kubernetes, Argo) and data‑movement tooling.
- Data versioning / cataloguing (e.g. DVC, lakeFS, a metadata catalog) and familiarity with FAIR data principles.
- CI/CD and provisioning:
Git Lab CI, Hashi Corp Vault, MAAS.
- Hardware and scale you won’t find in enterprise IT — and problems with no vendor playbook.
- Work that directly enables published science and frontier‑scale model training.
- Room to shape how data is managed, not just maintained, in an environment that takes it seriously.
Seriously. Curious? Read more and apply now: (Use the "Apply for this Job" box below)._55peH7G
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: