Simulation-to-ML Infrastructure Engineer
Listed on 2026-03-07
-
IT/Tech
Data Engineer, Systems Engineer
Role Overview
Smack Technologies is building the infrastructure that turns large-scale simulation output into usable training data for reinforcement learning systems. As a Simulation-to-ML Infrastructure Engineer on Applied Engineering, you will own the end-to-end pipeline from raw simulation states through data processing, storage, versioning, and delivery into ML training workflows.
This is a greenfield role. Early work focuses on standing up foundational infrastructure that enables iteration. Over time, the system must scale to handle massive simulation-driven data generation and repeated training cycles. The emphasis is on building working systems first, then evolving them as usage and scale increase.
You will operate at the intersection of simulation, data engineering, infrastructure, and ML consumption, working closely with simulation engineers and ML researchers to ensure the system supports real training needs.
What You’ll Do- Stand up foundational infrastructure to support simulation execution and data collection.
- Design and implement data storage and management practices that scale with growing volume and complexity.
- Build initial data pipelines that ingest simulation outputs and prepare them for reinforcement learning training.
- Implement basic validation, quality checks, and data organization to support early experimentation.
- Establish data versioning and lineage practices to support reproducibility as the system evolves.
- Set up experiment tracking, dataset management, and model artifact storage for training workflows.
- Support training job execution and infrastructure required for iterative RL experimentation.
- Work with simulation and ML teams to define data interfaces and end-to-end flow requirements.
- Build bridges between simulation systems and ML training infrastructure, with future feedback loops in mind.
- Implement containerization, deployment pipelines, and basic observability to support rapid iteration.
- Continuously evaluate scaling bottlenecks and evolve the system as usage patterns emerge.
- Document architectural decisions and patterns to keep the system understandable as it grows.
- Contribute to adjacent infrastructure or tooling work as needed to unblock progress.
- Active TS/SCI clearance
- Experience building and operating infrastructure in greenfield environments
- Strong background in distributed systems, data pipelines, or ML infrastructure
- Active TS/SCI clearance
- Comfort working with cloud platforms and infrastructure automation
- Solid understanding of Linux systems, networking, and storage
- Experience designing systems that start simple and evolve toward scale
- Strong programming skills in Go, Python, Java, or similar
- Ability to work across simulation, data, and ML boundaries
- Comfort operating with ambiguity and making pragmatic tradeoffs
- Infrastructure:
Docker, Kubernetes, CI/CD - Data Pipelines: ingestion, validation, versioning, storage
- ML Training Support: experiment tracking, dataset management, artifact storage
- Systems: distributed execution, scalability, reliability
- Languages:
Go, Python, or similar - Deployment Contexts: TS/SCI, IL-7, on-prem environments
- Prior experience in ML infrastructure or MLOps
- Experience supporting reinforcement learning or large-scale training systems
- Familiarity with ML frameworks such as PyTorch or Tensor Flow
- Experience with experiment tracking tools or workflow orchestration systems
- Background in simulation, scientific computing, or HPC environments
- Experience evolving systems from early prototypes to large-scale platforms
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).