Simulation-to-ML Infrastructure Engineer Job El Segundo area,California USA,IT/Tech

Role Overview

Smack Technologies is building the infrastructure that turns large-scale simulation output into usable training data for reinforcement learning systems. As a Simulation-to-ML Infrastructure Engineer on Applied Engineering, you will own the end-to-end pipeline from raw simulation states through data processing, storage, versioning, and delivery into ML training workflows.

This is a greenfield role. Early work focuses on standing up foundational infrastructure that enables iteration. Over time, the system must scale to handle massive simulation-driven data generation and repeated training cycles. The emphasis is on building working systems first, then evolving them as usage and scale increase.

You will operate at the intersection of simulation, data engineering, infrastructure, and ML consumption, working closely with simulation engineers and ML researchers to ensure the system supports real training needs.

What You’ll Do

Stand up foundational infrastructure to support simulation execution and data collection.
Design and implement data storage and management practices that scale with growing volume and complexity.
Build initial data pipelines that ingest simulation outputs and prepare them for reinforcement learning training.
Implement basic validation, quality checks, and data organization to support early experimentation.
Establish data versioning and lineage practices to support reproducibility as the system evolves.
Set up experiment tracking, dataset management, and model artifact storage for training workflows.
Support training job execution and infrastructure required for iterative RL experimentation.
Work with simulation and ML teams to define data interfaces and end-to-end flow requirements.
Build bridges between simulation systems and ML training infrastructure, with future feedback loops in mind.
Implement containerization, deployment pipelines, and basic observability to support rapid iteration.
Continuously evaluate scaling bottlenecks and evolve the system as usage patterns emerge.
Document architectural decisions and patterns to keep the system understandable as it grows.
Contribute to adjacent infrastructure or tooling work as needed to unblock progress.

Must-Have Qualifications

Active TS/SCI clearance
Experience building and operating infrastructure in greenfield environments
Strong background in distributed systems, data pipelines, or ML infrastructure
Active TS/SCI clearance
Comfort working with cloud platforms and infrastructure automation
Solid understanding of Linux systems, networking, and storage
Experience designing systems that start simple and evolve toward scale
Strong programming skills in Go, Python, Java, or similar
Ability to work across simulation, data, and ML boundaries
Comfort operating with ambiguity and making pragmatic tradeoffs

Core Technologies & Concepts

Infrastructure:
Docker, Kubernetes, CI/CD
Data Pipelines: ingestion, validation, versioning, storage
ML Training Support: experiment tracking, dataset management, artifact storage
Systems: distributed execution, scalability, reliability
Languages:

Go, Python, or similar
Deployment Contexts: TS/SCI, IL-7, on-prem environments

Nice-to-Have Qualifications

Prior experience in ML infrastructure or MLOps
Experience supporting reinforcement learning or large-scale training systems
Familiarity with ML frameworks such as PyTorch or Tensor Flow
Experience with experiment tracking tools or workflow orchestration systems
Background in simulation, scientific computing, or HPC environments
Experience evolving systems from early prototypes to large-scale platforms

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language