×
Register Here to Apply for Jobs or Post Jobs. X

Machine Learning Engineer Mountain View, CA, View

Job in Mountain View, Santa Clara County, California, 94039, USA
Listing for: Unity Technologies
Full Time position
Listed on 2026-07-03
Job specializations:
  • Software Development
    Machine Learning/ ML Engineer, AI Engineer (Applied/Software)
Salary/Wage Range or Industry Benchmark: 218400 - 327600 USD Yearly USD 218400.00 327600.00 YEAR
Job Description & How to Apply Below
Position: Staff Machine Learning Engineer Mountain View, CA, USA View role
## Staff Machine Learning Engineer Apply locations:
Mountain View, CA, USAtime type:
Full time posted on:
Posted Yesterday job requisition :
JOBREQ-2616040
** The opportunity
** We are building the next generation of AI-driven game experiences — generative world models, neural rendering, and multi-modal understanding that turn images, text, and 3D primitives into interactive worlds. As our Staff Machine Learning Engineer, you will be a core technical leader bringing state-of-the-art computer vision and multi-modal models — transformers, diffusion networks, vision-language models (VLMs), and JEPA-style architectures — from research into robust, production-grade systems.

This is a deeply hands-on, high-impact role. You will help define the modeling and deployment strategy, drive architectural decisions across the ML stack, and mentor a team of senior and mid-level engineers. Your work will directly shape the quality, capability, and performance of AI features experienced by billions of players — across cloud, server, and on-device targets.
** What you'll be doing
** Technical Leadership
* Help set the technical vision and roadmap for computer vision and multi-modal AI models, spanning transformers, diffusion models, vision-language models, and JEPA-style generative architectures.
* Drive design and implementation of models for image and video understanding, generation, segmentation, detection, and dense prediction, as well as multi-modal reasoning over images, text, and 3D inputs.
* Make sound decisions on model architecture, training strategy, data pipelines, and evaluation — balancing quality, capability, latency, and cost across deployment targets.
* Own the path from research prototype to production: training, fine-tuning, distillation, export, and serving, with deployment spanning cloud GPUs through to efficient on-device inference where the product requires it.

Architecture   Research Translation
* Collaborate directly with research scientists to translate novel CV and multi-modal model architectures into deployable, well-engineered implementations.
* Design scalable systems for multi-modal inference that process diverse inputs images,
* video, text, primitives, and metadata — and produce rich outputs from semantic
* predictions to pixel-level generation.
* Track and rapidly adopt breakthroughs across the field: vision-language pretraining and
* alignment, efficient diffusion (e.g., consistency models, flow matching), efficient attention
* e.g., Flash Attention, linear-attention variants), and tokenization/representation learning
* for vision.
* Where latency or device constraints demand it, apply compression, quantization, pruning, and knowledge distillation, and work with appropriate runtimes (e.g., TensorRT, ONNX Runtime, CoreML, TFLite) to meet performance budgets.
* Team   Cross-Functional Leadership
* Lead and mentor a team of ML engineers; define engineering best practices, code review standards, and rigorous benchmarking and evaluation methodology.
* Partner with research, platform engineers, product managers, and runtime teams to align ML capabilities with product roadmaps and target-platform constraints.
* Champion a culture of measurement: define KPIs for model quality, accuracy, latency, memory, and cost, and ensure the team tracks them rigorously.
** What we're looking for
*** 6+ years in ML engineering, with significant depth in computer vision and/or multi-modal modeling.
* Proven production experience with transformer-based and diffusion-based vision models (e.g., ViT, CLIP/SigLIP-style encoders, Stable Diffusion, DETR/SAM-style architectures)
* Strong command of the full model lifecycle: data curation, training and fine-tuning, evaluation, and serving at scale.
* Familiarity with efficient attention, diffusion samplers, multi-modal fusion, and vision-language alignment techniques.
* Strong Python and modern deep-learning tooling (PyTorch); solid software
* engineering fundamentals.
* Track record of technical leadership: setting direction, influencing cross-functional partners, and growing engineers.
** You might also have
*** Experience with world-model, video-generation, or neural rendering pipelines…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary