Member of Technical Staff – AI Research Engineer; Image/Video Foundation Models
Listed on 2026-02-13
-
IT/Tech
AI Engineer, Data Scientist, Machine Learning/ ML Engineer, Systems Engineer
Location: Zürich
Overview
Gen Peach AI is a product-driven research lab building vertical multimodal foundation models for hyper-realistic human generation in image and video – designed for emotionally resonant, human-centered AI experiences. Our goal is to create tools that supercharge human creativity rather than replace it.
We train models from scratch: proprietary datasets at massive scale, novel architectures and training recipes, large GPU clusters, and tight product integration so research ships to users quickly.
We are a deeply technical team of around 10 people. We’re advised by Directors from Google Deep Mind and backed by leading AI-focused funds and angels from OpenAI, Meta AI, Microsoft AI, Project Prometheus, and Fal. Collectively, our team, advisors, and angels have contributed to models including Meta’s Imagine/Movie Gen and foundation-model work behind OpenAI’s Sora, plus Google’s Veo and Gemini.
About Gen Peach AI
You’ll join the research team working across image/video generation and multimodal understanding. You’ll work closely with other Research Engineers and Scientists, as well as Founders and help turn research into scalable training runs, strong evaluations, and production-ready systems.
RoleWe’re hiring an AI Research Engineer to help build and scale Gen Peach’s foundation models end-to-end – from implementing new model ideas and training recipes, to owning the parts of the training stack that determine quality and speed, to pushing models through production constraints.
This is a hands-on, high-ownership role. You’ll write research-grade code that becomes production-critical.
Responsibilities- Implement and iterate on image/video generative model ideas (architecture, losses, conditioning, sampling, distillation, post-training)
- Own training performance end-to-end (distributed training, throughput, memory, stability, debugging scaling failure modes)
- Build the experimentation loop (evals, ablations, reproducibility tooling, reporting, decision hygiene)
- Build and improve VLMs for image/video captioning (data recipes, training strategies, model variants, evaluation)
- Run high-iteration research: read papers when useful, implement ideas, validate empirically
- Create captioning pipelines that improve generation training and product quality
- Partner with inference/product to ship under real constraints (latency, cost, reliability, rollout safety)
- Build demos and prototypes to showcase capabilities and accelerate iteration
Minimum Qualifications
- Strong Python and PyTorch skills (4+ years of experience)
- Experience implementing and training deep learning models (generative models, VLMs, LLMs, vision/video, or adjacent)
- Solid understanding of training dynamics, optimization, and practical debugging
- Ability to drive projects end-to-end with minimal supervision
Preferred Qualifications
- Hands-on experience with diffusion/flow-based image or video generation, or large-scale generative modeling in adjacent domains
- Experience with distributed training at scale (multi-node) and performance tuning (throughput/memory)
- Experience building evaluation frameworks (offline metrics + human eval + regression tracking)
- Strong intuition for data quality and dataset/labeling tradeoffs for training and captioning
- Publications are a plus, but shipped impact and strong technical evidence matter more
- Build frontier image/video models and the VLM captioning systems that power them
- Join a lean, senior team that holds a high engineering + research bar
- Direct product impact: your training runs become real user-facing capabilities
- Benchmark against the best in the world and compete on model quality through what we ship
- You own outcomes end-to-end and are trusted with real responsibility
- Direct, low-ego communication and fast feedback loops
- Bias toward impact: measure → iterate → ship
- Research discipline: clear ablations, reproducibility, and crisp decision-making
- Location:
Zurich (Switzerland) or Warsaw (Poland)— onsite or hybrid. If you’re elsewhere, we’re open to remote (team/timezone fit considered). - Compensation: competitive salary + meaningful equity (level-dependent)
- Interview process: quick screen → 2x technical rounds (practical + systems) → team fit/values
- Visa sponsorship (where applicable); we’ll make a strong effort to relocate you to Switzerland or Poland if desired
- Remote-friendly: work fully remote, hybrid, or on-site from our hubs
- Regular offsites and in-person events to collaborate and connect
- Flexible PTO
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: