×
Register Here to Apply for Jobs or Post Jobs. X

Member of Technical Staff – AI Research Engineer; Image​/Video Foundation Models

Job in Zürich, 8058, Zurich, Kanton Zürich, Switzerland
Listing for: GenPeach AI
Full Time position
Listed on 2026-02-13
Job specializations:
  • IT/Tech
    AI Engineer, Data Scientist, Machine Learning/ ML Engineer, Systems Engineer
Salary/Wage Range or Industry Benchmark: 30000 - 80000 CHF Yearly CHF 30000.00 80000.00 YEAR
Job Description & How to Apply Below
Position: Member of Technical Staff – AI Research Engineer (Image/Video Foundation Models)
Location: Zürich

Overview

Gen Peach AI is a product-driven research lab building vertical multimodal foundation models for hyper-realistic human generation in image and video – designed for emotionally resonant, human-centered AI experiences. Our goal is to create tools that supercharge human creativity rather than replace it.

We train models from scratch: proprietary datasets at massive scale, novel architectures and training recipes, large GPU clusters, and tight product integration so research ships to users quickly.

We are a deeply technical team of around 10 people. We’re advised by Directors from Google Deep Mind and backed by leading AI-focused funds and angels from OpenAI, Meta AI, Microsoft AI, Project Prometheus, and Fal. Collectively, our team, advisors, and angels have contributed to models including Meta’s Imagine/Movie Gen and foundation-model work behind OpenAI’s Sora, plus Google’s Veo and Gemini.

About Gen Peach AI

You’ll join the research team working across image/video generation and multimodal understanding. You’ll work closely with other Research Engineers and Scientists, as well as Founders and help turn research into scalable training runs, strong evaluations, and production-ready systems.

Role

We’re hiring an AI Research Engineer to help build and scale Gen Peach’s foundation models end-to-end – from implementing new model ideas and training recipes, to owning the parts of the training stack that determine quality and speed, to pushing models through production constraints.

This is a hands-on, high-ownership role. You’ll write research-grade code that becomes production-critical.

Responsibilities
  • Implement and iterate on image/video generative model ideas (architecture, losses, conditioning, sampling, distillation, post-training)
  • Own training performance end-to-end (distributed training, throughput, memory, stability, debugging scaling failure modes)
  • Build the experimentation loop (evals, ablations, reproducibility tooling, reporting, decision hygiene)
  • Build and improve VLMs for image/video captioning (data recipes, training strategies, model variants, evaluation)
  • Run high-iteration research: read papers when useful, implement ideas, validate empirically
  • Create captioning pipelines that improve generation training and product quality
  • Partner with inference/product to ship under real constraints (latency, cost, reliability, rollout safety)
  • Build demos and prototypes to showcase capabilities and accelerate iteration
Qualifications

Minimum Qualifications

  • Strong Python and PyTorch skills (4+ years of experience)
  • Experience implementing and training deep learning models (generative models, VLMs, LLMs, vision/video, or adjacent)
  • Solid understanding of training dynamics, optimization, and practical debugging
  • Ability to drive projects end-to-end with minimal supervision

Preferred Qualifications

  • Hands-on experience with diffusion/flow-based image or video generation, or large-scale generative modeling in adjacent domains
  • Experience with distributed training at scale (multi-node) and performance tuning (throughput/memory)
  • Experience building evaluation frameworks (offline metrics + human eval + regression tracking)
  • Strong intuition for data quality and dataset/labeling tradeoffs for training and captioning
  • Publications are a plus, but shipped impact and strong technical evidence matter more
What makes this role unique
  • Build frontier image/video models and the VLM captioning systems that power them
  • Join a lean, senior team that holds a high engineering + research bar
  • Direct product impact: your training runs become real user-facing capabilities
  • Benchmark against the best in the world and compete on model quality through what we ship
How we work
  • You own outcomes end-to-end and are trusted with real responsibility
  • Direct, low-ego communication and fast feedback loops
  • Bias toward impact: measure → iterate → ship
  • Research discipline: clear ablations, reproducibility, and crisp decision-making
Logistics
  • Location:

    Zurich (Switzerland) or Warsaw (Poland)— onsite or hybrid. If you’re elsewhere, we’re open to remote (team/timezone fit considered).
  • Compensation: competitive salary + meaningful equity (level-dependent)
  • Interview process: quick screen → 2x technical rounds (practical + systems) → team fit/values
What we offer
  • Visa sponsorship (where applicable); we’ll make a strong effort to relocate you to Switzerland or Poland if desired
  • Remote-friendly: work fully remote, hybrid, or on-site from our hubs
  • Regular offsites and in-person events to collaborate and connect
  • Flexible PTO
#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary