Machine Learning Engineer
Listed on 2026-02-09
-
Engineering
AI Engineer
Overview
JOB TITLE
Senior Applied Machine Learning Engineer (audio / music generation)
ABOUT THE ROLE
We’re building an AI-powered music system focused on commercial-ready audio generation. Our initial priority is getting the music generation quality right – structure, musicality, consistency, and production readiness.
We are looking for a Senior Applied ML Engineer to own the end-to-end audio generation pipeline for our MVP. This role is hands-on and pragmatic: you’ll fine-tune open-source music models, integrate inference pipelines, and work closely with audio and backend engineers to deliver usable results quickly and efficiently. This role starts as a contract engagement (details below), with a path to full-time position for the right fit.
ROLE DETAIL
- Terms:
Fixed-term (5 months) | Potential full-time conversion - Compensation: $30,000 (Full 5 Month Term)
- Location:
Hybrid/On-site (Monrovia, CA)
WHAT YOU’LL WORK ON
- Fine-tuning open-source music generation models.
- Implement conditioning controls (beats per minute, key, mood, section, density).
- Training and deploying parameter-efficient fine-tunes (LoRA / adapters).
- Building reference-conditioned generation.
- Support long-form generation via chunking and continuation.
- Integrating with Backend inference pipelines and APIs.
- Collaborating with audio DSP engineers to ensure outputs are production ready.
REQUIRED QUALIFICATIONS
- Strong experience with Python and PyTorch.
- Hands-on experience with audio or speech generation models.
- Familiarity with diffusion or autoregressive generative models.
- Experience using or fine-tuning open-source ML models, familiar with HF Interfaces.
- Understanding of audio representations.
- Experience deploying ML models to production or API environments.
NICE-TO-HAVE SKILLS
- Familiarity with CLAP / audio embeddings or retrieval-assisted generation.
- Experience working with LoRA / PEFT methods.
- Basic understanding of audio production workflows (tempo, key, stems, loudness).
- Experience Optimizing inference cost and latency.
ROLE GOALS & OBJECTIVES
- Reliably generate musically coherent, commercial-friendly cues (30 ~ 120 seconds)
- The model responds correctly to conditioning inputs like tempo, key and mood
- Outputs are stable, repeatable and usable downstream by post-production tools
- The system is modular and ready to be integrated with downstream models.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).