Senior Data Scientist- Audio And Speech; Multimodal AI Job Sharon area,Massachusetts USA,IT/Tech

Position: Senior Data Scientist- Audio And Speech (Multimodal AI)
?

Are you ready to push the boundaries of Audio Intelligence

We're looking for a Senior Data Scientist with deep expertise in Audio AI, Speech Processing, and Generative Modeling to design and develop advanced on‑prem multimodal systems capable of understanding, generating, and analyzing complex audio streams in noisy, real‑world environments

You'll join a world‑class Defense Tech AI team building speech‑driven solutions that enable intelligent communication, operational insight, and next‑generation human‑machine interaction

:

What You'll Do

Fine tune, and evaluate Speech‑to‑Text (STT) models optimized for noisy, low‑latency, and mission‑critical environments

Develop speaker identification and diarization ,sentiment and emotional analysis to detect tone, stress levels, and affective patterns

Design and optimize multimodal pipelines combining audio, text, and visual inputs for enhanced semantic understanding and cross‑modal reasoning

Contribute to Generative AI innovations - noise reduction, voice conversion, speech enhancement, and conversation insights

Collaborate closely with ML engineers and research peers to deploy, scale, and optimize Audio AI models on‑prem and edge hardware

Work with domain experts to adapt models for real‑time speech understanding, decision support, and behavioral insights

:

Your Expertise

Solid background in Machine Learning, Deep Learning, and Audio Signal Processing

5+ years hands‑on experience developing and deploying speech or audio‑based AI models

3+ years focused on STT / ASR, TTS, speaker recognition, or sentiment analysis

Deep familiarity with architectures such as Conformer, Whisper, RNN‑Transducer, Fast Speech / Tacotron, speaker embedding networks, and self‑supervised speech representations

Experience handling noisy, real‑time audio, latency optimization, and edge‑device constraints

Understanding of semantic embeddings, multimodal search, and RAG architectures

Strong data‑driven mindset and ability to conduct research on novel Audio AI approaches

Comfortable working with Agile workflows, MLOps, and Dev Ops principles

Publication record, Kaggle or challenge participation, or equivalent - Advantage

:

Why Join Us

Work with leading researchers and engineers on next‑generation Speech and Audio Intelligence

Make a direct impact on speech understanding, generation, and sentiment analytics in real‑world applications

Collaborate on cutting‑edge multimodal AI systems integrating vision, audio, and language

Be part of a forward‑thinking team that values creativity, research excellence, and continuous learning

Shape the future of Audio and Speech AI - from concept to deployment

Only suitable applications will be considered

#Netanya