Senior Machine Learning Engineer – Model Training and Customization Job Boston area,Massachusetts USA,Software Development

Overview

Come be a part of Red Hat's charge to democratize AI with open source! Red Hat's Global Engineering Team is looking for a Senior Machine Learning Engineer to join our newly formed AI Engineering organization. This role will be located within the AI Innovation team, which conducts customer- and science-driven research to drive innovation for Red Hat's customers. The team focuses on a pattern of "research → open-source software → product" as the way we operate our engineering work.

This role will be focused on building the core logic and enhancements for our model fine-tuning and post-training libraries.

The role involves working directly with research scientists and open source AI communities to build and improve implementations of novel training methods, ranging from SFT, continual learning, and offline preference tuning to online reinforcement learning methods like GRPO and RLHF. You will develop working relationships across multiple teams, contributing to both upstream open source projects and our internal Training Hub. The ideal candidate will be highly collaborative with a passion for complex ML projects in an open organization where contributions are valued and expected from all levels.

This is a fast-moving area of opportunity for Red Hat, so productive and effective communication with team members, stakeholders, and Red Hat leadership is critical. Success means delivering robust, scalable training libraries that bridge cutting-edge research with production needs. This position reports to the Manager of AI Innovation and may require occasional travel to our Boston, MA office multiple times per quarter.

Successful applicants must reside in a state where Red Hat is registered to do business.

Primary

Job Responsibilities (what You’ll Do)

Develop core libraries for various model post-training methods and innovations.
Work directly on upstream, open source projects and engage with community needs and contributions.
Contribute to core post-training algorithm research and engineering, introducing new methods both to community efforts and our own Training Hub.
Understand and adapt novel architectures and techniques to work with various post-training algorithms, across distributed training frameworks.
Optimize, enhance, and improve robustness and usability of both existing and in-flight projects, working closely with researchers to validate prototype logic.
Maintain and expand library feature pool, and address core algorithm bugs and blockers.
Work closely with software engineers on interface and testing designs.
Participate in code reviews and collaborate on best practices within the engineering team.
Document system designs, processes, and model performance for transparency and future reference.
Report on project status, challenges, and results to stakeholders.

Qualifications

Required Skills (what You’ll Bring)
Bachelor's degree in computer science or equivalent.
3+ years of experience in Python development.
Significant background in AI/ML projects or coursework (neural networks, deep learning, language models, reinforcement learning).
Experience in research engineering, machine learning engineering, or applied ML roles.
Strong experience with common model architecture development and adapter frameworks (e.g. PyTorch, Transformers, PEFT, etc.).
Familiarity with distributed training frameworks (e.g. FSDP, Deep Speed) and inference runtimes (e.g. vLLM).
Experience in open-source projects and collaborative development workflows.
Existing background in software development or engineering, building robust and consumable libraries and implementations.
Experience with unit testing, integration testing, and performance testing.
Strong self-motivation and organizational skills.
Excellent written and verbal communication skills.
Positive attitude and willingness to share ideas openly.

Bonus Qualifications

Masters or PhD in Machine Learning (ML) / Natural Language Processing (NLP).
Experience with MLOps and deployment systems (e.g., Kubeflow, MLflow, Kubernetes, CI/CD pipelines).
Experience writing functional, end-to-end or coverage tests in Python.
Experience with Git Hub Actions, Git Hub automation, or CI/CD practices.
Exp…


Increase/decrease your Search Radius (miles)



Job Posting Language