HPC Systems Engineer
Listed on 2026-02-14
-
Engineering
Systems Engineer
The selected candidate will be a member of the team responsible for the engineering and management of server infrastructure in a large-scale, multi-datacenter environment including CPU and GPU compute resources, job scheduling, and related infrastructure systems. This includes developing solutions that enable our internal customers to be successful and working with the team to ensure the environment runs optimally.
Qualifications- Bachelor's degree in related field or equivalent work experience
- 5+ years of Linux administration experience
- 2+ years of server automation experience
- Strong server and infrastructure automation knowledge
- Strong architectural and engineering experience with enterprise Linux environments
- Strong scripting experience with Python and Bash, and willingness to learn other languages.
- Performs at an extremely high level of technical competence and maturity
- Excellent problem-solving and troubleshooting skills
- Excellent communication skills and ability to utilize desktop tools to accelerate communications (Grafana, mkdocs, Wikis, IM, MS Teams, etc.)
- Seeks out improvements to processes and business offerings
- Ability to create detailed technical documentation
- Ability to handle multiple complex projects at one time
- Experience using build and configuration automation (Red Hat Satellite, Ansible Automation Platform, etc.)
- Exposure to HPC interconnect technologies like Infini Band or MPI
- Familiarity with batch workload managers
- Experience using SRE practices and Jira/Agile
- Experience managing Red Hat Enterprise Linux
You may not check every box, or your experience may look a little different from what we've outlined, but if you think you can bring value to Ford Motor Company, we encourage you to apply! As an established global company, we offer the benefit of choice. You can choose what your Ford future will look like: will your story span the globe, or keep you close to home?
Will your career be a deep dive into what you love, or a series of new teams and new skills? Will you be a leader, a change maker, a technical expert, a culture builder…or all of the above? No matter what you choose, we offer a work life that works for you, including:
- Immediate medical, dental, and prescription drug coverage
- Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up child care and more
- Vehicle discount program for employees and family members, and management leases
- Tuition assistance
- Established and active employee resource groups
- Paid time off for individual and team community service
- A generous schedule of paid holidays, including the week between Christmas and New Year’s Day
- Paid time off and the option to purchase additional vacation time
For a detailed look at our benefits, :
Benefit Summary
This role is remote unless you are within 50 miles of Dearborn, MI, then you will be required on-site 4x a week.
* Visa Sponsorship is NOT provided for this specific role*
* Relocation assistance is NOT provided for this specific role*
Candidates for positions with Ford Motor Company must be legally authorized to work in the United States. Verification of employment eligibility will be required at the time of hire. We are an Equal Opportunity Employer committed to a culturally diverse workforce. All qualified applicants will receive consideration for employment without regard to race, religion, color, age, sex, national origin, sexual orientation, gender identity, disability status or protected veteran status.
In the United States, If you need a reasonable accommodation for the online application process due to a disability, please call
#LI-Remote #LI-DS2
SG7-SG8
Responsibilities- Actively design, implement and support HPC compute resources, and related infrastructure
- Assist application teams with optimizing workflows for the environment
- Develop and support automation and scripts (Python, Perl, bash, Ansible) and the servers related to those automations (Satellite, Ansible Automation Platform)
- Monitor and troubleshoot system failures (occasional on-call)
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).