×
Register Here to Apply for Jobs or Post Jobs. X

HPC Sr. Scientific Software Engineer; IT@JH Research Computing

Job in Baltimore, Anne Arundel County, Maryland, 21276, USA
Listing for: AAAI Press
Full Time position
Listed on 2025-12-02
Job specializations:
  • IT/Tech
    Systems Engineer, AI Engineer, Cloud Computing, IT Support
Job Description & How to Apply Below
Position: HPC Sr. Scientific Software Engineer (IT@JH Research Computing)

IT@JH Research Computing is seeking a HPC Sr. Scientific Software Engineer who will design, build, and support Johns Hopkins University’s high-performance computing and AI research infrastructure. This role integrates elements of both systems and software engineering, ensuring scalable, secure, and reproducible environments for scientific and data-intensive research. The Engineer develops and automates system and application workflows across CPU/GPU clusters, parallel storage, and hybrid cloud platforms.

Responsibilities include configuring and optimizing large-scale Linux environments, implementing job scheduling and orchestration frameworks, containerizing applications, and supporting researchers in optimizing performance and reproducibility. Work combines project-based engineering with operational support, requiring both independent problem-solving and close collaboration with the Research Computing team and faculty stakeholders.

Specific Duties & Responsibilities Software Deployment and Design
  • Develop and refine deployment strategies for scientific software on HPC and AI systems.
  • Design computational workflows, selecting optimal software configurations, and utilizing tools like Ansible for automation.
  • Assist teams in implementing, tuning, and optimizing AI models and gateway applications (e.g., XDMoD, Coldfront, Open OnDemand, CryoSPARC Live, SBGrid, AI Agents).
Performance Optimization
  • Analyze and optimize the performance of AI models and HPC applications, focusing on GPU-enabled computing.
  • Implement parallel processing, distributed computing, and resource management techniques for efficient job execution.
Integration and Optimization
  • Develop, debug, and maintain software tools, libraries, and frameworks supporting HPC and AI workloads.
  • Collaborate with the system team and software vendors (e.g., NVIDIA, Intel, Matlab) to optimize systems for maximum performance.
  • Utilize CUDA, DNN, Tensor

    RT, and Intel Compilers to enhance system performance.
HPC Scientific Software Support
  • Manage and support scientific software deployment across HPC, cloud-based, and colocation facilities.
  • Oversee installation, configuration, and maintenance of HPC packages with tools like CMake, Make, Easy Build, Spack, and Lua module files.
Collaboration and Mentorship
  • Work closely with cross-functional teams, including researchers, data scientists, and software developers, to address complex HPC/AI challenges.
  • Mentor junior engineers and foster a culture of continuous learning.
Technical Support and Training Workshops and Troubleshooting
  • Resolve complex technical issues and perform root cause analysis for HPC/AI software challenges.
  • Implement effective solutions to prevent recurrence and improve system reliability.
  • Provide training workshops for researchers and students, focusing on troubleshooting, optimizing workflows, and effectively using HPC systems.
Learning and Development
  • Stay current with advances in HPC and AI technologies and methodologies.
  • Incorporate new research findings into existing systems to improve performance and capabilities.
Container Orchestration
  • Develop and manage container orchestration strategies to ensure scalability, reliability, and security of applications.
  • Oversee the container lifecycle from creation and deployment to scaling and removal.
Documentation and Compliance
  • Create comprehensive documentation for system designs, performance metrics, and project status.
  • Ensure compliance with security and regulatory standards for all HPC and AI systems.
In Addition to the Duties Described Above
  • Design, deploy, and maintain large-scale Linux HPC clusters with CPU/GPU resources, high-speed networks, and distributed storage.
  • Develop and maintain automation frameworks for provisioning, monitoring, and software lifecycle management.
  • Implement and optimize job scheduling, container orchestration, and workflow automation tools to support diverse research workloads.
  • Collaborate with faculty and research teams to parallelize, containerize, and scale computational workflows for multi-GPU and distributed environments.
  • Benchmark and tune application performance across architectures, documenting findings and sharing best practices.
  • Integrat…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary