Software Engineer – Infinia L4
Listed on 2025-12-10
-
IT/Tech
AI Engineer, Systems Engineer
Overview
This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. Data Direct Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.
“DDN’s A3I solutions are transforming the landscape of AI infrastructure.” – IDC
“The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA
DDN is the global leader in AI and multi-cloud data management cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.
Our success is driven by our unwavering commitment to innovation, customer‑centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.
Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.
Job DescriptionAs a Staff Software Engineer – Infinia L4
, you’ll be an escalation point for the most complex and critical issues affecting enterprise and hyperscale environments. This hands‑on role is ideal for a deep technical expert who thrives under pressure and has a passion for solving distributed system challenges s role is part of the Infinia Core engineering team.
You’ll collaborate with Engineering, Product Management, and Field teams to drive root cause resolutions, define architectural best practices, and continuously improve product resiliency. Leveraging AI tools and automation, you’ll reduce time‑to‑resolution, streamline diagnostics, and elevate the support experience for strategic customers.
Key ResponsibilitiesTechnical Expertise & Escalation Leadership
- Own critical customer case escalations end‑to‑end, including deep root cause analysis and mitigation strategies.
- Act as one of the technical escalation points for Infinia incidents — especially in production‑impacting scenarios.
- Lead war rooms, live incident bridges, and cross‑functional response efforts with other engineering, QA, and Field teams.
- Utilize AI‑powered debugging, log analysis, and system pattern recognition tools to accelerate resolution.
- Become a subject‑matter expert on Infinia internals: metadata handling, storage fabric interfaces, performance tuning, AI integration, etc.
- Reproduce complex customer issues and propose product improvements or workarounds.
- Author and maintain detailed runbooks, performance tuning guides, and RCA documentation.
- Feed real‑world support insights back into the development cycle to improve reliability and diagnostics.
- Partner with Field CTOs, Solutions Architects, and Sales Engineers to ensure customer success.
- Translate technical issues into executive‑ready summaries and business impact statements.
- Participate in post‑mortems and executive briefings for strategic accounts.
- Drive adoption of observability, automation, and self‑healing support mechanisms using AI/ML tools.
- Deliver training to customers support and field engineering.
- 8+ years in enterprise storage, distributed systems, or cloud infrastructure support/engineering.
- Deep understanding of file systems (S3, POSIX, NFS), storage performance, and Linux kernel internals.
- Proven debugging skills at system/protocol/app levels (e.g., strace, tcpdump, perf).
- Hands‑on experience with troubleshooting on Linux.
- Exposure to RDMA, NVMe‑oF, or high‑performance…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).