×
Register Here to Apply for Jobs or Post Jobs. X

Principal Performance Engineer

Job in Wayne, Delaware County, Pennsylvania, 19087, USA
Listing for: CloudDevs
Full Time position
Listed on 2026-02-06
Job specializations:
  • Manufacturing / Production
    Systems Engineer
Job Description & How to Apply Below

Cornelis Networks delivers the world’s highest performance scale-out networking solutions for AI and HPC datacenters. Our differentiated architecture seamlessly integrates hardware,software and system level technologies to maximize the efficiency of GPU,CPUand accelerator-based compute clusters at any scale. Our solutions drive breakthroughs in AI & HPC workloads, empowering our customers to push the boundaries of innovation. Backed by top-tier venture capital and strategic investors, we are committed to innovation,performance and scalability – solving the world’s most demanding computational challenges with our next-generation networking solutions.

We are a fast-growing, forward-thinking team of architects, engineers, and business professionals with a proven track recordof building successful products and companies. As a global organization, our team spans multiple U.S. states and six countries, and we continue to expand with exceptional talent in onsite, hybrid, and fully remote roles.

We’re seeking a Principal Performance Engineer to drive end-to-end performance for next-generation networking silicon and systems (adapters, switches, software) . You will help set the performance strategy , lead investigations across layers (switch/silicon ? drivers ? AI/HPC workloads) , and enable large-scale customer deployments across multiple verticals (cloud, autonomous, aerospace/defense, manufacturing, life sciences, climate). You’ll partner directly with architecture, firmware, software, and lighthouse customers to raise the performance ceiling.

This is a high-impact, highly visible individual-contributor role with technical leadership scoping (mentoring, cross-functional influence).

Key Responsibilities:
  • Own pre- and post-launch performance : plan, execute, and sustain performance validation, debugging, and optimization for adapters, switches, and fabric software—first in lab, then at scale in production.
  • Lead performance for post-silicon bring-up validation of networking ASICs and end-products (adapters, switches, etc.); driving optimization and characterization against networking metrics and application performance.
  • Deliver white-glove customer support at scale : reproduce field issues, co-debug in shared/onsite labs, land mitigations and durable fixes, and publish per-customer tuning guides; opportunity to grow into customer performance support lead while remaining an IC.
  • Pathfind and optimize forward-looking workloads : drive research and enablement for AI inference (QPS, P99/P99.9, cost/throughput), distributed AI training (NCCL/RCCL collectives), and traditional HPC (manufacturing, life sciences, climate).
  • Multi-fabric research & enablement : evaluate and tune Cornelis/Omni-Path, Ethernet/RoCEv2, and Infini Band across topologies (Clos/fat-tree/dragonfly), routing (ECMP/adaptive), and congestion control (credit, PFC/ECN/DCQCN)
  • Design credible experiments : synthesize representative traffic, replay workload traces, and run on-cluster A/B tests with statistically sound comparisons (P50/P90/P99).
Required Qualifications:
  • 10+ years in performance engineering, post-silicon/perf validation, or systems performance for high-speed networking or HPC/AI products.
  • Post-silicon expertise : hands‑on bring‑up and performance validation of networking ASICs/systems (adapters, switches), including crafting validation plans, establishing pass/fail, correlating pre‑silicon models to silicon, and driving fixes from first silicon through production.
  • Demonstrated depth in networking hardware (switch/silicon) and software debug for performance tuning and issue resolution across production‑scale deployments.
  • Hands‑on multi‑fabric experience:
    Cornelis/Omni‑Path, Ethernet/RoCEv2, and/or Infini Band; strong grasp of PCIe/GPU‑Direct, queueing/QoS, and congestion control (credit, PFC, ECN, DCQCN).
  • AI/HPC workload fluency: NCCL/RCCL collectives, UCX/ libfabric /MPI; ability to optimize end‑to‑end training and inference (throughput, QPS, tail latency, efficiency) on real clusters.
  • Experimentation & analysis: workload modeling, on‑cluster A/B tests, tail‑latency analysis (P50/P90/P99); ability to separate congestion from compute/IO…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary