Hardware Engineer
Job in
Dallas, Dallas County, Texas, 75215, USA
Listed on 2026-06-26
Listing for:
CDW
Full Time
position Listed on 2026-06-26
Job specializations:
-
IT/Tech
IT Infrastructure, SRE/Site Reliability, Unix/Linux, Systems Engineer
Job Description & How to Apply Below
Operations focused Hardware Systems Engineer supporting a large-scale bare-metal server environment (~17,000 servers) with a heavy emphasis on CPU and GPU compute availability
. This role is centered on reliability, automation, and operational excellence — digging into systems and pipelines when things break and improving them so they break less often. (Not hands‑on in Data center)
- Administer and support large‑scale bare‑metal server infrastructure
, primarily HPE and Dell platforms - Perform server break/fix troubleshooting including hardware faults, firmware/BIOS/BMC issues, POST failures, degraded components, and system instability
- Manage server lifecycle operations: onboarding, provisioning, firmware updates, BIOS/BMC configuration, and hardware refresh kits
- Own incident response and break/fix workflows while maintaining 98%+ compute availability SLAs
- Work cross?functionally with Data Center and Networking teams during hardware incidents, including ticket creation, repair coordination, and log collection
- Interface directly with HPE and Dell vendors
: gathering diagnostics, sending logs, driving RMAs, and tracking issues through resolution - Support and troubleshoot CI/CD and automation pipelines used for server provisioning, configuration, and lifecycle management
- Dig into automation code and workflows (Ansible, scripts, pipelines) when jobs fail to understand root cause and unblock deployments
- Identify recurring operational issues and contribute to process improvements
, runbooks, and reliability enhancements - Help manage and reduce the operations backlog
, prioritizing fixes, cleanup, and automation improvements
- Hands?on experience supporting HPE and Dell servers in production, including break/fix and hardware incident troubleshooting
- Experience with HPE iLO
, Dell iDRAC
, and related BMC environments - Strong understanding of server hardware components (CPU, GPU, memory, disks, NICs, power) and common failure modes
- Experience troubleshooting automation and CI/CD pipelines that manage infrastructure (not just running them, but fixing them when they fail)
- Operational mindset with experience owning incidents, SLAs, backlog items, and process improvements
- Automation experience with Ansible, Bash, Jenkins
, or similar tooling - Exposure to GPU dense, HPC, or high‑performance compute environments
- Experience improving runbooks, reducing toil, and scaling operations through automation
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×