DevOps Systems Engineer
Listed on 2025-12-27
-
IT/Tech
Systems Engineer, Cloud Computing, Systems Administrator, IT Support
At Tensor Wave, we’re leading the charge in AI compute, building a versatile cloud platform that’s driving the next generation of AI innovation. We’re focused on creating a foundation that empowers cutting‑edge advancements in intelligent computing, pushing the boundaries of what’s possible in the AI landscape.
About the RoleWe are seeking a highly skilled Dev Ops & Infrastructure Management Engineer to join our growing infrastructure team. This role is ideal for someone who thrives in hardware‑centric environments, enjoys hands‑on datacenter and system administration work, and can build reliable automation around large‑scale infrastructure. You will be responsible for managing enterprise hardware, monitoring systems, network operations, infrastructure automation, and supporting our compute clusters across multiple data centers.
This role touches every layer of modern infrastructure—from bare metal provisioning, to OS and Kubernetes management, to monitoring and troubleshooting hardware. If you are detail‑oriented, resourceful, and comfortable working with both low‑level hardware systems and higher‑level Dev Ops tooling, we’d love to talk.
Key ResponsibilitiesHardware & Infrastructure Management
Manage and maintain enterprise‑grade server hardware and infrastructure components.
Utilize out‑of‑band management systems (iLO, iDRAC, IPMI, Redfish, etc.) for remote operations.
Use automated hardware management tools (BMC/Redfish‑based) to streamline provisioning and maintenance.
Perform hardware diagnostics and troubleshooting (CPU, memory, disks, PSUs, NICs, etc.).
Handle vendor interactions, including RMAs, part replacements, and inventory tracking.
Oversee datacenter hardware operations, including racking, cabling, PDU installation, and physical layout.
Use Data Center Infrastructure Management (DCIM) tools for inventory, capacity planning, and environmental tracking.
Manage power delivery and consumption across racks and nodes.
Configure and monitor managed PDU systems for power cycling, monitoring, and alerts.
Collaborate with colocation providers on connectivity, power, security, and maintenance tasks.
Build and maintain infrastructure monitoring and alerting using tools such as Prometheus/Grafana, SNMP, Nagios, Check
MK, or similar platforms.Implement automated alerting for hardware health, network status, power issues, and service‑level metrics.
Create dashboards to give internal teams visibility into system performance and reliability.
Manage and configure firewalls, routing, and network segmentation.
Configure and troubleshoot VPN technologies (IPsec, OpenVPN, Wire Guard).
Oversee subnetting, IP address allocation, and network architecture planning.
Configure managed switches, VLANs, port settings, and trunking.
Manage NAT, port forwarding, and related gateway/edge network configurations.
Install, configure, and manage Linux servers (Ubuntu/Debian preferred).
Perform system‑level troubleshooting (boot issues, login problems, service failures).
Manage networking configuration (static IPs, DHCP).
Configure and maintain file systems: partitioning, MD RAID, ext4/XFS, LVM, resizing/growing volumes.
Implement secure access using public key authentication and proper SSH hardening.
Manage certificates for internal systems, including issuance, revocation, HTTPS installation, and rotation.
Handle basic BIOS configuration relevant to bare metal provisioning or system bring‑up.
Deploy and manage hardware provisioning tools such as MAAS, Foreman, or similar systems.
Configure and troubleshoot network boot mechanisms (PXE, UEFI Boot, HTTP Boot).
Automate provisioning pipelines to rapidly bring new nodes online.
Work with Kubernetes clusters at a foundational level (cluster access, basic resource troubleshooting).
Deploy workloads using Helm charts and maintain cluster application lifecycle.
Assist with cluster scaling, node replacements, and security hardening.
Write shell scripts (bash) for automation of system tasks, monitoring, or provisioning.
Use CLI tooling such…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).