Reliability & Monitoring Engineer
Listed on 2026-03-12
-
IT/Tech
IT Support, Cybersecurity
Position Overview / Role Purpose
The Reliability & Monitoring Engineer is responsible for fleet-level monitoring, incident analysis, and reliability insights for Nextracker-supported utility-scale solar tracker systems. This role provides real‑time system visibility, post‑event analysis, and actionable intelligence that support rapid recovery and long‑term asset reliability, particularly following severe weather, and other high‑impact events.
Operating within a portfolio‑based support model, the Reliability & Monitoring Engineer translates monitoring data into clear technical insights that improve system uptime, inform customer communication, and strengthen long‑term asset performance. This is a desk‑based role within the Nextpower organization, focused on proactive monitoring, analytical investigation, and continuous operational improvement, working closely with the U.S. Technical Services organization and the Manager, Remote Monitoring & Asset Resilience (U.S.).
KeyObjectives Deliver High-Quality Fleet Monitoring
Continuously monitor utility‑scale tracker fleets to detect abnormal system behavior, communication failures, and offline assets across customer portfolios.
Lead Incident Analysis & Root Cause InvestigationPerform structured incident analysis and Root Cause Analysis (RCA) for alarms, outages, and post‑weather events, producing clear, technically sound findings.
Support Technical Services & Customer CommunicationProvide monitoring‑based insights and documentation that enhance Technical Services’ ability to resolve issues quickly and communicate effectively with customers.
Drive Reliability Insights & Operational ImprovementIdentify recurring issues and systemic risks, and contribute to the refinement of monitoring thresholds, alert logic, and operational playbooks that improve asset resilience.
Core Responsibilities Fleet Monitoring & Operational Awareness- Monitor utility‑scale solar tracker fleets using web‑based monitoring platforms, including NX Navigator, to maintain real‑time awareness of system status.
- Identify abnormal system states, communication failures, and offline assets across assigned customer portfolios.
- Support remote operational actions during high‑wind and severe weather events, including coordination of tracker stow and recovery activities under the direction of the Manager, Remote Monitoring & Asset Resilience.
- Maintain clear situational awareness across active customer sites, including key alarms, stow states, communication health, and emerging risk signals.
- Log and track monitoring observations, ensuring key events are captured in internal systems and aligned with established RMC workflows and SOPs.
- Perform structured Root Cause Analysis (RCA) for system alarms, outages, and post‑weather events using operational data, logs, SCADA‑like signals, and environmental inputs.
- Correlate tracker behavior, monitoring signals, and weather data to determine probable failure mechanisms and reliability risks.
- Produce clear, technically sound incident summaries and RCA documentation for customers, Technical Services, and internal stakeholders.
- Support warranty‑aligned documentation and evidence collection, ensuring events are captured in a way that supports potential warranty claims and risk assessments.
- Participate in post‑event reviews, providing data‑driven input on incident timelines, system behavior, and key contributing factors.
- Provide monitoring‑based technical analysis to support customer issues managed by the Technical Services team and other customer‑facing functions.
- Translate complex system behavior into clear, actionable insights that enable Technical Services to prioritize and execute field or remote actions.
- Ensure that incident records, timelines, and findings meet internal service expectations and quality standards for accuracy, completeness, and clarity.
- Support preparation of materials for customer calls, reports, and follow‑ups by supplying data extracts, charts, and concise technical summaries derived from monitoring platforms.
- Identify recurring issues,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).