Principal Consultant: Red Hat Advanced Cluster Management; RHACM
Listed on 2026-06-27
-
IT/Tech
Systems Engineer, Cloud Computing: Infrastructure & Operations
ACM Architecture Validation & Strategy Design Authority
Review and finalize ACM architecture, ensuring it supports diverse deployment topologies and critical Disaster Recovery (DR) requirements (Active/Passive configurations). Infrastructure Synergy:
Optimize the co-location of infrastructure management and ArgoCD to ensure a seamless "single pane of glass" for the platform. Performance Engineering:
Define storage and performance specs required to support high-throughput multi-cluster observability and alerting frameworks.
Data-Driven Insights:
Lead stakeholder sessions to define and build custom Grafana dashboards that provide actionable data on capacity, network traffic, and workload scaling. Alerting Framework:
Design and implement a performant alerting framework that filters noise and provides SRE teams with discrete, actionable notifications. Right-Sizing Initiatives:
Utilize Multi-cluster Observability (MCO) and auto-scalers (HPA/VPA) to identify over-requested resources and automate application density optimization.
Configuration Drift Mitigation:
Transition Day-2 operations to ArgoCD, ensuring all cluster configurations (RBAC, network policies, operator installs) are managed as code and automatically reverted if manual drift occurs. Policy-as-Code:
Establish a Git Ops-based governance process. Create and roll out ACM Policy Sets to monitor cluster health and security compliance across the entire fleet. Automation Integration:
Integrate ACM Policies and Day-2 configurations into existing Ansible automation pipelines for full lifecycle orchestration.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).