Systems Operations Manager - Data Platforms -Teradata & Hadoop
Job in
Irving, Dallas County, Texas, 75061, USA
Listed on 2026-06-05
Listing for:
Wells Fargo
Full Time
position Listed on 2026-06-05
Job specializations:
-
IT/Tech
IT Support, Cloud Computing, Systems Engineer, SRE/Site Reliability
Job Description & How to Apply Below
Wells Fargo is back in the office collaborating for fabulous outcomes.
This role is in the office three days a week.
No visa sponsorship or visa transfers.
About this role:
Wells Fargo is seeking a Systems Operations Manager to lead the end-to-end support and operations of enterprise Teradata and Hadoop data platforms powering large-scale analytics and business decisioning.
This role is accountable for platform stability, reliability, and operational excellence across a complex, multi-tenant ecosystem supporting 100+ tenants. The manager will lead a 24x7 operations team, apply Site Reliability Engineering (SRE) principles, and drive automation-led transformation to ensure predictable, resilient service delivery at scale.
This is a hands-on leadership role requiring strong execution discipline, ownership, and the ability to operate in a high-risk, regulated environment, ensuring SLA adherence, compliance, and business continuity outcomes.
In this role, you will:
Operational Leadership & Platform Ownership
* Lead end-to-end platform operations for Teradata and Hadoop environments, ensuring availability, performance, and resilience
* Provide clear ownership and accountability for production services, operational outcomes, and service stability
* wel Drive incident, problem, and change management, including major incident command and recovery leadership
* Lead 24x7 global support operations, including on-call governance and escalation management
Operational Excellence & Service Performance
* Own and drive SLA/OLA adherence, uptime, and service health metrics
* Lead capacity management, performance tuning, and proactive issue prevention initiatives
* Establish and enforce operational standards, runbooks, and service management practices
* Drive root cause analysis (RCA) and long-term remediation of systemic issues
* Drive adoption of automation, observability, and AIOps practices to reduce manual toil and improve MTTR.
Governance, Risk & Compliance
* Ensure alignment with enterprise risk, compliance, and change management frameworks
* Drive patching, vulnerability remediation, and platform security posture
* Maintain audit readiness, documentation quality, and control adherence
* Identify, escalate, and mitigate operational and platform risks
Multi-Tenant Platform Operations
* Manage operations across shared, multi-tenant platforms, ensuring workload isolation and stability
* Oversee resource allocation, scheduler configuration, and workload prioritization
* Execute in high-risk production environments where changes impact multiple tenants simultaneously
Site Reliability Engineering (SRE) & Automation
* Apply SRE principles to improve reliability, availability, and scalability of data platforms
* Drive automation-first operations to eliminate manual toil and standardize service delivery
* Implement and enhance observability, monitoring, and self-service capabilities
* Partner with engineering teams to improve platform reliability, operability, and service maturity
* Drive adoption of automation, observability, and AIOps practices to reduce manual toil and improve MTTR.
Stakeholder Engagement & Execution Alignment
* Partner with Engineering, CIO-aligned teams, Cybersecurity, and LOB stakeholders
* Provide clear, executive-ready communication on platform health, risks, and priorities
* Drive cross-functional accountability and execution discipline across teams
People Leadership & Talent Development
* Lead, coach, and develop a team of Systems Operations engineers and analysts
* Build a culture of ownership, accountability, and operational excellence
* Manage resource allocation, workforce planning, and vendor/partner support
* Develop team capabilities in SRE practices, automation, and platform operations maturity
Resiliency & Business Continuity
* Ensure resiliency posture across Teradata and Hadoop platforms, including:
* Disaster recovery (DR) readiness and execution
* RTO/RPO alignment and validation
* Continuous improvement of recovery capabilities
* Lead BCP execution and failover coordination for critical platforms
Required Qualifications:
* 5+ years of Systems Engineering, and Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
* 2+ years of Leadership experience
* Hands-on experience with:
* Teradata and Hadoop platforms
* Distributed systems and data platform operations
* Incident, problem, and change management processes
Desired
Qualifications:
* Experience supporting enterprise-scale Teradata and Hadoop platforms
* Demonstrated leadership in 24x7 production support and SRE environments
* Strong experience in:
* Automation, AIOps, and operational transformation
* Dev Sec Ops and CI/CD practices
* Observability, monitoring, and platform telemetry
* Familiarity with Kubernetes, containerization, and cloud-native architectures
* Strong understanding of:
* Multi-tenant data platforms and…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×