Sr. Infrastructure Site Reliability Engineer Job Southlake area,Texas USA,IT/Tech

Your opportunity

At Schwab, you’re empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us “challenge the status quo” and transform the finance industry together.

Schwab Technology Services enables the future of how clients manage their money by providing innovative and reliable technology products and services as part of our ongoing commitment to democratize access to investing and financial planning.

A Manager for Advisor Services Technology (AST) Infrastructure Operations SRE will lead the strategy, execution, and operational excellence of the application infrastructure ecosystem supporting AST platforms. This role is accountable for ensuring high availability, scalability, reliability and performance through disciplined operational practices, life‑cycle management, and modern SRE principles. This requires an oversight of all routine and strategic infrastructure initiatives, including operating system upgrades, patching, EOL remediation, infrastructure changes, middleware and database activities, cloud technologies and readiness, tooling modernization, and automation will drive holistic capacity management, ensuring that compute, storage, network and application‑tier resources are designed and maintained to meet current and future business demand.

You will partner closely with architecture and application engineering teams to ensure infrastructure and platform components align with solution designs and support the long‑term technical roadmap. The role also governs the organization's observability platforms – defining the telemetry strategy, metrics, SLOs, and alerting posture necessary to maintain operational health and reduce toil. You will lead ongoing improvements in automation, resilience engineering, disaster recovery readiness, and operational maturity, creating repeatable, well‑engineered processes that support rapid change with minimal risk.

This role requires a deep understanding of enterprise infrastructure and security principles, excellent analytical skills, and the ability to communicate effectively with technical and non‑technical stakeholders.

What you’re good at

Strategic thinker who is passionate about application infrastructure reliability and efficiency.
Strong stakeholder engagement – able to work with application teams, I&O, and senior leadership. Drive consensus, negotiate priorities, and resolve conflicts.
Effective decision‑maker driving solutions and leadership updates during high‑pressure incidents.
Leads with integrity and sound judgment, showing the courage to uphold what’s right in all situations.
High standard of change management quality by enforcing rigor, reducing operational risks, and ensuring predictable, safe deployments.
Apply Site Reliability Engineering mindset and solve problems through automation and instrumentation.
Identify opportunities to build innovative tools and solve unique operations problems on large enterprise and mission‑critical applications.
Drive continuous improvement via automation across infrastructure provisioning, configuration management, compliance, system health, and operational activities.
Monitor the current state of infrastructure to identify deficiencies through aging of the technologies used by the application, or misalignment with business requirements.
Analyze the business‑IT environment (run, grow and transform the business) to detect critical deficiencies, and recommend solutions for improvement.
Govern change management practice, ensuring minimal service impact of infrastructure changes and activities.
Lead capacity planning across compute, storage and application tiers to ensure scalability and optimization.
Implement proactive monitoring and forecasting to prevent performance degradation across all supported platforms (on‑prem and cloud technologies).
Partner with architecture teams to improve system resiliency, fail‑over design, and scalability patterns.
Establish standards for tooling around runbooks, incident response, and environment configuration.
Lead complex incident triage and root‑cause analysis, drive action plans to eliminate recurrences.
Coordinate DR exercises,…