×
Register Here to Apply for Jobs or Post Jobs. X

Principal Architect, Site Reliability Engineering

Job in Austin, Travis County, Texas, 78716, USA
Listing for: Charles Schwab
Full Time position
Listed on 2026-02-16
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

Your Opportunity

At Schwab, you are empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us “challenge the status quo” and transform the finance industry together.

We believe in the importance of in-office collaboration and fully intend for the selected candidate for this role to work on site in the specified location(s).

As a Principal Architect, Site Reliability Engineering for Schwab's Technology Solutions organization, you will be responsible for building a purposeful, proactive, and sustainable approach to reliability on a foundation of SRE principles. You will partner with multiple support teams, architects, developers, and other stakeholders to develop common tools and guidance and drive adoption of key reliability engineering practices in support of large-scale and mission-critical services.

Through your deep SRE knowledge and history of implementation, you will have open, candid conversations with senior leaders and engineers and play a pivotal role in establishing a foundational SRE practice at Schwab.

As a member of the Retirement Technology (RT) Leadership Team, you will lead & mentor the SAVE (Service Availability and Engineering) organization. This scope of responsibility will include all Software Availability and SRE Engineering components for RBS, SRT and RPS. In totality, the scope of responsibility includes 150+ active applications. You will be immersed in a collaborative, innovative, and technically challenging environment.

The platforms you will support are essential to the success of the holistic Schwab Retirement lines of business.

Position Responsibilities
  • Evangelize SRE mindset and practices across the Schwab Technology Solutions organization.
  • Partner with support, development, and business stakeholders to develop, measure, and leverage service level objectives.
  • Design and develop solutions to eliminate toil and manual effort from day-to-day support responsibilities.
  • Identify and implement improvements to logging, metrics, and tracing telemetry and triaging capabilities across a diverse technology stack.
  • Lead complex triage and postmortem activities for critical issues and drive prioritization/resolution of remediation items.
  • Perform chaos engineering experiments to improve application resilience to known and unknown failures.
  • Document reliability guidance and best practices. Advocate for and drive adoption of said practices.
  • Foster a culture of learning through coaching, mentoring, and knowledge sharing around reliability practices, processes, and tools.
  • Develop tools, frameworks, and instrumentation to validate and increase release success for applications.
What you have

Required Qualifications
  • Minimum 5+ years in SRE role, with at least 3+ years in an architect or technical leadership position.
  • At least 3 or more years of experience designing and implementing highly scalable and fault tolerant systems.
  • In-depth knowledge of resilience patterns (i.e. circuit breakers, timeouts, retries, etc.) and how to design and implement them.
  • In-depth knowledge of CICD processes and tools to ensure software is delivered safely using known deployment strategies (i.e. blue/green, canary deployments, feature toggles, etc.).
  • Authored technical postmortems (at least weekly) with root cause analyses and documented action items that resulted in measurable resiliency improvements.
  • Contributed to the SLO strategy for at least 5 teams, ensuring alignment with business and client objectives.
  • Three or more years hands-on experience with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk), with a proven track record of setting up dashboards and alerts.
  • Experienced with latest AI solutions to reduce repetitive operational toil.
  • Led or participated in cross-functional SRE-focused initiatives that included key stakeholders from both technical and business units.
  • Participated in resilience or chaos engineering exercises, with documentation showing a reduction in unplanned downtime.
  • Presented findings or led training sessions to share SRE practices, enhancing team performance or adoption rates for reliability engineering methods.
  • Mentore…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary