More jobs:
Resiliency/SRE Principal Engineer
Job in
Madison, Dane County, Wisconsin, 53774, USA
Listed on 2026-02-10
Listing for:
Hispanic Alliance for Career Enhancement
Full Time
position Listed on 2026-02-10
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing
Job Description & How to Apply Below
Position Compensation Range
Pay Rate Type:
Salary
Compensation may vary based on the job level and your geographic work location. Relocation support is offered for eligible candidates.
Primary Accountabilities- Evaluate existing system architectures and identify critical failure points.
- Define and document comprehensive resiliency engineering principles and practices tailored to our environment.
- Identify and implement tools for monitoring, alerting, chaos engineering, and automated recovery.
- Analyze current ITSM workflows (e.g., incident, change, problem) and identify high-impact automation opportunities.
- Architect and develop automation solutions leveraging existing ITSM tools and integrating with other systems.
- Deploy and integrate automated workflows, ensuring seamless dataflow and reporting.
- Establish metrics to track the effectiveness of implemented automation.
- Partner with Enterprise Dev Ops, Integration Platform Dev Ops, other Dev Ops teams to review existing CI/CD pipelines, identify bottlenecks, and areas for improvement.
- Define a comprehensive automation strategy encompassing build, test (unit, integration, B2B), security scanning, and deployment processes to improve production and system resiliency.
- Evaluate and implement best-in-class automation tools and integrate them into a cohesive pipeline. Enable developers with self-service capabilities for environment provisioning and deployment through automation.
- Define standard operating environments and infrastructure configurations with assigned release windows. Implement an Infrastructure-as-Code (IaC) framework using tools like Terraform or Ansible by partnering multiple teams including Cloud Architecture.
- Advise building automated pipelines for provisioning, configuring, patching, and deploying infrastructure components.
- Implement automated testing and rollback mechanisms for infrastructure changes.
- Leads the design, development, enhancement and maintenance of tools, systems and software solutions.
- Designs, architects, develops, integrates, and tests systems, solutions or products.
- Performs incident triage, including determining scope, urgency, and potential impact.
- Leads technology evaluations and re-engineering activities to support strategy definition and continuous improvement activities.
- Leads the identification, design and implementation of automated solutions to enable development needs.
- Transforms business requirements into technical specifications.
- Accountable for stakeholder engagement/management to understand internal processes and identify potential hard or soft gaps between capabilities and business requirements or expectations.
- Manages relationships with stakeholders to enable cross functional coordination and ensure a partnership focused approach is taken to align product and system releases and roadmaps to technology policies and standards and ensure all exceptions or gaps in capabilities or coverage in systems are managed with a risk based approach that balances services priorities with business needs.
- Expertise in designing fault-tolerant architectures with automated failover, multi-region redundancy, and graceful degradation strategies enabled by Chaos engineering.
- Deep understanding of complex distributed systems, including microservices orchestration, service meshes, and eventual consistency models.
- Demonstrated experience providing customer-driven solutions, support or service.
- Extensive knowledge and understanding of software engineering architectures, system/software designs, and system deployments.
- Demonstrated experience in multiple IT subject areas (e.g., development, testing, configuration, deployment, monitoring, etc.).
- Extensive knowledge and understanding of infrastructure technologies and application development methodologies.
- Demonstrated experience leading System Administration (configuration, installations, patch management, server maintenance, etc.) and Network Management (firewalls, proxies, IP management, routing, DNS).
- Demonstrated experience leading the utilization and support of integration and communication protocols between applications, databases, and technology platforms.
- Demonstrates strong foundations in building frameworks that scale to the enterprise requirements and continue to provide specs to build API's that can be consumable as part of the fulfillment processes.
- Offer to selected candidate will be made contingent on the results of applicable background checks
- Offer to selected candidate is contingent on signing a non-disclosure agreement for proprietary information, trade secrets, and inventions
- Sponsorship will not be considered for this position unless specified in the posting
In this flex office/home role, you will be expected to work a minimum of 10 days per month from one of the following office locations:
Madison, WI 53783;
Boston, MA 02110
Candidates must reside within a 50-mile radius of the office location (or 35-mile radius for Boston)
Internal candidates…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×