More jobs:
Software Engineer - Resiliency and Platform Engineering
Job in
Scottsdale, Maricopa County, Arizona, 85261, USA
Listed on 2026-01-31
Listing for:
Choice Hotels International, Inc.
Full Time
position Listed on 2026-01-31
Job specializations:
-
Software Development
Software Engineer, DevOps, Cloud Engineer - Software
Job Description & How to Apply Below
Scottsdale AZ - Sky Touch Technology time type:
Full time posted on:
Posted Yesterday job requisition :
R20988
** This role is not eligible for sponsorship AND is four days onsite hybrid at our N. Scottsdale office
** Choice Hotels has an exciting new opportunity as our
** Staff Software Engineer, Resiliency & Platform Engineering
** in the Sky Touch Technology division. Sky Touch Technology is an independently operated division of Choice Hotels that provides the most widely used cloud-based (SaaS) hotel property management system. As a key member of our Sky Touch Technology division, you will help strengthen the resiliency, safety, and operability of a large-scale, multi-tenant SaaS platform by improving foundational platform capabilities, runtime behavior, and the developer experience used to build and operate our systems.
This role sits at the intersection of
** software engineering, platform engineering, and resiliency**. You will focus on building shared capabilities, libraries, frameworks, tooling, guardrails, and standards used by dozens of engineers across the organization. These capabilities make resilient behavior the default for application teams and reduce operational risk through better system design rather than reactive response.
This is
** not a traditional Site Reliability Engineering (SRE) role**. In our environment, resiliency and platform engineering are proactive, year-round engineering disciplines focused on preventing failures, improving system behavior under stress, and enabling teams to build and operate services safely emphasis is on durable, systemic improvements and developer enablement rather than pager-driven operations or feature delivery.
You will also be expected to apply
** AI-assisted tools and techniques pragmatically
** to reduce engineering toil, improve diagnostics, and accelerate resiliency and platform outcomes, prioritizing durability, correctness, and adoption over experimentation.
The **#Skys The Limit
** when you **#Make It Your Choice **! We encourage you to apply today!
** Your Responsibilities
***
* Important note:
** This role writes production code,
** but not for feature delivery**. Coding is focused on platform-level resiliency, developer enablement, observability, and systemic improvements rather than customer-facing feature enhancements.
* Design and implement
** platform-level capabilities
** including shared libraries, frameworks, tooling, automation, and guardrails that improve application resiliency, runtime safety, and developer experience across the ecosystem, favoring leverage and durability over short-term delivery.
* Strengthen foundational platform and runtime behavior by identifying and eliminating systemic failure modes such as JVM memory leaks, unsafe defaults, brittle error handling, poor failure propagation, and resource exhaustion.
* Improve how software is built and operated at scale by defining and rolling out
** developer-facing standards and paved roads
** for resiliency, observability, error handling, and operational readiness.
* Define, standardize, and evolve
** logging, monitoring, alerting, and observability practices
** that improve signal quality, reduce noise, and enable faster diagnosis and recovery.
* Partner closely with Principal Software Engineers, Solution Architects, and Engineering Managers to identify systemic risks and translate them into
** well-scoped platform and resiliency initiatives
** and technical work.
* Operate across software engineering resiliency, data engineering resiliency, and platform engineering teams to identify cross-cutting risks, design shared solutions, and raise the technical bar, rather than owning individual team backlogs.
* Engage directly in application codebases, particularly during ramp-up, to understand real-world system behavior, identify failure patterns, and validate resiliency improvements. Exit application-level work once learning is complete and systemic improvements are identified.
* Participate in incident postmortems and operational…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×