Software Engineer - Resiliency and Platform Engineering Job Scottsdale area,Arizona USA,Software Development

Staff Software Engineer - Resiliency and Platform Engineering page is loaded## Staff Software Engineer - Resiliency and Platform Engineering locations:
Scottsdale AZ - Sky Touch Technology time type:
Full time posted on:
Posted Yesterday job requisition :
R20988
** This role is not eligible for sponsorship AND is four days onsite hybrid at our N. Scottsdale office
** Choice Hotels has an exciting new opportunity as our
** Staff Software Engineer, Resiliency & Platform Engineering
** in the Sky Touch Technology division. Sky Touch Technology is an independently operated division of Choice Hotels that provides the most widely used cloud-based (SaaS) hotel property management system. As a key member of our Sky Touch Technology division, you will help strengthen the resiliency, safety, and operability of a large-scale, multi-tenant SaaS platform by improving foundational platform capabilities, runtime behavior, and the developer experience used to build and operate our systems.

This role sits at the intersection of
** software engineering, platform engineering, and resiliency**. You will focus on building shared capabilities, libraries, frameworks, tooling, guardrails, and standards used by dozens of engineers across the organization. These capabilities make resilient behavior the default for application teams and reduce operational risk through better system design rather than reactive response.

This is
** not a traditional Site Reliability Engineering (SRE) role**. In our environment, resiliency and platform engineering are proactive, year-round engineering disciplines focused on preventing failures, improving system behavior under stress, and enabling teams to build and operate services safely emphasis is on durable, systemic improvements and developer enablement rather than pager-driven operations or feature delivery.

You will also be expected to apply
** AI-assisted tools and techniques pragmatically
** to reduce engineering toil, improve diagnostics, and accelerate resiliency and platform outcomes, prioritizing durability, correctness, and adoption over experimentation.

The **#Skys The Limit
** when you **#Make It Your Choice **! We encourage you to apply today!
** Your Responsibilities
***
* Important note:

** This role writes production code,
** but not for feature delivery**. Coding is focused on platform-level resiliency, developer enablement, observability, and systemic improvements rather than customer-facing feature enhancements.
* Design and implement
** platform-level capabilities
** including shared libraries, frameworks, tooling, automation, and guardrails that improve application resiliency, runtime safety, and developer experience across the ecosystem, favoring leverage and durability over short-term delivery.
* Strengthen foundational platform and runtime behavior by identifying and eliminating systemic failure modes such as JVM memory leaks, unsafe defaults, brittle error handling, poor failure propagation, and resource exhaustion.
* Improve how software is built and operated at scale by defining and rolling out
** developer-facing standards and paved roads
** for resiliency, observability, error handling, and operational readiness.
* Define, standardize, and evolve
** logging, monitoring, alerting, and observability practices
** that improve signal quality, reduce noise, and enable faster diagnosis and recovery.
* Partner closely with Principal Software Engineers, Solution Architects, and Engineering Managers to identify systemic risks and translate them into
** well-scoped platform and resiliency initiatives
** and technical work.
* Operate across software engineering resiliency, data engineering resiliency, and platform engineering teams to identify cross-cutting risks, design shared solutions, and raise the technical bar, rather than owning individual team backlogs.
* Engage directly in application codebases, particularly during ramp-up, to understand real-world system behavior, identify failure patterns, and validate resiliency improvements. Exit application-level work once learning is complete and systemic improvements are identified.
* Participate in incident postmortems and operational…


Increase/decrease your Search Radius (miles)



Job Posting Language