More jobs:
Principal SRE; AI Enablement Platform
Job in
Town of Poland, Jamestown, Chautauqua County, New York, 14701, USA
Listed on 2026-06-02
Listing for:
ABC Financial
Full Time
position Listed on 2026-06-02
Job specializations:
-
IT/Tech
Systems Engineer, SRE/Site Reliability, Cloud Computing
Job Description & How to Apply Below
Location: Town of Poland
Join ABC Fitness, the leading technology provider for the fitness industry!
What You’ll Do- Architect and evolve core platform capabilities for reliability, including execution environments, CI/CD systems, and validation pipelines that support high-throughput, machine-assisted change.
- Design and implement fast, isolated execution environments where generated work can be built, tested, and safely discarded at scale.
- Transform CI/CD into a validation system by embedding automated verification (tests, integration harnesses, canarying, rollback signals) into promotion decisions.
- Build production-like validation environments that allow realistic system behavior testing without impacting live systems.
- Establish deep observability patterns for autonomous workflows, including tracing what ran, what failed, why, and what it cost across agents, tools, and orchestration layers.
- Define and implement guardrails-as-code, including access controls, policy enforcement, cost protections, and auditability for platform usage.
- Design for reliability from day one, including scalability, fault tolerance, performance optimization, and operational resilience.
- Lead technical design reviews and influence platform and infrastructure decisions across engineering teams.
- Define and document reusable infrastructure patterns, platform standards, and reference implementations that create a consistent paved path for teams.
- Not a ticket queue or generic support role.
- Not incremental-only ops without ownership of architecture and adoption.
- Not "just Kubernetes admin", Kubernetes is one layer in a broader platform problem.
- Typically 10+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or Platform Engineering.
- Deep experience designing and operating distributed systems at scale, including cloud platforms (e.g., AWS), Kubernetes, and infrastructure-as-code.
- Strong expertise in reliability engineering practices, including incident management, fault isolation, resiliency design, and system performance tuning.
- Experience building and operating CI/CD systems, test harnesses, and automated validation frameworks.
- Strong understanding of observability systems, including metrics, logging, tracing, and system-level debugging.
- Demonstrated ability to define technical standards and influence multiple teams through architecture, design review, and strong engineering judgment.
- Strong production mindset, with experience designing systems for scalability, availability, and operational efficiency.
- Experience implementing secure, multi-tenant infrastructure with strong isolation, IAM, and secrets management practices.
- Excellent cross-functional collaboration skills.
- Growth mindset and One Team orientation.
- Experience supporting AI/LLM-powered systems in production, including understanding of latency, cost, and orchestration challenges.
- Experience designing high-throughput, isolated compute systems or sandboxed execution environments.
- Experience building internal developer platforms or platform-as-a-product capabilities.
- Familiarity with governance or regulated environments.
- Experience with advanced validation systems such as canarying, chaos engineering, or automated rollback strategies.
- Faster delivery through platform-enabled validation and automation.
- Automated validation of changes before production, reducing reliance on manual review.
- Platform standards adopted across teams as the default paved path.
- Early detection of reliability issues through strong observability and validation systems.
- Reduced infrastructure complexity so engineers can focus on product and policy.
ABC Fitness is evolving toward an AI-native engineering model where automation, agents, and platform systems handle increasing portions of the software lifecycle. This role builds the foundation that enables scalable, trustworthy, and high-velocity software delivery across the organization.
#J-18808-LjbffrTo View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×