×
Register Here to Apply for Jobs or Post Jobs. X

Principal SRE; AI Enablement Platform

Job in Town of Poland, Jamestown, Chautauqua County, New York, 14701, USA
Listing for: ABC Financial
Full Time position
Listed on 2026-06-02
Job specializations:
  • IT/Tech
    Systems Engineer, SRE/Site Reliability, Cloud Computing
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Position: Principal SRE (AI Enablement Platform)-2
Location: Town of Poland

Join ABC Fitness, the leading technology provider for the fitness industry!

What You’ll Do
  • Architect and evolve core platform capabilities for reliability, including execution environments, CI/CD systems, and validation pipelines that support high-throughput, machine-assisted change.
  • Design and implement fast, isolated execution environments where generated work can be built, tested, and safely discarded at scale.
  • Transform CI/CD into a validation system by embedding automated verification (tests, integration harnesses, canarying, rollback signals) into promotion decisions.
  • Build production-like validation environments that allow realistic system behavior testing without impacting live systems.
  • Establish deep observability patterns for autonomous workflows, including tracing what ran, what failed, why, and what it cost across agents, tools, and orchestration layers.
  • Define and implement guardrails-as-code, including access controls, policy enforcement, cost protections, and auditability for platform usage.
  • Design for reliability from day one, including scalability, fault tolerance, performance optimization, and operational resilience.
  • Lead technical design reviews and influence platform and infrastructure decisions across engineering teams.
  • Define and document reusable infrastructure patterns, platform standards, and reference implementations that create a consistent paved path for teams.
What This Is Not
  • Not a ticket queue or generic support role.
  • Not incremental-only ops without ownership of architecture and adoption.
  • Not "just Kubernetes admin", Kubernetes is one layer in a broader platform problem.
What You’ll Need
  • Typically 10+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or Platform Engineering.
  • Deep experience designing and operating distributed systems at scale, including cloud platforms (e.g., AWS), Kubernetes, and infrastructure-as-code.
  • Strong expertise in reliability engineering practices, including incident management, fault isolation, resiliency design, and system performance tuning.
  • Experience building and operating CI/CD systems, test harnesses, and automated validation frameworks.
  • Strong understanding of observability systems, including metrics, logging, tracing, and system-level debugging.
  • Demonstrated ability to define technical standards and influence multiple teams through architecture, design review, and strong engineering judgment.
  • Strong production mindset, with experience designing systems for scalability, availability, and operational efficiency.
  • Experience implementing secure, multi-tenant infrastructure with strong isolation, IAM, and secrets management practices.
  • Excellent cross-functional collaboration skills.
  • Growth mindset and One Team orientation.
And It’s Great to Have
  • Experience supporting AI/LLM-powered systems in production, including understanding of latency, cost, and orchestration challenges.
  • Experience designing high-throughput, isolated compute systems or sandboxed execution environments.
  • Experience building internal developer platforms or platform-as-a-product capabilities.
  • Familiarity with governance or regulated environments.
  • Experience with advanced validation systems such as canarying, chaos engineering, or automated rollback strategies.
What Success Looks Like
  • Faster delivery through platform-enabled validation and automation.
  • Automated validation of changes before production, reducing reliance on manual review.
  • Platform standards adopted across teams as the default paved path.
  • Early detection of reliability issues through strong observability and validation systems.
  • Reduced infrastructure complexity so engineers can focus on product and policy.
Why This Matters

ABC Fitness is evolving toward an AI-native engineering model where automation, agents, and platform systems handle increasing portions of the software lifecycle. This role builds the foundation that enables scalable, trustworthy, and high-velocity software delivery across the organization.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary