×
Register Here to Apply for Jobs or Post Jobs. X

SRE Architect; Automation​/AI

Job in Concord, Cabarrus County, North Carolina, 28027, USA
Listing for: Gravity IT Resources
Contract position
Listed on 2026-02-06
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, AI Engineer, Cybersecurity
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Position: SRE Architect (Automation/AI)

To Apply for this Job Site Reliability Engineering Architect (Remote EST)
Location: Charlotte, NC (Remote, EST preferred)
Type: 12 month contract (with potential to extend)
Position Overview
The Site Reliability Engineering (SRE) Architect is a senior technical leader responsible for designing and evolving automation first AI augmented reliability platforms for large scale cloud and hybrid environments. This role defines how systems detect, decide, and act with minimal human intervention, setting the technical direction, standards, and guardrails that reduce toil while improving resilience and delivery velocity. An automation first mindset is required;

observability alone is not sufficient—signals must drive automated or AI assisted action.

Core Responsibilities
Reliability Architecture & Operational Design

  • Define reference architectures that prioritize automated and AI assisted fault isolation, graceful degradation, and recovery.
  • Embed reliability, security, and governance into operational workflows and platforms—reducing complexity, human dependency, and operational risk.
  • Establish standards so that every operational signal has a defined automated or AI assisted response path (not just a dashboard alert).

Automation Platforms & Workflow Engineering

  • Architect event driven automation spanning detection decisioning execution (e.g., health checks enrichment safe remediation).
  • Replace ticket driven/manual runbooks with executable, testable automation and standardized patterns across incident response, change, and platform ops.
  • Ensure automation is resilient, observable, and auditable, with clear approval paths for higher risk actions.

AI Driven & Agent Based Operations

  • Design and own internal AI driven operational platforms that retrieve context, reason over signals, and invoke controlled actions across services.
  • Define guardrails, approvals, observability, and auditability for AI initiated actions; integrate AI decisioning directly into workflows.
  • Enable agent coordination and capability discovery for safe execution in production.

Observability, Signal Processing, & Decision Systems

  • Evolve observability from dashboards and decision systems that feed automation.
  • Build signal pipelines correlating metrics, logs, traces, and events to reduce noise and alert fatigue and to trigger context aware remediation.
  • Leverage existing tools (e.g., Dynatrace—DQL/APIs/AIOps/extensions, Zabbix, Pager Duty, Alertbot) to produce actionable context rather than standalone alerts. (Tools from original environment)

Cloud & Platform Monitoring Enablement

  • Support and extend AWS monitoring (e.g., Cloud Watch, ECS) with automation hooks and AI assisted triage.
  • Align low code enterprise automation (e.g., Power Platform) with code first systems, preventing platform sprawl while accelerating safe, governed workflows.

Leadership & Technical Influence

  • Serve as architectural authority for reliability, automation, and AI driven operations; mentor senior engineers and uplift organizational maturity.
  • Partner with application, middleware, infrastructure, security, and compliance teams to deliver scalable, safety critical operational platforms.
  • Challenge designs that increase operational risk, toil, or manual dependency; champion automation first solutions.

Required Qualifications

  • 5+ years in SRE, Platform Engineering, Dev Ops, or Infrastructure Engineering supporting complex distributed systems.
  • Proven experience designing and operating automation heavy platforms (event driven workflows, orchestration, policy/guardrails).
  • Strong programming & automation skills (e.g., Python) and workflow orchestration/event driven systems experience.
  • Practical experience integrating AI or intelligent decision systems into production operations (3–5 years AI/ML preferred).
  • Deep understanding of failure modes, blast radius management, and risk aware automation.

Important: Candidates with observability only backgrounds—without deep, hands on automation/workflow engineering—will not be a fit.
Preferred Qualifications

  • Experience designing or implementing agent based or AI assisted operational systems; familiarity with modern AI platforms and model integration for ops use cases.
  • Experien…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary