SRE Architect; Automation/AI Job Concord area,North Carolina USA,IT/Tech

Position: SRE Architect (Automation/AI)

To Apply for this Job Site Reliability Engineering Architect (Remote EST)
Location: Charlotte, NC (Remote, EST preferred)
Type: 12 month contract (with potential to extend)
Position Overview
The Site Reliability Engineering (SRE) Architect is a senior technical leader responsible for designing and evolving automation first AI augmented reliability platforms for large scale cloud and hybrid environments. This role defines how systems detect, decide, and act with minimal human intervention, setting the technical direction, standards, and guardrails that reduce toil while improving resilience and delivery velocity. An automation first mindset is required;

observability alone is not sufficient—signals must drive automated or AI assisted action.

Core Responsibilities
Reliability Architecture & Operational Design

Define reference architectures that prioritize automated and AI assisted fault isolation, graceful degradation, and recovery.
Embed reliability, security, and governance into operational workflows and platforms—reducing complexity, human dependency, and operational risk.
Establish standards so that every operational signal has a defined automated or AI assisted response path (not just a dashboard alert).

Automation Platforms & Workflow Engineering

Architect event driven automation spanning detection decisioning execution (e.g., health checks enrichment safe remediation).
Replace ticket driven/manual runbooks with executable, testable automation and standardized patterns across incident response, change, and platform ops.
Ensure automation is resilient, observable, and auditable, with clear approval paths for higher risk actions.

AI Driven & Agent Based Operations

Design and own internal AI driven operational platforms that retrieve context, reason over signals, and invoke controlled actions across services.
Define guardrails, approvals, observability, and auditability for AI initiated actions; integrate AI decisioning directly into workflows.
Enable agent coordination and capability discovery for safe execution in production.

Observability, Signal Processing, & Decision Systems

Evolve observability from dashboards and decision systems that feed automation.
Build signal pipelines correlating metrics, logs, traces, and events to reduce noise and alert fatigue and to trigger context aware remediation.
Leverage existing tools (e.g., Dynatrace—DQL/APIs/AIOps/extensions, Zabbix, Pager Duty, Alertbot) to produce actionable context rather than standalone alerts. (Tools from original environment)

Cloud & Platform Monitoring Enablement

Support and extend AWS monitoring (e.g., Cloud Watch, ECS) with automation hooks and AI assisted triage.
Align low code enterprise automation (e.g., Power Platform) with code first systems, preventing platform sprawl while accelerating safe, governed workflows.

Leadership & Technical Influence

Serve as architectural authority for reliability, automation, and AI driven operations; mentor senior engineers and uplift organizational maturity.
Partner with application, middleware, infrastructure, security, and compliance teams to deliver scalable, safety critical operational platforms.
Challenge designs that increase operational risk, toil, or manual dependency; champion automation first solutions.

Required Qualifications

5+ years in SRE, Platform Engineering, Dev Ops, or Infrastructure Engineering supporting complex distributed systems.
Proven experience designing and operating automation heavy platforms (event driven workflows, orchestration, policy/guardrails).
Strong programming & automation skills (e.g., Python) and workflow orchestration/event driven systems experience.
Practical experience integrating AI or intelligent decision systems into production operations (3–5 years AI/ML preferred).
Deep understanding of failure modes, blast radius management, and risk aware automation.

Important: Candidates with observability only backgrounds—without deep, hands on automation/workflow engineering—will not be a fit.
Preferred Qualifications

Experience designing or implementing agent based or AI assisted operational systems; familiarity with modern AI platforms and model integration for ops use cases.
Experien…


Increase/decrease your Search Radius (miles)



Job Posting Language

SRE Architect; Automation​/AI

SRE Architect; Automation/AI