Senior Site Reliability Engineer; SRE - NC,TX Job Charlotte area,North Carolina USA,IT/Tech

Position: Senior Site Reliability Engineer (SRE) - NC, TX

Job#: 3036969

Client:
Financial Services

Team: TBA

Job Title:

Systems Operations Engineer 4 / Senior Site Reliability Engineer (SRE)

Location:

Charlotte, NC / Irving, TX – Hybrid (3 days onsite mandatory)

Contract Length: 18 months (potential extension; must be eligible for conversion)

Pay Rate: $60 - $65

Top Requirements

2+ years of hands‑on SRE experience with strong production support and reliability focus
Strong Kubernetes/Open Shift experience and Linux‑based systems expertise
Experience with observability tools (Grafana, Prometheus, Splunk, App Dynamics) and defining SLOs/SLIs

Plusses

Experience with automation/scripting (Python or Java)
Experience with Auto Sys and ITSM tools (Service Now, Netcool)
Experience with AIOps, AI/ML, or RPA‑driven automation
Experience in financial services or regulated environments
Strong database knowledge (SQL, Oracle, Mongo

DB, etc.)

Job Summary

In this contingent resource assignment, you may:
Consult on complex initiatives with broad impact and large‑scale planning for Systems Operations Engineering. Review and analyze complex multi‑faceted, larger‑scale or longer‑term Systems Operations Engineering challenges that require in‑depth evaluation of multiple factors including intangibles or unprecedented factors. Contribute to the resolution of complex and multi‑faceted situations requiring solid understanding of function, policies, procedures, and compliance requirements that meet deliverables.

Strategically collaborate and consult with client personnel.

Day‑to‑Day Responsibilities Reliability Engineering & Operations

Design and implement solutions to improve system reliability, availability, and scalability
Support critical applications across domains including payments, regulatory operations, and financial crimes
Transition from reactive incident response to proactive reliability engineering and prevention

Automation & Toil Reduction

Build automation tools to reduce manual operational tasks
Implement self‑healing and AIOps solutions using AI/ML and RPA where applicable
Improve operational efficiency through scripting (Python/Java) and automation frameworks

Observability & Monitoring

Implement and enhance monitoring, alerting, and logging systems
Define and measure SLOs, SLIs, and error budgets across applications
Improve visibility and reduce alert noise using enterprise observability tools

Incident & Problem Management

Lead incident response, root cause analysis, and postmortems
Drive remediation strategies to eliminate recurring production issues
Partner with teams to improve response times and operational readiness

Platform & Infrastructure

Work with Kubernetes/Open Shift environments and cloud platforms (AWS, GCP, Azure, PCF)
Support microservices architectures, APIs, and messaging systems (Kafka, MQ)

Collaboration & Leadership

Champion SRE best practices across teams and mentor peers
Partner with engineering, product, and operations teams to align on system reliability goals
Identify gaps in processes and drive continuous improvement initiatives

EEO Statement

Apex Systems is an equal‑opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law.

Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law.

#J-18808-Ljbffr

Senior Site Reliability Engineer; SRE - NC, TX