Senior Site Reliability Engineer; SRE - NC, TX
Listed on 2026-06-12
-
IT/Tech
SRE/Site Reliability, Systems Engineer, Cloud Computing
Job#: 3036969
Client:
Financial Services
Team: TBA
Job Title:
Systems Operations Engineer 4 / Senior Site Reliability Engineer (SRE)
Location:
Charlotte, NC / Irving, TX – Hybrid (3 days onsite mandatory)
Contract Length: 18 months (potential extension; must be eligible for conversion)
Pay Rate: $60 - $65
Top Requirements- 2+ years of hands‑on SRE experience with strong production support and reliability focus
- Strong Kubernetes/Open Shift experience and Linux‑based systems expertise
- Experience with observability tools (Grafana, Prometheus, Splunk, App Dynamics) and defining SLOs/SLIs
- Experience with automation/scripting (Python or Java)
- Experience with Auto Sys and ITSM tools (Service Now, Netcool)
- Experience with AIOps, AI/ML, or RPA‑driven automation
- Experience in financial services or regulated environments
- Strong database knowledge (SQL, Oracle, Mongo
DB, etc.)
In this contingent resource assignment, you may:
Consult on complex initiatives with broad impact and large‑scale planning for Systems Operations Engineering. Review and analyze complex multi‑faceted, larger‑scale or longer‑term Systems Operations Engineering challenges that require in‑depth evaluation of multiple factors including intangibles or unprecedented factors. Contribute to the resolution of complex and multi‑faceted situations requiring solid understanding of function, policies, procedures, and compliance requirements that meet deliverables.
Strategically collaborate and consult with client personnel.
- Design and implement solutions to improve system reliability, availability, and scalability
- Support critical applications across domains including payments, regulatory operations, and financial crimes
- Transition from reactive incident response to proactive reliability engineering and prevention
- Build automation tools to reduce manual operational tasks
- Implement self‑healing and AIOps solutions using AI/ML and RPA where applicable
- Improve operational efficiency through scripting (Python/Java) and automation frameworks
- Implement and enhance monitoring, alerting, and logging systems
- Define and measure SLOs, SLIs, and error budgets across applications
- Improve visibility and reduce alert noise using enterprise observability tools
- Lead incident response, root cause analysis, and postmortems
- Drive remediation strategies to eliminate recurring production issues
- Partner with teams to improve response times and operational readiness
- Work with Kubernetes/Open Shift environments and cloud platforms (AWS, GCP, Azure, PCF)
- Support microservices architectures, APIs, and messaging systems (Kafka, MQ)
- Champion SRE best practices across teams and mentor peers
- Partner with engineering, product, and operations teams to align on system reliability goals
- Identify gaps in processes and drive continuous improvement initiatives
Apex Systems is an equal‑opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law.
Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).