DC Technician - Sr
Listed on 2026-06-28
-
IT/Tech
Systems Engineer, Cloud Computing: Infrastructure & Operations, SRE/Site Reliability
Production Support And Site Reliability Engineer
This will be a day shift position, potential shifts would be Sun-Tues every other Saturday. 6am-6pm EST Could also be Wed-Fri and every other Saturday 6am-6pm EST
Location:
Richmond, VA or Plano, TX
Contract to Hire Opportunity!
Apex Systems is working alongside one of our clients in the search for a Production Support and Site Reliability Engineer! If you are interested, please apply below!
For this position, Software and Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to manage and support large-scale, massively distributed, fault-tolerant systems hosted in the external cloud environment. SRE engineers ensure that financial services-both our internally critical and our externally-visible systems have reliability and uptime appropriate to customers’ needs and a fast rate of improvement while keeping an ever-watchful eye on availability, capacity and performance in a 24/7 environment.
SRE engineers will be engaged in automation and development work, reducing toil, developing self-service capabilities, automating manual tasks and develop support tools, utilizing common scripting languages (i.e. Python). When incidents do occur, SRE engineer is responsible for taking ownership and consulting, engaging and partnering with Lines of Business to lead the team towards successful resolution, as well as conducting problem management activities coupled with implementing future prevention and automation strategies.
Basic Qualifications:
- Bachelor's Degree
- At least 3 years of expertise in designing, analyzing, and troubleshooting large-scale distributed systems (i.e. 3-tier applications end-to-end troubleshooting).
- At least 2 years of experience with Unix/Linux operating systems.
- At least 2 year of Infrastructure experience with networks, load balancers, firewalls and web application firewall (WAF)
- At least 2 years of experience with Scripting language(s) to debug, optimize code, and automate routine tasks.
- At least 2 years of experience using and supporting External Cloud environments (i.e. troubleshooting cloud-hosted micro-service failures).
- At least 2 years’ experience with enterprise monitoring and observability solutions (Splunk, Datadog, Pager Duty or New Relic)
- Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
Preferred Qualifications:
- Master's Degree
- 3+ years of software development
- 3+ years of automation experience
- 3+ years of Dev Ops and/or SRE experience
Required Skills :
Monitoring tools (Splunk, New Relic, etc.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).