Senior Site Reliability Engineer
Job in
Irving, Dallas County, Texas, 75038, USA
Listed on 2026-06-04
Listing for:
Veloc Inc.
Full Time
position Listed on 2026-06-04
Job specializations:
-
IT/Tech
SRE/Site Reliability, Cloud Computing, Systems Engineer, IT Support
Job Description & How to Apply Below
You will write and review automation code, contribute to architecture and deployment discussions, and collaborate closely with product engineering teams to ensure operational and reliability decisions are made correctly the first time.
Key Responsibilities Reliability Engineering & Operations (~40% of role) Own day-to-day monitoring, alerting, operational health, and on-call support for mission-critical SaaS platforms and cloud infrastructure. Lead major incident response activities including escalation coordination, root cause analysis, and postmortem reviews. Design and maintain high-availability, failover, backup, and disaster recovery procedures; validate RTO/RPO targets regularly. Investigate and resolve production incidents end-to-end across infrastructure, platform, and application layers.
Automation & Platform Engineering (~30% of role) Design, implement, and maintain Infrastructure as Code (IaC), deployment automation, and CI/CD pipeline improvements. Develop tooling and automation to reduce operational toil and improve engineering productivity. Partner with development teams to improve deployment safety, release reliability, and operational scalability. Drive standardization of cloud infrastructure, operational engineering practices, and deployment governance. Observability & Performance Optimization (~15% of role) Build and maintain monitoring, logging, tracing, and alerting capabilities across distributed systems.
Establish service-level objectives (SLOs), SLIs, and error budget policies. Identify and remediate performance bottlenecks, scaling issues, and infrastructure inefficiencies. Analyze operational telemetry and trends to improve reliability and capacity planning. Security, Compliance & Architecture (~15% of role) Implement operational security best practices including RBAC, least privilege access, and infrastructure hardening. Ensure compliance with SOC 2, HIPAA, GDPR, and organizational security standards.
Participate in architecture reviews and operational readiness assessments for new services and platforms. Mentor junior engineers on reliability engineering, cloud operations, automation, and incident management best practices.
Required Qualifications 7+ years of experience in Site Reliability Engineering, Dev Ops, Cloud Infrastructure, or Production Operations roles. Strong experience operating workloads in cloud environments such as Microsoft Azure, AWS, or Google Cloud. Hands-on experience with Kubernetes, Docker, CI/CD pipelines, and Infrastructure as Code tools. Strong scripting and automation skills using Python, Bash, Power Shell, Go, or similar languages.
Experience with observability and monitoring platforms such as Datadog, Grafana, Prometheus, or Splunk. Strong understanding of networking, Linux/Windows administration, distributed systems, and cloud-native architectures.
Experience with incident response, production troubleshooting, and operational governance. Strong communication skills and ability to collaborate across engineering and business teams.
Preferred Qualifications Experience supporting multi-tenant SaaS environments.
Experience with Terraform, Bicep, ARM templates, or Ansible. Familiarity with Git Ops and modern deployment strategies such as canary or blue/green deployments. Experience working within regulated or compliance-driven environments. Relevant cloud or Kubernetes certifications.
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×