SRE Engineer Job Irving area,Texas USA,IT/Tech

Overview

Prodapt is the largest specialized player in the Connectedness industry. As an AI-first strategic technology partner, Prodapt provides consulting, business reengineering, and managed services for the largest telecom and tech enterprises building networks and digital experiences tomorrow. A Service Now-invested company, Prodapt has been recognized by Gartner as a Large, Telecom-Native, Regional IT Service Provider.

We are seeking a skilled Site Reliability Engineer (SRE) with 3–5 years of experience to join our reliability and infrastructure team. The ideal candidate will have a strong background in systems engineering, cloud platforms, and automation, with a passion for building resilient, scalable, and observable systems. This role involves both hands-on engineering and collaboration with cross-functional teams to improve reliability and developer productivity.

Key Responsibilities

System Reliability & Operations
- Ensure high availability and performance of production systems.
- Participate in on-call rotations and incident response, driving root cause analysis and postmortems.
- Implement monitoring, alerting, and observability solutions to proactively detect issues.
Infrastructure & Automation
- Design, build, and maintain CI/CD pipelines for seamless deployments.
- Automate infrastructure provisioning and scaling using Infrastructure-as-Code (Terraform, Ansible, etc.).
- Manage containerized workloads with Docker and Kubernetes.
Performance & Scalability
- Conduct capacity planning, load testing, and performance tuning.
- Optimize system reliability through fault-tolerant design and distributed systems best practices.
- Collaborate with developers to improve application performance and resilience.
Security & Compliance
- Implement security best practices in infrastructure and operations.
- Ensure compliance with organizational and regulatory standards.
- Contribute to disaster recovery and business continuity planning.
Collaboration & Continuous Improvement
- Work closely with development teams to embed reliability into the software lifecycle.
- Document processes, runbooks, and operational standards.
- Contribute to a culture of continuous learning and improvement.

Required

Skills & Qualifications

3–5 years of experience in SRE, Dev Ops, or systems engineering roles.
Strong knowledge of Linux/Unix systems and shell scripting.
Hands-on experience with cloud platforms (AWS, Azure, GCP).
Proficiency with container orchestration (Kubernetes) and CI/CD pipelines
.
Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK, Datadog).
Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP).
Strong problem-solving and troubleshooting skills.

Education & Certifications

Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent practical experience).
Certifications in cloud (AWS Certified Solutions Architect, GCP Professional Cloud Engineer, etc.) or Kubernetes (CKA/CKAD) are a plus.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language