Site Reliability Engineer; SRE strong Middleware expertise
Listed on 2026-01-05
-
IT/Tech
Cloud Computing, Systems Engineer
Site Reliability Engineer (SRE) with strong Middleware expertise
Location: Plano, TX (5 Days onsite & 24x7 Rotational)
Shift: Rotational (Shift 1 (8 AM - 5 PM), Shift 2 (4 PM - 1 AM), Shift 3 (12 AM - 9 AM)) also on weekend based upon Roaster
Duration: Long Term
Job Description:
- Design, operate, and continuously improve highly available, secure, and scalable enterprise platforms
- Apply SRE principles including automation, observability, SLIs/SLOs, error budgets, and incident reduction
- Partner with application, infrastructure, security, and Dev Ops teams to ensure platform reliability
- Drive automation, standardization, and operational excellence.
Key Responsibilities:
- Define, implement, and track SLIs, SLOs, and error budgets for middleware and platform services
- Drive MTTR reduction, availability improvements, and operational resilience
- Lead incident response, root cause analysis (RCA), and post-incident reviews
- Implement proactive monitoring and alerting to reduce noise and prevent outages
Middleware Platform Engineering:
- Administer and support enterprise middleware platforms, including:
- Oracle Web Logic
- Apache, NGINX
- Java application servers and JVM-based services
- Perform patching, upgrades, configuration tuning, and capacity planning
- Manage certificates, keystores, trust stores, and TLS configurations
- Ensure platform security, compliance, and performance standards
Observability & Monitoring:
- Design and maintain end-to-end observability using tools such as:
- Dynatrace
- ELK / Kibana
- Splunk (or equivalent)
- Build executive and operational dashboards for real-time health visibility
- Reduce alert fatigue through smart alerting, thresholds, and suppression
- Monitor JVM metrics, GC behavior, thread utilization, and API performance
- Develop automation and self-healing solutions using:
- Shell scripting
- Python
- Ansible
- Terraform or similar tools
Automate routine operational tasks (restarts, validations, health checks)
Enable CI/CD-friendly middleware deployments and configuration management
Standardize environments across Dev, QA, and Prod
Cloud, Containers & Modern Platforms:
- Support middleware workloads on:
- Public or hybrid cloud environments (AWS, Azure, GCP)
- Integrate platform reliability into containerized and microservices architectures
- Collaborate with Dev Ops teams on deployment pipelines and release strategies
- Act as a reliability advisor to application and development teams
- Partner with Unix/Linux, Database, Network, and Security teams
- Provide mentoring, documentation, and best-practice guidance
- Participate in on-call rotations and production support leadership
Required
Skills & Experience:
Technical
Skills:
- 5+ years of experience in Middleware / Platform Operations / SRE
- Strong expertise in Web Logic, Java middleware, Apache, and NGINX
- Hands-on experience with observability platforms (Dynatrace, ELK, Splunk)
- Solid understanding of Linux/Unix systems and networking fundamentals
- Experience with API platforms (Apigee preferred)
- Strong automation and scripting skills (Shell, Python, Ansible, Terraform)
- Experience with Kubernetes/Open Shift and containerized workloads
SRE & Operational Excellence:
- Practical experience implementing SRE principles in production
- Strong troubleshooting skills (thread dumps, heap analysis, GC logs)
- Experience with incident management, RCA, and change management
- Ability to balance reliability with delivery velocity
Nice-to-Have:
- Experience with cloud-native architectures and service meshes
- Knowledge of IAM and security integrations (OAuth, SAML, mTLS)
- Exposure to CI/CD tools (Jenkins, Git Hub Actions, Git Lab CI)
Seniority level: Mid-Senior level
Employment type: Contract
Job function: Information Technology
Industries: Software Development
Referrals increase your chances of interviewing at Smart IT Frame LLC by 2x
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).