Systems Engineer - Automation Job Atlanta area,Georgia USA,IT/Tech

Position: Systems Engineer - Automation 4872

Overview

Position: Site Reliability Engineer (SRE) - Infrastructure

Location: Atlanta, GA

Employment Type: Full-Time

Work Arrangement: Onsite Hybrid

The Site Reliability Engineer (SRE) will ensure the reliability, scalability, and performance of enterprise applications and services across cloud and on-premises environments. This role focuses on automation, monitoring, and incident response to minimize downtime and enhance operational efficiency. The position requires close collaboration with development, quality assurance, and operations teams to deliver secure and resilient systems.

What You Will Do

Design, build, and maintain secure, compliant infrastructure using Infrastructure as Code tools such as Terraform and Ansible
Automate provisioning and management of servers, storage, networks, Kubernetes clusters, and related systems across cloud and on-premises environments
Develop tools and processes for automated deployment, configuration, monitoring, and alerting
Collaborate with cross-functional teams to implement scalable and reliable cloud and data center solutions
Participate in incident response, on-call rotations, and post-incident reviews to improve system resilience
Monitor system performance and availability using service-level agreements (SLAs), objectives (SLOs), and indicators (SLIs); proactively troubleshoot and resolve reliability, performance, or security issues
Create and maintain disaster recovery and business continuity plans for critical systems
Continuously analyze and improve infrastructure efficiency, scalability, and performance
Stay current with emerging technologies and recommend tools or practices to enhance platform capabilities
Share technical expertise and mentor team members to strengthen internal capabilities

What We Are Looking For

Required Qualifications
- Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent experience
- Proven experience as a Site Reliability Engineer or Systems Engineer
- Strong proficiency in Terraform and Ansible for infrastructure automation
- Hands-on experience with Kubernetes, Docker, or other container orchestration tools
- Proficiency in scripting languages such as Python or Bash
- In-depth knowledge of Google Cloud Platform (GCP) services including compute, networking, storage, Kubernetes, and security
- Solid understanding of VMware virtualization and enterprise storage systems (e.g., Pure Storage)
- Experience with networking technologies including VLANs, VPNs, and routing protocols
- Strong grasp of IT infrastructure and operations principles, including systems integration and automation best practices
- Excellent communication and collaboration skills
- Ability to manage multiple priorities under pressure with strong problem-solving skills
Preferred Qualifications
- Relevant certifications such as ITIL, PMP, or CISSP
- Experience in regulated or enterprise environments

Core Competencies

Communication and collaboration across technical and business teams
Problem-solving and analytical thinking
Ownership and accountability for system reliability
Adaptability to emerging technologies and changing business needs
Leadership and mentorship within technical teams

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language