×
Register Here to Apply for Jobs or Post Jobs. X

Systems Engineer - Cloud Ops

Job in Memphis, Shelby County, Tennessee, 37544, USA
Listing for: AutoZone
Full Time position
Listed on 2026-06-06
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, SRE/Site Reliability, Data Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

As a Systems Engineer on the Cloud Operations team, you will be responsible for deploying, managing, and optimizing our cloud-based infrastructure on Google Cloud Platform (GCP). You will work with technologies such as Terraform, Kubernetes (GKE), Git Ops/ArgoCD, CI/CD pipelines, and observability tools to ensure reliable, secure, and scalable platform operations.

You will also contribute to our AI/ML platform initiatives, supporting infrastructure for LLM-based applications and AI-powered automation tools that enhance developer productivity and operational efficiency.

You will collaborate with development teams, SREs, and platform architects to ensure seamless deployment and delivery of applications while maintaining the highest standards of reliability, security, and performance.

Cloud Infrastructure, Automation & Operations
  • Design, build, and maintain cloud infrastructure using Terraform to automate provisioning, scaling, and lifecycle management of resources on GCP
  • Develop and maintain CI/CD pipelines using Git Lab CI to automate build, test, and deployment workflows. Implement and maintain Git Ops practices using ArgoCD for declarative, version-controlled application deployment
  • Monitor system performance using observability tools (Dynatrace, Cloud Monitoring, Prometheus/Grafana) and troubleshoot production issues
  • Participate in on-call rotation to provide 24/7 support for critical infrastructure incidents
  • Perform root cause analysis on incidents and implement preventive measures. Document runbooks, architecture decisions, and operational procedures
Kubernetes Platform Management
  • Deploy, configure, and manage containerized applications on Google Kubernetes Engine (GKE), including GKE Autopilot and Standard clusters. Manage cluster lifecycle including upgrades, node pool configurations, and capacity planning
  • Troubleshoot pod failures, Crash Loop Back Off , OOMKilled events, and container resource issues
  • Configure and optimize resource requests/limits, Horizontal Pod Autoscaler (HPA), and Vertical Pod Autoscaler (VPA)
  • Manage Kubernetes networking including Services, Ingress controllers, Network Policies, and DNS configurations. Implement and manage service mesh (Istio) for traffic management, observability, and security
  • Manage secrets and configurations using Kubernetes Secrets, Config Maps, and external secret management tools. Implement pod security standards, RBAC policies, and workload identity configurations
AI/ML Platform & Automation
  • Support infrastructure for AI/ML workloads including LLM-based applications and model serving platforms
  • Deploy and manage AI-powered developer tools such as coding assistants (Claude Code, Git Hub Copilot) and agentic AI systems. Explore and implement AI-assisted incident response and automated remediation workflows
  • Build and maintain infrastructure for Retrieval-Augmented Generation (RAG) pipelines and vector databases
  • Configure GPU-enabled node pools and optimize resource allocation for AI/ML workloads
  • Implement MCP (Model Context Protocol) servers and AI agent integrations for operational automation
  • Stay current with emerging AI technologies and evaluate their applicability for infrastructure automation
Kubernetes Expertise (Essential)
  • 3+ years hands‑on experience with Kubernetes in production environments
  • Deep understanding of Kubernetes architecture: API server, etcd, scheduler, controller manager, kubelet
  • Experience with GKE (Standard and Autopilot modes), including cluster creation, upgrades, and maintenance
  • Proficiency in troubleshooting workloads: analyzing pod logs, events, describe outputs, and container states
  • Strong understanding of resource management: requests, limits, QoS classes, and resource quotas
  • Experience with Kubernetes networking:
    Services (Cluster

    IP, Node Port, Load Balancer), Ingress, Network Policies
  • Knowledge of Kubernetes storage:
    Persistent Volumes, Persistent Volume Claims , Storage Classes, dynamic provisioning
  • Experience with Helm charts for application packaging and deployment
  • Familiarity with Kubernetes security: RBAC, Pod Security Standards, Secrets management, Workload Identity
  • Understanding of Kubernetes observability: metrics-server, kubectl top, container resource monitoring
  • Experience debugging common issues:
    Image Pull Back Off , Crash Loop Back Off , OOMKilled, Evicted pods, pending pods
Cloud & Infrastructure
  • 3+ years of experience with Google Cloud Platform (GCP) services including GKE, Cloud Run, Cloud SQL, Memory store, Pub/Sub, and Cloud Logging
  • Strong experience with Terraform for infrastructure as code (IaC)
  • Understanding of cloud networking: VPCs, subnets, firewall rules, Cloud NAT, Private Service Connect
CI/CD & Git Ops
  • Proficiency with Git Lab CI/CD pipelines
  • Experience with ArgoCD or similar Git Ops tools
  • Understanding of Helm charts and Kustomize for Kubernetes manifest management
Observability & Troubleshooting
  • Experience with monitoring and APM tools (Dynatrace, Datadog, Prometheus, Grafana)
  • Ability to analyze logs, metrics, and traces to diagnose…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary