×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Remote / Online - Candidates ideally in
Orlando, Orange County, Florida, 32885, USA
Listing for: Optomi
Remote/Work from Home position
Listed on 2026-06-20
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Cloud Computing: Infrastructure & Operations, Systems Engineer, IT Infrastructure
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

Site Reliability Engineer (Hybrid — Orlando, FL | Burbank, CA | Seattle, WA)

Optomi, in partnership with a leading entertainment organization, is seeking a Site Reliability Engineer to join their Generative AI Engineering team. This engineer will play a critical role in shaping the reliability, scalability, and operational excellence of cloud infrastructure supporting enterprise AI and conversational experience platforms. The ideal candidate will bring deep expertise in cloud infrastructure, Kubernetes, Terraform, and Dev Ops practices while serving as a technical leader across complex multi‑cloud environments.

This is a highly visible role responsible for ensuring platform stability, driving automation initiatives, and supporting mission‑critical AI applications across GCP, AWS, and Azure. This position is open to candidates located in Orlando, FL, Burbank, CA, or Seattle, WA, and follows a hybrid schedule requiring onsite presence four days per week with remote work on Fridays.

Location:

Orlando, FL | Burbank, CA | Seattle, WA (Hybrid — Onsite Monday through Thursday, Remote Friday)

What the right candidate will enjoy:
  • Leading and mentoring a team of Site Reliability Engineers and Dev Ops specialists
  • Working with cutting‑edge technologies in multi‑cloud environments
  • Playing a pivotal role in the reliability strategy of generative AI platforms
What type of experience does the right candidate have:
  • 7+ years of Site Reliability Engineering, Dev Ops, Platform Engineering, or Infrastructure Engineering experience
  • Expert‑level experience with Kubernetes administration, operations, cluster scaling, and Helm-based configuration management
  • Advanced experience building and managing Infrastructure‑as‑Code using Terraform
  • Strong experience implementing and supporting CI/CD pipelines using Harness or similar deployment orchestration platforms
  • Hands‑on experience managing cloud infrastructure across GCP, AWS, and Azure, with strong expertise in GCP environments
  • Strong scripting and automation experience using Python, Bash, and YAML
  • Experience supporting backend infrastructure technologies including PostgreSQL, Redis, Kafka, MongoDB, and Hashi Corp Vault
  • Deep understanding of observability, monitoring, alerting, logging, and system reliability best practices
  • Strong troubleshooting, root cause analysis, and incident response experience in complex production environments
  • Experience implementing security, identity management, compliance controls, and cloud governance standards
  • Demonstrated leadership experience within Agile/Scrum environments
  • Excellent written communication and cross‑functional collaboration skills
What the responsibilities are of the right candidate:
  • Architect, design, and maintain highly available cloud infrastructure supporting Generative AI and conversational experience platforms
  • Develop and maintain Kubernetes environments, Helm charts, and Terraform modules to support scalable platform operations
  • Design and implement automated deployment processes and CI/CD pipelines utilizing Harness and Infrastructure-as-Code principles
  • Manage and support multi‑cloud infrastructure environments across GCP, AWS, and Azure
  • Ensure platform reliability and availability while maintaining aggressive uptime and service-level objectives
  • Implement observability, monitoring, alerting, logging, and tracing solutions across infrastructure and application environments
  • Support and maintain critical backend services including Kafka, PostgreSQL, Redis, MongoDB, Vault, and other platform dependencies
  • Lead incident response efforts, root cause investigations, and post-incident remediation activities
  • Establish operational processes including capacity planning, patch management, backups, disaster recovery, and infrastructure lifecycle management
  • Collaborate with architects, engineering leadership, and development teams to drive platform improvements and reliability initiatives
  • Evaluate and implement automation opportunities, including AI-driven operational tooling and proactive monitoring solutions
  • Mentor engineers and serve as a technical leader for infrastructure, reliability, and operational best practices
Preferred Qualifications:
  • Previous experience leading large-scale cloud infrastructure or platform engineering initiatives
  • Experience supporting AI, machine learning, or Generative AI platforms
  • Experience with open-source infrastructure and orchestration technologies such as Apache Airflow
  • Exposure to AI-assisted operations, predictive monitoring, or AIOps tooling
  • Experience operating within hybrid cloud environments and enterprise-scale deployment ecosystems
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary