×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer​/Platform Engineer

Job in Mountain View, Santa Clara County, California, 94039, USA
Listing for: Agile Fuel | World-class Dedicated Engineering Teams
Full Time position
Listed on 2025-12-06
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 150000 - 200000 USD Yearly USD 150000.00 200000.00 YEAR
Job Description & How to Apply Below
Position: Site Reliability Engineer/ Platform Engineer

Site Reliability Engineer / Platform Engineer

Join to apply for the Site Reliability Engineer / Platform Engineer role at Agile Fuel | World-class Dedicated Engineering Teams

Our client is a fast-growing AI-driven technology company focused on building intelligent, automated solutions that transform how modern engineering teams work. They are committed to creating a development culture where speed, reliability, and data-driven decision-making are at the core. Their product leverages advanced analytics and AI to help organizations improve productivity, enhance visibility, and deliver software more efficiently.

They are seeking a hybrid Site Reliability Engineer / Platform Engineer with strong Dev Ops expertise and solid Python engineering skills. This person will design, build, and operate the next generation of their cloud infrastructure and internal developer platforms. The ideal candidate is passionate about automation, observability, reliability, and scalable system design. You will drive improvements across cloud architecture, CI/CD workflows, development tooling, and operational excellence — enabling the engineering organization to ship faster and more reliably.

If you thrive in a fast-moving, AI-native environment and enjoy building intelligent, highly automated platforms, this role is an excellent fit.

Responsibilities
  • Design, build, and maintain highly reliable, scalable Azure infrastructure using Container Apps, ACR, managed databases, serverless components, and other PaaS services;
  • Own and enhance CI/CD pipelines, deployment workflows, platform automation, and the full observability stack;
  • Develop Python-based tooling and infrastructure to support a scalable, reliable AI-driven platform;
  • Architect and maintain secure, fault-tolerant integrations with external systems (Git Hub, Jira, Azure, Redis, Sentry, etc.);
  • Build and operate monitoring, logging, alerting, and SLO/SLA frameworks to ensure reliability and performance;
  • Partner with backend and data engineering teams to design a scalable infrastructure foundation for high-growth AI products;
  • Continuously optimize cost efficiency, reliability, and deployment velocity;
  • Scale AI infrastructure and support the transition to an AI-native engineering organization;
  • Drive an AI-native culture by leveraging LLM-powered workflows and automation for speed and efficiency.
Requirements
  • 5+ years in Dev Ops, SRE, Platform Engineering, or similar roles;
  • Expert-level understanding of cloud infrastructure, ideally Azure, including container services, serverless patterns, networking, and identity;
  • Strong Python software engineering ability — building platform tools, automation frameworks, or backend services;
  • Hands‑on experience with containerization, Docker, and cloud‑native operational patterns;
  • Strong understanding of external system integrations, how to design around them, and how to build reliable abstractions when they fail;
  • Experience designing and operating production‑grade pipelines, monitoring, alerting, and observability tools;
  • Practical understanding of resilience engineering: retries, backoff, idempotency, state management, and failure modes;
  • A bias toward automation: if something can be automated, you automate it;
  • A startup mindset: ownership, speed, pragmatic decision‑making, and willingness to wear multiple hats;
  • Interest and excitement about AI-native development workflows using tools like ChatGPT, Git Hub Copilot, and automated pipeline orchestration;
  • Upper-Intermediate English level.
Bonus points for
  • Experience with Bicep, Terraform or other IaC tools;
  • Background supporting Python/Django or data pipelines;
  • Familiarity with Celery, distributed queues, or event‑driven systems;
  • Experience working in SOC2‑compliant or enterprise‑grade environments;
  • Experience building internal developer platforms (IDPs) or self‑service infrastructure.
Benefits
  • People‑oriented management without bureaucracy;
  • Flexible schedule (≈ 3 hours overlap with ET);
  • 15 working days of annual paid vacation;
  • Paid sick‑leaves;
  • Friendly and engaging professional team;
  • Opportunities for self‑realization, career, and professional growth.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary