×
Register Here to Apply for Jobs or Post Jobs. X

Senior DevOps Engineer

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Gridware Technologies Inc.
Full Time position
Listed on 2026-06-04
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 190000 - 215000 USD Yearly USD 190000.00 215000.00 YEAR
Job Description & How to Apply Below

About Gridware

Gridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid. We pioneered a groundbreaking new class of grid management called active grid response (AGR), focused on monitoring the electrical, physical, and environmental aspects of the grid that affect reliability and safety. Gridware’s advanced Active Grid Response platform uses high‑precision sensors to detect potential issues early, enabling proactive maintenance and fault mitigation.

This comprehensive approach helps improve safety, reduce outages, and ensure the grid operates efficiently. The company is backed by climate‑tech and Silicon Valley investors. For more information, www.

Gridware.io.

Role Description

We’re scaling the deployment of critical infrastructure monitoring devices to detect real‑world fault events that lead to wildfires. The platform you’ll build and operate ingests millions of events per day from devices in the field, powers customer‑facing dashboards and alerting, and supports the data science work that turns raw signals into grid intelligence.

You will own AWS infrastructure, Kubernetes (EKS), CI/CD, and observability end‑to‑end, partnering with our Cloud Security team to keep the platform safe and compliant, and with backend, firmware, and data teams to keep them shipping fast. As an early member of the Dev Ops team, you’ll have a direct hand in shaping how Gridware builds, deploys, and runs production systems for years to come.

Responsibilities
  • Design, build, and maintain scalable, secure, and highly available infrastructure on AWS (EKS, EC2, RDS / Aurora Postgres, MSK, S3, VPC, IAM).
  • Manage and optimize Kubernetes clusters (EKS) across multiple environments, and deploy applications using Argo CD with Git Ops best practices.
  • Implement and maintain CI/CD pipelines using Git Hub Actions, including reusable workflows, build/push/scan flows for ECR, and frontend deployment pipelines.
  • Operate and tune Kafka‑based event streaming on Amazon MSK for high‑throughput, low‑latency device data pipelines.
  • Define and manage Infrastructure as Code with Terraform and Terragrunt, with reusable modules, sensible environment separation, and review‑friendly plans.
  • Manage identity and access across platforms with Auth0 / Entra

    ID integrations, IAM roles for service accounts (IRSA), and short‑lived credentials.
  • Build and maintain observability with Grafana, Loki, Prometheus / Mimir, and related tooling so on‑call engineers can quickly find and fix issues.
  • Monitor and optimize infrastructure cost across environments, partnering with engineering teams on right‑sizing, capacity planning, and waste reduction.
  • Partner with our Cloud Security team to enforce security standards, integrate with SIEM tooling, and respond to vulnerabilities and incidents.
  • Debug complex production issues across infrastructure, deployment, and networking layers, and turn the lessons learned into automation and runbooks.
Required Skills
  • 5+ years in Dev Ops, SRE, or Platform Engineering with production experience operating AWS infrastructure.
  • Deep hands‑on experience administering Kubernetes (EKS or equivalent) and deploying via Git Ops (Argo CD or Flux).
  • Proficiency with Infrastructure as Code using Terraform; comfort with Terragrunt or a similar wrapper.
  • Hands‑on experience designing and maintaining CI/CD pipelines, preferably with Git Hub Actions and reusable workflows.
  • Production experience operating distributed systems such as Kafka (MSK).
  • Strong understanding of networking, DNS, TLS, and security best practices, including IdP‑driven access control (Auth0, Entra

    ID, or similar).
  • Solid experience with monitoring and logging stacks such as Grafana, Loki, Prometheus, Mimir, or equivalents.
  • Ability to debug complex production issues across infrastructure, deployment, and networking layers.
  • Comfortable working in Linux environments with strong scripting skills (Python or Bash preferred for automation).
  • Knowledge of version control workflows, automated testing, and release management.
Bonus Skills
  • Experience operating Apollo Router / Graph

    QL federation gateways in production.
  • Experience operating Argo Workflows or similar…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary