×
Register Here to Apply for Jobs or Post Jobs. X

Sr DevOps Engineer

Job in Santa Clara, Santa Clara County, California, 95053, USA
Listing for: TechDigital Group
Full Time position
Listed on 2026-06-18
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Cloud Computing: Infrastructure & Operations, AWS, IT Infrastructure
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

Job Summary

We are seeking a highly capable Senior Dev Ops Engineer / Platform Engineer to build, operationalize, and scale the infrastructure and deployment foundation for a strategic site-builder / network automation platform
. This role will focus on creating reliable CI/CD pipelines, production‑grade Kubernetes deployment patterns, managed database services, observability, environment reproducibility, secrets management, and Infrastructure as Code across development, testing, staging, and production environments. This engineer will play a critical role in moving the platform from an early‑stage, partially manual operating model into a repeatable, supportable, and production‑ready Dev Ops model. The environment includes Kubernetes‑hosted services, AWS managed services, workflow orchestration with Temporal, integration with Nautobot, Argo‑based promotion flows, and the supporting tooling required for debugging, snapshotting, local development, and production support.

This is a hands‑on engineering role for someone who can design the right platform patterns, implement them directly, and establish a durable operating model between development and Dev Ops teams.

Key Responsibilities Platform Deployment & CI/CD
  • Design, implement, and maintain CI/CD pipelines for testing, staging, and production environments.
  • Build and maintain deployment workflows that support safe and seamless promotion across environments.
  • Improve and maintain Argo‑based deployment workflows to enable controlled release progression from test to staging to production.
  • Establish baseline deployment mechanisms for the site‑builder application and related services.
  • Standardize Kubernetes application packaging and deployment patterns, with a strong preference toward Helm‑based lifecycle management for complex services and third‑party components.
  • Migrate existing deployments to Helm charts where appropriate.
Kubernetes & Runtime Platform Engineering
  • Support the deployment and ongoing operation of services running in Kubernetes.
  • Improve runtime reliability, resiliency, and troubleshooting for distributed services operating inside shared Kubernetes clusters.
  • Investigate and harden service‑to‑service connectivity patterns, especially for workflow components such as workers connecting to the Temporal engine.
  • Partner with development teams to define production‑grade runtime requirements, resource sizing, restart policies, and platform support boundaries.
Infrastructure as Code & Cloud Services
  • Design and implement fully declarative Infrastructure as Code for managed cloud services, especially in AWS.
  • Provision and maintain managed data services such as RDS/PostgreSQL and MongoDB‑compatible document databases across all environments.
  • Eliminate manual infrastructure setup where possible and replace it with reproducible, version‑controlled deployment patterns.
  • Prepare the platform for future scale across multiple environments and regions through repeatable IaC and Git Ops‑aligned practices.
Data Services, Snapshots & Developer Enablement
  • Setup and maintain RDS, MongoDB, Redis/cache services
    , and related dependencies for all environments.
  • Build tooling and operational processes for:
    • production and staging database snapshots,
    • restoring snapshots into development environments,
    • enabling local debugging and development from realistic data states.
  • Support creation of local and development environments, including Minikube‑based environment‑as‑code approaches that mirror production behavior as closely as practical.
  • Improve platform reproducibility so engineers can quickly stand up close‑to‑production development environments.
Workflow Orchestration & Temporal Support
  • Lead the setup, deployment, and operational support of Temporal for workflow orchestration.
  • Support production operations for Temporal, including troubleshooting performance issues, restarts, scaling concerns, and resource shortages.
  • Establish maintainable deployment patterns for Temporal using supported packaging and lifecycle management approaches.
  • Partner with engineering teams to ensure workflow platform reliability and upgradeability over time.
Observability, Reliability & Incident Readiness
  • Design and…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary