Software Engineer - Platform Foundations
Listed on 2026-01-20
-
Software Development
Cloud Engineer - Software, DevOps, Software Engineer, Backend Developer
Harness is led by technologist and entrepreneur Jyoti Bansal, founder of App Dynamics (acquired by Cisco for $3.7B). The company has raised ~$240M in Series E venture funding, is valued at $5.5B, and backed by top investors including Goldman Sachs, Menlo Ventures, IVP, Google Ventures, J.P. Morgan, Capital One Ventures, Citi Ventures, Service Now, Splunk Ventures and more. Harness is building the industry’s leading AI-powered software delivery platform, enabling teams worldwide to build, test, and deliver software faster, safer, and more reliably.
Writing code is only 30–40% of the engineering lifecycle — the rest involves testing, deployments, security, compliance, and optimization. Harness brings AI and automation to this outer loop, turning complex, time‑consuming workflows into streamlined processes at massive global scale.
The platform includes industry leading products in CI/CD, Feature Flags, Cloud Cost Management, Service Reliability, Chaos Engineering, Software Engineering Insights, Internal Developer Experience, and API discovery, observability, governance, and runtime protection. Over the past year, Harness powered 128M deployments, 81M builds, 1.2T API calls protected, and $1.9B in cloud spend optimized, helping customers like United Airlines and Choice Hotels accelerate releases by up to 75% and achieve 10x Dev Ops efficiency.
With employees in over 25 countries, Harness is shaping the future of AI‑driven software delivery — and we’re looking for exceptional talent to help us move even faster.
We are looking for a Staff Software Engineer with deep expertise in distributed systems, microservices architecture, and cloud‑native backend engineering. This role requires strong operational excellence, ensuring system reliability, scalability, and observability while driving best practices in incident management, performance tuning, and automation.
Key ResponsibilitiesSystem Architecture & Scalability
- Design & Build:
Architect and develop scalable, fault‑tolerant backend systems that handle millions of requests per second. - Microservices Development:
Implement microservices using Go, Java, or Python, ensuring high availability and resilience. - Cloud & Kubernetes:
Deploy and manage applications on AWS, GCP, or Azure with Kubernetes (EKS, GKE, AKS). - Event‑Driven Architectures:
Work with Kafka, Pulsar, Rabbit
MQ for distributed messaging and streaming workloads.
- Reliability & Resilience:
Implement best practices for graceful degradation, retries, circuit breakers, and auto‑scaling. - Incident Response & On‑Call Management:
Define SLAs/SLIs/SLOs, set up robust alerting & escalation processes for incident handling. - Postmortems & RCA (Root Cause Analysis):
Lead post‑incident analysis, drive corrective actions, and improve system reliability. - Observability & Monitoring:
Define and implement logging, monitoring, and distributed tracing using Prometheus, Open Telemetry, Grafana, Datadog.
- Performance Tuning:
Diagnose and optimize latency, throughput, and memory utilization for large‑scale distributed systems. - Multithreading & Concurrency:
Design and implement highly concurrent, multithreaded backend services for parallel processing. - Database & Storage Optimization:
Improve performance of SQL (Postgre
SQL, MySQL) and No
SQL (Cassandra, Dynamo
DB, Redis, Mongo
DB) solutions. - Security & Compliance:
Implement API security, authentication, authorization, and ensure compliance with SOC2, ISO 27001, PCI DSS.
- Mentorship & Code Reviews:
Guide engineers in best practices for platform engineering, microservices, and distributed systems. - Cross‑Team
Collaboration:
Work with cloud engineering, security, and product engineering teams to align platform capabilities with business needs.
- 10‑14 years of experience in backend platform engineering, distributed systems, and microservices.
- Strong programming expertise in Go, Java, or Python, with a focus on multithreading and concurrency.
- Expertise in Kubernetes, service meshes (Istio, Linkerd), and cloud infrastructure.
- Deep understanding of gRPC, REST APIs, Graph
QL,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).