Cloud Native Engineer
Listed on 2026-02-16
-
Software Development
Cloud Engineer - Software, AI Engineer
Zettabyte delivers high-performance AI computing infrastructure to enterprises and entrepreneurs globally. We specialize in offering NVIDIA GPUs—such as H100, A100, and RTX series—through a proprietary software platform called Zsuite, which enables intelligent scheduling, resource optimization, and efficient management of distributed AI workloads. Drawing expertise from hyperscale cloud computing and renowned academic institutions, we address gaps in AI orchestration and reliability, helping customers efficiently train and deploy AI models.
The Zsuite platform provides robust orchestration and management capabilities for AI workloads, aiming to democratize access to advanced computing power. Zettabyte collaborates with leading technology companies and is a UN Global Marketplace member, supporting innovation with ethical standards and a global perspective. Our infrastructure supports on-demand cloud GPU instances, performance management, and tailored AI data center deployments, contributing toward advancing the AI ecosystem, notably in Asia.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
We`re looking for a Cloud Native Engineer to build and optimize the microservices architecture powering our AI cloud platform. You`ll design resilient, scalable systems using cutting‑edge cloud native technologies, ensuring our platform can handle massive AI workloads with reliability and performance.
As part of our cloud native engineering team, you`ll work on Kubernetes‑based infrastructure, Go microservices, and container orchestration systems that serve as the backbone of our AI computing platform. You`ll architect the systems that make AI compute seamless for thousands of developers and enterprises.
This is a unique opportunity for someone who`s excited to work with the latest cloud native technologies, solve complex distributed systems challenges, and build infrastructure at scale in the rapidly growing AI space.
What You'll Do- Design and develop microservices using Go that power our AI cloud platform`s core functionality
- Build and maintain Kubernetes‑based infrastructure for container orchestration and workload management
- Implement and optimize cloud native solutions for scalability, reliability, and performance
- Contribute to code reviews, technical documentation, and knowledge sharing within the engineering team
- Explore and integrate emerging cloud native technologies like Volcano, Prometheus, and service mesh solutions
- Design distributed systems architecture for high‑availability AI workload processing
- Collaborate with Dev Ops and SRE teams to ensure production reliability and monitoring
- Leverage AI‑assisted coding tools (Git Hub Copilot, ChatGPT, Cursor IDE, etc.) to boost productivity and code quality
- 7+ years of software engineering experience with focus on distributed systems and cloud native technologies
- Strong proficiency in Go with deep understanding of concurrency patterns and standard libraries
- Familiarity with Python or other backend languages for polyglot development environments
- Hands‑on Kubernetes experience including development, deployment, and maintenance of production clusters
- Solid understanding of microservices architecture design patterns and implementation best practices
- Experience with containerization technologies (Docker, containerd) and container runtime optimization
- Problem‑solving mindset with ability to independently design and implement complex system components
- Strong collaboration and communication skills for working in cross‑functional teams
- Experience using AI‑assisted coding tools and willingness to integrate them into development workflow
- Familiarity with cloud native ecosystem tools such as Prometheus, Grafana, Volcano, or service mesh technologies
- Open source contributions to cloud native projects (Kubernetes, CNCF ecosystem)
- Experience with large‑scale Kubernetes cluster operations and troubleshooting in production environments
- Knowledge of microservices architecture patterns including circuit breakers, service discovery, and distributed tracing
We provide Competitive salary and equity based on your experience and skillset.
This is a Hybrid role - 3 days in office, 2 days WFH;
Must locate in Palo Alto
Applicants must be authorized to work in the United States without need for visa sponsorship.
Must locate in Palo Alto and be available for 3 days per week in office per company policy.
Skills- Programming Languages & AI‑Assisted Coding
- Containerization & Orchestration
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).