DevOps Engineer
Listed on 2026-06-03
-
IT/Tech
SRE/Site Reliability, Cloud Computing
We’re a growth‑stage technology company helping brands optimize content performance and apply AI to modern marketing. Our culture is fast‑paced, entrepreneurial, and highly adaptable—we move quickly, test ideas, and evolve with our customers.
Through the Knotch platform, brands can measure the impact of their content, identify what drives results, and continuously improve performance across channels. By combining performance data, strategic insights, and AI‑driven capabilities, we help marketing teams make smarter decisions and get more impact from every piece of content they produce.
About the RoleAs we evolve into an AI‑native platform powered by agentic systems and large‑scale data pipelines, the reliability, scalability, and observability of our infrastructure becomes mission‑critical. We’re not just deploying services — we’re operating complex, production‑grade AI systems that enterprise clients depend on every day.
As a Dev Ops Engineer
, you’ll take part in building and scaling the foundation that powers everything we ship — from our core platform to our AI agents. You’ll work across infrastructure, CI/CD, observability, and security to ensure our systems are fast, resilient, and cost‑efficient.
This role goes beyond “keeping the lights on”. You’ll help define how Knotch operates as an AI‑first company, shaping infrastructure strategy, enabling developer velocity, and ensuring our systems scale alongside rapid product innovation. If you want to own infrastructure in a high‑impact environment, work closely with engineering teams across the stack, and directly influence how production systems are built and operated, this is that role.
If you’ve made it this far, please kindly input this code with your application: DEVOPS-ENG-2026.
- Design, build, and maintain scalable, secure, and highly available infrastructure across pre‑production and production environments.
- Develop and manage CI/CD pipelines to enable fast, reliable, and repeatable deployments across multiple environments.
- Own infrastructure as code (IaC) practices using tools like Terraform to ensure consistency and reproducibility.
- Manage environment lifecycle (development, staging, production), including promotion workflows and configuration management.
- Partner closely with Engineering, Data, and AI teams to support system performance, reliability, and scalability.
- Implement and maintain monitoring, logging, and alerting systems to ensure high visibility into system health and performance.
- Optimize infrastructure for cost, performance, and reliability, especially for compute‑ and data‑intensive AI workloads.
- Support Kubernetes‑based deployments and container orchestration for distributed systems.
- Contribute to security best practices across infrastructure, including IAM, networking, and application‑level protections.
- Create dashboards and reporting systems to provide visibility into system performance, uptime, and operational metrics.
- Document architecture, operational processes, and infrastructure decisions to support knowledge sharing and onboarding.
- Act as a Dev Ops/SRE partner across teams, helping troubleshoot issues and improve system reliability.
You have a minimum 5+ years of experience in Dev Ops, Site Reliability Engineering, or Infrastructure Engineering roles within SaaS, PaaS, or cloud‑native environments.
Must Haves- Prior experience in growth‑stage and/or startup environement scaling from $10M to $20M+ ARR with a lean team.
- Strong experience with Google Cloud Provider (GCP), including IAM, networking, and data services.
- Hands‑on experience with Infrastructure as Code tools such as Terraform.
- Experience building and maintaining CI/CD pipelines (Git Hub Actions, ArgoCD, or similar).
- Solid experience with Kubernetes, Docker, and containerized environments.
- Familiarity with deployment tools such as Helm.
- Experience with monitoring and observability tools like Prometheus and Grafana.
- Strong understanding of system reliability, scalability, and performance optimization.
- Ability to work across multiple systems and priorities in a dynamic environment.
- Strong documentation and communication skills, with attention to…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: