DevOps Engineer; Mid-to-Senior Level
Listed on 2026-01-01
-
IT/Tech
Cloud Computing, Data Engineer
Location: New York
About the Role
We’re looking for a talented Dev Ops Engineer to join our remote team and help scale the sophisticated infrastructure behind Resonance ONE. As a Dev Ops Engineer at Resonance, you will play a critical role in designing, building, and maintaining a complex full-stack platform that underpins everything from digital design tools to e-commerce and manufacturing automation. Our stack spans a wide range of modern technologies – from machine learning services (OpenAI and other ML models) to a robust cloud backend (AWS infrastructure, AWS Lambda), data and analytics systems (Hasura Graph
QL engine, Snowflake data warehouse, Looker BI), event streaming (Kafka), and orchestration tools (Kubernetes with Argo Workflows, plus integrations with tools like Airtable) – all working in concert to realize our mission.
In this role, you will ensure these diverse components work together in harmony, securely and ’ll have the opportunity to shape and implement scalable Dev Ops practices and systems from the ground up in a forward-thinking, AI-driven organization. You will collaborate closely with software engineers, data scientists, and product teams to continuously improve our development pipeline, deployment processes, and infrastructure automation.
This is a unique chance to tackle challenging problems in an architecture that pushes the boundaries of technology – all while enabling fashion brands to innovate without waste.
- Architect and Maintain Cloud Infrastructure:
Build, maintain, and scale our AWS cloud infrastructure using infrastructure-as-code and modern CI/CD pipelines (e.g. Argo Workflows). Ensure reliable, automated deployments of our applications and machine learning services across development, staging, and production environments. - Container Orchestration:
Manage our Kubernetes clusters and containerized microservices, optimizing for high availability, security, and efficient resource usage. Continuously improve our cluster deployment, scaling strategies, and rollback processes to support a rapidly growing platform. - CI/CD & Automation:
Design and implement continuous integration and delivery pipelines that empower our development team to ship code and ML model updates quickly and safely. Automate routine operations and workflows, reducing manual work through scripts, AWS Lambda functions, and other automation tools. - Monitoring & Reliability:
Implement robust monitoring, logging, and alerting (using tools like Prometheus, Cloud Watch, etc.) to proactively track system performance and reliability. Quickly troubleshoot and resolve infrastructure issues or bottlenecks across the stack to maintain high uptime and responsive services. - Data & Pipeline Integration:
Work closely with our data engineering team to support a seamless flow of data through the platform. Maintain and optimize our event streaming and pipeline architecture (Kafka) and its integration with downstream systems like our Snowflake data warehouse and Looker analytics, ensuring data is delivered accurately and on time. - AI/ML Infrastructure:
Collaborate with machine learning engineers to deploy and scale AI/ML models in production. Support the integration of OpenAI and other ML models into our applications, implementing the infrastructure (compute, storage, containers) needed for model training, inference, and monitoring model performance in a live environment. - Tool Integration & Support:
Integrate and manage internal and third-party tools that extend our platform’s functionality – for example, maintaining our Hasura Graph
QL engine that interfaces with databases, or automating workflows involving external services like Airtable. Ensure these tools are properly deployed, updated, and aligned with our security and compliance standards. - Dev Ops Best Practices & Culture:
Champion Dev Ops best practices across the engineering organization. This includes improving our release processes (e.g. implementing Git Ops workflows), optimizing build/test pipelines, and mentoring developers on using infrastructure tools. You will continually evaluate new technologies and processes to enhance deployment speed, reliability, and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).