Platform Operations Engineer; SRE Colombia/Mexico
Listed on 2026-02-12
-
IT/Tech
Systems Engineer, Cloud Computing, Network Engineer
Great software doesn’t happen on its own. It takes great people. That just happens to be our Forte. With 25 years of matching top engineering talent with preeminent and innovative brands, we seek individuals who are inquisitive, resourceful, and dedicated to their craft, driven to help companies build exceptional software. If this sounds like you, read on.
Position Summary:
We are looking for a Platform Operations Engineer to support and strengthen the operational foundation of our Backend organization while helping build the next-generation infrastructure for our platform. This role sits at the intersection of Engineering, SRE, DBA, and NOC, acting as the first line of support for platform reliability and performance, and contributing hands‑on to the construction and evolution of our backend infrastructure.
You will debug complex issues, improve performance, deepen observability, and help architect and implement a more resilient, scalable platform as we modernize our systems.
This is a high‑impact engineering role for someone comfortable working across boundaries, who understands real‑world system behavior, and who is motivated to improve both what exists today and what comes next.
Why This Role Matters:
Our platform is entering a new phase of scale and modernization. This role is critical to bridging day‑to‑day operational realities with the future state of our infrastructure. You will shape and support the systems that power the entire platform, ensuring that what we build is not only functional but also resilient, observable, and ready to meet the demands of a rapidly growing business.
What You’ll Do:
Operational Ownership
- Serve as the operational proxy for the Backend team in cross‑functional discussions with SRE, DBA, Infrastructure, and NOC teams.
- Participate in PRAC, CAB, and other operational forums, representing backend engineering needs and platform realities.
- Act as first responder for backend‑impacting issues, partnering with SRE and DBA teams to drive quick resolution.
Infrastructure Development
- Contribute directly to the design and buildout of new infrastructure and platform components.
- Partner with SRE and Infrastructure teams to define infrastructure requirements, deployment patterns, scaling strategies, and reliability expectations.
- Develop tooling, automation, and configuration patterns that support the new platform.
- Assist in migrating services, workloads, or configurations as part of platform modernization.
Platform Stability & Performance
- Debug production issues across logs, metrics, traces, configs, and data flows.
- Manage API performance using JMeter and other load/performance tools, identify bottlenecks, and propose fixes.
- Build and refine monitors, alerts, dashboards, and investigative workflows.
- Implement tools and automation to reduce manual operational effort and improve visibility.
- Extend observability across new and existing services.
- Drive efficiency in deployments, debugging, and incident response.
Cross‑Functional Collaboration
- Work closely with backend developers to understand new features and how to instrument, test, and monitor them.
- Partner with DBAs on database behaviours, schema changes, and query performance.
- Collaborate with SRE on reliability goals, capacity planning, and readiness criteria for new infrastructure components.
- Proactively identify weaknesses in system design, monitoring, deployment, or configuration, and implement improvements.
- Use time between incidents to strengthen resilience, optimize performance, and reduce future operational load.
Technology
- Java, Spring Boot / Spring Framework
- MySQL, Redis
- Kafka or RabbitMQ
- Git / Git Hub
- JMeter, Blaze Meter, Postman
- Jira for task management
What You Bring:
- Experience working with large‑scale distributed backend systems and cloud‑native environments.
- Strong AWS knowledge and exposure to infrastructure patterns such as autoscaling, distributed caching, service meshes, load balancing, and cloud deployments.
- Ability to read and interpret Java or similar backend code during debugging.
- Solid foundation in relational databases and performance best practices.
- Experience building internal tools, automation, or operational workflows.
- Ability to…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).