About Shelf
There is no AI Strategy without a Data Strategy. Getting GenAI to work is mission-critical for most companies, but 90% of AI projects haven't deployed. Why? Poor data quality—it’s the #1 obstacle companies face getting GenAI into production.
Shelf unlocks AI readiness. We provide the core infrastructure that enables GenAI to be deployed help companies deliver more accurate GenAI answers by eliminating bad data in documents and files before they go into an LLM and create bad answers.
We’re partnered with Microsoft, Salesforce, Snowflake, Databricks, OpenAI and other leaders bringing GenAI to the enterprise. Our mission is to empower humanity with better answers everywhere.
About the roleThe Platform Engineering team works across the stack to give product teams paved, secure, cost-efficient paths to build, ship, and run software with minimal cognitive load. We own the "how," so product teams can focus on the "what." You will join the team responsible for running the core infrastructure that supports Shelf products. This role is primarily based in our European offices in Wroclaw, Poland and Lviv, Ukraine.
We will prioritize candidates who are already in Wroclaw or are open to relocating there, as we believe in the value of in-person collaboration to foster strong relationships and seamless communication within our team.
In certain specific situations, we will also consider remote candidates based in one of the countries listed in this job posting. In any case, we ask all new hires to visit our office for the first week of their onboarding (accommodation and travel covered) and then at least 2 days per month or a week per 2 months.
You will develop reusable components, improve system performance, and create scalable abstractions that accelerate product development across the organization.
You will maintain high standards for reliability and security in your work and in the systems used by other teams.
This is a high-ownership, hands-on engineering role. You will manage everything from Terraform/Open Tofu modules and CI/CD pipelines to SSO permissions and observability tools, with a mandate to build infrastructure that works and keeps working.
You will work with AWS, Datadog, Open Tofu, Snowflake, Git Hub, Azure, various LLMs, and many other tools and services.
In this role, you will- Write and maintain infrastructure as code in Open Tofu, making modules more reusable and robust so that more engineers can ship infrastructure safely on their own.
- Write clear runbooks and playbooks that explain how things work and what to do when they break. You present your work in a clean, structured way, prefer writing a good doc once to enable self-serve, and treat every question as a signal to either document the answer or automate it so it does not need to be asked again.
- Care deeply about the health of our infrastructure by keeping databases, LLMs, and third-party self-hosted services on current, supported versions, standardizing them across environments, and actively hunting down and removing outdated components instead of tolerating an aging tech stack.
- Participate in on call rotations and incident response, and write clear postmortems with concrete action items. You enjoy turning every incident into an opportunity to improve, define and refine SLOs and error budgets, and then follow through on the work that prevents repeats, tightens detection, speeds up response, and makes recovery cleaner.
- Treat CI/CD pipelines as a critical product. Own and improve hundreds of pipelines by making them faster, more reliable, easier to roll back, and more standardized so they reduce manual toil and mental overhead for developers.
- Become a Datadog and observability expert, tuning logging, metrics, tracing, dashboards, and alerts to squeeze out as much useful signal as possible. Build simple defaults, automation, and clear docs so developers can self serve, contribute to observability, and rely on a solid platform rather than on ad hoc help from you.
- Make thoughtful build vs buy decisions and work directly with vendors and cloud support (AWS, Azure, GCP, and others) to solve infrastructure problems, plan upgrades, and find…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).