Data Engineer - Warsaw Warsaw
Listed on 2026-01-12
-
IT/Tech
Data Engineer
About the Role
We are looking for a Data Engineer to design and maintain scalable data solutions that power advanced analytics and AI-driven insights. This role combines expertise in big data engineering and web scraping
, enabling you to work on high-impact projects involving large, complex, and unstructured datasets.
You will architect data pipelines, enforce governance standards, and build tools for extracting and processing data from diverse sources, including websites and external vendors. If you thrive in solving complex data challenges and want to work with cutting-edge technologies, this is the role for you.
What You’ll Do- Architect, develop, and maintain high-throughput data pipelines in Databricks and AWS (Glue, EMR, Fargate, Step Functions).
- Ingest, normalize, and enrich large volumes of structured and unstructured data, including internal data, market data, vendor feeds, and alternative sources.
- Collaborate with AI Engineers
, ML scientists
, and software teams to translate requirements into scalable data architectures, schemas, and APIs. - Optimize pipeline performance and cost using distributed processing techniques (
Spark, Delta, Arrow
) and AWS best practices (spot fleets, autoscaling). - Enforce data governance, privacy, and lineage standards
, cataloguing assets in Unity Catalog and managing PII/PCI classification. - Build automated validation, testing, and monitoring frameworks to ensure data quality and freshness for both offline and online workloads.
- Support onboarding and integration of new external data vendors, ensuring compliance and rapid time-to-value.
- Continuously evaluate emerging GenAI tooling (vector stores, LLMOps platforms, synthetic-data generators) and drive proof-of-concepts.
- Web Scraping Focus:
- Own the creation of tools and workflows for web crawling and scraping using compliance-approved technologies.
- Test and validate scraped data for accuracy, quality, and compliance.
- Identify and resolve issues with scrapes and scale processes as needed.
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
- 5+ years of experience in data engineering or a related role.
- Strong experience in Python and SQL
. - Experience with Spark or Scala and distributed data processing.
- Proficiency in building scalable, distributed data pipelines in a cloud environment
. - Familiarity with Linux/UNIX
, HTTP, HTML, JavaScript, and networking concepts. - Knowledge of web scraping tools and libraries (e.g.,
Requests, Beautiful Soup, Scrapy, Pandas, Selenium, Spark
). - Working knowledge of version control systems and open-source practices.
- Solid understanding of data architecture principles
, data modeling, and data warehousing. - Excellent analytical and problem-solving skills.
- Strong communication skills in English (written and spoken).
- Commitment to the highest ethical standards.
- Experience extracting text from PDFs, images, and applications.
- Familiarity with system monitoring/administration tools.
- Knowledge of graph databases.
- Prior experience analysing big data sets.
Vantage Point Global is fully committed to being an Equal Opportunities, inclusive employer. We are passionate about attracting diverse talent, and welcome applications regardless of ethnicity, culture, age, gender, nationality, religion, disability, or sexual orientation.
Things you need to know:
• To apply, you’ll need to provide us with a CV and answer a few initial questions.
• We’d like to make you aware that if you have not heard back from us within three weeks of the date of application that we will not be progressing your application.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).