×
Register Here to Apply for Jobs or Post Jobs. X

Data Engineer

Job in Toronto, Ontario, C6A, Canada
Listing for: Drive Capital
Full Time position
Listed on 2026-06-04
Job specializations:
  • IT/Tech
    Data Engineer, Machine Learning/ ML Engineer, AI Engineer, Big Data
Salary/Wage Range or Industry Benchmark: 60000 - 80000 CAD Yearly CAD 60000.00 80000.00 YEAR
Job Description & How to Apply Below

About Us

The rules of discovery have changed. The search bar is becoming a conversation, and brands need a new playbook to win. That's where Yolando comes in. We are the command center for the AI era, helping marketers move from simple visibility to true velocity. Backed by $12M in funding (including Drive Capital and MaRS Discovery District), our tight-knit team of 15 is building the engine that defines how brands get found, cited, and recommended by AI.

We aren't just building a roadmap; we're building the standard for Generative Engine Optimization.

Role Overview

We are seeking a skilled Data Engineer to build the backbone of our AI platforms, Yolando and Birdseye Post. You will design and maintain sophisticated ETL pipelines using Databricks and Spark, ensuring the reliable flow of data that powers our insights and ML models. You will implement Bronze-Silver-Gold medallion architectures and build event-driven flows to process streaming data for real-time analytics.

In this role, you will prepare datasets for LLM fine-tuning and drive the integration of third‑party sources to enable data‑driven decision‑making at scale.

Key Responsibilities
  • Build and Optimize Data Pipelines: Design, build, and maintain ETL pipelines using Databricks and Spark for processing customer data, campaign analytics, and AI model inputs. Implement Bronze-Silver-Gold medallion architectures for reliable data transformation.

  • Enable Real‑Time Data Processing: Build event‑driven data flows using GCP Pub/Sub and Protocol Buffers. Process streaming data for real‑time analytics, attribution tracking, and AI system inputs.

  • Power AI and ML Systems: Prepare and manage datasets for LLM fine‑tuning, embedding generation, and recommendation systems. Build pipelines that feed vector databases (pgvector) with processed embeddings for semantic search.

  • Integrate Third‑Party Data Sources: Build reliable ingestion pipelines for platforms like Klaviyo, Shopify, and marketing APIs. Handle incremental loads, schema evolution, and data quality validation.

  • Drive Analytics and Attribution: Implement attribution models, customer lifetime value (CLV) calculations, and campaign performance analytics. Build data models that power dashboards and enable data‑driven decision making.

  • Ensure Data Quality and Reliability: Implement data validation, monitoring, and alerting for pipeline health. Build idempotent, retry‑safe pipelines that handle failures gracefully.

What We're Looking For
  • 4+ years data engineering experience.

  • Strong proficiency in Python and SQL for data transformation.

  • Production experience with Spark (PySpark) and distributed data processing.

  • Experience with
    cloud data platforms (Databricks, Big Query, Snowflake, or similar).

  • Solid understanding of data modeling patterns (dimensional modeling, medallion architecture).

  • Experience with
    event streaming systems (Pub/Sub, Kafka, or similar).

  • Familiarity with GCP or other major cloud platforms.

  • Track record of building reliable, scalable pipelines in production.

Bonus if you have:
  • Experience with
    Databricks Asset Bundles or similar deployment frameworks.

  • Background in ML data pipelines : feature engineering, embedding generation, model serving data.

  • Familiarity with Protocol Buffers or other schema evolution tools.

  • Experience with
    vector databases and embedding workflows.

  • Background in marketing data : attribution, customer analytics, campaign tracking.

  • Experience with e‑commerce data sources (Shopify, Klaviyo, marketing platforms).

Our Stack
  • Data Processing: Databricks, Apache Spark, PySpark, dbt

  • Event Streaming: GCP Pub/Sub, Protocol Buffers

  • Storage: Big Query, Alloy

    DB (Postgre

    SQL), Cloud Storage

  • ML/AI Data: pgvector, embedding pipelines, LLM training data

  • Infrastructure: GCP, Terraform, Kubernetes, Git Hub Actions

  • Languages: Python 3.11, SQL

Why Join Us?
  • Join an innovative, fast‑growing startup building cutting‑edge AI marketing solutions.

  • Make a meaningful impact by shaping the platform's user experience, design identity, and overall success.

  • Dynamic environment with opportunities for real ownership, learning, and growth.

  • Competitive salary and support for professional development.

This is a hybrid role, with 4 days per week in our downtown Toronto office.

#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary