×
Register Here to Apply for Jobs or Post Jobs. X

Lead Data Architect

Job in Elizabeth, Union County, New Jersey, 07215, USA
Listing for: Karsun Solutions, LLC
Full Time position
Listed on 2026-06-18
Job specializations:
  • IT/Tech
    Data Engineering, AI Engineer (Applied/Software), Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

Summary

The Lead Data Architect will design, build, and operate enterprise data platforms that power GenAI and AI/ML use cases. This highly technical, hands‑on role is responsible for data platform architecture, end‑to‑end data engineering, ML/LLM pipeline design, production model onboarding, and delivery of scalable Databricks‑centric solutions across cloud environments.

What You’ll Be Doing
  • Architect and implement enterprise data platforms (batch + streaming) optimized for ML, LLMs, and GenAI workloads.
  • Lead design and hands‑on implementation of Databricks work spaces, Unity Catalog, Delta Lake design patterns, cluster policies, and performance tuning.
  • Build and own end‑to‑end data pipelines (ingest, transform, feature engineering, serving) using PySpark, Databricks Jobs, Spark SQL, Delta Lake, and orchestration tools.
  • Design and operationalize model training, fine‑tuning (LLM), evaluation, deployment, and monitoring pipelines (MLOps/RAG/CAG) integrating Databricks MLflow, CI/CD, and infra‑as‑code.
  • Implement vectorless and vectorization/embedding pipelines, vector store integrations, and retrieval layers for RAG (FAISS, Pinecone, Weaviate, Milvus).
  • Define data schemas, governance, lineage, access controls, and data product APIs; implement Unity Catalog or equivalent for centralized governance.
  • Drive cost/performance optimization for storage, compute (spot/preemptible), and query patterns.
  • Collaborate with engineers, data scientists, product owners, and security to translate business needs into production GenAI solutions.
  • Mentor and lead engineering teams; conduct architecture reviews, code reviews, and run technical deep dives.
  • Implement observability for data and ML pipelines (metrics, logging, data quality tests, alerting).
  • Create reproducible experiment tracking, model registry, and rollout strategies (canary, shadow testing, rollback).
  • Stay current on GenAI/LLM architectures and evaluate/introduce new tooling and frameworks.
Required Qualifications
  • BA or BS degree in CS, Computer Engineering, Information Technology or a related field.
  • 8+ years hands‑on experience in data engineering/platform architecture; 3+ years in an architect or lead role.
  • Candidate must hold an active AWS Certified Machine Learning – Specialty certification.
  • Proven, hands‑on Databricks experience (designing work spaces, Delta Lake, performance tuning, product ionizing Spark jobs).
  • Deep Spark + PySpark expertise and experience with Databricks Runtime.
  • Strong experience building ML/LLM pipelines and operationalizing models (training, fine‑tuning, serving).
  • Practical experience with vector embeddings, semantic search, and RAG architectures.
  • Solid Python expertise and common ML libraries (PyTorch, Tensor Flow, Hugging Face transformers) and MLflow.
  • Cloud platform experience (AWS strongly preferred).
  • Experience with containerization and orchestration while leveraging open source libraries for unstructured and structured data processing, serving/inference.
  • Strong SQL skills; experience with distributed query/warehouse systems and parquet/AVRO/Delta formats.
  • CI/CD and infra‑as‑code experience (Terraform, Git Ops, Jenkins/Git Hub Actions/Git Lab CI).
  • Data governance, security, and IAM experience; experience implementing row/column level access controls and data lineage.
  • Demonstrated ability to design for scalability, reliability, and cost efficiency.
Preferred Qualifications
  • Prior experience with Databricks Unity Catalog, Photon, and Databricks SQL.
  • Experience integrating Databricks with vector databases (Pinecone, Neo4j) and retrieval frameworks (Lang Chain, Llama Index).
  • Familiarity with AWS Bedrock or other managed LLM services.
  • Experience with realtime streaming (Kafka, Kinesis) and stream processing on Databricks Structured Streaming.
  • Certifications:

    Databricks Certified Professional.
  • Experience with data quality and profiling tools (Great Expectations, Soda).
  • Experience with large‑scale ETL frameworks and tools (Airflow, Prefect).
Commitment to Non‑Discrimination

All qualified applicants will receive consideration for employment without regard to disability, status as a protected veteran or any other status protected by applicable federal, state, local,…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary