Lead Data Architect
Job in
Oklahoma City, Oklahoma County, Oklahoma, 73116, USA
Listed on 2026-06-18
Listing for:
Karsun Solutions, LLC
Full Time
position Listed on 2026-06-18
Job specializations:
-
IT/Tech
Data Engineering, AI Engineer (Applied/Software), Machine Learning/ ML Engineer
Job Description & How to Apply Below
Summary
The Lead Data Architect will design, build, and operate enterprise data platforms that power GenAI and AI/ML use cases. This highly technical, hands‑on role is responsible for data platform architecture, end‑to‑end data engineering, ML/LLM pipeline design, production model onboarding, and delivery of scalable Databricks‑centric solutions across cloud environments.
What You’ll Be Doing- Architect and implement enterprise data platforms (batch + streaming) optimized for ML, LLMs, and GenAI workloads.
- Lead design and hands‑on implementation of Databricks work spaces, Unity Catalog, Delta Lake design patterns, cluster policies, and performance tuning.
- Build and own end‑to‑end data pipelines (ingest, transform, feature engineering, serving) using PySpark, Databricks Jobs, Spark SQL, Delta Lake, and orchestration tools.
- Design and operationalize model training, fine‑tuning (LLM), evaluation, deployment, and monitoring pipelines (MLOps/RAG/CAG) integrating Databricks MLflow, CI/CD, and infra‑as‑code.
- Implement vectorless and vectorization/embedding pipelines, vector store integrations, and retrieval layers for RAG (FAISS, Pinecone, Weaviate, Milvus).
- Define data schemas, governance, lineage, access controls, and data product APIs; implement Unity Catalog or equivalent for centralized governance.
- Drive cost/performance optimization for storage, compute (spot/preemptible), and query patterns.
- Collaborate with engineers, data scientists, product owners, and security to translate business needs into production GenAI solutions.
- Mentor and lead engineering teams; conduct architecture reviews, code reviews, and run technical deep dives.
- Implement observability for data and ML pipelines (metrics, logging, data quality tests, alerting).
- Create reproducible experiment tracking, model registry, and rollout strategies (canary, shadow testing, rollback).
- Stay current on GenAI/LLM architectures and evaluate/introduce new tooling and frameworks.
- BA or BS degree in CS, Computer Engineering, Information Technology or a related field.
- 8+ years hands‑on experience in data engineering/platform architecture; 3+ years in an architect or lead role.
- Candidate must hold an active AWS Certified Machine Learning – Specialty certification.
- Proven, hands‑on Databricks experience (designing work spaces, Delta Lake, performance tuning, product ionizing Spark jobs).
- Deep Spark + PySpark expertise and experience with Databricks Runtime.
- Strong experience building ML/LLM pipelines and operationalizing models (training, fine‑tuning, serving).
- Practical experience with vector embeddings, semantic search, and RAG architectures.
- Solid Python expertise and common ML libraries (PyTorch, Tensor Flow, Hugging Face transformers) and MLflow.
- Cloud platform experience (AWS strongly preferred).
- Experience with containerization and orchestration while leveraging open source libraries for unstructured and structured data processing, serving/inference.
- Strong SQL skills; experience with distributed query/warehouse systems and parquet/AVRO/Delta formats.
- CI/CD and infra‑as‑code experience (Terraform, Git Ops, Jenkins/Git Hub Actions/Git Lab CI).
- Data governance, security, and IAM experience; experience implementing row/column level access controls and data lineage.
- Demonstrated ability to design for scalability, reliability, and cost efficiency.
- Prior experience with Databricks Unity Catalog, Photon, and Databricks SQL.
- Experience integrating Databricks with vector databases (Pinecone, Neo4j) and retrieval frameworks (Lang Chain, Llama Index).
- Familiarity with AWS Bedrock or other managed LLM services.
- Experience with realtime streaming (Kafka, Kinesis) and stream processing on Databricks Structured Streaming.
- Certifications:
Databricks Certified Professional. - Experience with data quality and profiling tools (Great Expectations, Soda).
- Experience with large‑scale ETL frameworks and tools (Airflow, Prefect).
All qualified applicants will receive consideration for employment without regard to disability, status as a protected veteran or any other status protected by applicable federal, state, local,…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×