Data Scientist Indianapolis - IN - Indiana
Listed on 2026-06-12
-
IT/Tech
Data Analyst, Data Science Manager, Data Scientist, Data Engineering
We are seeking a Databricks Data Scientist with strong experience in Databricks Lakehouse, advanced analytics, and Genie (AIBI) to design, build, and deploy scalable data science and AI solutions. This role will focus on transforming enterprise data into actionable insights using machine learning, natural language analytics, and self-service BI powered by Databricks Genie.
You will work closely with medical, commercial, and R&D teams across the pharma and life sciences industry to build intelligent solutions that drive scientific and business impact from drug discovery to commercial analytics to patient outcomes.
Python PySpark Databricks MLflow Spark ML Delta Lake
Genie (AIBI) Unity Catalog SQL NLP GenAI HIPAA GxP
Key Responsibilities
Data Science & Machine Learning
Design, develop, and deploy machine learning models using Databricks (MLflow, Spark ML, Python) for pharma and life sciences use cases
Implement end-to-end ML pipelines covering data ingestion, feature engineering, model training, deployment, and monitoring
Build predictive models for patient identification, HCP segmentation, market access analytics, pharmacovigilance, and safety signal detection
Apply NLP and generative AI techniques (LLMs, RAG pipelines) to extract insights from medical literature, clinical notes, and regulatory documents
Conduct AB testing, model validation, and statistical analysis to evaluate model performance and business impact
Collaborate with data engineers to ensure reliable, high-quality, production-ready datasets in the Lakehouse
Databricks & Lakehouse Architecture
Leverage Databricks Lakehouse (Delta Lake, Unity Catalog) for scalable, governed, and high-performance analytics
Design and optimize Spark jobs for performance and cost efficiency across large-scale pharma datasets
Apply best practices for data governance, data lineage, and security within Unity Catalog
Build and maintain Bronze Silver Gold Medallion architecture for clinical, claims, and commercial data
Implement Delta Live Tables (DLT) pipelines with data quality checks for real-time and batch processing
Configure and manage Databricks Workflows, Repos, and cluster policies for production ML workloads
Genie (AIBI & Natural Language Analytics)
Configure and enable Databricks Genie for self-service analytics across business and scientific teams
Design semantic layers and curated Gold datasets optimized for natural language queries via Genie
Define certified questions, trusted assets, and business glossary terms to improve Genie response quality
Partner with business stakeholders to translate complex pharma questions into Genie-enabled insights
Monitor and iterate on Genie Spaces based on user feedback, query accuracy, and adoption metrics
Enable non-technical users across Medical Affairs, Commercial, and R&D to self-serve data insights
Real-World & Clinical Data Analysis
Analyze real-world data (RWD), electronic health records (EHR), claims data, and clinical trial datasets to generate actionable insights
Build scalable data pipelines for pharma-specific sources including IQVIA, Symphony Health, Komodo, and specialty pharmacy data
Apply survival analysis, mixed models, and Bayesian methods for epidemiology and health economics (HEOR) studies
Ensure all models and data processes comply with HIPAA, GxP, and 21 CFR Part 11 regulations
Business Enablement & Stakeholder Collaboration
Work closely with product owners, analysts, and business leaders to identify and prioritize high-value data science use cases
Communicate complex analytical results and model outputs in a clear, business-friendly manner to non-technical audiences
Produce analytical documentation: model cards, design specs, performance reports, and executive summaries
Lead sprint ceremonies as analytics owner: architecture reviews, estimation sessions, and release planning
Required Qualifications
Experience:
4+ years of professional experience in data science or advanced analytics, preferably in pharma, biotech, or life sciences
Education:
Bachelors or Masters degree in Data Science, Computer Science, Statistics, Engineering, or a related field
Databricks:
Hands-on experience with Databricks and Apache Spark for…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).