Data Scientist Synthetic Data Job Pune area,Maharashtra India,IT/Tech

About Position:

We are seeking an experienced and highly skilled Data Scientist specialising in Generative AI and Synthetic Data Generation to design, develop, and deploy advanced data-driven solutions. This role focuses on building and optimising generative models capable of producing high-quality synthetic data that closely mimics real-world datasets across domains such as text, images, and structured data. The ideal candidate will play a critical role in leveraging machine learning and statistical techniques to enhance model performance, scalability, and reliability.

You will collaborate closely with AI researchers, engineers, and domain experts to drive innovation in generative AI systems, particularly in data-constrained or regulated environments.

Role:
Data Scientist Synthetic Data

Location:

All Persistent Locations

Experience:

8 to 12 years
Job Type: Full Time Employment

What You'll Do:

Synthetic Data Generation (Must-Have):
Design and develop machine learning models for synthetic data generation using techniques such as GANs, VAEs, diffusion models, and other deep generative approaches. Ensure generated data maintains statistical fidelity, diversity, and privacy compliance.
Data Collection & Preprocessing:
Identify, acquire, and curate relevant datasets. Perform data cleansing, transformation, and structuring to ensure high-quality inputs for training generative models.
Model Development & Training:
Build end-to-end data pipelines and workflows for training generative AI models using state-of-the-art architectures including GANs, Variational Autoencoders, Normalizing Flows, and Diffusion Networks.
Model Optimisation:
Perform hyperparameter tuning and experiment with architectures to improve model accuracy, stability, and output quality.
Data Augmentation:
Implement advanced data augmentation strategies to enhance dataset size, diversity, and model generalisation.
Performance Evaluation:
Define, track, and improve model evaluation metrics to ensure objective assessment of generative model performance and synthetic data quality.
Bias Detection & Mitigation:
Analyse datasets and model outputs to identify biases and implement techniques to ensure fairness, robustness, and ethical AI practices.
Transfer Learning & Adaptation:
Apply transfer learning approaches to fine-tune pre-trained models for new domains and specific use cases.
Collaborative Development:
Work cross-functionally with AI researchers, software engineers, product teams, and domain SMEs to integrate generative AI and synthetic data solutions into production systems.
Documentation & Knowledge Sharing:
Maintain clear and comprehensive documentation of methodologies, experiments, and findings to ensure reproducibility and knowledge dissemination

Expertise You'll Bring:

Master’s or Ph.D. in Computer Science, Data Science, Machine Learning, or a related discipline with a focus on AI or generative modelling.
Mandatory experience in synthetic data generation using machine learning or deep learning models.
Strong proficiency in Python and leading AI frameworks such as Tensor Flow or PyTorch.
In-depth understanding of generative modelling techniques including GANs, VAEs, Diffusion Models, and Normalizing Flows.
10+ years of experience in data science, including large-scale data processing, feature engineering, and model training.
Strong foundation in statistics, probability, and quantitative analysis.
Preferred Skills
Experience working with complex datasets such as biomedical, clinical, genomic, imaging, or omics data.
Familiarity with Natural Language Processing (NLP) and Computer Vision techniques in generative AI contexts.
Hands-on experience with privacy-preserving techniques (e.g., differential privacy, synthetic data validation).
Knowledge of Responsible AI principles, including fairness, transparency, explainability, and data privacy.
Experience working in regulated domains such as healthcare, life sciences, or finance.
Core Competencies
Strong problem-solving and analytical thinking capabilities.
Ability to work independently as well as in cross-functional teams.
Excellent communication and stakeholder management skills, with the ability to explain complex concepts…