More jobs:
Job Description & How to Apply Below
Location: Bengaluru
Job Title:
Data Engineer
Role Overview
We are seeking a detail-oriented Data Engineer with strong expertise in Python, Natural Language Processing (NLP), and LLM-assisted analytics to support large-scale contract text analysis. This role focuses on structured extraction, pattern discovery, and analytics-driven insights from redlined legal documents.
This is a data and analytics role — not a legal drafting or negotiation function.
Essential Functions
Redline Pattern Extraction & Variant Discovery
Analyze historical tracked changes and document markups at scale to identify recurring edit patterns, counter party tendencies, and structured clause-variation clusters within legal text.
Group similar edits into normalized variant sets without determining legal fallback positions or acceptability.
Clause Change Clustering & Exception Mapping
Classify edits by frequency, type (add / delete / narrow / broaden), and document location.
Surface anomalies and outliers for review by legal SMEs (no interpretation required).
LLM-Assisted Summarization & Tagging
Leverage AI tools and LLM frameworks to accelerate classification of edit intent and thematic categorization, with structured QC oversight.
Generate structured outputs to support SME review (no drafting or position-setting responsibilities).
Insight Packaging for SMEs & Playbook Authors
Produce clean variant summaries, clause drift reports, and exception trend snapshots.
Deliver contract-type-specific insight packs (e.g., across 200–500 agreements) highlighting recurring edits, variant clusters, and trend deviations.
Present structured findings to legal stakeholders for final determination.
Scalable Text Extraction & Data Normalization
Develop and operate Python-based scripts and approved AI services for large-scale text extraction, normalization, and clustering.
Extract and compare previous vs. revised language across batches of Word documents.
Treat redlines as structured text elements rather than legal judgments.
Secondary Analytics (As Bandwidth Allows)
Support adjacent analytics initiatives such as help-desk pattern identification, matter trend clustering, and metadata normalization.
Strictly analytical scope — no contract drafting, negotiation, or rule design.
Cross-Time-Zone Collaboration
Provide clear asynchronous updates, maintain backlog transparency, and communicate structured pattern insights to US-based stakeholders.
Required Skills & Technical Expertise
Strong proficiency in Python (text parsing, data manipulation, automation).
Experience with NLP techniques (tokenization, clustering, semantic similarity, embeddings).
Exposure to LLM-assisted workflows for classification and summarization.
Experience handling large volumes of unstructured or semi-structured documents.
Familiarity with Word document parsing (e.g., tracked changes, redlines) preferred.
Strong data structuring, normalization, and analytical reasoning skills.
Education & Experience
Bachelor’s degree (B.Tech / B.E /
B.Sc /
B.Com / BA / LL.B).
4–7 years of experience in data or document analytics.
Experience working with structured/unstructured text datasets required.
Exposure to legal document formats preferred (not mandatory).
Postgraduate qualification in Analytics, Data Science, or Legal Operations is a plus.
Experience working with global teams across time zones preferred.
Position Requirements
5+ Years
work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×