Data Quality · Governance · Data Science

Manisha
Takale

Turning data chaos into governed, trusted assets

12+ years at the intersection of software quality engineering and data science. I bring the rigour of enterprise QA thinking to modern data governance — frameworks that hold up in production, not just on paper.

12+
Years Experience
5
Global Clients
50%
Test Time Reduced
10m+
Data Processed
DAMA-DMBOK Privacy-by-Design Collibra Federated Learning Azure LLM Engineering EngD · JADS EU AI Act
01

Core Expertise

01 / 03

Data Quality
Engineering

Designing validation frameworks, acceptance criteria, and DQ rules that scale across complex, multi-source environments. Rooted in 7 years of production quality engineering for global telecom systems.

DQ Framework DesignData ProfilingRoot Cause AnalysisSQL · Python
02 / 03

Data Governance
& Strategy

Building governance frameworks from scratch — data ownership models, metadata standards, data contracts, and privacy-compliant architectures aligned with DAMA-DMBOK and GDPR.

DAMA-DMBOKData ModelingGDPRCollibraData Contracts
03 / 03

Applied
Data Science

End-to-end ML engineering with a specialisation in LLM applications, federated learning, and synthetic data generation for privacy-sensitive domains.

Machine LearningLLM · GenAIFederated LearningSynthetic Data
02

Featured Projects

Governance Strategy · Healthcare

Philips Healthcare

Case Dispatch Automation & Data Governance · 2024 · JADS × Philips

"How do you automate case dispatch in a hospital without patient data ever leaving its source system?"

Conducted a comprehensive data quality audit of the Philips IMS database, identifying critical gaps, lineage issues, and privacy constraints. Designed a Federated Learning architecture embedding governance controls at the structural level — patient data stays within hospital boundaries by design. Delivered a GDPR-compliant AI automation roadmap accepted by Philips stakeholders.

Privacy-by-Design ArchitectureDQ Audit DeliveredStakeholder Accepted Roadmap
Federated Learning · Python · Azure · GDPR Framework
Synthetic Data Quality · Maritime

MARIT-D

Synthetic AIS Data for Surveillance Testing · 2024 · JADS · Team Lead

"How do you validate anomaly detection algorithms when you can't use real surveillance data?"

Led the simulation track of a maritime intelligence project. Built a hybrid synthetic AIS data pipeline combining LLM-based trajectory generation (DeepSeek-R1) with rule-based logic. Established scenario-based DQ validation protocols ensuring statistical fidelity — enabling rigorous, privacy-safe algorithm validation.

LLM + Rule-Based HybridDQ Validation ProtocolsTeam Lead
DeepSeek-R1 · Ollama · Python · Pandas · Synthetic Data Design
Data Governance · Active Research

Maritime Data Platform

Cyber-Physical Governance Research · 2024–Present · JADS (Early Stage)

"How do you govern sensitive sensor data shared across organisations with competing jurisdictions?"

Early-stage research applying DAMA-DMBOK principles to design data ownership models, quality rules, and cross-organisational data sharing agreements for a multi-stakeholder maritime surveillance context.

DAMA-DMBOK AppliedData ModelingActive Research
Data Modeling · DAMA-DMBOK · Jira · Python
Data Quality · Enterprise Scale

Amdocs Global Telecom

Vodafone · AT&T · Sprint · Optus · WL-COM · 2011–2018 · Team Lead

"How do you maintain data integrity across 5 global telecom clients with zero downtime tolerance?"

7-year foundation in enterprise-scale data quality engineering. Designed comprehensive validation frameworks — quality rules, acceptance criteria, defect classification across multi-client, multi-geography environments. Led root cause analysis and defect lifecycle management: the exact workflow of a Data Quality Manager.

50% Reduction in Test Time7 Days → 24 Hours Deployment5 Global Clients
SQL · UNIX · Agile · Defect Lifecycle Management
03

Open Source Framework

WORKING PROJECT · PYTHON · OPEN SOURCE

Enterprise Data Quality &
AI Governance Control Framework

A production-ready reference implementation showing how regulated institutions operationalise data ownership, SLA enforcement, incident management, bias monitoring, and drift detection — built on German credit risk data.

DAMA-DMBOKEU AI Act DORAFinancial ServicesModel Governance
System Architecture
Data Ingestion DQ Rule Engine Incident Store
Ownership & SLA Router
── ML LAYER ────────────────────
Training Gate· Bias Monitor· Drift (PSI)· Model Card
01 · DATA GOVERNANCE

Ownership & Metadata Registry

Domain-based accountability model with YAML-as-policy artifacts synced to DuckDB.

  • Ownership registry (YAML)
  • DuckDB metadata sync
  • Domain accountability model
02 · DATA QUALITY

Rule Engine & Incidents

Completeness and validity rules with severity-based incident generation and SLA breach detection.

  • Completeness & validity rules
  • Severity-based routing
  • SLA breach detection
  • Evidence tracking
03 · ML QUALITY

Bias, Drift & Model Registry

Label integrity gate, demographic bias monitoring, PSI drift detection, auto-generated model cards.

  • Label integrity gate
  • Bias monitoring (age / demographic)
  • Feature & prediction drift
  • Model card generation
04 · DATASET

German Credit Risk Data

Built on the UCI German Credit dataset — real-world financial services data relevant to banking and insurance governance.

  • 1,000 credit applicant records
  • 20 features incl. demographics
  • Binary risk classification
  • Financial services context
DAMA
DAMA-DMBOK Alignment
DOMAINIMPLEMENTATION
Data GovernanceOwnership registry with domain accountability model
Data QualityRule engine + severity-based incident management
Metadata MgmtYAML policy artifacts synced to DuckDB
Risk ManagementSLA tracking + automated breach monitoring
EU AI
EU AI Act & DORA Alignment
ARTICLE / RULEIMPLEMENTATION
Art 9 – Risk MgmtMLQC training gate + escalation routing
Art 10 – Data GovLabel validation + bias monitoring
Art 12 – RecordsStored artifacts + model registry
Art 15 – MonitoringPSI-based drift detection pipeline
DORA – ResilienceOperational SLA + incident management
OPEN SOURCE · GITHUB
enterprise-dq-governance-framework
Python · DuckDB · DAMA-DMBOK · EU AI Act · German Credit Data
↗ View on GitHub
STACK PythonDuckDBPandas Scikit-learnYAMLPSI Drift Model CardsSLA Engine
04

Governance Thinking

1

Quality Before Governance

Governance frameworks fail when the underlying data quality is ignored. My approach always starts with a DQ audit — profiling, lineage mapping, gap analysis — before designing any governance structure. You can't govern what you don't understand.

2

Governance by Architecture

The strongest governance isn't enforced by policy — it's embedded in the architecture. My Philips Federated Learning design is a direct example: privacy compliance wasn't a rule added on top, it was structurally impossible to violate by design.

3

Defect Thinking Applied to Data

7 years of defect lifecycle management in telecom QA maps directly to data quality: classify the issue, trace the root cause, define acceptance criteria, automate the prevention. The vocabulary differs; the thinking is identical.

4

Synthetic Data as Quality Proxy

In privacy-sensitive domains you can't always test against real data. MARIT-D introduced me to synthetic data as a quality validation tool — generating statistically faithful test data to establish DQ acceptance criteria without exposing sensitive information.

"Good governance is invisible to the people it protects — and unmistakable to the engineers who built it right."

— Manisha Takale · On Privacy-by-Design
05

Leadership & Impact

Cross-functional Teams

Led 5–6 teams of ~30 participants as Overall Coordinator of JADS' Data Challenge Week — managing delivery, inter-team governance, and stakeholder communication simultaneously.

Client Stakeholder Management

7 years coordinating across Vodafone, AT&T, Sprint, Optus, and WL-COM — aligning data quality standards with client expectations across multiple time zones and business cultures.

End-to-End Data Ownership

As sole Data Science Lead at Yopla, owned the entire data lifecycle — ingestion, quality, modeling, deployment, and reporting — building accountability-first data culture from scratch.

71%
Deployment cycle reduction at Amdocs
25%
Data access efficiency gain at Yopla
~30
Researchers led at JADS Data Challenge Week
5
Global enterprise clients across 4 countries
06

Career Journey

2024 – PRESENT
EngD Researcher & Data Governance Practitioner
Jheronimus Academy of Data Science (JADS), Netherlands
Active research in data governance, synthetic data quality, and privacy-preserving AI. Industry projects with Philips Healthcare and maritime organisations.
2022 – 2024
Data Science Lead
Yopla, Eindhoven (Remote)
Full end-to-end data ownership at a sustainability startup — built all data infrastructure, quality controls, pipelines, and analytics from scratch.
2021 – 2022
Data Scientist
Aestivalis, Amsterdam (Remote)
Real-time financial data validation and predictive modelling using LSTM-RNN architectures on live market data.
2019 – 2020
MSc — Data Science and Society
Tilburg University
Transition from quality engineering to data science — bridging technical foundations with statistical and machine learning methodology.
2011 – 2018
Advanced Test Engineer & Team Lead
Amdocs India Pvt. Ltd., Pune
7 years building the quality engineering foundation that now underpins all governance thinking. Enterprise-scale validation for Vodafone, AT&T, Sprint, Optus, and WL-COM.
07

A Talk I Love

08

Let's Connect

OPEN TO NEW OPPORTUNITIES

"Looking for organisations where data quality is a strategic priority, not an afterthought."

Based in the Netherlands. Available for Data Quality Lead, Data Governance Manager, and Senior Data Scientist roles across NL, DE, and AT. Open to hybrid and remote arrangements.