April 10, 2025
DPDP Act Implementation: A Technical Checklist for AI Systems
The Digital Personal Data Protection Act 2023 moved from legislation to enforcement this year. For teams running AI systems in India, this isn’t just a legal checkbox - it requires actual technical changes to how you collect, process, and store data.
We’ve spent the last several months helping enterprises audit their AI systems for DPDP compliance. This post shares what we’ve learned - the specific technical requirements and the implementation patterns that work.
What DPDP Actually Requires for AI Systems
Let’s cut through the legal language. For AI systems specifically, DPDP creates these technical obligations:
1. Purpose Limitation
You can only use personal data for the purpose you collected it for. This sounds simple until you consider:
- Your customer support chatbot logs conversations for quality improvement
- Your data science team wants to use those logs to fine-tune a model
- That fine-tuned model gets deployed for a different use case
Each of these steps potentially violates purpose limitation if not handled correctly.
Technical requirement: Data lineage tracking that captures the original collection purpose and validates downstream usage.
class DataAsset:
def __init__(self, data, purpose: str, consent_id: str):
self.data = data
self.original_purpose = purpose
self.consent_id = consent_id
self.allowed_purposes = [purpose]
self.usage_log = []
def use_for(self, new_purpose: str) -> bool:
if new_purpose not in self.allowed_purposes:
self.log_violation_attempt(new_purpose)
return False
self.usage_log.append({
'purpose': new_purpose,
'timestamp': datetime.now(),
'actor': get_current_context()
})
return True
2. Data Minimization
Only collect and retain data necessary for your stated purpose. For AI systems, this conflicts with the instinct to “collect everything, figure out what’s useful later.”
Technical requirement: Schema enforcement at collection points that rejects unnecessary fields.
Before DPDP:
{
"user_id": "12345",
"name": "Priya Sharma",
"email": "[email protected]",
"phone": "+91-9876543210",
"address": "123 MG Road, Bangalore",
"ip_address": "203.0.113.42",
"device_fingerprint": "abc123...",
"query": "What's my account balance?",
"session_history": [...last 50 interactions...]
}
After DPDP (for a balance inquiry):
{
"user_id": "12345",
"query": "What's my account balance?",
"session_context": [...last 3 relevant interactions...]
}
The second schema answers the same query while collecting 80% less personal data.
3. Consent Management
DPDP requires explicit, informed consent that users can withdraw at any time. For AI systems, this means:
- Users must know their data will be processed by AI
- Users must be able to opt out of AI processing specifically
- Withdrawal of consent must stop processing within a reasonable time
Technical requirement: A consent management layer that AI systems check before processing.
sequenceDiagram
participant User
participant API Gateway
participant Consent Service
participant AI System
participant Fallback System
User->>API Gateway: Request (with user_id)
API Gateway->>Consent Service: Check AI consent status
Consent Service-->>API Gateway: {ai_allowed: true/false}
alt AI Consent Given
API Gateway->>AI System: Process request
AI System-->>User: AI-generated response
else AI Consent Withdrawn
API Gateway->>Fallback System: Process request
Fallback System-->>User: Non-AI response
end
4. Right to Erasure
Users can request deletion of their personal data. For AI systems, this is complicated:
- What about data that was used to train a model?
- What about embeddings derived from user content?
- What about cached responses that contain user information?
Technical requirement: Data inventory that tracks where personal data flows, including derived data.
We’ve seen enterprises with user data in:
- Primary databases
- Analytics warehouses
- Model training datasets
- Vector databases (embeddings)
- LLM conversation logs
- CDN caches
- Third-party analytics tools
- Backup systems
A deletion request needs to propagate to all of these.
5. Data Localization
For “significant data fiduciaries” (large companies), certain data must stay in India. For AI systems using cloud APIs, this is a direct problem:
- Sending user queries to GPT-4 in the US? That’s a data transfer.
- Using Claude via Anthropic’s US endpoints? Same issue.
- Even sending anonymized queries might transfer personal data if the query itself contains personal information.
Technical requirement: Either self-host models in India, use India-region endpoints, or implement robust anonymization before any cross-border API call.
This is exactly why we built Sankalp, our sovereign AI gateway - to provide enterprise AI capabilities without data leaving Indian infrastructure.
Implementation Patterns That Work
Pattern 1: Consent-Aware Request Pipeline
Build consent checking into your request pipeline, not as an afterthought:
class DPDPCompliantPipeline:
def __init__(self, consent_service, ai_service, fallback_service):
self.consent = consent_service
self.ai = ai_service
self.fallback = fallback_service
async def process(self, request: UserRequest) -> Response:
# Check consent before any processing
consent_status = await self.consent.get_status(
user_id=request.user_id,
purpose='ai_assistance'
)
if not consent_status.is_valid:
return await self.fallback.process(request)
# Minimize data before AI processing
minimized_request = self.minimize_for_purpose(
request,
purpose='ai_assistance'
)
# Process with audit trail
response = await self.ai.process(
minimized_request,
audit_context={
'consent_id': consent_status.consent_id,
'purpose': 'ai_assistance',
'data_fields_used': minimized_request.field_names()
}
)
return response
Pattern 2: Tiered Data Retention
Different data requires different retention periods. Implement automated cleanup:
flowchart TD
A[Data Ingestion] --> B{Data Classification}
B -->|Session Data| C[7-day retention]
B -->|Transaction Data| D[7-year retention]
B -->|Conversation Logs| E[90-day retention]
B -->|Training Data| F[Purpose-specific]
C --> G[Automated Deletion Job]
E --> G
F --> H{Purpose Still Valid?}
H -->|No| G
H -->|Yes| I[Retain with Re-consent Check]
Pattern 3: Anonymization Pipeline for External AI
If you must use external AI services, implement proper anonymization:
class AnonymizationPipeline:
def __init__(self):
self.pii_detector = PIIDetector() # Detects Indian PII patterns
self.token_map = {} # Maps tokens to original values
def anonymize(self, text: str) -> tuple[str, dict]:
"""
Replace PII with tokens, return mapping for re-identification
"""
entities = self.pii_detector.detect(text)
anonymized = text
mapping = {}
for entity in entities:
token = f"[{entity.type}_{uuid4().hex[:8]}]"
anonymized = anonymized.replace(entity.value, token)
mapping[token] = entity.value
return anonymized, mapping
def deanonymize(self, text: str, mapping: dict) -> str:
"""
Restore original values from tokens
"""
result = text
for token, original in mapping.items():
result = result.replace(token, original)
return result
Example transformation:
Original: "Rahul Verma's Aadhaar number is 1234-5678-9012
and he lives at 45 Nehru Street, Chennai"
Anonymized: "[NAME_a3b2c1d4]'s Aadhaar number is [AADHAAR_e5f6g7h8]
and he lives at [ADDRESS_i9j0k1l2]"
The anonymized version can be sent to external AI. The response is de-anonymized locally.
Pattern 4: Audit Trail Architecture
DPDP requires you to demonstrate compliance. Build comprehensive audit trails:
@dataclass
class AIProcessingAudit:
timestamp: datetime
request_id: str
user_id: str
consent_id: str
purpose: str
# Data minimization evidence
fields_available: list[str]
fields_used: list[str]
fields_excluded: list[str]
# Processing details
model_used: str
model_location: str # 'india' or external
anonymization_applied: bool
# Outcome
processing_result: str
data_retained: bool
retention_period_days: int
def to_compliance_report(self) -> dict:
return {
'minimization_ratio': len(self.fields_used) / len(self.fields_available),
'data_sovereignty': self.model_location == 'india',
'consent_valid': self.consent_id is not None,
'retention_compliant': self.retention_period_days <= MAX_RETENTION
}
Common Compliance Gaps We’ve Found
In our DPDP audits, these issues appear repeatedly:
Gap 1: Vector Database Blindspot
Teams remember to handle personal data in traditional databases but forget about vector stores. If you’re using RAG with user documents, those embeddings are derived from personal data and subject to DPDP.
Fix: Implement user-ID tagging on vector embeddings and deletion propagation to your vector store.
Gap 2: Log File Leakage
AI system logs often contain user queries verbatim. These logs might be:
- Shipped to external logging services (data transfer)
- Retained indefinitely (retention violation)
- Accessible to debugging tools without consent checks
Fix: Scrub PII from logs before storage, or implement log-level consent checking.
Gap 3: Model Training Without Consent Trail
If you fine-tune models on user data, you need to maintain a consent trail. We’ve seen teams use conversation logs for training without checking whether users consented to that specific purpose.
Fix: Tag training data with consent IDs and filter during dataset creation.
Gap 4: Third-Party SDK Data Collection
Many AI SDKs collect telemetry by default. This data might include:
- User queries (for “quality improvement”)
- Session identifiers
- Device information
Fix: Audit all third-party dependencies for data collection. Disable telemetry or ensure it’s covered by consent.
The Compliance Architecture
Here’s what a DPDP-compliant AI architecture looks like:
flowchart TB
subgraph "User Interaction Layer"
A[User Request] --> B[API Gateway]
B --> C{Consent Check}
end
subgraph "Consent Management"
C --> D[Consent Service]
D --> E[(Consent Database)]
end
subgraph "Data Processing Layer"
C -->|Consent Valid| F[Data Minimizer]
F --> G{Data Location Check}
G -->|India Data| H[India AI Processing]
G -->|Needs Anonymization| I[Anonymization Pipeline]
I --> J[External AI Service]
J --> K[De-anonymization]
end
subgraph "Audit & Compliance"
H --> L[Audit Logger]
K --> L
L --> M[(Audit Database)]
M --> N[Compliance Reports]
end
subgraph "Data Lifecycle"
O[Retention Manager] --> P{Check Retention Rules}
P --> Q[Automated Deletion]
Q --> R[(Primary DB)]
Q --> S[(Vector Store)]
Q --> T[(Log Storage)]
end
Getting Help
DPDP compliance isn’t a one-time project - it’s an ongoing operational requirement. Your AI systems need to be compliant by design, not retrofitted.
At Rotavision, we offer:
- DPDP compliance audits for existing AI systems
- Sankalp - sovereign AI gateway with built-in compliance controls
- Vishwas - AI trust platform with consent and audit trail management
- Implementation services to build compliant AI architectures
The penalties for DPDP non-compliance can reach Rs. 250 crore. More importantly, compliance builds the trust that makes AI adoption sustainable.
Contact us if you need help assessing your AI systems against DPDP requirements.