DPDP Act Implementation: A Technical Checklist for AI Systems

The Digital Personal Data Protection Act 2023 moved from legislation to enforcement this year. For teams running AI systems in India, this isn’t just a legal checkbox - it requires actual technical changes to how you collect, process, and store data.

We’ve spent the last several months helping enterprises audit their AI systems for DPDP compliance. This post shares what we’ve learned - the specific technical requirements and the implementation patterns that work.

What DPDP Actually Requires for AI Systems

Let’s cut through the legal language. For AI systems specifically, DPDP creates these technical obligations:

1. Purpose Limitation

You can only use personal data for the purpose you collected it for. This sounds simple until you consider:

Your customer support chatbot logs conversations for quality improvement
Your data science team wants to use those logs to fine-tune a model
That fine-tuned model gets deployed for a different use case

Each of these steps potentially violates purpose limitation if not handled correctly.

Technical requirement: Data lineage tracking that captures the original collection purpose and validates downstream usage.

class DataAsset:
    def __init__(self, data, purpose: str, consent_id: str):
        self.data = data
        self.original_purpose = purpose
        self.consent_id = consent_id
        self.allowed_purposes = [purpose]
        self.usage_log = []

    def use_for(self, new_purpose: str) -> bool:
        if new_purpose not in self.allowed_purposes:
            self.log_violation_attempt(new_purpose)
            return False
        self.usage_log.append({
            'purpose': new_purpose,
            'timestamp': datetime.now(),
            'actor': get_current_context()
        })
        return True

2. Data Minimization

Only collect and retain data necessary for your stated purpose. For AI systems, this conflicts with the instinct to “collect everything, figure out what’s useful later.”

Technical requirement: Schema enforcement at collection points that rejects unnecessary fields.

Before DPDP:

{
  "user_id": "12345",
  "name": "Priya Sharma",
  "email": "[email protected]",
  "phone": "+91-9876543210",
  "address": "123 MG Road, Bangalore",
  "ip_address": "203.0.113.42",
  "device_fingerprint": "abc123...",
  "query": "What's my account balance?",
  "session_history": [...last 50 interactions...]
}

After DPDP (for a balance inquiry):

{
  "user_id": "12345",
  "query": "What's my account balance?",
  "session_context": [...last 3 relevant interactions...]
}

The second schema answers the same query while collecting 80% less personal data.

DPDP requires explicit, informed consent that users can withdraw at any time. For AI systems, this means:

Users must know their data will be processed by AI
Users must be able to opt out of AI processing specifically
Withdrawal of consent must stop processing within a reasonable time

Technical requirement: A consent management layer that AI systems check before processing.

sequenceDiagram
    participant User
    participant API Gateway
    participant Consent Service
    participant AI System
    participant Fallback System

    User->>API Gateway: Request (with user_id)
    API Gateway->>Consent Service: Check AI consent status
    Consent Service-->>API Gateway: {ai_allowed: true/false}

    alt AI Consent Given
        API Gateway->>AI System: Process request
        AI System-->>User: AI-generated response
    else AI Consent Withdrawn
        API Gateway->>Fallback System: Process request
        Fallback System-->>User: Non-AI response
    end

4. Right to Erasure

Users can request deletion of their personal data. For AI systems, this is complicated:

What about data that was used to train a model?
What about embeddings derived from user content?
What about cached responses that contain user information?

Technical requirement: Data inventory that tracks where personal data flows, including derived data.

We’ve seen enterprises with user data in:

Primary databases
Analytics warehouses
Model training datasets
Vector databases (embeddings)
LLM conversation logs
CDN caches
Third-party analytics tools
Backup systems

A deletion request needs to propagate to all of these.

5. Data Localization

For “significant data fiduciaries” (large companies), certain data must stay in India. For AI systems using cloud APIs, this is a direct problem:

Sending user queries to GPT-4 in the US? That’s a data transfer.
Using Claude via Anthropic’s US endpoints? Same issue.
Even sending anonymized queries might transfer personal data if the query itself contains personal information.

Technical requirement: Either self-host models in India, use India-region endpoints, or implement robust anonymization before any cross-border API call.

This is exactly why we built Sankalp, our sovereign AI gateway - to provide enterprise AI capabilities without data leaving Indian infrastructure.

Implementation Patterns That Work

Build consent checking into your request pipeline, not as an afterthought:

class DPDPCompliantPipeline:
    def __init__(self, consent_service, ai_service, fallback_service):
        self.consent = consent_service
        self.ai = ai_service
        self.fallback = fallback_service

    async def process(self, request: UserRequest) -> Response:
        # Check consent before any processing
        consent_status = await self.consent.get_status(
            user_id=request.user_id,
            purpose='ai_assistance'
        )

        if not consent_status.is_valid:
            return await self.fallback.process(request)

        # Minimize data before AI processing
        minimized_request = self.minimize_for_purpose(
            request,
            purpose='ai_assistance'
        )

        # Process with audit trail
        response = await self.ai.process(
            minimized_request,
            audit_context={
                'consent_id': consent_status.consent_id,
                'purpose': 'ai_assistance',
                'data_fields_used': minimized_request.field_names()
            }
        )

        return response

Pattern 2: Tiered Data Retention

Different data requires different retention periods. Implement automated cleanup:

flowchart TD
    A[Data Ingestion] --> B{Data Classification}
    B -->|Session Data| C[7-day retention]
    B -->|Transaction Data| D[7-year retention]
    B -->|Conversation Logs| E[90-day retention]
    B -->|Training Data| F[Purpose-specific]

    C --> G[Automated Deletion Job]
    E --> G
    F --> H{Purpose Still Valid?}
    H -->|No| G
    H -->|Yes| I[Retain with Re-consent Check]

Pattern 3: Anonymization Pipeline for External AI

If you must use external AI services, implement proper anonymization:

class AnonymizationPipeline:
    def __init__(self):
        self.pii_detector = PIIDetector()  # Detects Indian PII patterns
        self.token_map = {}  # Maps tokens to original values

    def anonymize(self, text: str) -> tuple[str, dict]:
        """
        Replace PII with tokens, return mapping for re-identification
        """
        entities = self.pii_detector.detect(text)

        anonymized = text
        mapping = {}

        for entity in entities:
            token = f"[{entity.type}_{uuid4().hex[:8]}]"
            anonymized = anonymized.replace(entity.value, token)
            mapping[token] = entity.value

        return anonymized, mapping

    def deanonymize(self, text: str, mapping: dict) -> str:
        """
        Restore original values from tokens
        """
        result = text
        for token, original in mapping.items():
            result = result.replace(token, original)
        return result

Example transformation:

Original: "Rahul Verma's Aadhaar number is 1234-5678-9012
           and he lives at 45 Nehru Street, Chennai"

Anonymized: "[NAME_a3b2c1d4]'s Aadhaar number is [AADHAAR_e5f6g7h8]
             and he lives at [ADDRESS_i9j0k1l2]"

The anonymized version can be sent to external AI. The response is de-anonymized locally.

Pattern 4: Audit Trail Architecture

DPDP requires you to demonstrate compliance. Build comprehensive audit trails:

@dataclass
class AIProcessingAudit:
    timestamp: datetime
    request_id: str
    user_id: str
    consent_id: str
    purpose: str

    # Data minimization evidence
    fields_available: list[str]
    fields_used: list[str]
    fields_excluded: list[str]

    # Processing details
    model_used: str
    model_location: str  # 'india' or external
    anonymization_applied: bool

    # Outcome
    processing_result: str
    data_retained: bool
    retention_period_days: int

    def to_compliance_report(self) -> dict:
        return {
            'minimization_ratio': len(self.fields_used) / len(self.fields_available),
            'data_sovereignty': self.model_location == 'india',
            'consent_valid': self.consent_id is not None,
            'retention_compliant': self.retention_period_days <= MAX_RETENTION
        }

Common Compliance Gaps We’ve Found

In our DPDP audits, these issues appear repeatedly:

Gap 1: Vector Database Blindspot

Teams remember to handle personal data in traditional databases but forget about vector stores. If you’re using RAG with user documents, those embeddings are derived from personal data and subject to DPDP.

Fix: Implement user-ID tagging on vector embeddings and deletion propagation to your vector store.

Gap 2: Log File Leakage

AI system logs often contain user queries verbatim. These logs might be:

Shipped to external logging services (data transfer)
Retained indefinitely (retention violation)
Accessible to debugging tools without consent checks

Fix: Scrub PII from logs before storage, or implement log-level consent checking.

If you fine-tune models on user data, you need to maintain a consent trail. We’ve seen teams use conversation logs for training without checking whether users consented to that specific purpose.

Fix: Tag training data with consent IDs and filter during dataset creation.

Gap 4: Third-Party SDK Data Collection

Many AI SDKs collect telemetry by default. This data might include:

User queries (for “quality improvement”)
Session identifiers
Device information

Fix: Audit all third-party dependencies for data collection. Disable telemetry or ensure it’s covered by consent.

The Compliance Architecture

Here’s what a DPDP-compliant AI architecture looks like:

flowchart TB
    subgraph "User Interaction Layer"
        A[User Request] --> B[API Gateway]
        B --> C{Consent Check}
    end

    subgraph "Consent Management"
        C --> D[Consent Service]
        D --> E[(Consent Database)]
    end

    subgraph "Data Processing Layer"
        C -->|Consent Valid| F[Data Minimizer]
        F --> G{Data Location Check}
        G -->|India Data| H[India AI Processing]
        G -->|Needs Anonymization| I[Anonymization Pipeline]
        I --> J[External AI Service]
        J --> K[De-anonymization]
    end

    subgraph "Audit & Compliance"
        H --> L[Audit Logger]
        K --> L
        L --> M[(Audit Database)]
        M --> N[Compliance Reports]
    end

    subgraph "Data Lifecycle"
        O[Retention Manager] --> P{Check Retention Rules}
        P --> Q[Automated Deletion]
        Q --> R[(Primary DB)]
        Q --> S[(Vector Store)]
        Q --> T[(Log Storage)]
    end

Getting Help

DPDP compliance isn’t a one-time project - it’s an ongoing operational requirement. Your AI systems need to be compliant by design, not retrofitted.

At Rotavision, we offer:

DPDP compliance audits for existing AI systems
Sankalp - sovereign AI gateway with built-in compliance controls
Vishwas - AI trust platform with consent and audit trail management
Implementation services to build compliant AI architectures

The penalties for DPDP non-compliance can reach Rs. 250 crore. More importantly, compliance builds the trust that makes AI adoption sustainable.