Code-Mixed Reality: Why Your AI Fails When Indians Actually Talk

Here’s a real customer service interaction from an Indian bank:

User: Mera account mein kuch problem hai, last week se transactions nahi ho rahi

Bot: I’m sorry, I didn’t understand that. Could you please rephrase in Hindi or English?

User: Bhai Hindi mein hi toh bol raha hoon

Bot: I’m sorry, I didn’t understand that. Could you please rephrase in Hindi or English?

The user IS speaking Hindi. But they’re also using English words (account, problem, transactions). And the bot, despite supporting both Hindi and English, can’t handle the combination.

This is the code-mixing problem. And it’s breaking AI systems across India.

How Indians Actually Communicate

Code-mixing isn’t a mistake or laziness. It’s how multilingual people naturally communicate.

A study of urban Indian communication found:

67% of conversations between educated Indians involve code-mixing
The average speaker switches languages 3-4 times per sentence
Technical topics (banking, healthcare, technology) have higher mixing rates

Common patterns:

Hinglish (Hindi + English):

“Yeh feature bahut useful hai” (This feature is very useful)
“Please mujhe tomorrow call karna” (Please call me tomorrow)
“Loan ka interest rate kya hai?” (What’s the loan interest rate?)

Tanglish (Tamil + English):

“Account-la problem irukku” (There’s a problem in the account)
“Password reset pannu” (Reset the password)

Benglish (Bengali + English):

“Form-ta fill up korte hobe” (The form needs to be filled up)
“Transaction failed hoye geche” (The transaction has failed)

Why Standard Approaches Fail

Approach 1: Language Detection + Routing

The naive approach: detect the language, route to the appropriate model.

def handle_query(text):
    language = detect_language(text)
    if language == 'hindi':
        return hindi_model.process(text)
    elif language == 'english':
        return english_model.process(text)
    else:
        return "Please use Hindi or English"

This fails because code-mixed text doesn’t trigger either detector reliably. “Mera balance check karo” might be detected as:

Hindi (by word count)
English (by character patterns)
Unknown (by confidence threshold)

And even if you detect it, which model do you route to? The Hindi model doesn’t understand “balance” and “check” in this context. The English model doesn’t understand “mera” and “karo.”

Approach 2: Translation First

Translate everything to English, then process with your English model.

def handle_query(text):
    english_text = translate_to_english(text)
    return english_model.process(english_text)

Problems:

Translation adds latency (200-500ms typically)
Translation errors compound with model errors
Code-mixed text translates poorly (“Mera balance check karo” might become “Do my balance check” - grammatically weird)
You lose the ability to respond in the user’s natural register

Approach 3: Multilingual Models

Use a single multilingual model like mBERT, XLM-R, or a multilingual LLM.

This is better but still struggles because:

Training data for code-mixed text is scarce
Models learn languages separately, not the transitions between them
Romanized scripts (Hindi written in Latin characters) often fail

We evaluated several multilingual models on code-mixed banking queries:

Model	Pure Hindi	Pure English	Code-Mixed
mBERT	78%	91%	54%
XLM-R Large	82%	93%	61%
GPT-4	89%	96%	72%
Our fine-tuned	91%	94%	87%

The performance drop on code-mixed text is consistent across models. Generic multilingual training doesn’t solve this.

Understanding Code-Mixing Patterns

Code-mixing isn’t random. There are predictable patterns we can exploit:

Pattern 1: Matrix Language + Embedded Language

In most code-mixed speech, one language provides the grammatical structure (matrix language) while the other contributes words or phrases (embedded language).

"Mera account mein kuch problem hai"
 |      |      |    |     |      |
 HI    EN     HI   HI    EN     HI

Matrix: Hindi (provides grammar)
Embedded: English (provides domain terms)

Understanding this helps with parsing. The verb conjugation (“hai”) follows Hindi grammar even though “problem” is English.

Pattern 2: Domain-Specific Borrowing

Technical domains consistently borrow specific terms:

Domain	Commonly Borrowed Terms
Banking	account, balance, transaction, transfer, loan, EMI
Healthcare	doctor, appointment, test, report, medicine
Technology	phone, recharge, data, internet, password
Legal	court, case, advocate, document, petition

These aren’t translated because they’re standard usage. Saying “khata” instead of “account” would sound formal and unnatural.

Pattern 3: Script Variation

The same code-mixed sentence might appear in:

Devanagari: मेरा balance check करो
Roman: Mera balance check karo
Mixed: मेरा balance चेक करो

Your system needs to handle all three.

What Actually Works

Solution 1: Code-Mix Aware Tokenization

Standard tokenizers treat code-mixed text poorly. We need tokenization that understands mixing patterns:

class CodeMixTokenizer:
    def __init__(self):
        self.hindi_tokenizer = HindiTokenizer()
        self.english_tokenizer = EnglishTokenizer()
        self.switch_detector = LanguageSwitchDetector()

    def tokenize(self, text: str) -> list[Token]:
        # First, detect language spans
        spans = self.switch_detector.detect_spans(text)

        tokens = []
        for span in spans:
            if span.language == 'hindi':
                span_tokens = self.hindi_tokenizer.tokenize(span.text)
            else:
                span_tokens = self.english_tokenizer.tokenize(span.text)

            # Mark language boundaries
            if tokens and tokens[-1].language != span.language:
                tokens.append(Token('[SWITCH]', 'boundary'))

            tokens.extend(span_tokens)

        return tokens

The [SWITCH] tokens help the model learn mixing patterns explicitly.

Solution 2: Fine-Tuning on Real Code-Mixed Data

Generic multilingual pretraining isn’t enough. You need fine-tuning on actual code-mixed data from your domain.

Data collection strategies:

Chat logs: Your existing customer interactions (with PII removed)
Social media: Twitter/X data from Indian users discussing relevant topics
Synthetic generation: Generate code-mixed variants of your existing data

For synthetic generation:

def generate_codemixed_variants(english_text: str, domain: str) -> list[str]:
    """
    Generate realistic code-mixed variants of an English sentence
    """
    variants = []

    # Strategy 1: Replace domain terms with English, rest in Hindi
    hindi_with_terms = translate_preserving_terms(english_text, domain)
    variants.append(hindi_with_terms)

    # Strategy 2: Romanize the Hindi portions
    romanized = romanize_hindi(hindi_with_terms)
    variants.append(romanized)

    # Strategy 3: Add discourse markers in Hindi
    with_markers = add_hindi_discourse_markers(english_text)
    variants.append(with_markers)

    return variants

Example outputs for “What is my account balance?”:

“Mera account balance kya hai?”
“Account mein kitna paisa hai?”
“Bhai account ka balance bata do”

Solution 3: Intent Normalization Layer

Before classification, normalize code-mixed input to a canonical form:

flowchart LR
    A[Raw Input] --> B[Script Normalizer]
    B --> C[Language Span Detector]
    C --> D[Domain Term Identifier]
    D --> E[Intent Normalizer]
    E --> F[Normalized Intent]

    subgraph "Example"
        G["मेरा balance चेक करो"] --> H["[HI]mera [EN]balance [EN]check [HI]karo"]
        H --> I["INTENT: check_balance, ENTITY: self"]
    end

The normalized form is language-agnostic, making downstream processing simpler.

Solution 4: Response Generation in Matching Register

If a user speaks Hinglish, respond in Hinglish. Matching the user’s register improves perceived quality and understanding.

def generate_response(intent: Intent, user_register: Register) -> str:
    base_response = get_response_template(intent)

    if user_register == Register.HINGLISH:
        return hinglify(base_response, formality=user_register.formality)
    elif user_register == Register.FORMAL_HINDI:
        return translate_formal_hindi(base_response)
    else:
        return base_response  # English

def hinglify(english_text: str, formality: str) -> str:
    """
    Convert English response to Hinglish matching user's style
    """
    # Keep domain terms in English
    # Convert common phrases to Hindi
    # Match formality level (aap vs tum vs tu)
    ...

Example:

User: “Balance check karo bhai”
Response: “Aapka balance ₹12,543 hai. Aur kuch help chahiye?”

Not: “Your current account balance is ₹12,543. Is there anything else I can help you with?”

Evaluation Framework

Standard NLU metrics don’t capture code-mixing performance. We’ve developed a more comprehensive evaluation:

class CodeMixEvaluator:
    def evaluate(self, model, test_set) -> dict:
        results = {
            'pure_hindi': [],
            'pure_english': [],
            'code_mixed_light': [],    # <20% mixing
            'code_mixed_moderate': [],  # 20-50% mixing
            'code_mixed_heavy': [],     # >50% mixing
            'romanized': [],
        }

        for example in test_set:
            category = self.categorize(example)
            prediction = model.predict(example.text)
            correct = prediction == example.label
            results[category].append(correct)

        return {
            category: sum(scores) / len(scores)
            for category, scores in results.items()
        }

This reveals where your model actually fails. Most models show acceptable pure-language performance but collapse on heavy code-mixing.

The Business Case

Why does this matter beyond linguistic curiosity?

Customer service: Banks report 30-40% of chatbot interactions involve code-mixing. If your bot fails on these, you’re frustrating a third of your customers.

Voice assistants: Voice is even more code-mixed than text. Users don’t consciously switch languages - they speak naturally.

Internal tools: Employee-facing AI tools need to understand how employees actually communicate, not how training data looks.

Inclusion: Forcing users into “pure” Hindi or English excludes those who are more comfortable mixing. This often correlates with geography and education - exactly the populations you might want to serve better.

What We’re Building

At Rotavision, we’ve invested heavily in code-mixing capabilities across our products:

Vishwas includes evaluation metrics specifically for code-mixed text - so you can measure whether your AI systems work for real Indian users.

Dastavez handles documents that mix scripts and languages - common in Indian bureaucratic paperwork where forms have English headers but Hindi content.

Our language evaluation frameworks test models specifically on code-mixed competence, not just pure-language benchmarks that look good in demos but fail in production.

Getting Started

If you’re building AI for Indian users:

Audit your current performance: What percentage of your failed interactions involve code-mixing? You might be surprised.
Collect real data: Don’t rely on synthetic or pure-language training data. Your users’ actual language patterns are the ground truth.
Test appropriately: Add code-mixed test cases to your evaluation suite. Test across scripts (Devanagari AND Romanized).
Match user register: Responding in formal English to Hinglish input feels robotic and distant.
Measure separately: Track code-mixed performance as a distinct metric, not averaged into overall accuracy.

The future of AI in India requires systems that understand how Indians actually communicate - not how linguistics textbooks describe language boundaries.

Let’s talk if you’re building AI systems for Indian users and want to get code-mixing right.