How long does it take to get started with RUNO?

Most firms are fully operational within 3-5 business days. We handle data migration, module configuration, and team training as part of onboarding.

Which African jurisdictions does RUNO support?

RUNO currently covers Nigeria, Ghana, Kenya, South Africa, the United Kingdom, and the United States, with ongoing expansion into Tanzania, Rwanda, and the OHADA region.

Do you offer a free trial?

Yes. We offer a 14-day free trial with full access to all modules. No credit card required.

Privacy & Compliance

GDPR-Compliant Document Redaction: A Comprehensive Guide for Legal Professionals

GDPR's data minimisation principle fundamentally changes how legal teams handle documents containing personal data. When documents must be shared—in litigation, regulatory requests, or commercial transactions—proper redaction ensures compliance while preserving document utility. This guide provides practical techniques for GDPR-compliant redaction across document types.

RUNO Editorial

January 31, 202621 min read557 views

A law firm preparing discovery production reviewed 50,000 documents manually for personal data. They redacted names, addresses, and obvious identifiers. After production, they believed they had satisfied their GDPR obligations. Six weeks later, opposing counsel notified them that 127 documents contained unredacted national insurance numbers in embedded metadata. The subsequent ICO investigation resulted in a formal reprimand, required remediation, and reputational damage that far exceeded what proper redaction would have cost.

GDPR-compliant redaction isn't just about black boxes over visible text. It requires systematic identification of personal data wherever it exists—in text, in metadata, in embedded objects, in image layers—and applying appropriate techniques that truly remove the data, not just obscure it visually.

This comprehensive guide provides the practical knowledge legal teams need to implement proper GDPR-compliant redaction.

Document redaction process showing data protection measures — Proper redaction requires more than visual obscuring of personal data

Understanding What Must Be Redacted

Personal Data Under GDPR

Article 4(1) of GDPR defines personal data as "any information relating to an identified or identifiable natural person." This deliberately broad definition encompasses far more than obvious identifiers.

Direct Identifiers - Data that identifies individuals on its own:

Full names, first names, surnames
National identification numbers (NI numbers, passport numbers, driving licence numbers)
Postal and email addresses
Phone numbers (mobile, landline, fax)
Financial account numbers (bank accounts, credit cards)
Government-issued identifiers (NHS numbers, tax references)

Indirect Identifiers - Data that identifies when combined with other information:

Job titles and positions (especially in small organisations)
Employee and membership numbers
IP addresses and device identifiers
Location data and travel patterns
Unique characteristics or descriptions
Vehicle registration numbers

Special Category Data - Requiring heightened protection under Article 9:

Racial or ethnic origin
Political opinions
Religious or philosophical beliefs
Trade union membership
Genetic and biometric data
Health data
Sex life and sexual orientation

The Redaction Decision Framework

Not all personal data requires redaction in every context. The analysis should consider:

Is processing lawful? If a legitimate legal basis exists for including the personal data in the document disclosure, redaction may not be required. Article 6(1)(f) legitimate interests or Article 6(1)(c) legal obligation may apply to litigation disclosures.

Is the data necessary? Data minimisation requires processing only what's necessary for the purpose. Even if lawful to include, unnecessary personal data should be redacted.

Who will receive the document? Internal circulation may justify different treatment than disclosure to adverse parties or public production. Context matters.

What agreements exist? Data processing agreements, protective orders, or confidentiality undertakings may permit or require certain approaches to personal data in documents.

Legal professionals reviewing documents for personal data — Systematic assessment determines what personal data requires redaction

Redaction Techniques by Data Type

Text Redaction

Black Box Redaction

The traditional approach—covering text with opaque boxes—seems straightforward but contains a critical trap:

The Critical Requirement: The underlying text must be removed, not just covered. Many PDF tools create visual overlays that can be removed, revealing the "redacted" text beneath.

Verification Protocol:

After applying redaction, attempt to copy text from the redacted area
Use PDF inspection tools to examine the document structure
Search the document for terms you believe were redacted
If any text is accessible, redaction is incomplete

Character Replacement

Replace personal data with descriptive placeholders:

Original: "John Smith, NI number AB123456C, contacted the department on 15 March 2024."

Redacted: "[NAME REDACTED], NI number [REDACTED], contacted the department on 15 March 2024."

Advantages:

Preserves document readability and context
Makes clear what type of data was redacted
Enables verification of redaction completeness

Best Practice: Use consistent placeholders across all documents (e.g., always "[NAME REDACTED]", not sometimes "NAME WITHHELD" or "[REDACTED NAME]").

Pseudonymisation

Replace identifiers with consistent pseudonyms throughout documents:

"John Smith" becomes "Individual A" in all documents where he appears.

Advantages:

Preserves relationships and narrative flow
Enables analysis of patterns and communications
Documents remain coherent and usable

Critical Requirement: Maintain the mapping table separately and securely. If the mapping is compromised, pseudonymisation provides no protection.

Metadata Redaction

Documents contain extensive metadata that frequently includes personal data invisible to casual review:

Metadata Category	Personal Data Risk	Remediation Approach
Author fields	Creator names, login IDs	Clear or replace with generic values
Last modified by	Editor names, user accounts	Clear or replace
File paths	May contain usernames in paths	Remove or sanitise
Revision history	Names of all editors, deleted content	Remove revision history entirely
Comments/annotations	Commenter names, potentially sensitive content	Remove all comments
Email headers	Sender/recipient addresses, routing information	Targeted redaction of personal addresses

Redaction Workflow:

Use metadata removal tools before visual redaction
Flatten documents to remove layers (tracked changes, comments)
Consider conversion to clean formats (e.g., printed to PDF/A)
Verify with metadata inspection tools after processing

Image Redaction

Images within documents—photographs, scanned pages, embedded graphics—require special handling:

Text in Images: OCR-derived text can be redacted from searchable layers, but text visible in the image itself requires image editing to obscure.

Facial Images: May require:

Complete obscuring (black box over face)
Blurring or pixelation
Consideration of whether context makes individuals identifiable even with face obscured

EXIF Data: Photograph metadata often contains:

GPS coordinates of where photo was taken
Camera serial numbers (potentially identifying)
Timestamps and date information
Device identifiers

Digital image metadata and EXIF data being reviewed — Image files contain hidden metadata that may include personal information

Bulk Redaction: Scaling the Process

Pattern-Based Redaction

For large document sets, pattern matching enables systematic redaction of structured personal data:

Regular Expression Patterns:

Data Type	UK Pattern	Notes
Email addresses	[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}	Catches most formats
UK phone numbers	(\+44\|0)\s?[1-9]\d{2,4}\s?\d{3,4}\s?\d{3,4}	Various spacing patterns
NI numbers	[A-CEGHJ-PR-TW-Z]{2}\d{6}[A-D]	Standard format
UK postcodes	[A-Z]{1,2}\d[A-Z\d]?\s?\d[A-Z]{2}	All valid formats
Credit cards	\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}	With optional separators

Limitations of Pattern Matching:

Patterns miss variations and errors in source data
Over-inclusive patterns generate false positives requiring manual review
Cannot identify personal data that doesn't match patterns (names, descriptions)
International formats may differ significantly

Named Entity Recognition (NER)

AI-based identification addresses pattern matching limitations:

Capabilities:

Name identification regardless of format or cultural origin
Address recognition beyond postcode patterns
Contextual identification (job titles that effectively identify individuals)
Relationship inference (identifying data through context)

Advantages Over Patterns:

Catches variations that patterns miss
Identifies personal data based on meaning, not just format
Learns from corrections, improving over time
Handles multiple languages in multilingual document sets

The Hybrid Approach

Effective bulk redaction combines techniques:

Pattern Matching: Automatically redact known identifier formats (NI numbers, account numbers, postcodes)
Named Entity Recognition: Identify names, addresses, organisations, and other contextual personal data
Dictionary Matching: Organisation-specific terms (project code names, internal identifiers)
Human Review: Verify AI suggestions, address edge cases, make judgment calls
Quality Control: Statistical sampling of processed documents to verify accuracy

AI-powered document processing and redaction system — Modern redaction platforms combine pattern matching with AI-powered entity recognition

Verification and Quality Control

Redaction Verification Protocol

Text Layer Verification:

Select all text and copy—does any supposedly redacted text appear?
Search for patterns that should have been redacted (phone formats, email formats)
Search for specific terms from the redaction log

Metadata Verification:

Run metadata extraction tools on processed documents
Check document properties panels
Verify no hidden content, revision history, or comments remain

Visual Verification:

Review documents for visible personal data
Check images and embedded objects
Verify consistent redaction appearance across documents

Statistical Quality Control

For large document sets, sample-based QC provides efficient verification:

Sample Selection:

Random selection across document types
Stratification by custodian, date range, document type
Statistical sample sizes for desired confidence levels

Review Protocol:

Full manual review of every sample document
Check for missed personal data and redaction failures
Document error types and rates

Error Response:

If sample reveals errors, investigate root cause
Determine if errors are systematic or isolated
Re-process affected document populations
Expand sample or implement full re-review if error rates unacceptable

Common Redaction Failures

Failure 1: Visual-Only Redaction

The Problem: Black boxes that can be removed or text that remains searchable beneath visual obscuring.

The Consequence: Personal data remains fully accessible to anyone who knows to look, defeating the entire purpose of redaction.

The Prevention: Use true redaction tools that remove underlying text; verify by attempting extraction after processing.

Failure 2: Metadata Neglect

The Problem: Visible text properly redacted but metadata containing personal data untouched.

The Consequence: The 127 documents with NI numbers in metadata—precisely what happened to the firm in our opening example.

The Prevention: Include metadata processing in standard workflow; verify with inspection tools.

Failure 3: Context Revelation

The Problem: Redacting direct identifiers while leaving context that identifies individuals.

Example: "The CEO of [COMPANY], [REDACTED], stated in his interview..." when only one person holds that position.

The Prevention: Consider whether surrounding context permits identification; redact contextual information if necessary.

Failure 4: Inconsistent Application

The Problem: Same name redacted in some documents but not others in the same production.

The Consequence: The unredacted instances defeat all the redacted ones, and the inconsistency may raise additional questions.

The Prevention: Maintain comprehensive redaction lists applied across entire document sets; implement cross-document verification.

Quality control checklist for document redaction — Systematic QC catches redaction failures before documents leave your control

RUNO's Review & Redaction Tools

RUNO's Review & Redaction module addresses the full spectrum of GDPR-compliant redaction requirements:

AI-Powered Identification: The platform combines pattern matching with named entity recognition to identify personal data automatically. UK-specific patterns for NI numbers, NHS numbers, postcodes, and other identifiers ensure comprehensive coverage for UK documents.

True Redaction Processing: Redaction tools remove underlying text—not just overlay it—with automated verification that removed content is no longer accessible through any extraction method.

Metadata Processing: Integrated metadata handling strips or sanitises document metadata as part of the standard workflow, ensuring hidden personal data doesn't survive the redaction process.

Batch Processing: Redaction rules can be applied consistently across entire document populations, ensuring the inconsistent application failures that plague manual processes are eliminated.

Audit Trail: Complete records document what was redacted, when, and by whom—supporting GDPR accountability requirements and providing defensibility documentation for any regulatory inquiry.

Conclusion: The Cost of Getting It Wrong

GDPR-compliant redaction requires systematic processes that address text, metadata, images, and context. The 127 documents with unredacted NI numbers weren't a failure of intent—the firm wanted to comply. They were a failure of process—relying on visual review that couldn't catch what eyes couldn't see.

The investment in proper redaction processes and tools is modest compared to the consequences of failure: regulatory enforcement, reputational damage, professional liability exposure, and the fundamental breach of trust that occurs when personal data is disclosed inappropriately.

When personal data must be shared, proper redaction protects both data subjects and the organisations handling their information. The tools and techniques exist. The question is whether your organisation will implement them before or after an incident forces the issue.

Explore RUNO's Review & Redaction Suite or request a demonstration to see AI-powered redaction in action.

Share this page:

#GDPR#Data Redaction#Personal Data#Document Review#Data Protection#Privacy Compliance#Legal Technology#Anonymisation