Sachith Dassanayake Software Engineering PII classification & data retention — Cheat Sheet — Practical Guide (Feb 18, 2026)

PII classification & data retention — Cheat Sheet — Practical Guide (Feb 18, 2026)

PII classification & data retention — Cheat Sheet — Practical Guide (Feb 18, 2026)

PII classification & data retention — Cheat Sheet

body { font-family: Arial, sans-serif; line-height: 1.6; margin: 2rem; max-width: 800px; }
h2, h3 { color: #004d99; }
pre { background: #f5f7fa; padding: 1rem; border-radius: 4px; overflow-x: auto; }
code { font-family: Consolas, monospace; }
p.audience { font-weight: bold; margin-bottom: 1rem; }
p.social { margin-top: 3rem; font-style: italic; color: #555; }

PII classification & data retention — Cheat Sheet

Level: Intermediate software engineers & data professionals

Date: 18 February 2026

Prerequisites

Before you start implementing PII (Personally Identifiable Information) classification and data retention policies, ensure you have:

  • Basic understanding of GDPR, CCPA, and other relevant privacy regulations.
  • Familiarity with your data storage and processing environments (cloud providers, databases, data warehouses).
  • Access to your organisation’s Data Classification Framework (if available) or regulatory standards defining PII categories.
  • Awareness of your application’s data model and data flow, including how PII is collected, stored, transmitted, and processed.

Hands-on steps

1. Define PII Categories & Classification Levels

Classifying data correctly is the foundation of compliance and minimal data retention. Common categories include:

  • Direct identifiers: Name, Social Security Number, passport numbers.
  • Indirect identifiers: Date of birth (DOB), address, IP address (depending on context).
  • Sensitive PII: Racial or ethnic origin, biometrics, health data.

Assign classification levels such as Public, Internal, Confidential, Restricted (example extracted from NIST SP 800-122).


// Example JSON snippet for PII classification schema
{
  "piiClassificationLevels": {
    "public": [],
    "internal": ["employeeId"],
    "confidential": ["email", "phoneNumber", "address"],
    "restricted": ["socialSecurityNumber", "passportNumber", "biometricData"]
  }
}

2. Implement Automated Detection & Tagging

Use a combination of static and dynamic techniques:

  • Static data classification: Annotate your database schemas and API payloads with data types and sensitivity metadata.
  • Automated scanning tools: Utilise cloud provider tools (e.g., AWS Macie, Azure Purview, Google Cloud Data Loss Prevention) or open-source libraries for pattern matching and NLP-based entity recognition.

Example with AWS Macie (supported on AWS regions since 2018): it classifies and alerts on sensitive data in S3 buckets. For simple pattern detection in your codebase or ETL, regex is sometimes enough but error-prone for complex cases.


import re

# Simple regex example to find emails and SSNs in text
def find_pii(text):
    email_pattern = r'b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]{2,}b'
    ssn_pattern = r'bd{3}-d{2}-d{4}b'  # US SSN format, adapt as needed

    emails = re.findall(email_pattern, text)
    ssns = re.findall(ssn_pattern, text)
    return {'emails': emails, 'ssns': ssns}

3. Define and Enforce Retention Policies

Data retention should be as brief as legally permissible. Examples include:

  • Delete consumer PII after account closure plus grace period (e.g., 30–90 days).
  • Retain transactional data for tax or audit requirements (often 5–7 years).
  • Ensure pseudonymised or anonymised data sets are used for analytics to minimise PII exposure.

Implement these policies in your data lifecycle tools:

  • Database-level retention with TTL (time to live) columns/triggers (e.g., PostgreSQL pg_cron or MySQL EVENT).
  • Cloud storage lifecycle rules (e.g., S3 Object Expiration Rules).
  • Data warehouse expiry configurations (e.g., BigQuery partition expiration).

-- Example: PostgreSQL table with expiry date and periodic purge job
CREATE TABLE user_pii (
    user_id UUID PRIMARY KEY,
    pii_data JSONB,
    retention_expiry DATE
);

-- Delete expired data (run daily via cron or scheduled job)
DELETE FROM user_pii WHERE retention_expiry < CURRENT_DATE;

4. Secure Deletion & Audit Trails

Beyond deleting records, securely disposing of backups and logs that may contain PII is essential.

  • Use cryptographic erasure where possible.
  • Employ versioning and retention policies on backups.
  • Maintain an audit trail of data access and deletion using immutable logs (e.g., write-once storage or blockchain-backed logs).

Common pitfalls

  • Over- or under-classification: Treating all data as PII inflates costs and risks; ignoring indirect identifiers risks compliance breaches.
  • Hard-coding retention times: Regulations and business needs evolve — implement configurable policies.
  • Lack of holistic lifecycle view: Data replicated across environments or cached may remain beyond intended retention.
  • Ignoring international differences: PII definitions and retention limits vary by jurisdiction; consult legal experts.
  • Insufficient validation of data deletion: Without verification, data may persist undetected in shadow copies or logs.

Validation

Validation means confirming that classification and retention requirements are enforced correctly.

  • Unit and integration tests for classification functions and regexes; review false positives/negatives.
  • Automated periodic scanning of stored data to detect residual PII beyond retention periods.
  • Periodic audits by internal or external compliance teams with reports on data inventory and retention.
  • Use of synthetic data and fuzzing to ensure edge cases are classified properly.
  • Validation of data deletion via audit logs and cryptographic proof where feasible.

Checklist / TL;DR

  • ✓ Identify PII categories relevant to your domain and regulation scope.
  • ✓ Implement layered classification: schema metadata, automated discovery, manual review.
  • ✓ Define clear, configurable retention policies aligned with legal and business needs.
  • ✓ Enforce retention via database TTL, lifecycle policies, or scheduled jobs.
  • ✓ Securely erase backups and audit logs to avoid data remanence.
  • ✓ Validate detection and deletion with automated tests and audits.
  • ✓ Consider international legal differences and adjust policies accordingly.
  • ✓ Document your entire PII classification and retention process for transparency.

References

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Post