Redacting PII with NER models and regex patterns
The process of redacting PII often requires a multi-faceted approach to ensure that sensitive data is accurately identified and removed from various data sources. Two key techniques used to redact PII are Named Entity Recognition (NER) models and regular expressions (regex) patterns. Combining these methods can help identify a broad range of PII types and ensure comprehensive data protection.
NER models
NER is a Natural Language Processing (NLP) technique used to identify and classify named entities, such as names of people, locations, organizations, and other specific information within text data. NER models can be particularly useful in redacting PII, as they can identify entities that do not follow a common pattern or structure. This enables the detection and redaction of less predictable PII types, which may be more difficult to identify using regex patterns alone.
Machine learning models, such as those based on BERT (Bidirectional...