Ensuring privacy and observing ethical considerations
Language data, especially data internal to an enterprise, may contain sensitive information. Examples that come to mind right away are medical and financial data. When an application deals with these kinds of topics, it is very likely to contain sensitive information about health or finances. Information can become even more sensitive if it is associated with a specific person. This is called personally identifiable information (PII), which is defined by the United States Department of Labor as follows:
“Any representation of information that permits the identity of an individual to whom the information applies to be reasonably inferred by either direct or indirect means” (https://www.dol.gov/general/ppii). This is a broad and complex issue, a full treatment of which is out of the scope of this book. However, it’s worth discussing a few important points specific to NLP applications that should be considered...