Categorization analysis of unstructured messages
Imagine that you are troubleshooting a problem by looking at a particular log file. You see a line in the log that looks like the following:
   18/05/2020 15:16:00 DB Not Updated [Master] Table
Unless you have some intimate knowledge about the inner workings of the application that created this log, you may not know whether the message is important. Having the database be Not Updated
possibly sounds like a negative situation. However, if you knew that the application routinely writes this message, day in and day out, several hundred times per hour, then you would naturally realize that this message is benign and should possibly be ignored, because clearly the application works fine every day despite this message being written to the log file.
The problem, obviously, is one of human interpretation. Inspection of the text of the message and the reading of a negative phrase (Not Updated
) potentially biases a person...