Exercise – extracting CTI information from X data
Social media can be a fascinating and useful source of information for discovering new vulnerabilities and investigating trends in the threat landscape through acquiring data that are publicly released by experts or organizations. X is a way to quickly publish short text content and is often used to share news about new vulnerabilities.
However, the volume of data from X can pose a challenge, as we try to retrieve information from text and find what’s important for our analysis. Tools that can help us automate this analysis and increase its accuracy in potentially noisy data are always welcome. Intelligent data processing is exactly what AI methods excel at, and given that the data is textual, we can apply methods that are used in NLP, which is a discipline that also intersects with AI.
In the next subsection, we describe the preprocessing of data as a preceding step to applying NLP methods.