Modeling topics discussed in earnings calls
In Chapter 3, Alternative Data for Finance – Categories and Use Cases, we learned how to scrape earnings call data from the SeekingAlpha site. In this section, we will illustrate topic modeling using this source. I'm using a sample of some 700 earnings call transcripts between 2018 and 2019. This is a fairly small dataset; for a practical application, we would need a larger dataset.
The directory earnings_calls
contains several files with the code examples used in this section. Refer to the notebook lda_earnings_calls
for details on loading, exploring, and preprocessing the data, as well as training and evaluating individual models, and the run_experiments.py
file for the experiments described next.
Data preprocessing
The transcripts consist of individual statements by company representatives, an operator, and a Q&A session with analysts. We will treat each of these statements as separate documents, ignoring operator...