The efficient market hypothesis postulates that at any given time, stock prices integrate all information about a stock, and therefore, the market cannot be consistently outperformed with superior strategy or, more generally, better information. However, it can be argued that current practice in investment banking, where machine learning and statistics are built into algorithmic trading systems, contradicts this. But these algorithms can fail, as seen in the 2010 flash crash or when systemic risks are underestimated, as discussed by Roger Lowenstein in his book When Genius Failed: The Rise and Fall of Long-Term Capital Management.
In this recipe, we'll build a simple stock prediction pipeline in scikit-learn, and we'll produce probability estimates using different methods. We'll then evaluate our different approaches.
Getting ready
We'll retrieve historical stock prices using the yfinance library.
Here's how we install it...