Suppose you just trained a nifty classifier showing some predictive power over the markets, should you start trading? Much like with the other machine learning projects we've done to date, you will need to test on an independent test set. In the past, we've often cordoned off our data into the following three sets:
- The training set
- The development set, aka the validation set
- The test set
We can do something similar to our current work, but the financial markets give us an added resource—ongoing data streams!
We can use the same data source we used for our earlier pulls and continue to pull more data; essentially, we have an ever-expanding, unseen dataset! Of course, some of this depends on the frequency of the data that we use—if we operate on daily data, it will take a while to accomplish this. Operating on hourly or per-minute data...