Preparing stock data for model performance
We are almost ready to build a prediction algorithm for the stock value performance of Apple. The remaining task at hand is to prepare the data in a manner that ensures the best possible predictive outcome.
Getting ready
We will perform transformations and visualizations on the dataframe in this section. This will require importing the following libraries in Python:
numpy
MinMaxScaler()
How to do it...
This section walks through the steps for preparing the stock market data for our model.
- Execute the following script to group the year column by the
Adj Close
count:
df.groupBy(['year']).agg({'Adj Close':'count'})\ .withColumnRenamed('count(Adj Close)', 'Row Count')\ .orderBy(["year"],ascending=False)\ .show()
- Execute the following script to create two new dataframes for training and testing purposes:
trainDF = df[df.year < 2017] testDF = df[df.year > 2016]
- Convert the two new dataframes to
pandas
dataframes to get row and column counts...