Exploring and preparing data
In the first step of the ML process, we must explore and prepare the data in Snowflake using Snowpark to make it available for training the ML models. We will work with the Bike Sharing dataset from Kaggle, which offers an hourly record of rental data for 2 years. The primary objective is to forecast the number of bikes rented each hour for a specific timeframe based solely on the information available before the rental period. In essence, the model will harness the power of historical data to predict future bike rental patterns using Snowpark. More information about the particular dataset has been provided in the respective GitHub repository (https://github.com/PacktPublishing/The-Ultimate-Guide-To-Snowpark).
Data exploration allows us to dissect the data to uncover intricate details that might otherwise stay hidden, acting as the foundation for our entire analysis. We will start the process by loading the dataset into a Snowpark DataFrame:
df_table...