Comparing the best hitter in baseball across years
In the Finding the baseball players best at… recipe back in Chapter 5, Algorithms and How to Apply Them, we worked with a dataset that had already aggregated the performance of players from the years 2020-2023. However, comparing players based on their performance across multiple years is rather difficult. Even on a year-to-year basis, statistics that appear elite one year can be considered just “very good” in other years. The reasons for the variation in statistics across years can be debated, but likely come down to some combination of strategy, equipment, weather, and just pure statistical chance.
For this recipe, we are going to work with a more granular dataset that goes down to the game level. From there, we are going to aggregate the data up to a yearly summary, and from there calculate a common baseball statistic known as the batting average.
For those unfamiliar, a batting average is calculated...