After you've gotten to know your data, you may need to fix issues residing in it. This section will do a walkthrough of some methods you can use to address the problems you may encounter in your dataset.
Processing your dataset
Fixing rare and outlier values
Detecting rare and outlier values can help you determine if you have bad or irrelevant data. To proceed from where we left off in the detecting rare and outlier values section, execute the following query:
USE lahmansbaseballdb;
SELECT DISTINCT h as hits, COUNT(h) as count
FROM batting
GROUP BY hits
ORDER BY count;
The preceding query will give you a list of each distinct value in the h (hits) column and a count of those hits, as shown in the following two screenshots...