At this point, we now understand a couple of ways to represent/manipulate our data (matrices and vectors), and we know how to gain and understanding about our data, and how to quantify how our data looks (statistics). However, sometimes when we are developing machine learning applications, we also want to know how likely it is that a prediction is correct or how significant certain results are, given a history of results. Probability can help us answer these how likely and how significant questions.
Generally, probability has to do with the likelihood of events or observations. For example, if we are going to flip a coin to make a decision, how likely is it that we would see heads (50%), how likely is it that we would see tails (50%), or even how likely is it that the coin is a fair coin? This might seem like a trivial example, but many similar questions come up when...