Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Science for Decision Makers

You're reading from   Data Science for Decision Makers Enhance your leadership skills with data science and AI expertise

Arrow left icon
Product type Paperback
Published in Jul 2024
Publisher Packt
ISBN-13 9781837637294
Length 270 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Jon Howells Jon Howells
Author Profile Icon Jon Howells
Jon Howells
Arrow right icon
View More author details
Toc

Table of Contents (20) Chapters Close

Preface 1. Part 1: Understanding Data Science and Its Foundations
2. Chapter 1: Introducing Data Science FREE CHAPTER 3. Chapter 2: Characterizing and Collecting Data 4. Chapter 3: Exploratory Data Analysis 5. Chapter 4: The Significance of Significance 6. Chapter 5: Understanding Regression 7. Part 2: Machine Learning – Concepts, Applications, and Pitfalls
8. Chapter 6: Introducing Machine Learning 9. Chapter 7: Supervised Machine Learning 10. Chapter 8: Unsupervised Machine Learning 11. Chapter 9: Interpreting and Evaluating Machine Learning Models 12. Chapter 10: Common Pitfalls in Machine Learning 13. Part 3: Leading Successful Data Science Projects and Teams
14. Chapter 11: The Structure of a Data Science Project 15. Chapter 12: The Data Science Team 16. Chapter 13: Managing the Data Science Team 17. Chapter 14: Continuing Your Journey as a Data Science Leader 18. Index 19. Other Books You May Enjoy

Probability distributions

Probability distributions are mathematical functions that describe the likelihood of different outcomes in a random event or process. They help us understand the behavior of random variables and make predictions about future events. There are two main types of probability distributions: discrete distributions and continuous distributions.

Discrete probability distributions

Discrete probability distributions are used when the possible outcomes of a random event are countable or finite. Let’s look at some common examples of discrete probability distributions

Bernoulli distribution

This is the simplest discrete probability distribution. It models a single trial with only two possible outcomes: success (usually denoted as 1) or failure (usually denoted as 0). For example, flipping a coin has a Bernoulli distribution with a probability of success (heads) of 0.5.

Binomial distribution

This distribution models the number of successes in a fixed number of independent trials, where each trial has the same probability of success. For example, if you flip a fair coin ten times, the number of heads you observe follows a binomial distribution with parameters of n = 10 (number of trials) and p = 0.5 (probability of success).

Negative binomial distribution

This distribution models the number of failures before a specified number of successes occurs in independent trials with the same probability of success. For instance, if you’re playing a game where you need to win three times before the game ends, the number of losses before the third win follows a negative binomial distribution.

Geometric distribution

This is a special case of the negative binomial distribution where the number of successes is fixed at 1. It models the number of failures before the first success in independent trials with the same probability of success. An example would be the number of times you need to roll a die before getting a 6.

Poisson distribution

This distribution models the number of events occurring in a fixed interval of time or space, given the average rate of occurrence. It is often used to model rare events, such as the number of earthquakes in a year or the number of customers arriving at a store in an hour.

Continuous probability distributions

Continuous probability distributions are used when the possible outcomes of a random event are continuous, such as measurements or time. Let’s look at some common examples of continuous probability distributions.

Normal distribution

Also known as the Gaussian distribution, this is the most well-known continuous probability distribution. It models continuous variables that have a symmetric, bell-shaped distribution, such as heights, weights, or IQ scores. Many natural phenomena follow a normal distribution.

Standard normal distribution

This is a special case of the normal distribution with a mean of zero and a standard deviation of one. It is often used to standardize variables and compare values across different normal distributions.

Student’s t-distribution

This distribution is similar to the normal distribution but has heavier tails. It is used when the sample size is small (typically less than 30) or when the population standard deviation is unknown. It is often used in hypothesis testing and constructing confidence intervals.

Gamma distribution

This distribution models continuous variables that are positive and have a skewed right distribution. It is often used to model waiting times, such as the time until a machine fails or the time until a customer arrives.

Exponential distribution

This is a special case of the gamma distribution where the shape parameter is equal to 1. It models the time between events occurring at a constant rate, such as the time between customer arrivals or the time between radioactive particle decays.

Chi-squared distribution

This distribution is used for positive variables. It is often used in hypothesis testing and to estimate the confidence interval of a sample variance. It is also used in the chi-squared test for independence and goodness of fit.

F-distribution

This distribution is used for variables that are positive or non-negative. It is often used to test the equality of two variances or the significance of a regression model. It is the ratio of two chi-squared distributions.

Probability distributions allow us to understand and quantify the probabilities of different outcomes in a random event or process. By understanding the different types of probability distributions and their applications, data science leaders can better model and analyze their data, make informed decisions, and improve their predictions. Knowing which distribution to use in a given situation is crucial for accurate data analysis and decision-making.

You have been reading a chapter from
Data Science for Decision Makers
Published in: Jul 2024
Publisher: Packt
ISBN-13: 9781837637294
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at £16.99/month. Cancel anytime