Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Hands-On Mathematics for Deep Learning

You're reading from   Hands-On Mathematics for Deep Learning Build a solid mathematical foundation for training efficient deep neural networks

Arrow left icon
Product type Paperback
Published in Jun 2020
Publisher Packt
ISBN-13 9781838647292
Length 364 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Jay Dawani Jay Dawani
Author Profile Icon Jay Dawani
Jay Dawani
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Section 1: Essential Mathematics for Deep Learning
2. Linear Algebra FREE CHAPTER 3. Vector Calculus 4. Probability and Statistics 5. Optimization 6. Graph Theory 7. Section 2: Essential Neural Networks
8. Linear Neural Networks 9. Feedforward Neural Networks 10. Regularization 11. Convolutional Neural Networks 12. Recurrent Neural Networks 13. Section 3: Advanced Deep Learning Concepts Simplified
14. Attention Mechanisms 15. Generative Models 16. Transfer and Meta Learning 17. Geometric Deep Learning 18. Other Books You May Enjoy

Single variable calculus

At its core, calculus is nothing more than the study of relationships and change. Having a keen grasp of calculus will help you better understand how deep learning algorithms work and how to make them work better for you as a practitioner.

Let's move on to understanding what makes calculus such a powerful tool. We start with single variable calculus, which is about functions that take in a single input and produce a single output.

Derivatives

To start with, let's imagine a straight line with the following equation:

In the equation, the following aspects apply:

  • y is a function of x, often written simply as f(x) (which is the notation we will be predominantly using in the remainder of the book). In the preceding equation, the output value y is dependent on the input value x.
  • The m value is the gradient, which tells us how steep the straight line is, or what its rate of change is (that is, how much does a change in the x value affect the y value).
  • The value tells us whether the line is moving upward or downward.
  • The value tells us by how much the line is above or below the origin.
  • The m and b values in a straight line are constant throughout.

Now that you know what the equation of a straight line looks like, you're probably wondering how to find it for an arbitrary straight line.

We start by first picking two points, (x1, y1) and (x2, y2), that lay on the line, and plug their values into the formula . After having found the value for m, we find the value of b by using the line equation and plugging into it the value for m and one (x, y) point on the line, and solve for b.

Well, that was very simple and straightforward. However, there are far more complex equations out there that aren't as straightforward—those that relate to curves (nonlinear functions), as illustrated in the following image:

Imagine a picture of a couple of hills or camel humps. If you trace the surface of them, you will have a curve, and as you may have no doubt noticed, they go up and then down and then back up, and the process repeats itself.

From the preceding image of the curve, you can easily tell that the gradient is not constant, as it was in the previous example with the straight line. We could sketch straight lines along the curve and calculate their slopes to understand how the curve moves. However, there is a simpler method than this tedious one.

At the very core of calculus are two concepts, as follows:

  • Differentiation helps us understand how much a function output changes with respect to changing input.
  • Integration helps us understand the impact of this change in inputs between certain points.

We will begin initially by taking an in-depth look at differentiation. The primary equation for finding the derivative of a function is shown here:

I know there are a few new symbols here and it looks complicated, but it's really very simple. What this equation is doing is finding the derivative of the function f with respect to the variable in the denominator x. This isn't too different from the earlier equation we saw (which we used to calculate the gradient of a straight line). We subtract two values, f(x+h) and f(x), and divide it by its difference, h. But what does have to do with this? This tells us that we want the two points on the curve to be as close to each other as possible so that when we are sketching the gradient on the curve, it looks like a straight line at one point on the curve. This allows us to better visualize and understand the effect of the change, as can be seen in the following screenshot:

See the following example:

Now that we understand what a derivative is and how to find it for any function, let's move on to some important rules of differentiation.

Sum rule

The sum rule states that the derivative of the sum of two functions is the same as the sum of the individual derivatives of the two functions, as can be seen in the following equation:

Let's suppose we have and .

From this, we can see that the following equation, , is the same as this one:.

Power rule

The power rule helps to find the derivative of a function where the variable has an exponent. Simply put, you multiply the power by the constant in front of the variable, and reduce the power by 1. Let's see what an example of this looks like, using the power rule , as follows:

Note that not every function will have a derivative, at least not in the function's domain.

There are certain functions—such as or —that are not as straightforward as the ones we saw earlier. The function is not differentiable at x = 0 because its value is undefined. This is known as discontinuity.

The same applies to ; however, e (known as Euler's number) has a very interesting property whereby the function is equal to its derivative—that is, .

Trigonometric functions

In high school or university, you likely studied trigonometry and encountered the sine, cosine, and tangent functions. More important to us are the sine and cosine functions, which you will encounter often and which we will look at here. These functions can be seen in the following image:

Here, sine is and cosine is .

The sine and cosine functions are related, and the derivative will show us how.

If , then . However, if , then .

The derivatives create a loop, which we can see as follows:

First and second derivatives

Now that we know how to find the derivative of a function, it is important to know that we can take the derivative more than once.

The first derivative, as we know, gives us the gradient (slope of the tangent line) of a function at any given point (x) on the curve—in other words, whether the curve's altitude (that is, y or f(x)) is increasing or decreasing. A positive slope tells us f(x) is increasing as x increases and a negative slope tells us f(x) is decreasing as x increases, and a slope of 0 tells us nothing about the curve's direction, other than that it is likely at a turning point (local minimum or local maximum). This can be written as follows:

  • If , then f(x) is increasing at x = t.
  • If , then f(x) is decreasing at x = t.
  • If , then x=t is a critical point of f(x).

For example, let . The derivative of this function is shown here:

At x = 0, the derivative is 9, which tells us the function is increasing at this point. But at x = 1 the derivative is -3 telling us that the function is decreasing at this point.

The second derivative is the derivative of the derivative of the function. We write this as or . As before, where the first derivative told us whether the function was increasing or decreasing, the second derivative gives us the same information about the first derivative—whether it is increasing or decreasing.

If the second derivative is positive, then as x increases, the first derivative is increasing; and if the second derivative is negative, then as x increases, the first derivative is decreasing.

To help us visualize this, when the second derivative is positive, the curve is concave up (parabola open upward) at a point, whereas when it is negative, the curve is concave down (parabola open downward). And as before, when the second derivative is equal to zero, we learn nothing new. This point could be a local maximum, a local minimum, or an inflection point. This is written as follows:

  • If , then f(x) is concave up at x=t.
  • If , then f(x) is concave down at x=t.
  • If , then at x=t we obtain no new information about f(x).

For example, let's take the second derivative of the same function we used, as follows:

At x = 0, the second derivative is -24, which tells us the function is concave down here. But at x = 2, it is equal to 24, telling us the function is concave up.

Earlier, we learned that when x is a critical point of a function we learn nothing new about the function at that point, but we can use it to find out whether it is a local maximum or a local minimum. These rules can be written as follows:

  • If and , then f(x) has a local minimum at x=t.
  • If and , then f(x) has a local maximum at x=t.
  • If and , then at x=t we learn nothing new about f(x).

Product rule

The product rule gives us a straightforward method to find the derivative of the product of two functions. Let's take two arbitrary functions, f(x) and g(x), and multiply them. So, . The derivative is .

Let's explore this in more detail to understand how this works. Have a look at the following equation:

We can rewrite the derivative, as follows:

This can be further simplified as , which is the same as before.

Quotient rule

The quotient rule allows us to find the derivative of a function that is being divided by another function. This can be derived from the product rule. As before, we take two functions f(x) and g(x), but now, we will divide them. So, . The derivative is .

Suppose we have and . Then, we have the following:

By finding the derivatives of f(x) and g(x) and plugging them into the preceding equation, we get the following:

If we expand it, we find the derivative.

Chain rule

The chain rule applies to functions that take in another function as input. Let's consider , which is often written as and read as f of g of x. This means that the output of g(x) will become the input to the function f.

The derivative of this will be written as follows:

This is the same as .

For example, suppose we have and . We then differentiate the two functions and get and .

By plugging this into the preceding formula, we get .

Antiderivative

We now know what derivatives are and how to find them, but now, suppose we know the rate of change (F) of the population (f), and we want to find what the population will be at some point in time. What we have to do is find a function F whose derivative is f. This is known as the antiderivative, and we define it formally as a function F is called an antiderivative of f on if for all .

Suppose we have a function , then (where c is some constant), from which we can confirm that .

The following table shows some important functions and their antiderivatives that we will encounter often:

Function

Antiderivative

Let's suppose we have the following function:

We want to find its antiderivative. I know this probably looks like a difficult equation, but by using the preceding table, we can make this very easy for ourselves. Let's see how.

First, we rewrite the function so it becomes the following:

And so, the antiderivative is as follows:

To make things easier, we rewrite this, as follows:

.

And there you have it.

You may now be wondering whether or not we can find what the value of c is, and if so, how. Let's go through another example, and see how.

Suppose we have a function that is the second derivative, and we want to find the antiderivative of the antiderivative—that is, the original function. We have the following:

and and

Then, the first antiderivative is as follows:

And so, the second antiderivative is as follows:

Here, we want to find the values of c and d. We can do this simply by plugging in the preceding values and solving for the unknowns, as follows:

; therefore,

We can also do this:

; therefore,

Thus, our function looks like this:

Integrals

So far, we have studied derivatives, which is a method for extracting information about the rate of change of a function. But as you may have realized, integration is the reverse of the earlier problems.

In integration, we find the area underneath a curve. For example, if we have a car and our function gives us its velocity, the area under the curve will give us the distance it has traveled between two points.

Let's suppose we have the curve , and the area under the curve between x = a (the lower limit) and x = b (the upper limit, also written as [a, b]) is S. Then, we have the following:

The diagramatical representation of the curve is as follows:

This can also be written as follows:

In the preceding function, the following applies: , and is in the subinterval .

The function looks like this:

The integral gives us an approximation of the area under the curve such that for some, ε > 0 (ε is assumed to be a small value), the following formula applies:

Now, let's suppose our function lies both above and below the x axis, thus taking on positive and negative values, like so:

As we can see from the preceding screenshot, the portions above the x axis (A1) have a positive area, and the portions below the x axis (A2) have a negative area. Therefore, the following formula applies:

Working with sums is an important part of evaluating integrals, and understanding this requires some new rules for sums. Look at the following examples:

Now, let's explore some of the important properties of integrals, which will help us as we go deeper into the chapter. Look at the following examples:

  • , when
  • , where c is a constant

Now, suppose we have the function , which looks like this:

Then, we get the following property:

This property only works for functions that are continuous and have adjacent intervals.

The fundamental theorem of calculus

The fundamental theorem of calculus is the most important theorem in calculus and is named very appropriately since it establishes a relationship between differential calculus and integral calculus. Let's see how.

Suppose that f(x) is continuous on [a, b] and differentiable at (a, b), and that F(x) is the antiderivative of f(x). Then, we have the following:

Let's rewrite the preceding equation a bit so it becomes this equation:

All we have done here is replace x with t and b with x. And we know that F(x)-F(a) is also a function. From this, we can derive the following property:

We can derive the preceding property since F(a) is a constant and thus has the derivative zero.

By shifting our point of view a bit, we get the following function:

Therefore, we get .

In summary, if we integrate our function f and then differentiate it, we end up with the original function f.

Substitution rule

Obviously, being able to find the antiderivative of a function is important, but the anti-differentiation formulas do not tell us how to evaluate every type of integral—for example, what to do when we have functions such as the following one:

This isn't as straightforward as the examples we saw earlier. In this case, we need to introduce a new variable to help us out and make the problem more manageable.

Let's make our new variable u, and , and the differential of u is then . This changes the problem into the following:

This is clearly a lot simpler. The antiderivative of this becomes the following:

And by plugging in the original value , we get the following:

And there we have it.

This method is very useful, and works when we have problems that can be written in the following form:

If , then the following applies:

That equation might be looking somewhat similar to you. And it should. It is the chain rule from differentiation.

Areas between curves

We know that integration gives us the ability to find the area underneath a curve between two points. But now, suppose we want to find the area that lies between two graphs, as in the following screenshot:

Our region S, as we can see, lies between the curves f(x) and g(x) in between the two vertical lines x = a and x = b. Therefore, we can take an approximation of the area between the curves to be the following:

We can rewrite this as an integral, in the following form:

To visualize this better and create an intuition of what is happening, we have the following image:

Integration by parts

By now, we know that for every rule in differentiation, there is a corresponding rule in integration since they have an inverse relationship.

In the earlier section on differentiation, we encountered the product rule. In integration, the corresponding rule is known as integration by parts.

As a recap, the product rule states that if f and g are differentiable, then the following applies:

And so, in integration, this becomes the following:

We can rewrite this, as follows:

We can combine this formula with the fundamental theorem of calculus and obtain the following equation:

We can use this to evaluate the integral between the interval [a, b].

Note: The term merely states that we plug in the value b in place of x and evaluate it, and then subtract it from the evaluation at a.

We can also use the preceding substitution method for integration by parts to make our lives easier when calculating the integral. We make and ; then, the differentials are and . And so, the formula becomes this:

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime