You're reading from Mastering NLP from Foundations to LLMs Apply advanced rule-based techniques to LLMs and solve real-world business problems using Python

Product type Paperback

Published in Apr 2024

Publisher Packt

ISBN-13 9781804619186

Length 340 pages

Edition 1st Edition

Languages

Python

Concepts

Deep Learning

Authors (2):

Meysam Ghaffari

Lior Gazit

View More author details

Table of Contents (14) Chapters

Preface

1. Chapter 1: Navigating the NLP Landscape: A Comprehensive Introduction FREE CHAPTER

2. Chapter 2: Mastering Linear Algebra, Probability, and Statistics for Machine Learning and NLP

3. Chapter 3: Unleashing Machine Learning Potentials in Natural Language Processing

4. Chapter 4: Streamlining Text Preprocessing Techniques for Optimal NLP Performance

5. Chapter 5: Empowering Text Classification: Leveraging Traditional Machine Learning Techniques

6. Chapter 6: Text Classification Reimagined: Delving Deep into Deep Learning Language Models

7. Chapter 7: Demystifying Large Language Models: Theory, Design, and Langchain Implementation

8. Chapter 8: Accessing the Power of Large Language Models: Advanced Setup and Integration with RAG

9. Chapter 9: Exploring the Frontiers: Advanced Applications and Innovations Driven by LLMs

10. Chapter 10: Riding the Wave: Analyzing Past, Present, and Future Trends Shaped by LLMs and AI

11. Chapter 11: Exclusive Industry Insights: Perspectives and Predictions from World Class Experts

12. Index

Why subscribe?

13. Other Books You May Enjoy

Introduction to linear algebra

Let’s start by first understanding scalars, vectors, and matrices:

Scalars: A scalar is a single numerical value that usually comes from the real domain in most ML applications. Examples of scalars in NLP include the frequency of a word in a text corpus.
Vectors: A vector is a collection of numerical elements. Each of these elements can be termed as an entry, component, or dimension, and the count of these components defines the vector’s dimensionality. Within NLP, a vector could hold components related to elements such as word frequency, sentiment ranking, and more. NLP and ML are two domains that have reaped substantial benefits from mathematical disciplines, particularly linear algebra and probability theory. These foundational tools aid in evaluating the correlation between variables and are at the heart of numerous NLP and ML models. This segment presents a detailed primer on linear algebra and probability theory, along with their practical usage in NLP and ML. For instance, a text document’s three-dimensional vector representation might be expressed as a real-number array, such as [word frequency, sentiment ranking, complexity].
Matrices: A matrix can be perceived as a rectangular collection of numerical elements composed of rows and columns. To retrieve an element from the matrix, one needs to denote its row and column indices. In the field of NLP, a data matrix might include rows that align with distinct text documents and columns that align with different text attributes, such as word frequency, sentiment, and so on. The dimensions of such a matrix are represented by the notation n × d, where n is the number of rows (i.e., text documents), and d is the number of columns (i.e., attributes).

Let’s move on to the basic operations for scalars, vectors, and matrices next.

The basic operations for scalars, vectors, and matrices—addition and subtraction—can be carried out on vectors with the same dimensions. Let’s have two vectors:

For example, if we have two vectors, a = [4,1] and b = [2,4], then a + b = [6,5].

Let’s visualize this as follows:

Figure 2.1 – Adding two vectors (a = [4,1] and b = [2,4]) means that a + b = [6,5]

It is possible to scale a vector by multiplying it by a scalar. This operation is performed by multiplying each component of the vector by the scalar value. For example, let’s consider a n-dimensional vector, <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><mi mathvariant="bold">x</mi><mo>=</mo><mfenced open="[" close="]"><mrow><msub><mi>x</mi><mn>1</mn></msub><mo>,</mo><msub><mi>x</mi><mn>2</mn></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mi>x</mi><mi>n</mi></msub></mrow></mfenced></mrow></mrow></math> . The process of scaling this vector by a factor of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>a</mml:mi></mml:math> can be represented mathematically as follows:

This operation results in a new vector that has the same dimensionality as the original vector but with each component multiplied by the scalar value <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>a</mml:mi></mml:math> .

There are two types of multiplications between vectors: dot product ( <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mo>∙</mml:mo></mml:math> ) and cross product (). The dot product is the one we use often in ML algorithms.

The dot product is a mathematical operation that can be applied to two vectors, and . It has many practical applications, one of which is to help determine their similarity. It is defined as the sum of the product of the corresponding elements of the two vectors. The dot product of x and y is represented by the symbol x y (having a dot in the middle) and is defined as follows:

where n represents the dimensionality of the vectors. The dot product is a scalar quantity and can be used to measure the angle between two vectors, as well as the projection of one vector onto another. It also serves a vital function in numerous ML algorithms, including linear regression and neural networks.

The dot product is commutative, meaning that the order of the vectors does not affect the result. This means that x y = y x. Furthermore, the dot product maintains the distributive property of scalar multiplication, implying the following:

The dot product of a vector with itself is also known as its squared norm or Euclidean norm. The norm, symbolized by 𝑛𝑜𝑟𝑚(x), signifies the length of the vector and is computed as

The normalization of vectors can be achieved by dividing them by their norm, also known as the Euclidean norm or the length of the vector. This results in a vector with a unit length, denoted by x’. The normalization process can be shown as

$<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><mrow><mi mathvariant="bold">x</mi><mo>′</mo></mrow><mspace width="0.25em" /><mo>=</mo><mspace width="0.25em" /><mfrac><mi mathvariant="bold">x</mi><mfenced open="‖" close="‖"><mi mathvariant="bold">x</mi></mfenced></mfrac><mspace width="0.25em" /><mo>=</mo><mspace width="0.25em" /><mfrac><mi mathvariant="bold">x</mi><msqrt><mrow><mi mathvariant="bold">x</mi><mo>∙</mo><mi mathvariant="bold">x</mi></mrow></msqrt></mfrac></mrow></mrow></math>$

where <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi mathvariant="bold">x</mml:mi></mml:math> is the original vector and represents its norm. It should be noted that normalizing a vector has the effect of retaining its direction while setting its length to 1, allowing the meaningful comparison of vectors in different spaces.

The cosine similarity between two vectors <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><mi mathvariant="bold-italic">x</mi><mo>=</mo><mspace width="0.25em" /><mfenced open="[" close="]"><mrow><msub><mi>x</mi><mn>1</mn></msub><mo>,</mo><msub><mi>x</mi><mn>2</mn></msub><mo>,</mo><mspace width="0.25em" /><mo>…</mo><mo>,</mo><msub><mi>x</mi><mi>n</mi></msub><mspace width="0.25em" /></mrow></mfenced></mrow></mrow></math> and <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><mi mathvariant="bold-italic">y</mi><mo>=</mo><mspace width="0.25em" /><mfenced open="[" close="]"><mrow><msub><mi>y</mi><mn>1</mn></msub><mo>,</mo><msub><mi>y</mi><mn>2</mn></msub><mo>,</mo><mspace width="0.25em" /><mo>…</mo><mo>,</mo><msub><mi>y</mi><mi>n</mi></msub><mspace width="0.25em" /></mrow></mfenced></mrow></mrow></math> is mathematically represented as the dot product of the two vectors after they have been normalized to unit length. This can be written as follows:

$<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><mi>C</mi><mi>o</mi><mi>s</mi><mfenced open="(" close=")"><mrow><mi mathvariant="bold">x</mi><mo>,</mo><mi mathvariant="bold">y</mi></mrow></mfenced><mspace width="0.25em" /><mo>=</mo><mspace width="0.25em" /><mfrac><mfenced open="(" close=")"><mrow><mi mathvariant="bold">x</mi><mo>∙</mo><mi mathvariant="bold">y</mi></mrow></mfenced><mfenced open="(" close=")"><mrow><mfenced open="‖" close="‖"><mi mathvariant="bold">x</mi></mfenced><mo>∙</mo><mfenced open="‖" close="‖"><mi mathvariant="bold">y</mi></mfenced></mrow></mfenced></mfrac><mspace width="0.25em" /><mo>=</mo><mspace width="0.25em" /><mfrac><mfenced open="(" close=")"><mrow><mi mathvariant="bold">x</mi><mo>∙</mo><mi mathvariant="bold">y</mi></mrow></mfenced><mfenced open="(" close=")"><mrow><msqrt><mrow><mi mathvariant="bold">x</mi><mo>∙</mo><mi mathvariant="bold">x</mi></mrow></msqrt><mo>∙</mo><msqrt><mrow><mi mathvariant="bold">y</mi><mo>∙</mo><mi mathvariant="bold">y</mi></mrow></msqrt></mrow></mfenced></mfrac></mrow></mrow></math>$

where <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mfenced open="‖" close="‖" separators="|"><mml:mrow><mml:mi mathvariant="bold">x</mml:mi></mml:mrow></mml:mfenced></mml:math> and are the norms of the vectors x and y, respectively. This computed cosine similarity between x and y is equivalent to the cosine of the angle between the two vectors, denoted as θ.

Vectors with a dot product of 0 are deemed orthogonal, implying that in the case of having both non-0 vectors, the angle between them is 90 degrees. We can conclude that a 0 vector is orthogonal to any vector. A group of vectors is considered orthogonal if each pair of them is orthogonal and each vector possesses a norm of 1. Such orthonormal sets prove to be valuable in numerous mathematical contexts. For instance, they come into play when transforming between different orthogonal co-ordinate systems, where the new co-ordinates of a point are computed in relation to the modified direction set. This approach, known as co-ordinate transformation in the field of analytical geometry, finds widespread application in the realm of linear algebra.

Basic operations on matrices and vectors

Matrix transpose is the process of obtaining the transpose of a matrix and involves interchanging its rows and columns. This means that the element originally at the (i, j)th position in the matrix now occupies the (j, i)th position in its transpose. As a result, a matrix that was originally of size n × m becomes an m × n matrix when transposed. The notation used to represent the transpose of matrix X is <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msup><mml:mrow><mml:mi mathvariant="bold">X</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:math> . Here’s an illustrative example of a matrix transposition operation:

Crucially, the transpose <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msup><mml:mrow><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi mathvariant="bold">X</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:math> of matrix <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msup><mml:mrow><mml:mi mathvariant="bold">X</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:math> reverts to the original matrix X. Moreover, it is clear that row vectors can be transposed into column vectors and vice versa. Additionally, the following holds true for both matrices and vectors:

It’s also noteworthy that dot products are commutative for matrices and vectors:

Matrix definitions

In this section, we’ll cover the different type of matrix definitions:

Symmetric matrix: A symmetric matrix is a type of square matrix where the transpose of the matrix is equal to the original matrix. In mathematical terms, if a matrix X is symmetric, then . For example,

is symmetric.

Rectangular diagonal matrix: This is a matrix that is m × n in dimensions, with non-0 values only on the main diagonal.
Upper (or Lower) triangular matrix: A matrix is called an upper (triangular) matrix if all the entries (i,j) below (above) its main diagonal are 0. Next, we are going to describe matrix operations.

Determinants

The determinant of a square matrix provides a notion of its impact on the volume of a d-dimensional object when multiplied by its co-ordinate vectors. The determinant, symbolized as det(A), represents the (signed) volume of the parallelepiped formed by the row or column vectors of the matrix. This interpretation holds consistently, as the volume determined by the row and column vectors is mathematically identical. When a diagonalizable matrix A interacts with a group of co-ordinate vectors, the ensuing distortion is termed anisotropic scaling. The determinant can aid in establishing the scale factors of this conversion. The determinant of a square matrix carries crucial insights about the linear alteration accomplished by the multiplication with the matrix. Particularly, the sign of the determinant mirrors the impact of the transformation on the basis of the system’s orientation.

Calculating determinant is given as follows:

For a 1×1 matrix A, its determinant is equivalent to the single scalar present within it.
For larger matrices, the determinant can be calculated by securing a column, j, and then broadening using the elements within that column. As another option, it’s possible to fix a row, i, and expand along that particular row. Regardless of whether you opt to fix a row or column, the end result, which is the determinant of the matrix, will remain consistent.
with j as a fixed value ranging from 1 to d,

Or, with the fixed i,

Based on the following equations, we can see that some of the cases can be easily calculated:

Diagonal matrix: For a diagonal matrix, the determinant is the product of its diagonal elements.
Triangular matrix: In the context of a triangular matrix, the determinant is found by multiplying all its diagonal elements. If all components of a matrix’s row or column are 0, the determinant is also 0.
For a 2 × 2 matrix of
Its determinant can be computed as ad - bc. If we consider a 3 × 3 matrix,