Preface
Crafting machines that can learn from data to perform intelligent decisions is becoming the dominant paradigm in many areas of technology. Acquiring the necessary skill set to perform this task will definitely boost your career. Machine Learning Techniques for Text aims to help you in this endeavor, focusing specifically on text data and human language. The book will show you how to analyze text data, get started with machine learning, and work effectively with the Python libraries often used for these tasks, such as pandas, NumPy, matplotlib, seaborn, and scikit-learn. You will also have the opportunity to work with state-of-the-art deep learning frameworks such as TensorFlow, Keras, and PyTorch.
There is a plethora of resources for mastering the field of machine learning for text, including complex theoretical concepts often expressed in a demanding mathematical language. Conversely, other resources focus disproportionately on Python code, and the theoretical foundations behind the design choices remain shallow. This book steers a middle path to keep the right balance between theory and practice. A good metaphor the book’s content builds upon is the relationship between an experienced craftsperson and their trainee. Based on the problem, the craftsperson picks a tool from the toolbox, explains its utility, and puts it into action. This approach will help you to identify at least one practical usage for the method or technique presented.
In each chapter, we focus on one specific case study using real-world datasets. For that reason, the book is solution oriented, and it’s accompanied by Python code in the form of Jupyter notebooks to help you obtain hands-on experience. This case study approach will allow you to engage more readily in learning and not just passively absorb information. Each time, the problem statement is set from the beginning, and everybody is aware of the challenge. Even if the discussion temporarily diverts from the principal aim, for instance, presenting some fundamental concept, you will be easily reoriented on the problem under study. A recurring pattern in the chapters is that we first try to gain some intuition on the data and then implement and contrast various solutions.
By the end of this book, you’ll be able to understand and apply various techniques with Python for text preprocessing, text representation, dimensionality reduction, machine learning, language modeling, visualization, and evaluation. This diverse skillset will allow you to work on similar problems seamlessly.