You're reading from Privacy-Preserving Machine Learning A use-case-driven approach to building and protecting ML pipelines from privacy and security threats

Product type Paperback

Published in May 2024

Publisher Packt

ISBN-13 9781800564671

Length 402 pages

Edition 1st Edition

Languages

Python

Tools

Federated Learning

Concepts

Machine Learning

Author (1):

Srinivasa Rao Aravilli

View More author details

Table of Contents (17) Chapters

Preface

1. Part 1: Introduction to Data Privacy and Machine Learning FREE CHAPTER

2. Chapter 1: Introduction to Data Privacy, Privacy Breaches, and Threat Modeling

3. Chapter 2: Machine Learning Phases and Privacy Threats/Attacks in Each Phase

4. Part 2: Use Cases of Privacy-Preserving Machine Learning and a Deep Dive into Differential Privacy

5. Chapter 3: Overview of Privacy-Preserving Data Analysis and an Introduction to Differential Privacy

6. Chapter 4: Overview of Differential Privacy Algorithms and Applications of Differential Privacy

7. Chapter 5: Developing Applications with Differential Privacy Using Open Source Frameworks

8. Part 3: Hands-On Federated Learning

9. Chapter 6: Federated Learning and Implementing FL Using Open Source Frameworks

10. Chapter 7: Federated Learning Benchmarks, Start-Ups, and the Next Opportunity

11. Part 4: Homomorphic Encryption, SMC, Confidential Computing, and LLMs

12. Chapter 8: Homomorphic Encryption and Secure Multiparty Computation

13. Chapter 9: Confidential Computing – What, Why, and the Current State

14. Chapter 10: Preserving Privacy in Large Language Models

15. Index

Why subscribe?

16. Other Books You May Enjoy

What this book covers

Chapter 1, Introduction to Data Privacy, Privacy Breaches, and Threat Modeling, serves as an introduction to various aspects of data privacy. We begin by exploring the concept of data privacy and distinguishing between sensitive data and personal sensitive data. Additionally, we delve into the realm of data privacy regulations, highlighting their significance in safeguarding individuals’ information. The chapter also introduces the concept of privacy by design, emphasizing its importance in ensuring privacy throughout the data life cycle. Furthermore, we examine the real-world implications of privacy breaches by discussing notable cases and the resulting fines imposed on major enterprise companies. These examples shed light on the consequences of failing to protect sensitive data adequately. The chapter then delves into privacy threat modeling using the LINDDUN framework. We explain the concepts of linkability and identifiability threats, providing illustrative examples to enhance understanding. By covering these topics comprehensively, this chapter sets the foundation for a deeper exploration of privacy-preserving techniques and methodologies discussed throughout the book. It equips you with the necessary knowledge to understand the importance of data privacy, the risks associated with privacy breaches, and the strategies employed to mitigate these risks while enabling data analysis and utilization.

Chapter 2, Machine Learning Phases and Privacy Threats/Attacks in Each Phase, provides an overview of various types of machine learning, including supervised, unsupervised, and reinforcement learning, along with an exploration of the machine learning phases and pipeline. It provides various formats to persist ML models and challenges in model persistence as well. Additionally, it highlights the importance of privacy considerations at each phase of the machine learning process. We delve into the privacy needs associated with different phases of machine learning, namely, training data privacy, input data privacy, model privacy, and inference/output data privacy. The chapter proceeds by examining privacy attacks specific to each phase. We focus on the threats posed to training data, model persistence, and inference processes. We delve into model inversion attacks, model inference attacks, and training data extraction attacks, providing detailed examples to illustrate how these attacks work using open source frameworks.

Chapter 3, Overview of Privacy-Preserving Data Analysis and Introduction to Differential Privacy, serves as an introduction to privacy-preserving data analysis, privacy-enhanced technologies, and the concept of differential privacy. These topics are explored to provide a foundation for understanding and implementing privacy-preserving measures in data analysis and machine learning. This chapter also covers reconstruction attacks on SQL, explores practical use cases, and discovers how to prevent such attacks using the Open Diffix framework, which provides robust privacy protection. Specifically, the concept of differential privacy is explored in detail including privacy loss, privacy budgets, differential privacy mechanisms, and local/global differential privacy.

Chapter 4, Differential Privacy Algorithms and Limitations of Differential Privacy, deep dives into various algorithms (Laplace, Gaussian, count, sum, mean, variance, standard deviation, and thresholding algorithms) used in differential privacy and the limitations of differential privacy.

Chapter 5, Developing Applications with Differential Privacy Using Open Source Frameworks, provides a deep dive into differential privacy (DP) using open source frameworks and the implementation of a fraud detection use case using ML and DL open source frameworks. This chapter also provides an overview of real-world applications that make use of DP.

Chapter 6, Need for Federated Learning and Implementing Federated Learning Using Open Source Frameworks, covers the importance of federated learning (FL) and addresses the privacy concerns associated with sending data to a central server for model training. It explores the concepts of independent and identically distributed (IID) and non-IID datasets, along with the different categories of non-IID data. Understanding these data characteristics is crucial for effectively implementing FL. Furthermore, it discusses FL techniques (FedAvg, FedYogi, FedSGD, etc.) and introduces available open source frameworks that support FL implementations. It also implements a use case in the financial domain with FL using the Flower open source framework.

Chapter 7, Federated Learning Benchmarks, Start-Ups, and Next Opportunities, focuses on FL datasets and benchmarks, and a comparison of both. It delves into the available FL benchmarks and provides insights into how they can be used for evaluating FL algorithms and techniques. Additionally, it discusses the process of selecting the most appropriate FL benchmarks for your specific project, considering factors such as data characteristics and evaluation criteria. It also explores the state-of-the-art research in FL, highlighting the latest advancements, methodologies, and challenges in this field, and sheds light on start-ups that are actively working on FL and their specific focus areas.

Chapter 8, Homomorphic Encryption and Secure Multiparty Computation, explores various privacy-enhancing techniques, including encryption, anonymization, and de-identification. It discusses the principles and limitations of these techniques, understanding their effectiveness in safeguarding sensitive data while preserving data utility. The concept of homomorphic encryption (HE) and its mathematical foundations are covered along with an exploration of how HE can be applied in machine learning scenarios, allowing computations to be performed directly on encrypted data without compromising privacy. Furthermore, we discuss secure multiparty computation (SMC) and its use cases and present a use case implementation using the private set interaction (PSI) SMC technique. At the end of the chapter, we provide a high-level overview of zero-knowledge proof (ZKP), a cryptographic protocol that enables one party to prove knowledge of certain information without revealing the information itself.

Chapter 9, Confidential Computing – What, Why, and Current State, delves into privacy and security attacks that target data stored in memory. We discuss the vulnerabilities and potential risks associated with such attacks, highlighting the importance of protecting data throughout its life cycle. We introduce the concept of confidential computation, focusing on trusted execution environments (TEEs), and explore the concept of attestation of source code and how it aids in mitigating insider threat attacks. By verifying the integrity and authenticity of source code, organizations can establish trust and ensure that malicious actors cannot compromise the security of their systems. Additionally, we compare the support for secure enclaves in major cloud service providers such as AWS, Azure, GCP, and Anjuna. We assess the capabilities, features, and security measures offered by these providers, enabling you to make informed decisions when choosing a platform for deploying applications that require secure enclaves.

Chapter 10, Privacy Preserving in Large Language Models, introduces generative AI and the fundamentals of LLMs, the privacy vulnerabilities associated with them, and the technologies and approaches to preserve privacy while using these models. This chapter covers developing LLM applications using open source LLMs and protecting them from privacy attacks (prompt injection attacks, membership inference attacks, etc.) and ends with state-of-the-art privacy research on LLMs.