Preface
Forensics is an important topic for law enforcement, civil litigators, corporate investigators, academics, and other professionals who deal with complex digital investigations. Digital forensics has played a major role in some of the largest criminal and civil investigations of the past two decades—most notably, the Enron investigation in the early 2000s. Forensics has been used in many different situations. From criminal cases, to civil litigation, to organization-initiated internal investigations, digital forensics is the way data becomes evidence—sometimes, the most important evidence—and that evidence is how many types of modern investigations are solved.
The increased usage of Big Data solutions, such as Hadoop, has required new approaches to how forensics is conducted, and with the rise in popularity of Big Data across a wide number of organizations, forensic investigators need to understand how to work with these solutions. The number of organizations who have implemented Big Data solutions has surged in the past decade. These systems house critical information that can provide information on an organization's operations and strategies—key areas of interest in different types of investigations. Hadoop has been the most popular of the Big Data solutions, and with its distributed architecture, in-memory data storage, and voluminous data storage capabilities, performing forensics on Hadoop offers new challenges to forensic investigators.
A new area within forensics, called Big Data forensics, focuses on the forensics of Big Data systems. These systems are unique in their scale, how they store data, and the practical limitations that can prevent an investigator from using traditional forensic means. The field of digital forensics has expanded from primarily dealing with desktop computers and servers to include mobile devices, tablets, and large-scale data systems. Forensic investigators have kept pace with the changes in technologies by utilizing new techniques, software, and hardware to collect, preserve, and analyze digital evidence. Big Data solutions, likewise, require different approaches to analyze the collected data.
In this book, the processes, tools, and techniques for performing a forensic investigation of Hadoop are described and explored in detail. Many of the concepts covered in this book can be applied to other Big Data systems—not just Hadoop. The processes for identifying and collecting forensic evidence are covered, and the processes for analyzing the data as part of an investigation and presenting the findings are detailed. Practical examples are given by using LightHadoop and Amazon Web Services to develop test Hadoop environments and perform forensics against them. By the end of the book, you will be able to work with the Hadoop command line and forensic software packages and understand the forensic process.