Physical versus remote collections
Hadoop data collection can either be performed directly on the Hadoop cluster or via remote access. Physical collections are any form of data acquisition in which the investigator is physically interacting with the cluster, typically by pulling the cluster's hard drives and imaging them. Alternatively, collections can also be performed remotely. In such cases, the investigator accesses the cluster through a network connection and acquires the data through a terminal over the network connection.
Hadoop can be run in many different designs and configurations. The Hadoop cluster can be run on physical devices with Hadoop being installed on the host operating system. Hadoop clusters can also be set up using a series of virtual machines. With the increased use of cloud computing, Hadoop also can be run as a Platform as a Service (PaaS) with the actual servers running Hadoop being masked by the abstraction of the cloud service. Additionally, Hadoop can be designed...