This section covers the benchmarking strategy for mixed loads on clusters such as MapReduce history job profiling and other production job profiling.
Mix-workloads
Rumen
Apache Rumen is the tool that parses your MapReduce job history logs. It outputs meaningful and easily readable text. The output from this job is used in other benchmarking tools like YARN Scheduler Load Simulator or Gridmix. It has the following two parts:
- Tracebuilder: Converts Hadoop job history logs to an easily parsable format, JSON. The following is the command to run Tracebuilder (Ref: Hadoop3 Documentation):
hadoop rumentrace [options] <jobtrace-output> <topology-output> <inputs>
<jobtrace-output> - Location...