2.4. SPARQL and MapReduce
The features expected from modern RDF triple stores are reminiscent of the Big Data trend in which solutions implementing specialized data stores from scratch are rare due to the enormous development effort they require. Instead, many RDF triple stores prefer to rely on existing infrastructures based on MapReduce [DEA 04] and clusters of distributed data and computation nodes for achieving efficient parallel processing over massively distributed data sets (see section 2.4.2.1). However, these cluster infrastructures are not designed as fully-fledged data management systems [STO 10] and integrating an efficient query processor on top of them is a challenging task. In particular, data storage and communication costs generated by the evaluation of joins (including data preprocessing and indexing) over distributed data need to be addressed cautiously. This section mainly reflects the work published in [NAA 17, NAA 16].
2.4.1. MapReduce-based SPARQL processing
Given...