Importing CSV data into Neo4j with Cypher
The comma-separated values (CSV) file format is the most widely used to share data among data scientists. According to the dataset of Kaggle datasets (https://www.kaggle.com/datasets/morriswongch/kaggle-datasets), this format represents more than 57% of all datasets in this repository, while JSON files account for less than 10%. It is popular for the following reasons:
- How it resembles the tabular data storage format (relational databases)
- Its closeness to the machine learning world of vectors and matrices
- Its readability – you usually just have to read column names to understand what it is about (of course, a more detailed description is required to understand how the data was collected, the unit of physical quantities, and so on) and there are no hidden fields (compared to JSON, where you can only have a key existing from the 1,000th record and later, which is hard to know without a proper description or advanced data...