In this example, we copy a number of large files drawn from the open source GeoNames project representing world city data and postal codes. The data provided by the project, authored by Marc Wick, is licensed under a Creative Commons Attributions v4.0 license. In the book repository, there is a script, /path/to/repo/sample_data/geonames_downloads.sh, that imports and unzips two large GeoNames project sample data files. Here is that download script:
#!/bin/bash
export STATS_URL="https://download.geonames.org/export/dump/allCountries.zip"
export ZIP_URL="https://download.geonames.org/export/zip/allCountries.zip"
wget -O /tmp/stats.zip $STATS_URL
wget -O /tmp/postcodes.zip $ZIP_URL
unzip /tmp/stats.zip -d /tmp
mv /tmp/allCountries.txt /tmp/allCountriesStats.txt
unzip /tmp/postcodes.zip -d /tmp
mv /tmp/allCountries.txt /tmp/allCountriesPostcodes.txt
rm /tmp/*.zip
ls -l /tmp/all*
Here is a list of the files to be copied into GridFS:
As you can see, most...