Choosing the right tool for the job
Your choice of data processing tools will depend heavily on what kind of data processing tasks you need to accomplish. If you have a bunch of raw data that you need to transform in bulk, as in an ETL/ELT task, Data Fusion would be a good place to start your assessment, whereas if you want to perform relatively simple transformations using SQL syntax, then start with BigQuery, and if you want to visualize and transform data via an easy-to-use GUI, then go with Dataprep. If you prefer to stick to using open source tools, then you might want to use something like pandas or Spark. We discussed pandas being a good starting point for people who are beginning to learn about data exploration and preprocessing, and how it’s also more than an educational tool. pandas is really great for initial data exploration and data processing at a moderate scale. However, for large-scale data processing projects, Spark’s highly parallelized functionality...