Summary
In this chapter we explored the common datasources and implemented a web scraping example. Next, we introduced the basic concepts of data scrubbing such as statistical methods and text parsing. Then we learned about how to parse the most used text formats with Python. Finally, we presented an introduction to OpenRefine which is an excellent tool for data cleansing and data formatting. Working with data is not just code or clicks, we also need to play with the data and follow our intuition to get our data in great shape. We need to get involved in the knowledge domain of our data to find inconsistencies. Global vision of data helps us to discover what we need to know about our data.
In the next chapter, we will explore our data through some visualization techniques and we will present a fast introduction to D3js.