Getting started with OpenRefine
OpenRefine (formerly known as Google Refine) is a formatting tool very useful in data cleansing, data exploration, and data transformation. It is an open source web application which runs directly in your computer, skipping the problem of uploading your delicate information to an external server.
To start working with OpenRefine just run the application and open a browser in the URL available at http://127.0.0.1:3333/
.
Refer to Appendix, Setting Up the Infrastructure.
Firstly, we need to upload our data and click on Create Project. In the following screenshot, we can observe our dataset, in this case, we will use monthly sales of an alcoholic beverages company. The dataset format is an MS Excel (.xlsx
) worksheet with 160 rows.
We can download the original MS Excel file and the OpenRefine project from the author's GitHub repository available at the following URL:
https://github.com/hmcuesta/PDA_Book/tree/master/Chapter2
Text facet
Text facet is a very useful tool...