Using regular expressions
For research, you may need to download data from open-access websites or authentication-required databases. These data sources provide data in various formats, and most of the data supplied are very likely well-organized. For example, many economic and financial databases provide data in the CSV format, which is a widely supported text format to represent tabular data. A typical CSV format looks like this:
id,name,score 1,A,20 2,B,30 3,C,25
In R, it is convenient to call read.csv()
to import a CSV file as a data frame with the right header and data types because the format is a natural representation of a data frame.
However, not all data files are well organized, and dealing with poorly organized data is painstaking. Built-in functions such as read.table()
and read.csv()
work in many situations, but they may not help at all for such format-less data.
For example, if you need to analyze raw data (messages.txt
) organized in a CSV-like format as shown...