Splitting a string on lines, words, or arbitrary tokens
Useful data is often interspersed between delimiters, such as commas or spaces, making string splitting vital for most data analysis tasks.
Getting ready
Create an input.txt
file similar to the following one:
$ cat input.txt first line second line words are split by space comma,separated,values or any delimiter you want
Install the split
package using Cabal as follows:
$ cabal install split
How to do it...
The only function we will need is
splitOn
, which is imported as follows:import Data.List.Split (splitOn)
First we split the string into lines, as shown in the following code snippet:
main = do input <- readFile "input.txt" let ls = lines input print $ ls
The lines are printed in a list as follows:
[ "first line","second line" , "words are split by space" , "comma,separated,values" , "or any delimiter you want"]
Next, we separate a string on spaces as follows:
let ws = words $ ls !! 2 print ws
The words are printed in a list...