Manipulating string data
Character-typed strings are standard in real-life data, such as name and address. Analyzing string data requires properly cleaning the raw characters and converting the information embedded in a blob of textual data into a quantifiable numeric summary. For example, we may want to find the matching names of all students that follow a specific pattern.
This section will cover different ways to define patterns via regular expressions to detect, split, and extract string data. Let’s start with the basics of strings.
Creating strings
A string is a character-typed variable that is represented by a sequence of characters (including punctuation) wrapped by a pair of double quotes (""
). Sometimes, a single quote ('
) is also used to denote a string, although it is generally recommended to use double quotes unless the characters themselves include double quotes.
There are multiple ways to create a string. The following exercise introduces...