Cleaning strings
You may find your strings messy when you start working with your dataset for the first time. Inconsistent spacing, unexpected letter cases, odd spelling mistakes, and so on may be causing this. Your data may not be in a state in which you can start your analysis or the necessary data transformations. Knowing how to clean strings helps move past this roadblock.
In this recipe, we’ll cover ways to clean strings using methods such as .str.strip_chars()
, .str.replace()
, and .str.to_titlecase()
.
Getting ready
We’re using a manually created dataset for this recipe. Run the following to create a DataFrame:
df = pl.DataFrame( { 'text': [ ' I aM a HUmAn... ', 'it is NOT easy! ', ...