Avoid mutating data
Although pandas allows you to mutate data, the cost impact of doing so varies by data type. In some cases, it can be prohibitively expensive, so you will be best served trying to minimize mutations you have to perform at all costs.
How to do it
When thinking about data mutation, a best effort should be made to mutate before loading into a pandas structure. We can easily illustrate a performance difference by comparing the time to mutate a record after loading it into a pd.Series
:
def mutate_after():
data = ["foo", "bar", "baz"]
ser = pd.Series(data, dtype=pd.StringDtype())
ser.iloc[1] = "BAR"
timeit.timeit(mutate_after, number=1000)
0.041951814011554234
To the time it takes if the mutation was performed beforehand:
def mutate_before():
data = ["foo", "bar", "baz"]
data[1] = "BAR"
ser = pd.Series(data, dtype=pd.StringDtype())
timeit.timeit...