Apply performance
The .apply
method on a Series and DataFrame is one of the slowest operations in pandas. In this recipe, we will explore the speed of it and see if we can debug what is going on.
How to do it…
- Let's time how long one use of the
.apply
method takes using the%%timeit
cell magic in Jupiter. This is the code from thetweak_kag
function that limits the cardinality of the country column (Q3
):>>> %%timeit >>> def limit_countries(val): ... if val in {'United States of America', 'India', 'China'}: ... return val ... return 'Another' >>> q3 = df.Q3.apply(limit_countries).rename('Country') 6.42 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
- Let's look at using the
.replace
method instead of.apply
and see if that improves performance:>>> %%timeit >>> other_values = df...