Use vectorized functions instead of loops
Python as a language is celebrated for its looping prowess. Whether you are working with a list or a dictionary, looping over an object in Python is a relatively easy task to perform, and can allow you to write really clean, concise code.
Even though pandas is a Python library, those same looping constructs are ironically an impediment to writing idiomatic, performant code. In contrast to looping, pandas offers vectorized computations, i.e, computations that work with all of the elements contained within a pd.Series
but which do not require you to explicitly loop.
How to do it
Let’s start with a simple pd.Series
constructed from a range:
ser = pd.Series(range(100_000), dtype=pd.Int64Dtype())
We could use the built-in pd.Series.sum
method to easily calculate the summation:
ser.sum()
4999950000
Looping over the pd.Series
and accumulating your own result will yield the same number:
result = 0
for...