Chaining with .pipe
When writing pandas code, there are two major stylistic forms that developers follow. The first approach makes liberal use of variables throughout a program, whether that means creating new variables like:
df = pd.DataFrame(...)
df1 = do_something(df)
df2 = do_another_thing(df1)
df3 = do_yet_another_thing(df2)
or simply reassigning to the same variable repeatedly:
df = pd.DataFrame(...)
df = do_something(df)
df = do_another_thing(df)
df = do_yet_another_thing(df)
The alternative approach is to express your code as a pipeline, where each step accepts and returns a pd.DataFrame
:
(
pd.DataFrame(...)
.pipe(do_something)
.pipe(do_another_thing)
.pipe(do_yet_another_thing)
)
With the variable-based approach, you must create multiple variables in your program, or change the state of a pd.DataFrame
at every reassignment. The pipeline approach, by contrast, does not create any new variables, nor does it change the state of your...