pandas DataFrame API (Koalas)
Data scientists and data engineers that are Python users are very familiar with working with pandas DataFrames when manipulating data. pandas
is a Python library for data manipulation and analysis but that lacks the capability to work with big data, therefore it is only suitable when working with small datasets. When we need to work with more data, the most common option is PySpark, as we have demonstrated in the previous section, which is a library with a very different syntax than pandas.
Koalas is a library that eases the learning curve from transitioning from pandas
to working with big data in Azure Databricks. Koalas has a syntax that is very similar to the pandas API but with the functionality of PySpark.
Not all the pandas methods have been implemented and there are many small differences or subtleties that must be considered and might not be obvious. We cannot understand Koalas without understanding PySpark.
Koalas, functionality is built...