Loading a file
As explained in the previous chapter, to load a dataframe, we have the .load
accessor in the Optimus instance, which lets us to load from different sources, such as the local filesystem, databases, and alternative filesystems (S3, HDFS, and more).
The most useful and readable function from this accessor is df.load.file
, which allows us to infer encoding data on the loading process so that we can forget about extra configuration when loading a new dataset. This following is an example:
from optimus import Optimus op = optimus("pandas") df = op.load.file("my_file.json")
In the preceding code, we are simply loading a JSON file. Internally, Optimus detects its format and creates the dataframe. It can also load formats such as CSV, JSON, XML, Excel, Parquet, Avro, ORC, and HDF5. If required, you can also call any specific method for its type, as shown here: