Data ingestion
The first part of the data engineering process is data ingestion – it is crucial to get all the different data into a usable format in Snowflake for analytics. In the previous chapter, we learned how Snowpark can access data through a DataFrame. This DataFrame can access data from Snowflake tables, views, and objects, such as streams, if we run a query against it. Snowpark supports structured data in various formats, such as Excel and CSV, as well as semi-structured data, such as JSON, XML, Parquet, Avro, and ORC; specialized formats, such as HL7 and DICOM, and unstructured data, such as images and media, can be ingested and handled in Snowpark. Snowpark enables secure and programmatic access to files in Snowflake stages.
The flexibility of Snowpark Python allows you to adapt to changing data requirements effortlessly. Suppose you start with a CSV file as your data source; you can switch to a JSON or packet format at a later stage. With Snowpark, you don’...