Facilitating data sharing with Delta
JDBC/ODBC connections or HTTP connections via REST APIs are good for sharing modest data but may become a bottleneck for larger datasets. Consider the scenario of sharing curated data with external vendors or partners. There are some firms whose business model is centered around data sharing, such as S&P, Bloomberg, FactSet, Nasdaq, and SafeGraph. They aim to be the source of truth for financial datasets, which every other financial institution will be interested in consuming for downstream analysis and to augment their own datasets. Wouldn't it be nice not to have to copy the data multiple times?
It is best to use cloud storage access directly to avoid unnecessary platform-related bottlenecks. That is what Delta sharing attempts to do – provide an open standard to securely and seamlessly share large volumes of data in Parquet/Delta with a wide variety of consumers and an easy way to govern and audit. Consumers can be from pandas...