Playing with the TPC-DS Dataset
In this chapter, we will get acquainted with the TPC-DS dataset. Lakehouse platforms, including Databricks, use TPC-DS benchmarks to prove their capabilities. Hence, it is important to know about it. In this chapter, we will learn about the TPC-DS dataset, the TPC-DS benchmark, and how to use the TPC-DS dataset to validate some of the concepts we learned about in the previous chapters.
This chapter is only for advanced users who wish to build a larger dataset to test out Databricks SQL features. If you already have access to such a dataset, or you don’t want to test with bigger datasets, there is no need to go through this chapter.
In this chapter, we will cover the following topics:
- Understanding the TPC-DS dataset
- Generating TPC-DS data
- Running automated benchmarks
- Experimenting with TPC-DS in Databricks SQL