Summary
This chapter has covered a lot of information from some pretty diverse topics. First, we talked about public sources of data including public databases, open sources, APIs, and web services, as well as the pros and cons of using each. Then, we talked about the different ways to collect your own data, including web scraping, surveying—especially the different types of survey questions and survey bias—and observations. Then, we covered the difference between ETL and ELT, as well as a full load and a delta load, and why it is important. Next, we briefly covered OLTP and OLAP and how they are used to collect and process transactional data. Finally, we wrapped up the chapter by covering ways to optimize query structures, such as filtering, subsets, indexing, sorting, parameterization, temporary tables, subqueries, and execution plans. Whew! There sure are a lot of ways to collect data. In the next chapter, we will go over what to do with it once you have it!