Understanding databases
In addition to using plenty of memory, large amounts of data can also take a long time to process. In some cases, it may make sense to process a large dataset from one input file to another output file. Data wrangling, however, is often an iterative process, involving back and forth between data analysis and modification. It can be hard to iterate with a large dataset using a Python script. This is where database management systems can be helpful.
A database simply refers to an organized collection of data. In contrast to files, databases are typically organized structurally to index each of the documents (another word for data entry) making it faster to retrieve specific documents or groups of documents. A database management system is software for interfacing with a database to do the following:
- Retrieve data from a database
- Modify data in a database
- Write data to a database
Database management systems define a language for analyzing and modifying data that does not...