Luckily for us, the team behind StackOverflow provides most of the data behind the StackExchange universe to which StackOverflow belongs under a cc-by-sa license. At the time of writing this book, the latest data dump can be found at https://archive.org/download/stackexchange. It contains data dumps of all the Q&A sites of the StackExchange family. For StackOverflow, you will find multiple files, of which we only need the stackoverflow.com-Posts.7z file, which is 11.3 GB.
After downloading and extracting it, we have around 59 GB of data in the XML format, containing all questions and answers as individual row tags within the root tag posts:
<?xml version="1.0" encoding="utf-8"?>
<posts>
...
<row Id="4572748" PostTypeId="2" ParentId="4568987" CreationDate="2011-01-01T00:01:03.387" Score...