Importing SPSS, Stata, and SAS data
We will use pyreadstat
to read data from three popular statistical packages into pandas. The key advantage of pyreadstat
is that it allows data analysts to import data from these packages without losing metadata, such as variable and value labels.
The SPSS, Stata, and SAS data files we receive often come to us with the data issues of CSV and Excel files and SQL databases having been resolved. We do not typically have the invalid column names, changes in data types, and unclear missing values that we can get with CSV or Excel files, nor do we usually get the detachment of data from business logic, such as the meaning of data codes, that we often get with SQL data. When someone or some organization shares a data file from one of these packages with us, they have often added variable labels and value labels for categorical data. For example, a hypothetical data column called presentsat
has the variable label overall satisfaction with presentation...