How to subset the data file using IN and IF
In the previous part, the in
qualifier was used; it makes sure that the subset pertains to selected data. A lot of observations follow after this, for example:
- The list in 14/19
- The list in 90/l
- The list in 30/l
As is clear from the preceding example, there are three observations:
- The first command lists observations from 14 to 19
- The second command lists 90 observations
- The third command lists observations from 30 till the last observation
The if
statement is the other way of subsetting data; it generally has values of true or false. The following is an example from the observation of the year 2010, where the variable name is yr:
In order to examine the raw data, the browse
window is used. However, a problem occurs when only selected variables are to be viewed; this happens in big datasets. So, in this condition, create a list of the variables you want to examine before browsing. This is done through the following command:
It is important to note that this edit
command will help change the dataset manually. The assert
command helps Stata examine the observation. This is because when the bigger data (or big data, as it is called in today's world) arrives, checking single data through browse
or edit
commands becomes difficult. In this case, the assert
command is helpful. There are a couple of advantages: it helps identify whether a data statement is right or wrong. For example, in the case of the population of the country (popscon
), it will tell us that the values are positive:
If the preceding command results in the value true, then assert
does not give any output. However, if the command value is false, then an error message will appear.
The describe
command accounts for various fundamental information regarding datasets and variables, such as the total size of the dataset and the variable, the total number of variables in the dataset, and different formats of the variables. This can be denominated as describe
. It can only be applied to an unread file in Stata. An example is given as follows:
Codebook can give information on variables in the dataset without the list of variables; an example of this is codebook country.
The summarize
command delivers the statistics summary: means, standard deviation, and so on. The following table represents this tab:
As we can see in the preceding table, string variables such as Cntry
and Countrycode
do not have numbers; this is why no summary details are available. Yr
is a numeric variable; therefore, we can see that it has a statistics summary. For more details, the summarize detail option can be used.
The wide range of graphic qualities makes Stata a unique tool. One can easily get help by typing the help
command in Stata. A histogram graph can be created through the following command:
For a scatter plot, you have to leverage the following command:
Even though there is some benefit of having advanced graphs in Stata, this makes it work slowly. In certain cases, it is better to use version 7 graphics because they help visualize the data properly without using papers or presentations. This can be seen as follows:
Saving the dataset is a very easy command, and it is represented as follows:
If we have sets of files of the same content, then the replace
tab/option can be helpful. It will swap the last version and save it. If the old version is to be stored for some reason, then save it with a different name. One thing that should be kept in mind is that the original file content can be changed if it is saved with revised datasets. Therefore, after changes are made to the revised file, in order to open the file and restart it, just reopen it.
There are two ways to preserve and store the data. One option is to save the current data and revise it, and later, if you don't want to keep the data, then reopen
the saved data version. Another option is to use the preserve
and restore
functions/commands; they will take an image of the data, and the data will come back after you type restore
.