Annual data for the period from 1960 to 1980 are taken from the Economic Report of the President. The data are as follows:
Year Money Inflation Unemployment Party 1960 0.7 1.6 5.5 1 1961 3.2 0.9 6.7 0 1962 1.8 1.8 5.5 0 1963 3.7 1.5 5.7 0 1964 4.6 1.5 5.2 0 1965 4.7 2.2 4.5 0 1966 2.5 3.2 3.8 0 1967 6.6 3.0 3.8 0 1968 7.7 4.4 3.6 0 1969 3.2 5.1 3.5 1 1970 5.3 5.4 4.9 1 1971 6.5 5.0 5.9 1 1972 9.3 4.2 5.6 1 1973 5.5 5.7 4.9 1 1974 4.4 8.7 5.6 1 1975 5.0 9.3 8.5 1 1976 6.6 5.2 7.7 1 1977 8.1 5.8 7.1 0 1978 8.3 7.3 6.1 0 1979 7.2 8.5 5.8 0 1980 6.4 9.0 7.1 0
Money is the money supply growth rate (percent increase in M1
over each year).
Inflation is the percent increase in the implicit
GNP price deflator.
Unemployment is measured as a percent of
the civilian labor force.
Party is the party holding the presidency
(one for Republicans, zero for Democrats).
ENTERis used to input small amounts of data from the keyboard,
READto enter data stored in text files, and
LOADto enter previously saved data from SST or other programs.
ENTERcommand is used to enter new data or change existing data from the keyboard in interactive mode. You tell SST the variables you wish to create or alter and a range of observations and then SST prompts you for data values. The syntax for the
enter to[variable list] obs[observation list]
SST will prompt you for data values on the variables
specified in the
TO subop in the range specified by the
subop. When finished with data entry, type the letter `q' or `quit'.
SST will supply the variable name, with the observation number in parentheses, followed by the current value of that variable in brackets. ("MD" indicates missing data if no value currently exists for the particular observation.) You can either change the value by typing a new value, followed by a carriage return, or leave the value unchanged by typing a carriage return. To enter a missing value, type either `MD', `md' or a period `.'.
Multiple values can be entered on one line, separated by blanks or commas. After the carriage return is pressed, the program the prompts you for the next data value. If you have not supplied a data value for all variables for the particular observation, it will remind you which variable comes next. If all data has been entered for a particular observation, it then prompts you for the next observation.
For example to enter data to the data listed at the start of this chapter, type:
enter to[year money inflat unemp party] obs[1-21]
SST responds with the prompt:
year(1) [ MD ]:
You could then type `1960' followed by a carriage return. The remainder of the session might continue as follows (carriage returns are entered after each list of data values):
money(1) [ MD ]: 0.7 inflat(1) [ MD ]: 1.6 unemp(1) [ MD ]: 5.5 party(1) [ MD ]: 1
To speed things up, you may want to type more than one value after each prompt. For example:
year(2) [ MD ]: 1961 3.2 0.9 unemp(2) [ MD ]: 6.7 0
Thus, the value of
inflat for observation 2 is 3.2, the value of
unemp for observation 2 is 0.9, and so forth.
For small amounts of data,
ENTER works well, but you probably
would not want to enter large amounts of data this way. To stop
entering data at any point during the
ENTER command, just type
`q' or `quit':
year(3) [ MD ]: quit
and SST will be ready to accept new commands.
SST allows you to designate some values as missing with the
command. When asked for a value, type either `MD' or a period (`.')
when prompted, and SST will mark that data value as missing.
ENTERcommand lets you input data from the keyboard in response to prompts. In some situations, however, it may be faster to create a text file with your data and input it using the
READcommand. For instance, you may have already typed your data into a file or have been given your data in this format. Text files on the IBM-PC (and most other computers) respresent characters using standard ASCII codes that can be displayed by using the DOS type command. Some spreadsheet and statistical programs (including SST) store data in a more compact binary format that cannot be displayed using the type command. If your data is in this form, check the
LOADcommand for details.
A data file can be created using a text editor or word processor (such as WordStar). If a word processor is used, be sure to use the "non-document" mode so that your word processor does not insert "invisible" control characters into the text file. SST normally ignores control characters. Although data need only be separated by a comma, space, or carriage return, the data set will be easier for you to read if the data is separated into fixed columns. You may want to set up tabs to input data into a fixed column format.
For our example, the "observations" correspond to years. For the data to be organized by observation, the input file would look like:
1960 0.7 1.6 5.5 1 1961 3.2 0.9 6.7 0
and so on. On each line of the input file, there are five data values
corresponding to the year, money supply growth rate, inflation rate,
unemployment rate, and party holding the presidency for the particular
observation (year) in question. The data for a single observation can
occupy more than one line of the input file, but in general you might
think of a data file organized by observation as being a rectangular
array with variables defining columns and rows defining observations.
Unless you state otherwise, the
READ command expects data to
be stored by observation.
Data organized by variable for the above example might look like:
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 0.7 3.2 1.8 3.7 4.6 4.7 2.5 6.6 7.7 3.2 5.3
and so on. For data organized by variable, all data is input for one variable before data is entered on the next variable. If the data for each variable could fit on one line of the input file, then data organized by observation could be viewed as a rectangular array with variables defining rows and observations defining columns.
In the example we are using, the distinction between variables and
observations is probably pretty clear. In other cases, however, the
distinction will depend on how one wants to use the data. For example, each
year a number of organizations prepare forecasts of GNP, inflation and
other macroeconomic indicators. Suppose one wanted to analyze this data. In
this case, it is not obvious what the variables should be. One possibility
is to make the variable the forecast of a particular organization. Thus one
has variables like
WHARTON, with observations defined by which macroeconomic indicator
is being forecast. Another possibility is to have variables corresponding
to different macroeconomic indicators with each obseration corresponding to
the organization that produced the forecast. Which way of thinking about
the data in terms of variables and observations will depend on how one
wants to use the data.
Note that the computer simply reads left to right, by row. Thus there is no difference to the computer between the following two data sets:
4.2 3.5 4.0 3.3 0.5 1.0 1.5 1.2
4.2 3.5 4.0 3.3 0.5 1.0 1.5 1.2
Of course if one is creating the data set, it may be simpler to read if each column and row correspond to a different variable or observation.
To summarize, decide which are variables and which are observations for your purpose. If the data set exists, then see whether the computer reading by row will first see different observations for the first variable (data by variable) or the values of different variables for the first observation (data by observation). If you are creating the data set, the method of organization is a matter of convenience.
READcommand you must supply two pieces of information: the name of the file which contains the data and the names of the variables in the file. If the data in our example were organized by obseration, the SST command to read the data from the file
read to[year money inflat unemp party] file[mydata]
Unless instructed otherwise, the
READ command expects data to
be organized by obseration. If the data in the file
organized by variable, the appropriate command would be instead:
read to[year money inflat unemp party] obs[1-21] byvar
BYVAR subop tells SST that the data is organized by
variable instead of by observation.
READcommand. First, the data file may contain illegal characters. Data files used in the
READcommand should only contain valid numbers. Valid numbers can be in integer, decimal, or exponential format. For example:
1 1.0 +1.0e0
are all examples of valid numbers (each with the same meaning). On the other hand, a file containing:
1 abc 2.0
would cause SST to issue an error message and abort the
Second, the number of values in the data file may not correspond to
your instructions in the
READ command. The number of data values
in a file should be a multiple of the number of variables specified in
TO subop. If data is read by variable, SST has no way to
determine the number of observations in the file other than to divide
the total number of data values in the file by the number of variables.
If these numbers are different, it assumes that you have made a mistake
and issues an error. If data is read by variable, it will issue a warning,
generally this means something is amiss and you should examine your
Third, reading data normally
requires two passes through the data file: one to determine
how many observations are in the data file, and another to process the
data. If you know how many observations are in a file, you can speed up
READ command by giving it this information in the
read to[year money inflat unemp party] nobs
In reading large datasets, you may run out of memory. Reading data
by variable is somewhat more efficient than reading data by observation.
The former only requires that the data on a single variable fit into
memory at once, while the latter requires that the entire data set fit
into memory at once. If you run out of memory, try entering data in
smaller batches and saving them using the
LOAD command which
can handle very large data sets efficiently.
1 2 3 , 4 5
the same as it will read:
For your own sanity, we suggest that you use a consistent system for data entry, but don't worry about SST -- it's very tolerant.
Users accustomed to mainframe computing often prefer to store their
data in a fixed column format without spaces, commas, or other delimiters
between data values. This format saves space, though it is somewhat
difficult to examine. SST allows users to specify a FORTRAN style format
statement for data in this form using the
In fixed format the data are required to appear in specified positions
within the file. A summary of FORTRAN format statements appears in an
appendix so we will only provide a few simple examples here. The letter
F in a FORTRAN format statement tells SST that you will be inputting a
floating point number. The letter
F is followed by an integer indicating
how many columns the number will occupy. Thus
F3 tells SST that you
will be inputting a floating point number occupying three columns in
the data file. For example, the following data file:
could be read using the
The first number (
123) occupies three columns and the second number
456) also occupies three columns. Instead of repeating the
F3 twice, you could specify repeats of the same
specification by preceding the letter
F with an integer indicating the
number of times the specification is to be repeated. Thus:
is equivalent to the specification above. FORTRAN format statements are quite flexible, though perhaps a bit complicated for new users.
SST will now provide you a listing of all variables entered, the number of non-missing observations on each variable, the date created, and the variable's label, if any (see below, for details of how to label a variable). For example:
Listing of variables in memory: year 21 Thu Jan 09 14:41:06 1986 money 21 Thu Jan 09 14:41:06 1986 change in M1 from year earlier inflat 21 Thu Jan 09 14:41:06 1986 change in GNP implicit price deflator unemploy 21 Thu Jan 09 14:41:06 1986 civilian unemployment rate party 21 Thu Jan 09 14:41:06 1986 republican president dummy
Are all the variables entered that you thought
should be entered? Does each variable have the number of observations
that you expected? If you just input a variable using the
command, the date and time on the variable should be very recent.
Even if the information supplied by the
LIST command is what you
expected, you will still want to check if the data values are correct.
There are several ways to do this. If you don't have too much data, you
can examine it using the
print var[year money]
and SST will print the values of the variables
that you have input. For large datasets,
you will probably want to restrict the observations printed by
specifying a limited observation range:
print var[year money] obs[1-10]
OBS subop restricts which values will be printed out on
the screen. The above example would only print the data for observations
one through ten. Alternatively, the observation range can be restricted
print var[year money] if[year > 1975]
which would print out data for years after 1975.
Another way to check the data that you have input is to compute some
descriptive statistics on the data. If the data are discrete (i.e., take
only a few distinct values), the
FREQ command will show you
which values the variable takes and the percentage of observations
falling into each category. For example:
would compute a frequency distribution for the variable
For variables that take a large number of distinct values (any of
the other variables in our data set), the
COVA command will
produce a few useful descriptive statistics on the variable:
cova var[year money inflat unemp]
COVA command automatically produces the mean, standard
deviation, minimum, and maximum of the variable specified in the
VAR subop. Usually if there has been some error in data entry,
one or more of these statistics will tip you off.
Further details of the
can be found in Chapter 4 of the User's Guide.
LABELcommand. For example, type:
label var[money] lab[change in M1 from year earlier]
LAB subop, you type whatever description you want
attached to the variable. The variable label ordinarily should not
exceed thirty characters. The label will be printed when you issue the
LIST command and at other points when you access the variable.
partywhich takes only a few values (in our case, Republican and Democratic) can also have labels assigned to specific values. We have coded party equal to one when Republicans hold the White House and zero when the Democrats hold the White House:
label var[party] val[1 Repub 0 Democrat]
VAL subop, you first list a value of the variable in
VAR subop followed by its label and continue until you
have finished labelling the values. Value labels are restricted to
a maximum of eight characters and must not contain spaces or
commas. Multiple variables whose categories have the same variables
can be labelled simultaneously by specifying more than one variable
name in the
VAR subop. In principle, there is no limit to the
number of value labels that can be assigned, but few people have
enough patience to type more than ten labels.
LABELcommand with the
VARsubop, but omit both the
VALsubops, SST will remove all labelling information from the variables specified. Since labelling information requires relatively little storage and is an invaluable reminder when you return to a data set that you have not worked with for awhile, we recommend that you keep as much labelling information as possible.
READcommand). The command to save all data in currently in SST into a file
SST automatically adds the extension `.sav' to the filename you
specify in the
FILE subop. (If for some reason you wanted another
extension, you would have to specify the full filename and extension in
FILE subop.) You may not want to save all variables
in memory. In this case SST allows you to list which variables you
save file[myfile] var[year money inflat]
Alternatively, you might only want to save some subset of the
data. To save only the first ten observations, add an observation
range using the
save file[myfile] obs[1-10]
The observation range can also be restricted using the
subop. To save only the post 1975 data, type:
save file[myfile] if[year > 1975]
LISTcommand with the
SST only reads the "header" off the system file, so issuing this command does not cause the data to be actually entered into SST. It tells you what is in the file, but does not waste time reading through the entire file.
LOADcommand is used to load a data set previously saved during an SST session. Once you have gone to the trouble of saving data in the form of an SST system file, reloading it is easy. Just type;
and SST loads the data and labelling information. It's fast and simple. If
no filename extension is specified in the
FILE subop, SST assumes
the extension `.sav'. To load only selected variables stored in the
myfile.sav, include the variables that you want in the
load file[myfile] to[year money]
LOADcommand, and the additional variables will be loaded into memory. (Caution: If some of the variables in the second data set have the same names as variables in the first, the old values will be overwritten.)
On other occassions, you may have two or more samples of data on the
same variables. For example, you may have several household
expenditure surveys conducted in different years. The variables in
each data set are the same (or at least overlapping) and you want
to combine the various samples. To do this, just add the
subop to the
load file[yourdata] append
The variables in the file
yourdata will be appended to whatever
variables are currently in memory. The starting observation for the new
data is determined by the maximum observation number of the data currently
in memory (which can be determined using the
LOADcommand. Give the
LOADcommand and add the
load file[filename] db2
SST assumes an extension of `.dbf' to the filename specified in the
FILE subop, unless told otherwise. (dBASE II uses this extension
by default when it produces a file in its standard format.)
Another common format for files produced by spreadsheet programs is the DIF (Data Interchange Format) format used by VisiCalc and other programs. To load a DIF file, enter:
load file[filename] dif
If no extension is specified, `.dif' is assumed when the
subop is present. With DIF files, column labels are used for variable
names. If no column labels are present, names are assigned by SST.
del var[x y]
After they have been deleted, the variables
y are lost
unless you previously saved them in a file. Use the
command with caution!
A wholesale delete of all variables (and everything else) from
memory can be accomplished using the
CLEAR also resets the range, so you should issue a new
statement after the
CLEAR command. The primary use of the
command is to restart an SST sesssion without having to
reload the program into memory. Remember, however, that
removes everything from memory.
It does not affect files that have been written to disk, but it is your
SAVE any data that you will need in the future.
SORTcommand sorts observations specified in the
VARsubop according to values of the variables specified in the
BYsubop. If more than one variable is specified in the
BYsubop, the sort is lexicographic--that is, first the data is sorted according to the first variable, and the second variable is only used to break ties in the first variable, and so on. Variables are sorted in ascending order: low values are put ahead of high values. Missing values are treated as large values so that missing values in the variables specified in the
BYsubop tend to end up at the bottom of the data file.
VAR subop is optional. If it is omitted, SST assumes that you
want all of your data sorted so that observations are kept intact.
SORT command writes over the variables specified in the
VAR subop; it is wise to save your data using the
command prior to using
One use of the
SORT command is to arrange your data in a way that
permits visual (as opposed to statistical) analysis. For example,
suppose wanted to examine the relationship between growth in the money
supply and inflation.
You could sort the data by money supply growth, and then look at the
associated inflation rates, as below:
sort by[money] print var[year money inflat unemp] OBS VARIABLES year money inflat unemp 1: 1960 0.7 1.6 5.5 2: 1962 1.8 1.8 5.5 3: 1966 2.5 3.2 3.8 4: 1969 3.2 5.1 3.5 5: 1961 3.2 0.9 6.7 6: 1963 3.7 1.5 5.7 7: 1974 4.4 8.7 5.6 8: 1964 4.6 1.5 5.2 9: 1965 4.7 2.2 4.5 10: 1975 5.0 9.3 8.5 11: 1970 5.3 5.4 4.9 12: 1973 5.5 5.7 4.9 13: 1980 6.4 9.0 7.1 14: 1971 6.5 5.0 5.9 15: 1967 6.6 3.0 3.8 16: 1976 6.6 5.2 7.7 17: 1979 7.2 8.5 5.8 18: 1968 7.7 4.4 3.6 19: 1977 8.1 5.8 7.1 20: 1978 8.3 7.3 6.1 21: 1972 9.3 4.2 5.6
It appears that years with high money supply growth rates accompany years with high inflation rates. This relationship could then be investigated further using the statistical procedures described in later chapters.
The same sorting could be accomplished with:
sort by[money] var[year money inflat unemploy year party]
since SST assumes that you want all variables sorted if the
subop is omitted. If we had specified only a subset of variables in the
VAR subop, then the data on different observations would be
DB2subops to the
SAVEcommand, SST will write either a DIF file or a dBASE II file. For DIF files, variable names will be used for column names. For a dBASE II files, variable names will be used for field names.
WRITEcommand. Unless a FORTRAN format is specified using the
FMTsubop, the data will be output with a space separating each data value. The default output format is by observation. For example:
write var[year money] file[myfile.out]
would create a file
myfile.out with contents:
1960 0.7 1961 3.2
and so on.