next up previous
Next: Class Data Archives Up: Econometrics Laboratory Data Sources Previous: Notes About Data

Data in the EML Archive

If you issue the command `` cd /archive; ls -al'' you will be given a listing of the directories found at /archive, which is the data archive for the Econometrics Laboratory. It looks like this:

total 70
drwxr-xr-x  24 katagiri root         512 Jan 30 13:15 ./
drwxr-sr-x  38 root     staff       1024 Mar  9 16:53 ../
-rw-r--r--   1 root     other        130 Mar  9 22:00 -->--.econ.c4t1d0s2.archive
drwxr-xr-x   6 katagiri staff        512 Apr 11  1996 1983npts/
drwxr-xr-x   6 katagiri staff        512 Apr 15  1996 1990npts/
drwxr-xr-x   2 katagiri staff        512 Oct  3  1995 bhhall/
drwxr-xr-x   3 katagiri staff       2048 Jul 24  1995 ccdb94/
drwxrwsr-x   2 katagiri staff        512 Jul 25  1995 ces92/
dr-xr-xr-x   7 root     root         512 Feb 23  1996 ces94/
drwxr-xr-x   2 katagiri staff        512 Jan  5 12:28 cps94/
drwxr-xr-x   2 katagiri staff        512 Oct  3  1995 hrs/
drwxr-xr-x   7 katagiri staff        512 May 16  1997 hrs_w2/
drwxr-xr-x   4 katagiri staff        512 Jan 29 20:36 ipums.old/
drwxr-sr-x   4 katagiri staff        512 Jan 29 21:50 ipums.umn/
drwx------   2 root     root        8192 Jun 18  1997 lost+found/
drwxr-xr-x   2 katagiri staff        512 Oct 18  1995 m+a/
drwxr-xr-x  10 katagiri staff       1024 Jul 24  1995 nese/
drwxr-xr-x   6 katagiri staff        512 Jul 12  1996 psid_all/
drwxr-xr-x   5 katagiri staff        512 Jul 11  1995 psid_w22/
drwxr-xr-x   2 katagiri staff        512 Aug  3  1995 regio/
drwxr-xr-x   3 katagiri staff       2560 Oct 10 08:41 sa94/
drwxr-xr-x   3 katagiri staff        512 Aug 23  1996 sa95/
dr-xr-xr-x   3 katagiri staff       1024 Oct 10 14:33 sa96/
drwxr-xr-x   2 katagiri staff        512 Jul 11  1995 sipp/
drwxr-sr-x   2 op       operator     512 Jun 20  1996 tape/

1983NPTS
The Nationwide Personal Transportation Survey was developed by the Bureau of Transportation Statistics (BTS) to make NPTS statistics accessible to a wider audience. The NPTS compiles national data on the nature and characteristics of personal travel. See the description below under /archive/1990NPTS for more information.

BHHALL
Bronwyn Hall's datasets that support her research publications on R&D, patents, and so on.

CCDB94
County and City Handbook Supplement to the Statistical Abstract of the U.S., 1994. The tables are in Lotus format (*.wk1) and correspond to the tables in the printed version.

CES92
Consumer Expenditure Survey, 1992. Quarterly data on household income and expenditures from the Bureau of Labor Statistics. For more information, point a web browser at http://www.bls.gov/.

HRS
The Health and Retirement Survey is a nationally representative longitudinal data set that was developed in the early 1990s to examine retirement and aging of society. For more information, point a web browser at http://www.umich.edu/.

M+A
Data on corporate mergers and acquisitions in an ascii text file.

NESE
The National Economic, Social, and Environmental Data Bank is a product of the Department of Commerce, Economics and Statistics Administration. The databank contains:

The Economic and Budget Outlook: Fiscal Years 1993-1997

U.S. Bureau of the Census Annual Survey of Manufactures: Fiscal Years 1993-1997

U.S. Bureau of the Census Statistical Abstract

Decennial Census Summary, 1990

Historical Tables from the Budget of the U.S. Government

Bureau of Economic Analysis Current Business Statistics)

Business Cycle Indicators

Pollution Abatement Expenditures

Gross State Product

Input-Output Accounts of the U. S. Economy

National Income and Product Accounts

U.S. Regional Economic Projections to 2040

Fixed Tangible Wealth of the U.S.

Capital Punishment 1990

Crime and the Nation's Households, 1990

Drugs and Jail Inmates, 1989

Felony Sentences in State Courts

Female Victims of Violent Crime

Jail Inmates 1990

Prisoners in 1990

Profile of Jail Inmates

Probation and Parole 1990

Weather Conditions at Meteorological Stations in the U.S., Statistical Information

School Crime

Women in Prison

Capital Stocks Data Base

Digest of Education Statistics, 1991

Annual Energy Review

Economic Report of the President

New England Economic Indicators)

Health United States 1990

U.S. Industrial Outlook, 1992

Airline and Airport On-Time and Departure Data

Composite Quotations for U.S. Government Securities

Toxins in the Community, National and Local Perspectives

PSID_W22
Panel Study of Income Dynamics, Wave 22 (1989/90). See /archive/psid_all below. For more information from the ICPSR, point a web browser at http://www.isr.umich.edu/src/psid.

REGIO
Regio contains socioeconomic information on the various regions of the European Community. These are classified in line with a specific system called NUTS (Nomenclature of territorial units for statistics). NUTS has three interrelated levels as follows:

(a) level 1 - 71 European Community regions (RCE); (b) level 2 - 183 basic administrative units (UAB); (c) level 3 - 1,044 subdivisions of the above (SUAB).

Regio is subdivided into seven statistical domains. Not all data are available at the most detailed level (i.e. level 3). If, however, data are available at level 3, they are also available at levels 1 and 2. The subjects covered by Regio are: demography, economic accounts, unemployment, labour force sample survey, industry, agriculture and transport.

Depending on the subject, data are available from 1970, 1975 or 1983. The frequency is always annual with the exception of certain monthly unemployment statistics.

SA94
Statistical Abstract of the U.S., 1994. The files are in Lotus format (*.wk1) and correspond to the tables in the printed version.

SA95
Statistical Abstract of the U.S., 1995. The files are in Lotus format (*.wk1) and correspond to the tables in the printed version.

SIPP
The Survey of Income Program Participation is maintained by the Census Bureau. The data is expected to provide a better understanding of the level, and change in the level, of well-being of the population and of how economic situations are related to demographic and social characteristics of individuals. The SIPP data should be especially useful in studying federal transfer programs, estimating program cost and effectiveness, and assessing the effect of proposed changes in program regulations and benefit levels. More detailed information is available from the Census Bureau's web site, http://www.census.gov/.

1990NPTS
The Nationwide Personal Transportation Survey was developed by the Bureau of Transportation Statistics (BTS) to make NPTS statistics accessible to a wider audience. The NPTS compiles national data on the nature and characteristics of personal travel.

The NPTS contains information in several files on households, people, vehicles, and travel reflecting a wide range of variables. The household data include information such as household size, family income, availability and proximity of public transportation, and family life cycle. Information on people includes such variables as the reference person's age, income category, and the primary mode of transportation used by individuals to commute to work. The NPTS also contains data on vehicles, including information on make and model, model year, annual mileage, and whether the vehicle was purchased new or used. Sample variables from the travel files include purpose of trip, primary mode of transportation, trip mileage, number of household members on trip, and a wide variety of related statistics.

The NPTS contains statistics from 1983 and 1990 surveys in ASCII and SAS formats. 1983 data is found at /archive/1983npts. 1990 data is found at /archive/1990npts. Be sure to read the readme.txt file in these directories first.

CES94
1994 Consumer Expenditure Survey. Data is available by quarter on household income and expenditures. More information is available from the Bureau of Labor Statistics home page, http://www.bls.gov/.

IPUMS.UMN (currently mirrored site) and IPUMS.OLD (frozen for 1995)
Integrated Public Use Microdata Series. IPUMS consists of twenty-three samples of the Unites States population drawn from the decennial censuses of 1850, 1880, 1900 through 1920, and 1940 through 1990. (Multiple datasets were created from more recent census data.) Like the Current Population Survey, the data is hierarchical, which means the information is disaggregated to observations of individual households and the members therein. The IPUMS innovation over the existing microdata samples is that all record layouts, documentation, and coding schemes have been harmonized into one coherent format. Though not every question was asked in all censuses, data is available on such characteristics as fertility, nuptiality, life-course transitions, immigration, internal migration, labor force participation, occupational structure, household composition, education, ethnicity, and many, many more characteristics. For these reasons the IPUMS is among the richest sources of information on American social change available.

The IPUMS is a household survey. The data are organized by household observation, containing variables common to all members of the household, followed by individual observations for all members of the household. Then the next household, etc. Examples of household variables are geographic location, population of geographic location, urban or rural location, rent, and the number of people in household. Examples of individual variables are race, gender, marital status, birthplace, year of immigration, languages spoken, occupation, income by sources, and veteran status.

The IPUMS data are HUGE files. The individual census-year files range in size from 30 MB to 150 MB when they are compressed. When uncompressed they are about ten times larger. For this reason it is only possible to use the data with software that can read unix-compressed ascii data, for example SAS. It is impossible to uncompress the data for use in other packages, such as Stata. There simply is not adequate disk space. Further, the large size of the data means that novice users should probably seek help before they begin.

Detailed information about the IPUMS data is available in its accompanying documentation and in ``help ipums'' on the EML system. Help includes sample SAS extraction routines .

NLSY
National Longitudinal Study of Youth and Child. The data files are DOS binaries and therefore difficult to use. We hope to obtain the updated 1994 CD ROM and make this data available to our users on a PC.

PSID_ALL
Panel Study of Income Dynamics.

The ICPSR has made data for waves 1-25 (1968-1992) available, plus some early release files for 1992-95. They have been downloaded to /archive/psid_all. This comprehensive collection of PSID data has 214 files containing 415.4 megabytes of data, documentation, and SAS data definition statements for the main data files, plus six supplemental files for the 1968-92 period.

The files in this archive are zipped; for assistance in unzipping them, see "man unzip". Because this is such a popular data source, I am providing some rather detailed information on the files here.

Main Data Files: /archive/psid\_all/main
-Single-year family files for each of 25 years (68-92)
-Cross9year individual file for 25 years (68-92)

68_92ind.zip   11265181       1968-1992 cross year individual data
68fam.zip      1313053        1968 family data
69fam.zip      1719067        1969 family data
70fam.zip      1542849        1970 family data
71fam.zip      1475938        1971 family data
72fam.zip      1657720        1972 family data
73fam.zip      881813         1973 family data
74fam.zip      1007744        1974 family data
75fam.zip      1199054        1975 family data
76fam.zip      1797369        1976 family data
77fam.zip      1353982        1977 family data
78fam.zip      1464292        1978 family data
79fam.zip      1567227        1979 family data
80fam.zip      1690627        1980 family data
81fam.zip      1789570        1981 family data
82fam.zip      1557858        1982 family data
83fam.zip      1880701        1983 family data
84fam.zip      2416749        1984 family data
85fam.zip      3119110        1985 family data
86fam.zip      2714813        1986 family data
87fam.zip      2495672        1987 family data
88fam.zip      2892548        1988 family data
89fam.zip      2823865        1989 family data
90fam.zip      3323509        1990 family data
91fam.zip      4905179        1991 family data
92fam.zip      5829410        1992 family data

Supplemental Data Files: /archive/psid_all/supplemental
-1988 Time and Money Transfers File
-1990 Health--Self-Administered Questionnaire
-1990 Health--Telephone Health Questionnaire
-1991 Parent Health Supplement
-1985-92 Childbirth and Adoption History File
-1985-92 Marriage History File

85_92cah.zip   647707
85_92mh.zip    340602
88tmt.zip      186966
90saq.zip      159295
90thq.zip      292607
91phs.zip      608335

Documentation Files: /archive/psid_all/documentation
Documentation files for supplemental data is packaged with the data.
This directory contains documentation for the main data files only.

68-78doc.zip   745418   1968-1978 family documentation
68-91doc.zip   246921   1968-1991 individual documentation
68-92doc.zip   282581   1968-1992 individual documentation
79doctxt.zip   92655    1979 family documentation
80doctxt.zip   99465    1980 "
81doctxt.zip   130477   1981 "
82doctxt.zip   100816   1982 "  
83doctxt.zip   133652   1983 "
84doctxt.zip   193815   1984 "
85doctxt.zip   251639   1985 "
86doctxt.zip   185487   1986 "
87doctxt.zip   175425   1987 "
88doctxt.zip   219867   1988 "
89doctxt.zip   243198   1989 "
90doctxt.zip   253992   1990 "
91doctxt.zip   233689   1991 "
92doctxt.zip   247691   1992 "
92index.zip
doc.html       8892     describes available documentation (similar to
                        what one gets in help psid)
newslttr.pdf   18279    .pdf files require Adobe's Acrobat Reader to view
q91.pdf        1621065  "
q92.pdf        2101989  "
q93.pdf        2607677  "
q93note.pdf    425828   "
q94.pdf        2559820  "
q95.pdf        2520116  "
q96.pdf        2343015  "
ques93a.pdf    52042    "

[Note that .pdf and .html files are viewable in netscape.]

Early-Release Files: /archive/psid_all/early-release

er68-95i.zip   13432901       1968-1995 cross year individual data
er92f.zip      2281521        1992 family data
er93f.zip      3469367        1993 family data
er94f.zip      4248421        1994 family data
er95f.zip      2223476        1995 family data

Note that ICPSR has made all the PSID files available via the web, at the URL http://www.isr.umich.edu/src/psid. There is a menu interface available that enables extraction of variables without use of a statistical program. This method can be slow due to network congestion, but it is also very easy.



next up previous
Next: Class Data Archives Up: Econometrics Laboratory Data Sources Previous: Notes About Data



katagiri@econ.Berkeley.EDU