******************************************************************** * This files merges Census of Manufacturing datafiles * for 1982 and 92; creates consistent * definition of variables over time; fixes some of the problems * with the geocodes (state and county codes); add data on * workers education from the 1980 and 1990 Censuses of Population * * The Census of Manufacturing datafile for 1992 provided by * the census is called Gs92 * The Census of Manufacturing datafile for 1982 provided by * the census is called Gs82 * The data on education from the Census of Population are in * a file called medu_lrd ********************************************************************; options compress=yes; libname in 'D:/cm/'; libname un 'D:/enrico/new/data'; libname on 'D:/enrico/imported/data/'; **************************** 1992 data ****************************; data tmp; set in.Gs92; yr =92; * transforming 3 character variables * into a numeric variables (because these * 3 variables are numeric in 1982); lfo_tmp = lfo*1; ind_tmp = ind*1; tn_tmp = tn*1; drop lfo ind tn; run; data tmp92; set tmp; lfo = lfo_tmp; ind = ind_tmp; tn = tn_tmp; drop lfo_tmp ind_tmp tn_tmp; run; ***************************** 1982 data *****************************; data tmp; set in.Gs82; yr =82; drop fst st; proc sort; by orgppn; run; ********************************** Correct for a problem in STATE code for Hawaii. (state code is corrected for 941 plants) **********************************; data statefix; set in.statefix; if yr ^=82 then delete; orgppn = ppn; * transforming fst into a numeric variable * (because fst is numeric in 1992); fst_tmp= fst*1; keep orgppn fst_tmp; proc sort; by orgppn; run; data tmp82; merge tmp statefix; by orgppn; fst = fst_tmp; drop fst_tmp; run; ************************************** 1982 + 1992 data **************************************; data merged1; set tmp82 tmp92; ******************************************************* COUNTY definition is made consistent across years. In particular, *1980* definition is adopted in all years. The transformation is based on backer.sas. However, it contains much less changes than becker.sas does, because I am interested only in years 77 to 92. Most changes in becker.sas take place in 1967 and 1972 (See becker.sas for explanation of changes) *******************************************************; /* Becker.sas starts here */ oldcou=cou; cou=.; * ALASKA, HAWAII; * Beker suggests to drop both states, but for my purposes * I see no reason. (The definition of Ancorage * does not change. No changes in Hawaii, either); * GEORGIA; /*** Georgia doesn't follow "usual" pattern. ***/ /*** Also issue of MUSKOGEE COUNTY and COLUMBUS CITY. Muskogee (106) ***/ /*** becomes Columbus City (510) in 1972, which in turn becomes ***/ /*** Muskogee (215) again in 1982. I convert to the latter. Note that ***/ /*** there is no "splitting" or "rejoining" involved. Muskogee ***/ /*** disappears from the LRD in 1972, and Columbus City (i.e., the ***/ /*** code 510) disappears completely in 1982. ***/ if fst=13 & oldcou=510 & (yr=72 | yr=77) then cou=215 ; * VIRGINIA ; /*** Virginia poses the most problems -- not only with the 1967-72 ***/ /*** conversion (lots of "independent cities"!), but also in the ***/ /*** later years as new counties form from old ones, and so forth. ***/ /*** Note: there are some counties (and county codes) that appear in ***/ /*** certain manuals, but NEVER in the LRD (e.g., Elizabeth City ***/ /*** (055), Norfolk County (129), Princess Anne County (154), South ***/ /*** Norfolk City (785), Warwick County (187), and Warwick City ***/ /*** (815)). I'll pretend they never existed. ***/ * No change relevant to my period of interest; * ARIZONA; /*** Issue is that LA PAZ COUNTY (012) splits from YUMA COUNTY (027) in ***/ /*** 1987. I will recode La Paz as Yuma, and perform the merge else- ***/ /*** where. Rest of the state is converted "as usual". ***/ if fst=4 & oldcou=012 & (yr=87 | yr=92) then cou=027 ; * NEW MEXICO; /*** New Mexico doesn't follow "usual" pattern. (Tim Dunne's program ***/ /*** doesn't catch this.) ***/ /*** Also issue of CIBOLA COUNTY(006) splitting from VALENCIA COUNTY ***/ /*** (061) in 1982. I will recode Cibola as Valencia, and merge the two ***/ /*** elsewhere. ***/ if fst=35 & oldcou=006 & (yr=82 | yr=87 | yr=92) then cou=061 ; * SOUTH DAKOTA; /*** South Dakota doesn't follow the "usual" pattern. (Tim Dunne's ***/ /*** program doesn't catch this.) ***/ /*** Also issue of WASHABAUGH COUNTY (065->131) merging into JACKSON ***/ /*** COUNTY (071) in 1977 or 1982. Since no manufacturing ever took ***/ /*** place in Washabaugh, I won't bother converting it to 071. ***/ /*** Note: ARMSTRONG COUNTY (post-72: 001) appears in some manuals, but ***/ /*** NEVER in the LRD. I will pretend it never existed. ***/ * No changes; * CALIFORNIA; /*** Irregularities in ALPINE COUNTY (002->003) appear in the LRD. It ***/ /*** appears to have 14, 409, 141, 176, 0, 0, and 0 manufacturing ***/ /*** establishments in 1963-1992, respectively. County & City Data ***/ /*** Books indicate that this county has historically had little or no ***/ /*** manufacturing. Any plants found in this county should be suspect, ***/ /*** and any data for this county derived from the LRD shouldn't be ***/ /*** used. (Of course, this is just my opinion.) I drop this county. ***/ if fst=6 & oldcou=3 & (yr=77 | yr=82 | yr=87 | yr=92) then delete ; * ALL OTHER STATES AND COUNTIES IN THE UNITED STATES; if cou=. & (yr=77|yr=82|yr=87|yr=92) then cou = oldcou ; drop oldcou; /* End of Becker.sas */ * The following are non trivial * changes from the equivalency files that are not included * in becker.sas * (trivial changes are chages where more than * afact >0.98 or afact <0.03) *OBS COUNTY80 COUNTY POP AFACT * 9 51015 51015 54677 0.898 *10 51015 51790 2244 0.037 *11 51015 51820 3987 0.065 *12 51081 51081 8853 0.925 *13 51081 51595 719 0.075 *14 51095 51095 34859 0.966 *15 51095 51830 1225 0.034 *16 51143 51143 55655 0.839 *17 51143 51590 10718 0.161 *18 51165 51165 57482 0.871 *19 51165 51660 8512 0.129 *20 51175 51175 17550 0.965 *21 51175 51620 640 0.035 *22 51177 51177 57403 0.957 *23 51177 51630 2550 0.043; if fst=51 & oldcou=790 & yr=92 then cou=015; if fst=51 & oldcou=820 & yr=92 then cou=015; if fst=51 & oldcou=595 & yr=92 then cou=081; if fst=51 & oldcou=830 & yr=92 then cou=095; if fst=51 & oldcou=590 & yr=92 then cou=143; if fst=51 & oldcou=660 & yr=92 then cou=165; if fst=51 & oldcou=620 & yr=92 then cou=175; if fst=51 & oldcou=630 & yr=92 then cou=177; * The variable COUNTY is now * (1) consistent over time * (2) with the correct state code; county = fst*1000 + cou; drop cou; ***************************************************************** SMSA is *1980* SMSA. For 1982, the original smsa definition is kept. For 1992, I modify the original smsa definition to be consistent with 1980 definition. (To do so, I replicate Michael Greenstone's code written for the Census of Population.) The variable SMSA80 is the correct variable to be used. The variable SMSAOLD is the original assignement (obviously SMSA80 =SMSAOLD in 1980). *****************************************************************; smsaold = smsa; if yr=82 then smsa80 = smsa; * BEGIN 1990 loop; if yr=92 then do; ******************************************************************** I delete all the msapmsas that are new to the '90 sample and were not part of another Smsa in '80. (I also delete Dayton here because it was combined with Springfield, OH and there isn't a good way to separate them and/or to get a definition of either one that resembles its form in 1980) ********************************************************************; if smsa=2030 then smsa=.; /*Decatur, AL new*/ if smsa=1580 then smsa=.; /*Cheyenne, Wy new*/ if smsa=2180 then smsa=.; /*Dothan, AL new*/ if smsa=2710 then smsa=.; /*Fort Pierce, FL new*/ if smsa=3350 then smsa=.; /*Houma-Thibodaux, LA new*/ if smsa=3580 then smsa=.; /*Jackson, IN new*/ if smsa=3610 then smsa=.; /*Jamestown-Dunkirk, NY new*/ if smsa=4940 then smsa=.; /*Merced, CA new*/ if smsa=5020 then smsa=.; /*Middletown, CT new*/ if smsa=5345 then smsa=.; /*Naples, FL */ if smsa=6660 then smsa=.; /*Rapid City, SD new*/ if smsa=7490 then smsa=.; /*Santa Fe, NM new*/ if smsa=9360 then smsa=.; /*Yuma, AZ new*/ if smsa=2000 then smsa=.; /* Dayton */ ******************************************************************** delete counties etc that were added to smsas in '90 ********************************************************************; if smsa=6960 and county =26085 then smsa=.; if smsa=6960 and county =26105 then smsa=.; if smsa=6960 and county =26107 then smsa=.; if smsa=6960 and county =26111 then smsa=.; if smsa=6960 and county =26123 then smsa=.; if smsa=6960 and county =26133 then smsa=.; if smsa=3240 and county =42075 then smsa=.; if smsa=5190 and county =34029 then smsa=.; if smsa=1520 and county =37097 then smsa=.; if smsa=1520 and county =37109 then smsa=.; if smsa=1520 and county =45027 then smsa=.; if smsa=1520 and county =45043 then smsa=.; if smsa=1520 and county =45089 then smsa=.; if smsa=1930 and county= 09001 then smsa=.; if smsa=1930 and county= 09005 then smsa=.; if smsa=5640 and county= 34037 then smsa=.; if smsa=5640 and county= 34041 then smsa=.; *if smsa=5480 and county=09009 then smsa=.; *ignore because there are too many pumas in this county; if smsa=5015 and county=34035 then smsa=.; if smsa=7560 and county=42113 then smsa=.; if smsa=7560 and county=42015 then smsa=.; if smsa=7560 and county=42115 then smsa=.; if smsa=7560 and county=42117 then smsa=.; if smsa=7560 and county=42131 then smsa=.; if smsa=7560 and county=42037 then smsa=.; if smsa=7560 and county=42093 then smsa=.; if smsa=7560 and county=42097 then smsa=.; *********************************************************************** In the following lines of code I define a new variable smsa80 that redefines all '90 smsas so that they correspond to the '80 definitions of smsas. ***********************************************************************; *correct for changes in entire smsas between 80 and 90; if smsa=5015 then do; smsa80=5460; end; else if smsa=5190 then do; smsa80=4410; end; else if smsa=5950 then do; smsa80=5660; end; else if smsa=620 then do; smsa80=1600; end; else if smsa=3690 then do; smsa80=1600; end; else if smsa=3965 then do; smsa80=1600; end; else if smsa=5775 then do; smsa80=7360; end; else if smsa=2800 then do; smsa80=1920; end; else if smsa=1145 then do; smsa80=3360; end; else if smsa=7090 then do; smsa80=1120; end; else if smsa=845 then do; smsa80=6280; end; else if smsa=1125 then do; smsa80=2080; end; else if smsa=5700 then do; smsa80=1280; end; else if smsa=8725 then do; smsa80=6440; end; else if smsa=6060 then do; smsa80=6480; end; else if smsa=7560 then do; smsa80=5745; end; *pull out '80 smsas when the '90 smsa represents 2 '80 smsas; else if smsa=6960 and county=26017 then do; smsa80=800; end; else if smsa=6760 and county=51053 then do; smsa80=6140; end; else if smsa=6760 and county=51081 then do; smsa80=6140; end; else if smsa=6760 and county=51149 then do; smsa80=6140; end; else if smsa=6760 and county=51181 then do; smsa80=6140; end; else if smsa=6760 and county=51183 then do; smsa80=6140; end; else if smsa=6760 and county=51570 then do; smsa80=6140; end; else if smsa=6760 and county=51595 then do; smsa80=6140; end; else if smsa=6760 and county=51670 then do; smsa80=6140; end; else if smsa=6760 and county=51730 then do; smsa80=6140; end; else if smsa=1840 and county=39089 then do; smsa80=5645; end; else if smsa=5720 and county=51001 then do; smsa80=5680; end; else if smsa=5720 and county=51095 then do; smsa80=5680; end; else if smsa=5720 and county=51131 then do; smsa80=5680; end; else if smsa=5720 and county=51199 then do; smsa80=5680; end; else if smsa=5720 and county=51735 then do; smsa80=5680; end; else if smsa=5720 and county=51830 then do; smsa80=5680; end; else if smsa=5720 and county=51650 then do; smsa80=5680; end; else if smsa=5720 and county=51700 then do; smsa80=5680; end; else if smsa=5720 and county=37051 then do; smsa80=5680; end; else if smsa=5720 and county=37017 then do; smsa80=5680; end; else if smsa=5720 and county=37093 then do; smsa80=5680; end; else if smsa=5720 and county=37155 then do; smsa80=5680; end; else if smsa=5720 and county=37165 then do; smsa80=5680; end; else if smsa=1520 and county=37005 then do; smsa80=6885; end; else if smsa=1520 and county=37009 then do; smsa80=6885; end; else if smsa=1520 and county=37011 then do; smsa80=6885; end; else if smsa=1520 and county=37121 then do; smsa80=6885; end; else if smsa=1520 and county=37189 then do; smsa80=6885; end; else if smsa=1520 and county=37193 then do; smsa80=6885; end; else if smsa=1520 and county=37199 then do; smsa80=6885; end; else if smsa=1520 and county=37025 then do; smsa80=7140; end; else if smsa=1520 and county=37159 then do; smsa80=7140; end; else if smsa=1520 and county=45091 then do; smsa80=6885; end; else if smsa=1520 and county=45015 then do; smsa80=7140; end; else if smsa=1520 and county=45019 then do; smsa80=7140; end; else if smsa=1520 and county=45035 then do; smsa80=7140; end; *move counties from their '90 smsa to their '80 one of origin; else if smsa=875 and county=34003 then do; smsa80=5600; end; *add counties that were deleted in the '90; else if fst=1 and smsa=. and county=1083 then do; smsa80=3440; end; else if fst=1 and smsa=. and county=1089 then do; smsa80=3440; end; else smsa80=smsa; * I did not catch this change originally and so will do it here. I am merely recreating the Patterson-Clifton-Passaic, NJ smsa that was aggregated in the '90 sample; if smsa80=875 then smsa80=6040; * END 1990 LOOP; end; drop smsa; sic = int(ind/100); proc sort; by smsa80 sic; run; data in.merged1; merge merged1 on.medu_lrd; by smsa80 sic; if ppn =. then delete; proc means; proc sort; by ppn; run;