Home > Structure of a SAS Program

Structure of a SAS Program


1  

Ron Briggs-UTD 

Introduction to SAS 

a programming environment and language for data manipulation and analysis. 
 

 Files referenced here are available in the Green Lab at:

P:\briggs\poec5317 

Single best book on SAS:  Lora D. Delwiche and Susan Slaughter “The Little SAS Book: A Primer,” SAS Institute, 2nd edition, 1999 

 


2  

Ron Briggs-UTD 

First Example Program 
Sample SAS program: sasex0.sas 

*my first SAS program;

data grocery;

    input  product   $    var1    var2;

    label   var1='Number on Hand'

                var2='Number on Order';

    lines;

cheese   25   32

hotdogs   26   14

mustard   13    32

;;;;;;;;

run;

proc print label; run;

proc means; run; 
 

The SAS System                                                                                    1 

                                   Number     Number

OBS    PRODUCT    on Hand    on Order 

1        cheese                    25         32

2        hotdogs                  26         14

3        mustard                  13         32 
 

The SAS System                                  19:13 Tuesday, July 16, 1996   2 

Variable  Label             N          Mean       Std Dev       Minimum Maximum

-------------------------------------------------------------------------------------------------------

VAR1      Number on Hand   3    21.3333333     7.2341781    13.0000000 26.000000

VAR2      Number on Order  3    26.0000000    10.3923048    14.0000000 32.000000

------------------------------------------------------------------------------------------------------- 
 
 

Output file (‘printout’) 

Program File


3  

Ron Briggs-UTD 

SAS Log from First Sample Program: sasex0.log 

                                                                            (with minor editing)

1          *my first SAS program;

3          data grocery;

4            input product $ var1 var2;

5            label var1='Number on Hand'

6                  var2='Number on Order';

7          lines;

NOTE: The data set WORK.GROCERY has 3 observations and 3 variables.

NOTE: DATA statement used:

      real time           0.148 seconds

      cpu time            0.070 seconds

11         ;;;;;;;;

12         run;

13         proc print label;  run;

NOTE: The PROCEDURE PRINT printed page 1.

14         proc means;  run;

NOTE: The PROCEDURE MEANS printed page 2.


4  

Ron Briggs-UTD 

Second Sample Program 
Here is Data!    How Get it into SAS? 

Alabama            1AL ESC  63  3894  3444  3267

Alaska             2AK PAC  94   402   304   228

Arizona            4AZ MTN  86  2718  1776  1306

Arkansas           5AR WSC  71  2286  1924  1787

California         6CA PAC  93 23668 19971 15735

Colorado           8CO MTN  84  2890  2211  1758

Connecticut        9CT NE   16  3108  3033  2536

Delaware          10DE SA   51   594   547   445

D of Columbia     11DC SA   53   638   757   764

Florida           12FL SA   59  9746  6797  4959

Georgia           13GA SA   58  5463  4587  3941

.

.


Texas             48TX WSC  74 14229 11199  9582

Utah              49UT MTN  87  1461  1059   890

Vermont           50VT NE   13   511   444   389

Virginia          51VA SA   54  5347  4652  3970

Washington        53WA PAC  91  4132  3415  2859

West Virginia     54WV SA   55  1950  1745  1861

Wisconsin         55WI ENC  35  4706  4418  3952

Wyoming           56WY MTN  83   470   332   330 

(51 lines of data) 

Data Documentation (state.dd): 
 

RECORD FORMAT SHEET:

FILE:    stpop.dat  (PC)    stpop.data (UNIX)

  Rec  COLS        Variable                              Type 

1    1-17        Name of State                          Alpha

1    18-20       Federal Info. Processing code   num

1    21-23       State postal code                       Alpha

1    24-27       Census Bureau Division            Alpha

1    28-30       Census Bureau State code          num

1    31-36       Population in 1980                     1000

1    37-42       Population in 1970                     1000

1    43-48       Population in 1960                     1000 
 

Data File (state.dat)


5  

Ron Briggs-UTD 

Second Sample Program (contd.) 
Example SAS Program  for Reading Data File (sasex1.sas) 

*USE DATA STEP TO READ IN RAW DATA FROM AN O/S FILE AND CREATE A SAS DATA SET;

DATA    USPOP;   *names the sasdata set to create;

    *read in data;

     INFILE    'p:\briggs\poec5317\state.dat';

     INPUT   STATE  $  1-17   FIPS  18-20  ST  $  21-23           DIV  $   24-27  CB  28-30  POP80  31-36  POP70 37-42 POP60 43-48;

    *create some new variables using assign statements;

     PPOP6070=(POP70-POP60)/POP60*100;

     PPOP7080=(POP80-POP70)/POP70*100;

    *label all variables;

     LABEL   STATE='Name of State'

        FIPS='Federal Info. Processing Code'

        ST='State Postal Code'

        DIV='Census Bureau Division'

        CB='Census Bureau State Code'

        POP80='POPULATION in 1980'

        POP70='POPULATION in 1970'

        POP60='POPULATION in 1960'

        PPOP6070='% POP.  Change 1960-1970'

        PPOP7080='% POP. Change 1970-1980';

RUN; 

*USE  PROC STEPS TO ANALYSE THIS SAS DATA SET;

*produce a simple print of the data;

PROC   PRINT   DATA=USPOP;

     VARIABLES   ST  POP80  PPOP6070  PPOP7080 ;

     TITLE1 'US Demographic Change by State' ;

     TITLE2 '1960-1980';

RUN;

*sort and print the data by geographic region;

PROC  SORT   DATA=USPOP;     BY CB;

PROC PRINT   DATA=USPOP(OBS=51)   NOOBS   LABEL;

     VARIABLES    ST   POP80   PPOP6070   PPOP7080;

     FORMAT    PPOP6070    PPOP7080   5.1;

     FOOTNOTE   'STATES Are Ordered According to Census Bureau Division';

RUN;

FOOTNOTE;

*regression predicting growth in  seventies from growth in sixties;

PROC REG  DATA=USPOP;

     MODEL  PPOP7080=PPOP6070  /STB;

     TITLE 'Population Growth in the Seventies Predicted from Sixties;

RUN;

*produce descriptive statistics;

PROC MEANS SUM MIN MAX;  

     TITLE 'Population Statistics';

RUN; 

 


6  

Ron Briggs-UTD 

Log from Second Sample Program: SASEX1.LOG 

1    *USE DATA STEP TO READ IN RAW DATA FROM AN O/S FILE AND CREATE A SAS DATA SET;

2    options nocenter;

3    DATA USPOP;   *names the sasdata set to create;

4

5        *read in data;

6         INFILE 'p:\briggs\poec5317\state.dat';

7         INPUT STATE $ 1-17 FIPS 18-20  ST $ 21-23

8           DIV $ 24-27 CB 28-30 POP80 31-36 POP70 37-42 POP60 43-48;

9

10       *create some new variables using assign statements;

11        PPOP6070=(POP70-POP60)/POP60*100;

12        PPOP7080=(POP80-POP70)/POP70*100;

13

14      *label all variables;

15        LABEL   STATE='NAME OF STATE'

16           FIPS='FEDERAL INFO. PROCESSING CODE'

17           ST='STATE POSTAL CODE'

18           DIV='CENSUS BUREAU DIVISION'

19           CB='CENSUS BUREAU STATE CODE'

20           POP80='POPULATION IN 1980'

21           POP70='POPULATION IN 1970'

22           POP60='POPULATION IN 1960'

23           PPOP6070='% POP.  CHANGE 1960-1970'

24           PPOP7080='% POP. CHANGE 1970-1980';

25   RUN;

NOTE: The infile 'p:\briggs\poec5317\state.dat';

      FILENAME='p:\briggs\poec5317\state.dat';

       RECFM=V,LRECL=256

NOTE: 51 records were read from the infile

      The minimum record length was 80.

      The maximum record length was 80.

NOTE: The data set WORK.USPOP has 51 observations and 10 variables. 
 

28   *USE  PROC STEPS TO ANALYSE THIS SAS DATA SET;

30   *produce a simple print of the data;

31   PROC PRINT DATA=USPOP;

32        VARIABLES ST POP80 PPOP6070 PPOP7080 ;

33        TITLE1 'US DEMOGRAPHIC CHANGE BY STATE' ;

34        TITLE2 '1960-1980';

35   RUN;

NOTE: The PROCEDURE PRINT used 0.98 seconds

37   *sort and print the data by geographic region;

38   PROC SORT DATA=USPOP;     BY CB;

NOTE: The data set WORK.USPOP has 51 observations and 10 variables.

NOTE: The PROCEDURE SORT used 0.44 seconds.

39   PROC PRINT DATA=USPOP(OBS=51) NOOBS LABEL;

40        VARIABLES ST POP80 PPOP6070 PPOP7080;

41        FORMAT    PPOP6070 PPOP7080 5.1;

42        FOOTNOTE 'STATES ARE ORDERED ACCORDING TO CENSUS BUREAU DIVISION';

43   RUN;

NOTE: The PROCEDURE PRINT used 0.27 seconds.

44   FOOTNOTE;

47

48 *regression to preditc growth in seventies form sixties;

49    PROC REG DATA=USPOP;

50        MODEL  PPOP7080=PPOP6070   /STB;

51       TITLE 'Population Growth in the Seventies Predicted from Sixties';

52    RUN;

NOTE: 51 observations read.

NOTE: 51 observations used in computations.

53   *produce descriptive statistics;

NOTE: The PROCEDURE PLOT used 0.48 seconds.

54   PROC MEANS SUM MIN MAX;

55        TITLE 'Population Statistics';

56   RUN;

NOTE: The PROCEDURE MEANS used 0.55 seconds.


7  

Ron Briggs-UTD 

Example of Output from sasex1 

US Demographic Change by State

1960-1980

OBS    ST    POP80    PPOP6070    PPOP7080

  1    AL     3894      5.4178     13.0662

  2    AK      402     33.3333     32.2368

  3    AZ     2718     35.9877     53.0405

  .

  .

51    WY      470      0.6061     41.5663 

US Demographic Change by State

1960-1980

State                    % POP.       % POP.

Postal    POPULATION     Change       Change

Code       in 1980     1960-1970    1970-1980 

  ME          1125          2.5         13.2

  NH           921         21.6         24.8

  VT           511         14.1         15.1

   .

   .

  CA         23668         26.9         18.5

  AK           402         33.3         32.2

  HI           965         21.6         25.3

STATES are Ordered According to Census Bureau Division 

Population Growth in the Seventies Predicted from Sixties

Model: MODEL1

Dependent Variable: PPOP7080   % POP.  Change 1970-1980 

Analysis of Variance 

                         Sum of         Mean

Source          DF      Squares       Square      F Value       Prob>F 

Model            1   4600.88079   4600.88079       36.285       0.0001

Error           49   6213.20649    126.80013

C Total         50  10814.08729 

    Root MSE      11.26056     R-square       0.4255

    Dep Mean      15.78436     Adj R-sq       0.4137

    C.V.          71.33995 

Parameter Estimates 

              Parameter  Standard   T for H0:                  Standardized

Variable  DF  Estimate     Error   Parameter=0    Prob > |T|      Estimate 

INTERCEP   1  5.627559  2.30854548   2.438        0.0185        0.00000000

PPOP6070   1  0.756642  0.12561153   6.024        0.0001        0.65226722 

              Variable

Variable  DF     Label 

INTERCEP   1  Intercept

PPOP6070   1  % POP.  Change 1960-1970


8  

Ron Briggs-UTD 

The Rules of SAS Programming 

  • SAS programs constructed of:
    • DATA steps
      • which create SAS data sets
    • PROC steps
      • which analyze SAS data sets
  • SAS statements
    • begin anywhere on a line
    • end with a semi-colon;
    • extend to as many lines as needed
    • can be stacked many to a line
    • not case sensitive
 
  • SAS names 
    (for variables, datasets, etc)
    • 8 characters or less
    • start with alpha character
    • no special characters except  _
  • SAS variables: 2 types
    • character  
      (max. of  200 characters long)
    • numeric 
      (default 8 bytes long)

Some of these rules for names and variables are less stringent in newer versions of SAS.


9  

Ron Briggs-UTD 

SAS Jobs 

Data State;

input pop80 pop90;

growth=pop90-pop80; 

State 

Proc MEANS;

variables pop80 pop90; 
 

report/

output 

raw data 

raw data 

read  by 

SAS

DATA

step 

creates 

   SAS DATA SET 

analyzed by 

SAS PROC Step 

generates 

report/output etc.


10  

Ron Briggs-UTD 

Raw Data & SAS Data Sets 

  • data may exist in one of two forms:
    • as a raw data file, managed by the operating system (e.g Windows, or UNIX)
    • as a SAS data set, managed by SAS
  • SAS data sets are matrices (tables):
    • variables occupy the columns 
      • each variable is identified by a name (e.g. Population)
    • observations occupy the lines or rows (e.g. Texas)
      • there is no limit to the number of observations
      • SAS always processes just one observation at a time

An entry (matrix element) in this table is called a data value.

Once  data are in a SAS data set, only the meaning of the variable names needs to be remembered--all physical details of the raw data file can be forgotten. 

data set name

variable attributes:

name, type, length, format

informat, label.

history:

source statements used

to create the data set 

proc

contents; 

report 

Jane  F 11 57.3 83

John  M 12 59.0 99.5

.

.

.

Alice F 13 56.5 84.0  

obs1

obs2

obs3

.

.

.

obs19 

descriptors 

data 

proc

print; 

report 

Name Gender Age  Height     Weight 

variables in columns 

SAS data set 

SAS Procedures can only process SAS data sets.

A data step is used to read raw data into a SAS data set.


11  

Ron Briggs-UTD 

Example Structure of a SAS Program 

*my program;

OPTIONS nocenter linesize=80;

DATA sasdata1;

INFILE  ‘c:\rawdata’;

INPUT var1 var2;

….

RUN;

DATA sasdata2;

…..

RUN;

DATA sasdata;

MERGE sasdata1 sasdata2;

RUN;

PROC first;

….

RUN;

PROC second;

RUN; 

comment statement--not part of PROC or DATA step

establishes options--not part of PROC or DATA step

start data step & name sas data set being created

location of raw data file

identify variables within raw data set

other data step statements

end first data step

start second data step 

end second data step

start third data step

combine first and second data sets into ‘master’ set 

end third data step

begin first PROC step to  analyze data  

end first PROC step

begin second PROC step for more analysis

end second PROC step 

data step 

data step 

proc step 

data step


12  

Ron Briggs-UTD 

How SAS data sets are built: one observation at a time 

From another SAS data set

  • the first observation is selected from the source data set
  • it is processed through all statements in the data step ( up to the run;)
    • it is output to the data set(s) being created
  • the second observation from the source is selected, etc.
 

From a raw data file

  • the first n logical records (based on the INPUT statement) are selected from the source
  • they are processed through all statements in the data step
    • they are output to the data set(s) being created, usually as one observation
  • the next n logical records from the source are selected, etc.
 

How SAS data sets are named: temporary v. permanent 
 

  • Temporary (duration of program execution only)

    --single level name: datasetname  e.g. USPOP

  • Permanent (saved to disk)

    --double level name: libraryname.datasetname e.g. POPDATA.USPOP

13  

Ron Briggs-UTD 

Using Permanent SAS data sets 

Once a SAS data set has been saved permanently (by using a two level name)  it can be recalled for use in PROCS,  or for further modification 

*recall and use in a PROC;

LIBNAME   POPDATA  'c:\sasdata\usstate\';

PROC   MEANS  DATA=POPDATA.USPOP;  RUN; 

*recall and further modify (transform) the data set temporarily;

LIBNAME   POPDATA  'c:\sasdata\usstate\';

DATA   USPOP;

SET  POPDATA.USPOP;

lines of code

RUN; 

*recall, modify and save new version permanently;

LIBNAME   POPDATA  'c:\sasdata\usstate\';

DATA POPDATA.USPOPV2

SET POPDATA.USPOP;

lines of code

RUN;  

The same name (USPOP) could be used for the new version (called USPOPV2). This is dangerous, but may be necessary if disk space is short. 

If the POPDATA library has been defined in SAS Explorer, the LIBNAME statement may be omitted.  

For SAS V7, the folder ‘c:\sasadata\usstate\’ contains a file called USPOP.SAS7BDAT


14  

Ron Briggs-UTD 

Third Example SAS Program:  sasex2A.sas (1 of 3) 

* read in some data from an O/S file and create a temporary SAS data set called USPOP;

*illustrates use of:

   --fixed field format

   --multiple lines of input per observation; 
 

DATA USPOP;

INFILE   ‘p:\appsdata\briggs\poec5317\state2.dat’;

INPUT        STATE $ 1-17 FIPS 18-20  ST $ 21-23 DIV $ 24-27 CB 28-30 POP80 31-36 POP70 37-42 POP60 43-48

  #2  UNEMP80 3.1   AGE2040 5.1    @10 INC80 6.0;

                       /* describes the data in the file*/

RUN; 

*it is  good practice to print some of the data read and check it;

PROC PRINT DATA=USPOP(OBS=5);  /* prints first five observations.*/

TITLE 'Check of Data Read In'; 
 

* read  additional data into a second SAS data set called POP85;

*illustrates use of:

     --free-field format

     --identification of O/S file on the INFILE statement;

     --FILENAME statement to assign an ‘alias’ to a file 

/*equates ‘nickname’ (in1)  with real  file name*/

FILENAME in1 'p:\appsdata\briggs\poec5317\stpop85.dat';

DATA POP85;

         INFILE  in1; /*uses nickname to identify the  data file */ INPUT POP85 POP82 FIPS;

         DROP POP82;

RUN; 

/************************************************

With free field format , variables don’t necessarily  occupy the same column position  (field) on each line. 

This can be convenient but also dangerous since a single error (e.g. missing space or data value) can result in ALL   subsequent data being read incorrect.                       

*************************************************/      


15  

Ron Briggs-UTD 

Example SAS Program Set:  sasex2A.sas(2 of 3) 

*We now combine together our two data sets into a single ‘research’  data set; 

PROC SORT DATA=POP85;BY FIPS;

DATA USPOP;   

  MERGE POP85 USPOP;

  BY FIPS;    

RUN; 
 

/********************************************                 

MERGE allows us to combine observation in one data set(s)  with those in another. The data sets are placed ‘side-by-side.’ Observations are matched based upon common values on a BY variable. Note that:

--BY variable must exist with same name in both data sets.

--Both data sets must be sorted by the BY variable.

*********************************************/

/********************************************

If no BY variable is available, the two data sets will be placed 'side-by-side' with observations matched only by sequential order. Dangerous! If they are not in the exact same order, or if an observation is missing from one data set,  there will be a mismatch from that point onwards and no erroror warning message will be issued.

*********************************************/


16  

Ron Briggs-UTD 

Example SAS Program Set:  sasex2A.sas (3 of 3) 

*we now save our final research data set permanently on disk;

*be sure that the subdirectory (folder) specified in the LIBNAME statement exists  on your file system before running program; 

LIBNAME   POPDATA  'c:\sasdata\usstate\';

*specifies the subdirectory to use to save the SAS data sets;

DATA POPDATA.USPOP;    *note two level name;

     SET USPOP;    *identifies source SAS data set;

     KEEP STATE FIPS ST DIV REG POP85 POP90 POP80 POP70 POP60; * selects variables for inclusion;

     LABEL STATE='Name of State'

                  FIPS='Federal Information Processing code'

                     ST='State postal code'

                   DIV='Census Bureau Division'

                  REG='Census Bureau Region'

               POP90='Population in 1990'

               POP80='Population in 1980'

               POP70='Population in 1970'

               POP60='Population in 1960';

RUN; 
 
 

/*****************************************

SAS data sets with a 'two level' name (POPDATA.USPOP) are saved permanently after the job runs.

The 'first level'  (POPDATA in this example) is a SAS alias (often referred to as a SAS library) which identifies, via a LIBNAME command, the sub-directory in which the SAS data set will be stored.

The SAS data set is written to that subdirectory with the 'second level' name as the filename,  and an extension of  ".sd2" in SAS/PC  (or ‘.ssd001’ in UNIX)

(that is, in the example, 'uspop.sd2' or 'uspop.ssd01'.); 

‘Single level’ SAS data sets are internally given a first level name by SAS of  WORK. (e.g. WORK.USPOP).  They exist only for the duration of the job. 

A  SET statement is used to identify the source SAS data set.

We are ‘setting’ WORK.USPOP into the data step to be saved as

POPDATA.USPOP,  if you like.

***********************************************/

/**********************************************

The above two steps could have been accomplished together:

LIBNAME   POPDATA  'c:\sasdata\usstate\’

DATA POPDATA.USPOP;   

  MERGE POP85  USPOP;

  BY   FIPS;

*other lines of code beginning with KEEP...;

RUN;

**************************************************/


17  

Ron Briggs-UTD 

STPOP2.DAT and STPOP2.DD 

Alabama            1AL ESC  63  3894  3444  3267   410  271 -233  178

75  294  16347

Alaska             2AK PAC  94   402   304   228    60   63   16   36

97  374  28395

Arizona            4AZ MTN  86  2718  1776  1306   243  231  228  712

62  305  19017

Arkansas           5AR WSC  71  2286  1924  1787   208  131  -71  232

69  279  14641

California         6CA PAC  93 23668 19971 15735  2123 1620 2113 2078

65  335  21537

Colorado           8CO MTN  84  2890  2211  1758   238  233  213  446

50  358  21279

Connecticut        9CT NE   16  3108  3033  2536   282  128  214   52

47  306  23149

.

.

.

Wisconsin         55WI ENC  35  4706  4418  3952   461  274    4   14

66  299  20915

Wyoming           56WY MTN  83   470   332   330    42   42  -39   96

41  350  22430 

RECORD FORMAT SHEET:

File:state2.dat (PC)  or  state2.data  (UNIX) 

Rec  COLS        Variable                              Type/Format

1    1-17        Name of State                              Alpha

1    18-20       Federal Info. Processing code       num

1    21-23       State postal code                           Alpha

1    24-27       Census Bureau Division                Alpha

1    28-30       Census Bureau State code              num

1    31-36       Population in 1980                         1000

1    37-42       Population in 1970                        1000

1    43-48       Population in 1960                          1000

1    50-54       Natural increase 1960-1970             1000

1    55-59       Natural increase 1970-1980             1000

1    60-64       Net migration 1960-1970                1000

1    65-69       Net migration 1970-1980                1000

2     1- 3       Percent Unemployed, 1980                 .1

2     4- 8       Percent Aged 20 thru 39                  .1

2     9-15       Median Family Income, 1980             num 

Check of Data Read In       11:19 Monday, July 15, 1996  36

  OBS   STATE        FIPS   ST   DIV   CB   POP80   POP70   POP60   UNEMP80   AGE2040   INC80

    1   Alabama        1    AL   ESC   63    3894    3444    3267     7.5       29.4    16347

    2   Alaska         2    AK   PAC   94     402     304     228     9.7       37.4    28395

    3   Arizona        4    AZ   MTN   86    2718    1776    1306     6.2       30.5    19017

    4   Arkansas       5    AR   WSC   71    2286    1924    1787     6.9       27.9    14641

    5   California     6    CA   PAC   93   23668   19971   15735     6.5       33.5    21537 
 
 
 
 

raw data 

data documentation 

printout of SAS Data set


18  

Ron Briggs-UTD 

STPOP85.DAT and STPOP85.dd 
(data and data documentation files) 

1165 1139 23  1156

998 948 33    977

535 520 50     530

5823 5746 25    5798

967 953 44      962

3175 3128 9      3154

17762 17575 36   17735

7568 7430 34     7515

11864 11883 42    11901

10745 10777 39    10752

5500 5484 18      5498

11538 11481 17    11511

9085 9118 26      9075

4775 4747 55      4766

4190 4133 27      4162

.

.

.

26358 24786 6  25622

522 446 2       500

1051 998 15     1039 

Documentation for file:  stpop85.dat 

Observations:

States of the US 

Variables:

Population, 7/1/1985

Population, 7/1/1982

FIPS code

Population, 7/1/84 

Free field format--variables separated by spaces 

Data 

Data documentation


19  

Ron Briggs-UTD 

sasex2A.log (1 of 2) 

1    *read in some data from an O/S file

2    & create a temporary SAS data set called USPOP;

3

4    DATA USPOP;

5       INFILE 'p:\appsdata\briggs\poec5317\state2.dat';

6       INPUT STATE $ 1-17 FIPS 18-20  ST $ 21-23 DIV $ 24-27 CB 28-30 POP80 31-36 POP70   37-42

POP60 43-48

7         #2  UNEMP80 3.1   AGE2040 5.1    @10 INC80 6.0;

8                         /* describes the data in the file*/

9    RUN; 

NOTE: The infile 'f:\public\poec5317\state2.dat' is:

      FILENAME=f:\public\poec5317\state2.dat,

      RECFM=V,LRECL=256

NOTE: 102 records were read from the infile 'p:\appsdata\briggs\poec5317\state2.dat'.

      The minimum record length was 80.

      The maximum record length was 80.

NOTE: The data set WORK.USPOP has 51 observations and 11 variables.

NOTE: The DATA statement used 2.08 seconds.

10

11   *it is  good practice to print some of the data read and check it;

12   PROC PRINT DATA=USPOP(OBS=5);

                       /* prints first five observations.*/

13   TITLE 'Check of Data Read In'; 
 
 

15   *read  additional data into a second SAS data set called POP85;

16

17   FILENAME     in1     'p:\appsdata\briggs\poec5317\stpop85.dat';

                             *assigns 'nickname' (in1) to file;

18   DATA POP85;

19       INFILE     in1;

20       INPUT   POP85   POP82   FIPS;

21       DROP POP82;

22   RUN; 

NOTE: The infile IN1 is:

      FILENAME=p:\appsdata\briggs\poec5317\stpop85.dat,

      RECFM=V,LRECL=256

NOTE: 51 records were read from the infile IN1.

      The minimum record length was 17.

      The maximum record length was 30.

NOTE: The data set WORK.POP85 has 51 observations and 2 variables.

NOTE: The DATA statement used 0.77 seconds.

23

24


20  

Ron Briggs-UTD 

sasex2A.log (2 of 2) 

*Now combine together our two data sets into a single ‘research’  data set;

25

26   *First, add the new data in the POP85 data set to the original USPOP;

27

28   PROC SORT DATA=POP85;   BY FIPS; 

NOTE: The data set WORK.POP85 has 51 observations and 2 variables.

NOTE: The PROCEDURE SORT used 0.5 seconds. 
 

29   DATA USPOP;

30        MERGE POP85 USPOP;

31             BY FIPS;

32   RUN; 

NOTE: The data set WORK.USPOP has 51 observations and 12 variables.

NOTE: The DATA statement used 0.7 seconds. 
 

34   *now save the sas data set permanently;

35

36   LIBNAME   POPDATA  'c:\sasdata\usstate\';

NOTE: Libref POPDATA was successfully assigned as follows:

      Engine:        V611

      Physical Name: C:\SASDATA\USSTATE

37   *specifies the subdirectory to use to save the SAS data sets;

38   DATA POPDATA.USPOP;    *note two level name;

39        SET USPOP;    *identifies source SAS data set;

40        KEEP STATE FIPS ST DIV POP85  POP80 POP70 POP60; * selects variables for

inclusion;

41        LABEL        STATE='Name of State'

42                      FIPS='Federal Information Processing code'

43                        ST='State postal code'

44                       DIV='Census Bureau Division'

45                     POP80='Population in 1980'

46                     POP70='Population in 1970'

47                     POP60='Population in 1960';

48   RUN; 

NOTE: The data set POPDATA.USPOP has 51 observations and 8 variables.

NOTE: The DATA statement used 0.77 seconds. 
 

 


21  

Ron Briggs-UTD 

SAS Action Overview: 
Sources of  Info. (‘data’) for  a SAS Data Set 

The source of information for creating a SAS data set may be:

  • data typed into the program file  
    (‘in-stream’ data)
    • use LINES & INPUT
  • an Operating System (OS) raw data (flat) file on disk
    • use INFILE & INPUT statements
    • optionally FILENAME also
  • a previous SAS data set created in the program
    • use SET or MERGE
  • a SAS data set saved on disk
    • use two-level name with SET or MERGE
    • LIBNAME to identify subdirectory     
 

Data     newsas;

input   var1    var2;

lines;

raw data here 

Data    newsas;

infile    ‘c:\rawdata’;

input   var1    var2; 

Data   newsas;

set   oldsas; 

Data    newsas;

merge  oldsasA   oldsasB; 

LIBNAME   saslib   ‘c:\sasdata\’;

Data   newsas;

set   saslib.oldsas; 

more info next page


22  

Ron Briggs-UTD 

SAS Action Overview: 
Transforming SAS data sets
 

New SAS data set(s) can be created from existing SAS data set(s) by:

  • transfering and reshaping using SET
    • changing variables with KEEP, DROP and ASSIGN (# obs =)
    • changing observations with IF, DELETE, OUTPUT (# obs <)
  • merging with MERGE or UPDATE  
    ('side by side') (# obs =)
    • with BY statement (matched)
    • without BY statement (unmatched)
  • concatenating 2 or more with SET  
    ('one after the other') (# obs >)
  • sorting observations with PROC SORT  
    (# obs =)
  • integrating/combining observations with PROC SUMMARY (# obs <)

Note: more than one data set can be created within one data step

      data sets can also be created by PROC steps. 

1990 

LA 

TX 

cnty 1

cnty2 

1980 

TX

LA

OK

NM 

States 

Regions 

TX

LA

OK

NM 

TX

LA

OK

NM 

NE 

MW 



NE

MW

S


cnty 1

cnty2 

cnty 1

cnty2 

cnty 1

cnty2 

Reshaping 

Data version2;

  SET version1; 

Merge 

Concatenate 

Data Two;

SET  TX   LA; 

Summary 

Data   p8090;

MERGE  p80   p90;


23  

Ron Briggs-UTD 

SAS Action Overview: 
Outputting Information 

To an O/S data file:

‘Flat files’ can be created from SAS data sets by using:

    • FILE ('reverse' of INFILE) to identify the file
      • optionally  FILENAME also
    • PUT ('reverse' of INPUT) to create the format of the file
 

For printing:

  • Many SAS procedures produce ‘printout.’
  • that is, they write information to the SAS Output window (or the file .lst in UNIX) in a format suitable for printing
  • Use standard print commnds to get a paper copy (e.g. File /Print pull down menu on PC)
  • PROC PRINT is no different from other PROCS as regards printing
    • better name might be  
      PROC DATASHOW
 

Data    _null_;

set     oldsas;

file    ‘c:\rawdata’;

if _n_=1 then put‘var1, var2,var3’;

put  var1‘,’ var2‘,’ var3’,’;

*creates file for export to spreadsheet;


24  

Ron Briggs-UTD 

Steps in debugging a program 

  ASSUME THAT THE PROGRAM HAS NOT WORKED CORRECTLY!

  • correct all syntax problems (ERROR messages)
  • establish reason for all WARNING messages
  • check that every data set has the correct  number of obs. and vars
  • are values on all variables reasonable
    • check against hard copy/data read in
    • no 500 kid families
  • run test data  and hand-check calculations for at least three observations
    • first, last, one-in-between
    • be sure sample data includes missing values!
Search more related documents:Structure of a SAS Program
Download Document:Structure of a SAS Program

Set Home | Add to Favorites

All Rights Reserved Powered by Free Document Search and Download

Copyright © 2011
This site does not host pdf,doc,ppt,xls,rtf,txt files all document are the property of their respective owners. complaint#downhi.com
TOP