Personal tools
You are here: Home Software Packages SAS Introduction to Array Processing

Introduction to Array Processing

Arrays can simplify many programming tasks, especially when working with longitudinal data.

  • Use of arrays can simplify the coding of SAS programs.
  • An array is a group of variables.
  • Arrays are only used to manipulate data on one observation.
  • Arrays are defined only for the durationof the data setep.
  • Arrays can be character or numeric.

Syntax of the array statement:

ARRAY array-name {subscript} <$> <length> <array-elements> <(initial values)>;

The array name can be any valid SAS name. However, it cannot be:
  • The name of the SAS dataset.
  • The name of a variable being used in the dataset.
  • It is not advisable to use the name of a SAS function as an array name.

    The subscript follows the name and is enclosed in parentheses ( ), square brackets [ ], or curly brackets { }. The subscript contains the number of variables.

  • The subscript can be a number. TEST{3} is an array containing 3 variables.

  • The subscript can be a range of numbers. The use of the colon in the subscript

    allows the specification of upper and lower bounds in the subscript.

    TEST{93:95} is an array containing 3 variables.

  • If the number of variables is unknown, or can change, the asterisk '*' can be

    used in the subscript. In this case, the use of the DIM function can be used

    when using the array.

  • The $ is optional.
    It is used to indicate that the array is composed of character variables. If the $ is not present, variables will be assumed to be numeric.
    The length is optional.
    It can be used to standardize the length of character variables.
    The array elements are the variables that compose the array.
    They can either be existing variables or new variables that are being created. Some examples:

    ARRAY CTYNAME{3} $ QH6B QH10B QH12B;



    ARRAY TEST{3} EXAM1 EXAM2 EXAM3;

    ARRAY SCORES{94:98} EXAM94 EXAM95 EXAM96 EXAM97 EXAM98;

    ARRAY SCORES{94:98} EXAM94-EXAM98;

    ARRAY ALLTEST{*} EXAM1-EXAM3 EXAM94-EXAM98;

    Array elements can be initialized by putting the values in parentheses following the variable list.

    ARRAY LOG_INC{3} INCOME1 INCOME2 INCOME3 (0 0 0);

    Using arrays

    An array reference can be used anywhere that you write a SAS expression. This includes:

    • Assignment statements
    • IF
    • INPUT
    • PUT
    • SELECT
    • DO UNTIL (expression) and DO WHILE (expression)
    • SUM statement


    The most common use of arrays is in iterative DO statements of the form:

    DO indexvariable = start TO stop [BY increment];

    SAS statements

    END;



    Examples: Suppose we have a data set where every record contains information for the years

    1970-1995 for a family. Among the variables are head of household income, spouse income,

    average hours worked per week for both head and spouse, and number of weeks worked in the

    year. If we wanted to calculate an average hourly rate for the head and spouse, the following

    code would work:

     

    HHHRWG70 = HHINC70 / (HHWKS70 * HHHRS70);

    SPHRWG70 = SPINC70 / (SPWKS70 * SPWKS70);

    HHHRWG71 = HHINC71 / (HHWKS71 * HHHRS71);

    ...

    SPHRWG95 = SPINC95 / (SPWKS95 * SPWKS95);

    However the same task can be done with much less coding by using arrays:

    ARRAY HHHRWG{70:95} HHHRWG70-HHHRWG95;

    ARRAY HHINC{70:95} HHINC70-HHINC95;

    ARRAY HHWKS{70:95} HHWKS70-HHWKS95;

    ARRAY HHHRS{70:95} HHHRS70-HHHRS95;



    ARRAY SPHRWG{70:95} SPHRWG70-SPHRWG95;

    ARRAY SPINC{70:95} SPINC70-SPINC95;

    ARRAY SPWKS{70:95} SPWKS70-SPWKS95;

    ARRAY SPHRS{70:95} SPHRS70-SPHRS95;

    DO I = 70 to 95;

    HHHRWG{I} = HHINC{I} / (HHWKS{I} * HHHRS{I});

    SPHRWG{I} = SPINC{I} / (SPWKS{I} * SPHRS{I});

    END;


    By using the BY feature in the DO statement, only particular years will be processed.

    If we wanted to do the preceding calculation only for presidential election years, the

    code could be altered as such:


    DO I = 72 to 92 by 4;

    HHHRWG{I} = HHINC{I} / (HHWKS{I} * HHHRS{I});

    SPHRWG{I} = SPINC{I} / (SPWKS{I} * SPHRS{I});

    END;


    An array can be processed in descending order:


    DO I = 95 to 70 by -1;

    ...



    END;


    When defining arrays various groups of variables can be included as long as the array consists

    of only character or numeric variables.


    ARRAY TOTINC{*} HHINC70-HHINC95 SPINC70-SPINC95;


    If the '*' subscript is used, the DIM function is very useful. DIM takes an array name as an argument and returns the number of elements in an array. The DO statement can then be written as:


    DO I = 1 TO DIM(TOTINC);


    Array elements can be used in assignments statements or functions the same as any variable.

    HHBEGINC = HHINC{70};

    TOTALINC = SUM(OF TOTINC{*});


    Or in IF statements.

    IF HHINC{I} EQ . THEN HHINC{I} = 0;


    Potential Errors and Problems


    When defining an array, if the number of elements does not match the number of variables,

    an error occurs and the datastep terminates. This does not happen if the '*' subscript is used.

    The error message generated reads as follows:

    ERROR: Too many variables defined for the dimension(s) specified for the array XYZ.


    If during the course of processing during a DO loop, one element of an array is compared or used with another element of the array, care must be taken to insure that the subscript does not fall outside the bounds of the array. For example if the DO statement is 'DO K=1 to 10;', the and there is an array X{10}, and a statement in the DO loop is of the form:


    IF X{k} lt X{k+1} THEN expression;

    the datastep will terminate, and an ERROR will be written to the log file -

            ERROR: Array subscript out of range at line 37 column 21

    Care should be taken when using the '-' and '-' range operators. If the variables are not "numbered" sequentially, either additional variables can be created, or unexpected results can occur. In the following datastep, the Axx and Bxx variables are in increments of 5, and the 'A's and 'B's alternate. Using the range operators produces unexpected results as seen in the output.

    data test; 
    A70 = 5;
    B70=10;
    A75 = 4;
    B75=20;
    A80 = 6;
    B80=50;

    array a{*} a70--a80;
    array b{*} b70-b80;

    size_a = dim(a);
    size_b = dim(b);
    run;

    proc print;
    run;
    OBS A70 B70 A75 B75 A80 B80 B71 B72 B73 B74 B76 B77 B78 B79 SIZE_A SIZE_B
    1 5 10 4 20 6 50 . . . . . . . . 5 11

    If after using arrays in a datastep, addition values with missing values are found in the dataset,

    Document Actions

    Copyright ©2009, The Pennsylvania State University | Privacy and Legal Statements
    Contact the Help Site Administrator | Last modified Aug 13, 2008 | Weblion Partner