Personal tools
You are here: Home Software Packages Sudaan SAS-Callable Sudaan

SAS-Callable Sudaan

SUDAAN is a single program consisting of a family of procedures used to analyze data from complex sample surveys and other observational and experimental studies involving repeated measures and cluster-correlated data. PRI supplies a version of SUDAAN that runs as an add-on to SAS

SUDAAN

WHAT IS SUDAAN, AND WHEN SHOULD I USE IT?

SUDAAN is a single program consisting of a family of procedures used to analyze data from complex sample surveys and other observational and experimental studies involving repeated measures and cluster-correlated data. A complex sample may be multistage, stratified,unequally weighted, or clustered. SUDAAN s strength lies it its ability to compute standard errors of ratio estimates, means, totals, regression coefficients, and other statistics in accordance with the sample design. Many, if not most, data sets require attention to correlation and weighting. Unfortunately, few statistical software packages offer the user the opportunity to specify how data are correlated and weighted. While some are offering limited capability, SUDAAN remains the only broadly-applicable software for analysis of correlated and weighted data.

SUDAAN is unique among software packages because it enables you to use survey and other types of clustered data to obtain estimates using the proper design parameters, and to compute appropriate standard errors of these estimates. Thus, SUDAAN can greatly increase theaccuracy and validity of your results. Most commonly used statistical packages do not account for complex sample designs when computing variance estimates and test statistics. Although standard computer programs, such as SAS and SPSS, presume a simple random sample, the variances within the strata are more homogeneous than those between the strata. This phenomenon can lead to biased significance tests. SUDAAN permits adjustment for single as well as multi-stage stratification and clustering.

For example, when large surveys are conducted there is often clustering of the population into areas. Random sampling first selects the clusters. Then there is random sampling within the clusters. The elements within a cluster are usually more correlated than those between the clusters. As the cluster size and intracluster correlation increase, cluster variances increase more than one would find in a simple random sample. In short, these effects lead to loss of precision and reduction of effective sample size. Unless these variance differences are taken into account, the significance tests performed by standard statistical packages will have a tendency to yield false positive results.

The SAS-callable version of SUDAAN is installed as an add-on to SAS on the Windows 95/NT or Sun/Solaris computer system. Thus, you execute SUDAAN procedures within your SAS programs just like any SAS procedures. SUDAAN can use any data file that your SAS system can use.

SUDAAN is specifically designed for analysis of cluster-correlated data from studies involving recurrent events, longitudinal data, repeated measures, multivariate outcomes, multi- stage sample designs, stratified designs, unequally weighted data, and without replacement samples. SUDAAN fits marginal or population-averaged models using generalized estimation equations (GEE). Robust variance estimates are computed which fully account for intracluster correlation, unequal weighting, stratification, and without-replacement sampling. For example, individual responses might represent:

  • longitudinal responses or recurrent events observed on an individual
  • multiple sites (e.g. teeth or eyes) studies per patient
  • group or community randomized trials
  • observations on related family members
  • observations on littermates in toxicology experiments
  • multiple subjects within a cluster such as a physician clinic or school classroom

SUDAAN is the only software package to offer all 3 population robust variance estimation methods:

  1. Taylor series linearization (GEE for regression models)
  2. Jackknife
  3. Balance repeated replication (BRR)

Three broad classes of designs may be specified:

  1. With-replacement sampling of the first-stage sample (equal or unequal probabilities of selection)
  2. Equal probability with- or without-replacement sampling at all stages.
  3. Unequal probability without-replacement sampling at the first stage and:
    1. Equal probability without-replacement sampling or subsequent stages or
    2. With-replacement sampling of subsequent stages (equal or unequal probabilities of selection).

All three options allow stratification. The design options may be combined in one study if different sampling methods were used for parts of the population.

GETTING STARTED

SUDAAN was designed to focus solely on statistical computations leading to parameter estimates and their appropriate standard errors, starting with input data that were already in a standard form for SUDAAN. The standard form is:

Sorted Input Records:
For all sample designs except simple random sampling(DESIGN=SRS) and balanced repeated replication (DESIGN=BRR), your input data must be sorted by the variables you list on your NEST statement. SUDAAN does not include any kind sort procedure or facility. You must sort your input data file before running SUDAAN.

Numeric Input Variables:
All input data variables that you name on any of yourSUDAAN statements must be numeric variables. Presently, SUDAAN cannot process character variables. Note that it is not sufficient that a character variable have only numerals as values. A variable must actually be a type of numeric.

Record Structure:
Each sample observation must be represented by a single record,and each variable must appear on every record. (For observations for which a variable is not relevant, a variable can be blank or have a missing value on that record.) A given variable must occupy the same record field on every observations record. If your data do not meet these structural requirements, your must reformat your records before running SUDAAN.

Types of SUDAAN statements are:

  • PROC statements, which tell SUDAAN to run a particular kind of procedure, such as RATIO or DESCRIPT
  • sample design statements, which tell SUDAAN how to compute standard errors
  • computational statements, which tell SUDAAN what to compute
  • output statements, which tell SUDAAN how to display results in printed tables, and how to save the results for further processing.

Using SAS-callable SUDAAN: Guidelines

SAS-callable SUDAAN will automatically look for any SAS formats named in your SAS data set. You can also use the RFORMAT statement to associate a SAS format with a SAS variable. SUDAAN will recognize the following RFORMAT statement:

  > rformat variable(s) format_name;

SUDAAN will look for a format file in the format library under the specified format_name. If it finds that format_name, it uses its text for the one or more listed variables labels in PRINT tables and OUTPUT file documentation. SUDAAN can handle only this simple RFORMAT statement. A single RFORMAT statement can list one or several variables, but can name only a single format. For multiple formats, you must include multiple RFORMAT statements in your SUDAAN procedure statements, one for each format.

When using SAS-callable SUDAAN (instead of the stand-alone version), note these differences in syntax:

Use this . . . Instead of this . . .
RLOGIST LOGISTIC
RPRINTAB PRINTAB
RTITLE TITLE
RFOOTNOTE FOOTNOTE
RFORMAT FORMAT

About SAS-callable SUDAAN:

  • SUDAAN statements are echoed in the SAS log file.
  • SUDAAN output is sent to the SAS listing file.
  • All SUDAAN procedures except HYTEST and PRINTAB can accept either SAS or SUDAAN data sets as input.
    • SAS-callable SUDAAN will read SAS input data sets (Version 6, 7 & 8) and SUDAAN data sets. However, you must run SUDAAN in SAS 8.
    • For SAS input, use the DATA=libname.memname syntax on the PROC statement, or SUDAAN will open a default data set if there is one. Since SAS is the default type, you do not have to use the FILETYPE=SAS option on the PROC statement, although it is permitted for clarity.

The SUDAAN OUTPUT statement can create either SAS or SUDAAN output data sets. Use the FILETYPE option on the OUTPUT statement to indicate the type of file you wish to create.

Example: Sample Job Revised to Run SUDAAN under SAS:

> libname sastmp  /sastmp ;
*
*
The usual SAS code and procedures, including PROC FORMAT
for value labels
*
*
> proc crosstab data=sastmp.example design=wor deft2;
> nest stratum psu;
> totcnt tpgms _minus1_;
> weight mywgt;
> subgroup patyr patmon modality;
> levels 7 7 4;
> tables (patyr patmon) * modality;
> setenv pagesize=60 decwidth=5;
> print nsum wsum sewgt colper secol deffcol;
> output colper secol / filename=sastmp.table filetype=SAS
tablecell=default replace;
> rtitle crosstab example ;
> rformat patyr patmon.;
> rformat patmon patmon.;
> run;

SAS-callable SUDAAN examples on Popnet:

At you will find a listing of SAS files, each of which contains an example for a specific SUDAAN procedure (e.g. CROSSTAB, RLOGIST). They are located on Popnet at /home/web/pri/help/cacpri/sudaan (the file examples.doc in this directory provides a brief explanation of each file).

The #INCLUDE Facility:

Use this facility to insert the contents of another text file anywhere into your program file.

The syntax is:

	> #include "file_name" ;

When you execute your program file, all text in any file named by an #include directive will appear in the SUDAAN output stream following the #include reference.

  • The quotation marks around the file's name are required.
  • SUDAAN ignores any text following a #include reference that is on the same line as the reference.

Understanding Output: General Information

PRINT Statement Output: SUDAAN's PRINT statement produces tables suitable for inclusion in reports and presentations. Other SUDAAN capabilities you can use to tailor the PRINT tables to your needs include the

  • optional TITLE and FOOTNOTE statements
  • optional statistic format control
  • optional facility for replacing default statistic labels with text labels of your choice
  • optional SETENV statement for controlling many PRINT parameters.

When you do not include any PRINT statements, SUDAAN generates a default PRINT statement, which sends the statement s default tables to your default output stream. The PRINT statement also allows you to write its tables to a separate file. To do this, specify FILENAME=filename (where you supply the filename ) on the PRINT statement. Everything produced by that PRINT statement then goes to that file.

With SUDAAN, you have the option to save the estimates and statistics that SUDAAN produces in alternative formats. This feature allow you to read the estimates and statistics back into SUDAAN or to other programs for further analysis, graphing, or printing.

- You can EITHER . . .
save the results in a text (ASCII) format
OR
in a special SUDAAN binary format.

With SAS-callable SUDAAN you can also save results in a SAS dataset.

3. SAMPLE DESIGN STATEMENTS AND OPTIONS

Overview

You must use the sample design statements in SUDAAN to obtain the appropriate point and standard error estimates for statistics estimated under a particular sample design. Every SUDAAN procedure uses the same set of design statements, plus an optional DESIGN parameter on the PROC (procedure) statement, to describe the input data file. Different design statements are required for each design type.

The 3 basic types of sample design statements accommodated by SUDAAN are:

  1. With-replacement sampling of the first-stage sample (equal or unequal probabilities of selection)
  2. Equal probability with- or without-replacement sampling at all stages.
  3. Unequal probability without-replacement sampling at the first stage and:
    1. Equal probability without-replacement sampling or subsequent stages or
    2. With-replacement sampling of subsequent stages (equal or unequal probabilities of selection).

You can also specify simple random sampling from an infinite population. Under simple random sampling, variances and test statistics are computed according to standard statistical assumptions, ignoring the sample design (weighting, clustering, stratification). Therefore, when you designate your data as a simple random sample, statistics computed with SUDAAN are either the same or are asymptotically equivalent to statistics computed using SAS, SPSS, or any other statistical software package.


Overview of Sample Design Statements

                   WITH             WITHOUT          REPLICATION
REPLACEMENT REPLACEMENT METHOD WR WOR JACKKNIFE UNEQWOR BRR PROC ... DESIGN = STRWR STRWOR SRS NEST variable(s) / PSULEV= STRLEV= FRL=[MISSUNIT] [NOSORTCT];

** REPWGT variables / ADJFAY=value;

** IDVAR variable(s);

WEIGHT variable(s);

* TOTCNT variable(s);

* SAMCNT variable(s);

* JOINTPROB variable(s);




* Without-replacement designs only
** BRR design only


Additional TOTCNT statement variables:
_ZERO_

_MINUS1_


Special values for TOTCNT statement variables:

0
-1


Sample Design Options

For every SUDAAN procedure, you can specify the input data sample design on the PROC statement, using the DESIGN parameter. (If you omit the DESIGN parameter, SUDAAN assumes a WR design.) SUDAAN offers two general methods for variance estimation Taylor linearization and replication methods (BRR and Jackknife). A general review of these methods is given by Wolter (1985). There are 6 Taylor linearization design options and 2 replication methods. Currently, the two replication methods are not available in PROC SURVIVAL or MULTILOG. The choice of design is an important issue in correlated data analysis, since each of the eight SUDAAN options leads to a particular variance estimation procedure. The sample design options also determines the set of sample design statements that are necessary for computing variances.

1. Taylor Linearization Methods

Choosing the Appropriate Design Option
Based on Taylor Linearization


taylor

2. Replication Methods

A. DESIGN=JACKKNIFE implies:

  • sampling with replacement at the first stage (or with small sampling fractions, say less than 10% in every first stage stratum). The sampling fraction in a first stage stratum is the number of PSUs (primary sampling units) selected into the sample divided by the population number of PSUs in the stratum.
  • sampling with or without replacement at subsequent stages
  • sampling with equal or unequal probabilities of selection at both the first and subsequent stages.

Note: This design option is often used in non-survey applications at both the first and subsequent stages.

B. DESIGN=BRR (Balanced Repeated Replication)

  • implies that the sample design is specified by the series of replicate weights listed on the REPWGT statement. BRR replicate weights are usually developed under the same assumptions as those listed for WR and JACKKNIFE, but it is possible to develop specialweights that account for without replacement sampling. SUDAAN assumes that the replicate weights have already been developed and are available on the input data file.

Sample Design Statements

SUDAAN's seven design statements serve these purposes:

WEIGHT Identifies the variable whose values are the analysis weights tobe used in computing estimates.
NEST Lists in order the variable(s) whose values identify the design stages (by which the data file is sorted).
REPWGT Lists in order the variables who values are the BRR replicate weights
IDVAR Lists in order the variable(s) whose values are used to synchronize DATA and REPDATA input files. (Optional for BRR only.)
TOTCNT Lists in order the variable(s) whose values are the population counts at each sampling stage.
SAMCNT Lists in order the variable(s) whose values are the sample counts at each sampling stage (optional).
JOINTPROB Lists in order the variable(s) whose values are the single and joint inclusion probabilities for each primary sampling unit (PSU) and each pair f PSUs in each first-stage stratum.

For a given input data file and specified design type, the identical design statements are included in each set of procedure statements. For DESIGN=SRS, no statements are included

The following table notes which statements are required, optional, or not allowed for each sample design.


Sample Design Statement Requirements for Each Sample Design


WEIGHT NEST TOTCNT SAMCNT JOINTPROB REPWGT
With Replacement
DESIGN=WR
yes yes no no no no
Without Replacement
DESIGN=WOR
yes yes yes optional no no
Unequal Probabilities
Without Replacement
DESIGN=UNEQWOR
yes yes yes optional no yes
Stratified With
Replacement
DESIGN=STRWR
yes yes no no no no
Stratified Without
Replacement
DESIGN=STRWOR
yes yes yes optional no no
Simple Random Sampling
DESIGN=SRS
no no no no no no
Jackknife
DESIGN=JACKKNIFE
yes yes no no no no
Balanced Repeated
Replication
DESIGN=BRR
optional no* no no no yes

Statements marked "yes" are required. Statements marked "no" are not allowed and cause a fatal error if included.

*Except when R=EXCHANGEABLE option is used in REGRESS and MULTILOG


EXAMPLE: DESIGN=WOR

  • PSUs, identified by distinct values of the variable SCHOOL, are selected within strata REGION without-replacement.
  • Persons are selected without-replacement in each SCHOOL.
  • The variables POPSCHL and SAMSCHL contain the total number and sample number of PSUs (schools) in each stratum (region).
  • The variables POPSTUD and SAMSTUD contain the total number and sample number of students within each PSU (school).

In order to compute variance estimates for this example, specify the choice DESIGN=WOR on your PROC statement. The corresponding sample design statements you would include in your program file are:

>nest region school;
>totcnt popsch popstud;
>samcnt samschl samstud;

How to Use Special Keywords that can Substitute for Variables:

SUDAAN provides three keywords that you can use on any SUDAAN statement in place of a variable name:

_ONE_ The equivalent of a variable that has a value of 1 (one) for every observation.
_ZERO_ The equivalent of a variable that has a value of 0 (zero) for every observation.
_MINUS1_ The equivalent of a variable that has a value of -1 (minus one) for every observation.

These keywords are particularly useful for sample design statements:

  • Use _ONE_ on the NEST statement to treat the entire population as a single stratum.
  • Use _ZERO_ as the second or subsequent variable name on the TOTCNT statement to denote a stratification variable.
  • Use _MINUS1_ as a second or subsequent TOTCNT variable name to indicate with-replacement sampling (but not if your design is STRWOR).

EXAMPLE:

Suppose that a sample of students was selected, stratified by classroom and gender. This is a single-stage non-clustered design, and we use DESIGN=STRWOR if any of the strata contain large sampling fractions. This requires us to use the TOTCNT statement. In the following set of statements:

> nest class_id sex;
> totcnt _zero_ popstud;

the keyword _ZERO_ indicates that sex is a stratification variable and does not contribute to variance. The POPSTUD variable (population number of students within each class-by-sex cell) causes SUDAAN to compute variance components corresponding to students chosen within class and sex. SUDAAN recognizes that CLASS_ID and SEX are stratification variable, not sampling stages.

Design Effects

The design effect (DEFF) is defined as the ratio of the properly computed actual variance of an estimated parameter to the variance based on a simple random sample of the same size. This option identifies variance inflation due to stratification, clustering, and unequal weighting and over sampling.

Design Effect Measures Variance Inflation due to . . . Default?
DEFF1 Stratification (or Blocking), Clustering, Unequal
Weighting, and Over Sampling

Assumes that total sample size is fixed.
No;
This is the design effect computed by previous releases of SUDAAN. Request on PROC statement.
DEFF2 Stratification (or Blocking), Clustering, Unequal
Weighting

Assumes that subgroup sample size are fixed.
No;
Request on PROC statement
DEFF3 Stratification (or Blocking), Clustering

Assumes that subgroup sample sizes are fixed.
No;
Request on PROC statement
DEFF4 Stratification (or Blocking), Clustering, Unequal
Weighting

Model-based SRS variance (this is the variance computed by standard software packages when no weights are involved)
Good for experimental designs
Yes

FEATURES AND FUNCTIONS

SUDAAN procedure statements can be grouped into these categories:

- Procedure statement PROC ( PROCEDURE NAME )
- Sample design statements WEIGHT, NEST, TOTCNT, SAMCNT,
JOINTPROB, REPWGT, IDVAR
- Computational statements SUBGROUP, LEVELS, RECODE, SUBPOPN
- Output statements TITLE and FOOTNOTE, SETENV, PRINT,
OUTPUT, FORMAT

Procedure Statements

Each procedure has an associated PROC statement. For instance, PROC CROSSTAB callsup the CROSSTAB procedure. You ll find the appropriate syntax for various procedures in the individual procedure chapters (along with design requirements) in the SUDAAN user's guide and in the following sections of this handout.

Computation Statements

A. SUBGROUP and LEVELS Statements

The SUBGROUP statements lists the variables, and the LEVELS statements gives the number of categories in each of those variables.

  • The only valid values for a subgroup variable with m categories are the integers:

    1, 2, 3, . . . . m

    All other values (zero, negative, greater than m are treated as missing).

  • The maximum value m for each SUBGROUP variable is given on the LEVELS statement.

  • The values on the LEVELS statement must correspond one-to-one, in order, to the variables listed on the SUBGROUP statement.

B. RECODE Statement

This statement causes SUDAAN to replace the values of each named variable on the fly , but not on the input file. Such replacement takes place before any other SUDAAN statements are implemented. Each variable has n+1 distinct values determined as follows:


Interval Coded Value
x < a1 0
a1 x < a2 1
a2 x < a3 2
an+1 x < an n-1
an x n

EXAMPLES:

1. The statements:

> recode zerone = (0 1);

> subgroup zerone;
> levels 2;

will recode a zero-one variable to be a 1-2 variable and then declare
it as a categorical variable with two levels in SUDAAN.


2. The statement:

> recode x = (4.5);

will recode a continuous variable X to be a 0-1 variable. All values
of X less than 4.5 will be coded as 0;
all values of X greater than or equal to 4.5 will be coded as 1.


C. SUBPOPN Statement


Use this statement to identify a subpopulation of the overall population represented by the input data set. SUDAAN uses only the records for which the expression is true. Note that variables used in the SUBPOPN statement may be on the SUBGROUP statement, but this is not required. Note also that if the RECODE statement is present for any of the variables in SUBPOPN statement, then SUDAAN uses the recoded values to determine the value of the expression for the SUBPOPN statement.

EXAMPLE:


By including the following statements in your SUDAAN program, you can limit the analysis to records for which the value of RACE is 2 (Blacks) and the value of the SEX variable is 2 (Females), and the value of the AGE variable is either less than 18 or over 65.

	> subgroup	race	sex;	
> levels 2 2;
> subpopn race=2 & sex=2 & (age < 18 ¦ age > 65)
/ name = "Black Females not in Labor Force" :


Warning: Expressions such as

18 age 65

are NOT appropriate on the SUBPOPN statement and may lead to unexpected results. To indicate all values of AGE between 18 and 65 use the expression

                       (18age) & (age65)  


D. TITLE and FOOTNOTE Statements


You can supply one or more titles to display at the top of every table produced by SUDAAN. These may be in addition to or in place of any titles associated with your input data set. You can also supply one or more footnotes to display at the bottom of every table produced by SUDAAN.

Each title must be in double quotation marks.


With SAS-callable SUDAAN, use RTITLE AND RFOOTNOTE.


Statement Description
Each PRINT statement produces a set of formatted and labeled tables, which can go by default to a printer or terminal or can be directed to a file. Multiple PRINT statements can send their tables to multiplefiles, or all to the same file.
Each OUTPUT statement produces a single output data set of type ASCII, SAS, or SUDAAN. These files can be used for later processing in SUDAAN or other software.
FORMAT statements which are identical for all SUDAAN procedures, can specify SAS formats to associate with one or more variables from a SAS input data set.
The SETENV statement which is identical for all SUDAAN procedures, is used to alter the default environment parameters. Position it ahead of one or more PRINT or OUTPUT statements. The environment it defines applies to all subsequent PRINT or OUTPUT statements until SUDAAN encounters another SETENV statement.
TITLE and FOOTNOTE statements which are identical for all SUDAAN procedures, can add text before and after your PRINT statement tables. All TITLE and FOOTNOTE statements will be added to all PRINT statement tables, regardless of whether the PRINT statements come before or after the TITLE and FOOTNOTE statements in your SUDAAN program file.

Output Statements


There are currently six different types of output groups in SUDAAN:

TABLE, RECORD, TABCOV, MODCOV, FLAT, and MATRIX


These group types are described in the following table:


output
Group
Description

This group....
Examples PRINT Remarks
TABLE is a collection of summary statistics that may be

Overall
by Subgroup(Descriptives)
by Contrast (Modeling)
by Paramenter (Modeling)
TABLECELL
(Descriptive)

TESTS (Modeling)

BETAS (Modeling)
All statistics are printed in a single table
RECORD is a collection of records or observations with values for one or more variables input data sets Available only on PROC RECORDS; one row in the table for each record; one column for each variable
TABCOV consists of an estimate vector, its variance-covariance matrix, optional SRS-covariance matrix, and denominator degrees of freedom. All covariance output groups in descriptive procedures, including
WGTCOV,
RHATCOV,
MEANCOV
Each entity is printed separately
MODCOV consists of an estimate vector, its variance-covariance matrix, optional SRS-Covariance matrix, the denominator degrees of freedom, and and an idempotent matrix used to compute tests of hypothesis the VARIANCE group in each of the modeling procedures Each entity is printed separately
FLAT is a collection of estimates with a value for each input record PREDICTED group in PROCS REGRESS and LOGISTIC: the BASEHAZ group in PROC SURVIVAL Not available directly; use OUTPUT to ASCII or SUDAAN file, then use PROC RECORDS to print
MATRIX consists of a single matrix with one row on each record RHOS group in MULTILOG

A. SETENV Statement

The SETENV statement is used to specify the output environment parameters for PRINT and OUTPUT statements. The location of the SETENV statement is important. OnlyPRINT and OUTPUT statements following a SETENV statement are affected by it.



Options and Parameters

for Setting the PRINT/OUTPUT Environment


Available BOTH on the SETENV Statement and in the SUDAAN environment FILE

Parameter Default Minimum Maximum Description
PAGEBEG 1 0 1000 Beginning page number
TABBEG 1 0 1000 Beginning table number
LINESIZE 78 40 255 Number of characters per line
(Total line length is LEFTMGN + LINESIZE)
PAGESIZE 50 10 100 Number of lines per page
(Total page length is TOPMGN + PAGESIZE)
INDROWD 3 0 25 Number of columns to indent for distince dimensions of row level labels
(NCHS style only)
INDROWS 1 0 25 Number of columns to indent for labels of the same row level
(NCHS style only)
MAXIND 5 0 5 Maximum number of row indentions permitted
(NCHS style only)
LINESPCE 0 0 10 Number of spaces between lines in a cell
ROWSPCE 2 0 10 Number of spaces between cells
(BOX style only)
COLSPCE 3 1 10 Number of spaces between columns in a cell
ROWWIDTH 20 3 100 Column width for printing row lables
(BOX style only)
LABWIDTH 30 3 100 Column width for printing row labels
(NCHS style only)
TOPMGN 3 0 30 Number of lines to leave at the top of each page before Printing
(Total page length is TOPMGN + PAGESIZE)
LEFTMGN 0 0 10 Number of columns to leave at the left of each page before printing
(Total line length is LEFTMGN + LINESIZE)
Available ONLY on the SETENV Statement
COLWIDTH 15 1 100 Column width for a cell
DECWIDTH 3 0 18 Number of decimal places
Available ONLY in the SUDAAN Environment File
DATA=filename Name of default input dataset
FILETYPE=[ASCII | SAS | SPSS | SUDAAN] File type of dataset

This table illustrates available options and parameters for setting the PRINT/OUTPUT environment.



B. PRINT Statement


Use the PRINT statement to direct printed output to the desired destination. You will use the SETENV statement to specify the output environment parameters for PRINT.


C. OUTPUT Statement

Use the OUTPUT statement to save results for later printing or further processing. Note theserequirements for OUTPUT statements: (1) you must list one or more statistics from a single output group, and (2) provide the filename.


5. PROCEDURES

A. The RECORDS Procedure

The RECORDS procedure is designed to print records from any ASCII, SAS, SPSS, SUDAAN or SUDAAN Export record file. This is particularly useful when you want to verify that SUDAAN is reading your data properly. You can also use this procedure to convert an input file of one type to another. For instance, you can convert an ASCII data set to a SUDAAN data set. With SAS-callable SUDAAN you can also convert SUDAAN record files into SAS files.


EXAMPLE:

   > proc records data=sastmp.mydata filetype=SAS contents countrec;

The CONTENTS option specifies printing of information about the input data set, while COUNTREC specifies printing of a count of records in the input data set.


B. The CROSSTAB Procedure


The CROSSTAB procedure produces weighted frequency and percentage distributions for one-way (univariate, single-variable) and multi-way (multivariate or multiple-variable) tabulations. CROSSTAB also tests the hypothesis of no association between row and column variables in 2-way and multi-way tables, as well as odds ratios and relative risks in 2x2 tables. CROSSTAB is primarily for descriptive analyses of categorical variables. To produce descriptive statistics of continuous variables use the DESCRIPT or RATIO procedures.


EXAMPLE:


   > proc crosstab data=sastmp.mydata filetype=SAS design=wr
> nest col_str psu_id;
> weight wtf;
> subgroup age_gp sex race;
> levels 6 2 2;
> run;

This example is taken from the 1985 Health Interview Survey, and it illustrates the estimation of default statistics for a set of variables: age, sex, and race. The above statements will produce one page for the general SUDAAN output (reflecting the original statements) plus eight pages for the default PRINT output consisting of three one-way frequency distributions for the variables AGE_GP, SEX, and RACE.

To produce a three-way table, use the TABLES option:

   > tables age_gp * sex * race;

SUDAAN would then produce a separate two-way table for each level of AGE_GP.

Use the TEST statement with the appropriate test name(s) to cause CROSSTAB to print test statistics by default. Otherwise these statistics will be printed only if you include specific keywords on the PRINT statement, or if you use the TEST=ALL and/or CMHTEST=ALL option(s).

   > test chisq;

The above statement requests that CHISQ be computed for each two-way table.

C. The RATIO Procedure


The RATIO procedure produces ratio estimates and their standard errors for correlated data. The numerator and denominator variables can be continuous or categorical. RATIO is primarily for the estimation of ratios with two variables.

The following statements must be used with the RATIO procedure: NUMER, DENOM, NUMCAT, and DENCAT. On NUMER and DENOM, list the names of the numerator and denominator variables. The variables can be either continuous or categorical. However, within one call to RATIO, all numerator variables must be of the same type and all denominator variables must be of the same type. You must specify the same number of variables on both NUMER and DENOM statements. In addition, when you specify NUMCAT and/or DENCAT, the number of levels on each of these statements must equal the number of variables on the NUMER and DENOM statements.


Use NUMCAT and DENCAT to indicate that the NUMER and DENOM variables, respectively, are categorical, and to select the level of each variable to be analyzed. The values specified must be positive integers, and RATIO will then analyze only the specific values stated.


EXAMPLE1:


Suppose the variable HED equals the number of hospital episode days and the variable HEI equals the number of hospital visits. The ratio of days to visits can be computed with:

   > numer hed;
> denom hei;


EXAMPLE2:


Suppose the health status variable HLT takes on the values 1-5 for poor through excellent health, respectively. The following statements will compute the ratio of persons with excellent health to persons with poor health:

   > numer hlt;
> numcat 5;
> denom hlt;
> dencat 1;


Options are also available in RATIO that allow you to define contrasts of ratio estimates (e.g. overall contrast corresponding to the levels of the variable RACE).


D. The DESCRIPT Procedure


The DESCRIPT procedure produces descriptive statistics for continuos and categorical analysis variables. These statistics include means, totals, geometric means, medians and other quantiles, percentages, and their standard errors for sample surveys and other studies involving correlated data. DESCRIPT also computes standardized means according to the method of direct standardization. The standardizing weights are assumed to be known.

Within one call to DESCRIPT, all analysis variables must be either continuous or categorical. The analysis of both continuous and categorical variables requires separate calls to the DESCRIPT procedure.


EXAMPLE1:


This example is taken from the 1985 Health Interview Survey and illustrates the default estimation of descriptive statistics:


    > proc descript data=mydata filetype=SAS totals design=wr;
> nest col_str psu_id;
> weight wft;
> var tdv hed;

Let total doctor visits be TDV and total hosptial episode days be HED. The above statements request the means and totals of those two variables. To compute estimates by age group (AGE_GP), add the following statements:


    > subgroup	age_gp;		
> levels 6;

The table would then contain a column for each of six age categories plus the total.


EXAMPLE2:


This example uses the CATLEVEL statement to indicate that the VAR variables are categorical, and to select the level of each variable to be analyzed. The values specified must be positive integers.

Suppose the variables PLAN_A through PLAN_E correspond to five types of health insurance plans, with coverage for an individual indicated by a value of 1 for the variable. The statements:

    > var plan_a plan_b plan_c plan_d plan_e;
>catlevel 5*1;

request that the percentage of individuals covered by each plan be computed.


EXAMPLE3:


You can also request standardized estimates of means and percentages. For example, suppose the standardizing population has the following distribution by race and sex:


Sex Race Proportion of
Population
Males Black .45
Non-Black .10
Females Black .30
Non-Black .15

The following statements will produce estimates of the mean doctor visits per person (TDV) for each age group, standardized to the sex-by-race distribution:


    > subgroups   age_gp  sex   race;		
> levels 6 2 2;
> stdvar sex race;
> stdwgt .45 .10 .30 .15;
> tables age_gp;
> var tdv;
> run;


E. The REGRESS Procedure


The REGRESS procedure fits linear models to sample surveys and other cluster-correlated and repeated measure data applications. REGRESS produces estimates of the model parameters and their variance-covariance matrix. Tests of the null hypothesis that individual regression coefficients in the beta vector are equal to zero are computed. In addition, it computes tests for overall model significance, model minus intercept, as well as main effects and interaction effects.


You can also specify tests for linear combinations of the model parameters, or you can output the predicted values, residuals, parameter estimates, and their associated variance-covariance matrix for further hypothesis testing.


For survey applications, SUDAAN solves the weighted normal equations and estimates the variances of the estimated regression coefficients using either the implicit Taylor linearization method or via replication methods (BRR and Jackknife). For analyzing repeated measures and cluster-correlated data, SUDAAN estimates the regression parameters using GEE methodology with both robust and model-based variance estimation.


EXAMPLE:


This example is from the National Health and Nutrition Examination Survey and its epidemiological follow-up 10 years later. In this example, we want to determine if follow-up cancer status is related to body iron stores, after adjusting for age at initial exam and smoking status. CANCER12 (1=yes, 2=no), AGEXAM (continuous), and SMOKE (1=current, 2=former, 3=never, 4=unknown), and IRON (continuous).

    > proc regress data=one filetype=SAS design=wr;		
> nest qstrata psu1;
> weight wgt;
> subgroup cancer12 smoke;
> levels 2 4;
> reflevel smoke=1;
> model iron = cancer12 agexam smoke;
> title "Evaluate effect of cancer status on body iron stores";


The SUBGROUP statement produces additional regression coefficients for each level of the categorical covariates, cancer and smoking status. The reference level for smoking status was changed from the default (4=unknown) to 1 (current smokers), using the REFLEVEL statement.


F. The LOGISTIC Procedure


The LOGISTIC procedure fits logistic regression models to sample surveys and other clustered data applications using maximum likelihood estimation. LOGISTIC produces estimates of the model parameters and their standard errors, and tests the null hypothesis that individual regression coefficients associated with each variable in the model are equal to zero. LOGISTIC also provides tests for overall model significance, model minus intercept, as well as main effects and interactions.


EXAMPLE:


This example uses PROC LOGISTIC to model the risk of acute drinking as a function of race, sex, age, income and educational status. The data are from the Behavioral Risk Factor Surveillance Study.


    > proc logistic data=one filetype=SAS design=wr;
> nest ststr psu;
> weight finalwt;
> recode rdrink = 2;
> subgroup educat sex incat nrace agecat5;
> levels 4 2 4 4 5;
> reflevel incat=1 educat=1 nrace=1;
> model rdrink = educat sex incat nrace agecat5;
> setenv colwidth=11 decwidth=3 colspce=2 linesize=78;
> title predicting risk for acute drinking ;
> run;


The RECODE statement was used to convert a 1-2 variable (1=not at risk, 2=at risk) to a 0-1 variable (0=not at risk, 1=at risk). The REFLEVEL statement defines the reference level for income, education, and race to be the first level of each variable.


G. The MULTILOG Procedure


The MULTILOG procedure extends the logistic modeling capabilities of SUDAAN to include categorical outcomes with two or more categories which may or may not have a natural ordering. SUDAAN implements the proportional odds model with cumulative logit link for ordinal responses and a generalized multinomial logit model for nominal outcomes. Both models handle continuous as well as discrete explanatory variables. The generalizedmultinomial logit model produces separate parameter vectors for each of the generalized logit equations of interest; the proportional odds models produces a common slope but separate intercepts for each of the cumulative logit equations of interest.


The dependent variable must be a single variable which is on the SUBGROUP statement. The number of levels of the dependent variable should be on the LEVELS statement. For nominal responses use the default GENLOGIT (generalized logit) option on the model statement. For ordinal responses, use the CUMLOGIT (cumulative logit) option on the model statement.


EXAMPLE:


This example is to analyze data from a two-period crossover study to compare the suitability of two inhalation devices (A and B) in patients who are currently using a standard inhaler device. The first sequence of patients were randomized to Device A for one week (period 1) followed by Device B for another week (period 2). The second sequence of patients received the treatments in the opposite order. Patients gave their assessment on clarity of leaflet instructions accompanying the devices, recorded on an ordinal scale of: 1=easy, 2=clear only after re-reading, 3=not very clear, and 4=confusing.

   > proc multilog data=one filetype=SAS design=wr;
> nest _one_ person;
> weight _one_;
> subgroup clarity treat period;
> levels 4 2 2;
> model clarity = treat period / cumlogit;
> title "proportional odds model for inhaler device cross-over study";
> run;


H. The SURVIVAL Procedure


The SURVIVAL procedure fits the discrete proportional hazards model or Cox s proportional hazards model to sample surveys and other clustered data applications. To implement the proportional hazards model, you must specify two computation statements:


For the Discrete Proportional Hazards Model . . .

  1. On the model statement, specify the number of failure time intervals (INTERVALS= integer) - and, optionally - the starting values for the interval-specific baseline hazard rates (LAMBDAS=values), in which case the number of intervals must equal the number of starting values. Values given beyond the range given by the INTERVALS parameter are considered censored in the last interval.
  2. Use the EVENT statement to specify an indicator variable in the database that will distinguish between failure and censoring intervals (where 1=failure, 0=censored).

For the Cox Proportional Hazards Models (continuous time) . . .

  1. Specify the continuous form of the MODEL statement and include an EVENT statement. The response variable is the actual time of failure or censoring.
  2. Use the EVENT statement to specify an indicator variable that will distinguish between complete and censored failure times (where 1 = complete times, 0 = censored failure times).

EXAMPLE:


The following MODEL statement fits a discrete proportional hazards model in which the survival time outcome Y is expressed as a function of three continuous variable age, income, and education.


    > model y = age income educ / intervals=8;

The vector of parameter estimates, b = (b1,b2,b3) can be interpreted as follows:

b1 = relative effect of AGE on the baseline hazard rate

b2 = relative effect of INCOME on the baseline hazard rate

b3 = relative effect of EDUC on the baseline hazard rate


6. REFERENCE PAGE


Kalton, Graham. 1983. Introduction to Survey Sampling. Sage University Paper series on Quantitative Applications in the Social Sciences, series no. 07-035. Beverly Hills and London: Sage Pubns.

Shah, Babubhai V., Beth G. Barnwell, and Gayle S. Bieler. 1997. SUDAAN User's Manual, Release 7.5.
Research Triangle Park, NC: Research Triangle Institute.

Wolter, K.M. 1985. Introduction to Variance Estimation. Springer-Verlag, New York.


Document Actions

Copyright ©2009, The Pennsylvania State University | Privacy and Legal Statements
Contact the Help Site Administrator | Last modified Aug 13, 2008 | Weblion Partner