Rollouts
Generating multiple observations from one observation
Problem: I've got a data set with multiple measures on smoking and health symptoms. The data has 13 variables per person. City classification(industrial/rural) followed by the age, passive smoking index and symtom reported measures repeated four times per subject. The file looks like this:
But I want it to look like this:
View person/year file
Solution: This is a program shell that not only solves the problem above, but an
understanding of the program
reveals alot about how SAS processes data. Take a look at the code and try
running the program.
| Line | SAS Statement | Comment |
| 0001 | Data personyr (Keep= id city age smoke symptom); | Name new person/year file and specify variables to keep. |
| 0002 | set person; | read 1 obs from person level file |
| 0003 | array ages (8:11) age8-age11; | declare array to hold 4 ages, passive smoking |
| 0004 | array smokes (8:11) smoke8-smoke11; | and symptoms. Use ages 8 thru 11 as index |
| 0005 | array symptoms (8:11) symp8-symp11; | to the array. |
| 0006 | do i = 8 to 11; | Repeat enclosed statements four times, |
| 0007 | age = ages(i); | incrementing the index i from 8 to 11. |
| 0008 | smoke = smokes(i); | Copy values from array into corresponding variable |
| 0009 | symptom = symptoms(i); | for one year. |
| 0010 | output; | Output for each year of each person. 4 per person |
| 0011 | end; | |
| 0012 | run; |
The program assumes that you are familiar with arrays and do loops. What may be new to you is the OUTPUT statement. The DATA step works like a loop, repeating the statements over and over with each input record. If you don't tell SAS when to OUTPUT, it assumes that you want one row in your SAS table for each input record. In that case, it automatically outputs without you telling it to do so. In this case, we want four rows in the table for each input record. So we have to tell SAS when to OUTPUT explicitly. Line 0010 above, tells SAS to output all the variables in the KEEP statement in line 0001 to the table. Since this statement is in a loop that executes four times for each person, multiple rows will be generated.

