Informats and Formats
- Informats tell SAS how to interpret input data.
- Formats allow you to "recode" SAS variables without changing the original variable.
- You can use predefined informats and formats.
- You can create your own formats.
- Formats can be created "on the fly" or stored in a permanent library.
InformatsInformats allow SAS to read "non standard input" such as numbers with imbedded commas, scientific notation, or dates. The full list of character, numeric, and date informats is in the SAS Language (version 6) manual.
One of the most common uses of an informat is to tell SAS that a number should be read as a date or time value. For example: the informat mmddyy6. tells SAS to read the number 122495 as the date December 24, 1995. SAS will then store this date as a SAS date which is the number of days since January 1, 1960.
Since decimal points are generally not included in input data, the informat w.d is used to tell SAS where to insert the decimal point. W is the total length of the field and d is the number of digits that are to the right of the decimal point. For example the number 212345 read with the informat 6.2 would be read as 2123.45.
The following example code will read 3 fields (sales date, price and quantity) from a "raw file" that looks like this:
10184 45000 30 10284 37500 40 40684 10000 34 103085 152300 35 121785 225 7
data sales; infile temp; input sdate mmddyy6. price 6.2 quantity 5.;
A proc print of the SAS data set would look like this:
SDATE PRICE QUANTITY 8766 450.00 30 8767 375.00 40 8862 100.00 34 9434 1523.00 35 9482 2.25 7
Remember: the variable sdate is a now SAS date (number of days since Jan 1, 1960). To get the SAS date to print in a more readable style such as December 12, 1985 you associate the variable with a format. Formats tell SAS how to print the variables. SAS has predefined formats such as WORDDATEw or MMDDYYw that would give you December 12, 1985 or 12/12/85 respectively. The list of pre-existing formats are in the SAS Language (version 6) manual. You can also create your own formats by using proc format.
A few things to keep in mind about formats:
- The w is the length of the format.
- A format does NOT change the original value of the variable.
- A format always ends in a period.
- Formats can be numeric, character or date.
- Character formats always begin with $
- Format names can not be longer than 8 characters.
- Do NOT overlap values in a range such as 1-10, 10-20, 20-30. Use 1- <10, 10- <20, 20- <30) instead.
- You must tell SAS to associate a format with a variable.
- A single format can be used for multiple variables.
- The SAS keywords, low, high, and other can be used.
In this example, the SAS date sdate will be printed with the month, day and year written out, using the SAS format worddate20.
proc print data=sales; format sdate worddate20.;
OBS SDATE PRICE QUANTITY 1 January 1, 1984 450.00 30 2 January 2, 1984 375.00 40 3 April 6, 1984 100.00 34 4 October 30, 1985 1523.00 35 5 December 17, 1985 2.25 7
Proc Format allows you to create your own formats.
The formats and their values are created in a proc format. The formats are later associated to the variables in a data or proc step using the format statement. The format statement starts with the word FORMAT, the variable name and then the name of the format. The format must end in a period to distinguish it from a variable.
format age agefmt.;
Remember, the proc format creates the format and the format statement associates the format with the variable(s).
In this example, the formats $sexfmt, yesno and agefmt are created in a single proc format.
proc format; value $sexfmt 'f' = 'female' 'm' = 'male'; value yesno 1 = 'yes' 5 = 'no' other = 'bad data'; value agefmt low - 2 = 'infant' 3 - 5 = 'toddler' 6 - 10 = 'grade school' 11 - 14 = 'middle school' 15 - 18 = 'high school' 19 - high = 'adult'; proc print data=temp; format sex $sexfmt. quest1 quest4 quest7 yesno. age agefmt.;
The proc format creates 3 different formats. The $sexfmt is a character format so it must begin with a $ sign and the values (f & m) and the formatted text must be in quotation marks. The yesno format is numeric so only the formatted text is in quotation marks. In the proc print, the yesno format is associated with 3 variables (quest1 quest4 and quest7). The "other" keyword will assign the format "bad data" to any value other than 1 or 5. The agefmt format is also numeric and is an example of using a range of values. Low and High are special SAS keywords. It is common practice to use a format in a proc freq for variables such as age or income where you want to aggregate within a range. Remember: Do not overlap ranges. The following example uses the sales data set created in the first example.
proc format; value qty low-10 = 'low-10' 11-20 = '11-20' 21-30 = '21-30' 31-40 = '31-40' 41-50 = '41-50' 51-high = '51-high'; proc freq data=sales; tables quantity; format quantity qty.; run;
sales quantity Cumulative Cumulative QUANTITY Frequency Percent Frequency Percent ------------------------------------------------------ low-10 1 20.0 1 20.0 21-30 1 20.0 2 40.0 31-40 3 60.0 5 100.0
Common date format problem:
When my SAS data set was created, the dates were read in as a number rather than a date. How do I create a SAS date from a number in an existing SAS data set?
data temp; set sasuser.example; newdate = input(put(olddate,8.), mmddyy8.);
This code will take the number 12171985 and turn it into the SAS date of 9482 (Dec. 17, 1985). If you only have a 2 digit year (121785) use mmddyy6. If your date is in year, month and day order (851217) use yymmdd6.