Introduction to Dates and Times in SAS - The Essential Guide to SAS Dates and Times, Second Edition (2014)

The Essential Guide to SAS Dates and Times, Second Edition (2014)

Chapter 1. Introduction to Dates and Times in SAS

In the years that I've been working with SAS and teaching students how to use it, I find that two things consistently confuse those who are new to SAS. First is the default way that the DATA step works. Its implied DO-until-end-of-data generates many "How do I tell it how much data to read and when to stop?" questions. The second most confusing concept in SAS is that of how dates (and times) work within the software. I've seen many misuses of character strings masquerading as dates and/or times over the years, as well as unexpected results due to a failure to understand this fundamental part of SAS.

However, the way that SAS reads, stores, and displays dates and times is only the tip of the iceberg when it comes to the power and flexibility of SAS in handling this information. There is so much more than just having numbers represent date and time values. We'll start with the basics in the first three chapters, and then progress to some more advanced uses of those date and time values, taking advantage of many of the features available in SAS for that purpose.

1.1 How Does It Work? (January 1, 1960, and Midnight as Zero)

SAS counts dates, times, and datetime values separately. The date counter started at zero on January 1, 1960. Any day before January 1, 1960, is a negative number, and any day after that is a positive number. Every day at midnight, the date counter is increased by one. The time counter is measured in seconds and runs from zero (at midnight) to 86,400 (the number of seconds in a day), when it resets to zero. SAS calculates datetimes as the number of seconds since midnight, January 1, 1960. Why January 1, 1960? The founders of SAS wanted to use the approximate birth date of the IBM 370 system to represent the beginning of the modern computing era, and they chose January 1, 1960, as an easy-to-remember approximation.

In deciding whether to use a date, a time, or a datetime, you should consider how you are going to use it. Datetimes are always date and time combined; therefore, if you will not always have a time available for each date, you should strongly consider using separate date and time variables and then calculate a datetime variable from the two components when needed. Normally, attempting to create a datetime without both a date and a time will cause an error, and the result will be a missing value for the datetime. However, in specific circumstances, it is possible to create a datetime value from a date and a missing time (see Section 4.3.3, "ISO Datetime Informats," for an example). In these cases, the time will be set automatically to midnight (0 seconds of the given date). You may want your datetime value to be missing when there is a date but no time available. In these specific circumstances, it is important to keep track of date and time separately. Many programs that handle dates (such as databases and spreadsheets) maintain their dates and times as a numeric value relative to some fixed point in time, although the date that represents zero is different across each package, and packages may vary in how they keep track of time of day. Ultimately, this makes calculating durations easy, and working with dates and times stored in this fashion becomes a matter of addition, subtraction, multiplication, and division.

1.2 Internal Representation

SAS stores dates as integers, while the datetime and time counters are stored as real numbers to account for fractional seconds. The origin of the algorithm used for SAS date processing comes from a January 14, 1980, Computerworld article by Dr. Bhairav Joshi of SUNY-Geneseo. The earliest date that SAS can handle with this algorithm is January 1, 1582 (essentially the implementation date of the current Gregorian calendar system). The latest date is far enough into the future that at least five digits will be required to display the year.

Dates as stored by SAS don't do us much good in the real world. The statement "I was born on -242" won't mean much to anyone else. However, "I was born on May 4, 1959," can easily be translated into something that most people can understand, or it can be used as is. Fortunately, SAS has a number of built-in facilities to perform automatic translation between the internal numbers stored in SAS and dates and times and their representation as understood by the rest of the world. These built-in tools include formats and informats (introduced in Section 1.3 and covered extensively in Chapters 2, 3, and 4), date and time constants (Section 1.4), and functions (Chapter 5).

1.3 External Representation (Basic FORMAT Concepts)

Formats perform an automatic translation between the actual value and the value to be displayed. Formats display the date, time, and datetime values in a fashion that is much more easily understood. Formats do not change the values themselves; they are just a way to display the values in any output.

When you have dates or times and want to translate them into SAS date and time values, you will use informats. Although you will need a statement, procedure, or a function to actually create the SAS values, informats describe what the data look like so that SAS can translate it correctly for storage. We will discuss formats and informats in detail in Chapters 2, 3, and 4 because there are dozens of them. Three of the most commonly used formats that work with SAS date, time, and datetime values are used in the following section.

1.4 Date and Time as Numeric Constants in SAS

We've talked about internal and external representation of dates and times. How do you put a specific date into a program as a constant? Formats only change the way the values are displayed in output, so you can't use them. Informats need a function or a SAS statement to translate the characters they are given, so you could use them, but then you would always need to use the INPUT() function to create a SAS date in a DATA step or PROC SQL. The INPUT function takes a series of characters that you give it and translates it using the informat that describes what the series of characters look like. That's very inefficient if you just want one specific date.

date = INPUT("04AUG2013",DATE9.);

Look at the program in Example 1.1 to see how date, time, and datetime constants are written into a SAS program. Take note of the quotation marks around the values for date, time, and datetime and the letters that follow each closing quote.

Example 1.1: Date Constants

DATA date_constants;

date = '04aug2013'd; /* This is a date constant */

time = '07:15:00't; /* This is a time constant */

datetime = '07aug1904:21:31:00'dt; /* This is a datetime constant

*/

RUN;

TITLE "Unformatted Constants";

PROC PRINT DATA=date_constants;

VAR date time datetime;

RUN;

TITLE "Formatted Constants";

PROC PRINT DATA=date_constants;

VAR date time datetime;

FORMAT date worddate32. time timeampm9. datetime datetime19.; /*

Format the constants */

RUN;

The quotes are used to create a literal value. You may use a pair of single or double quotes to specify the literal value. Dates have to be written as ddmonyyyy; times as hh:mm:ss (add a decimal point and more digits to represent fractional seconds if necessary); and datetimes as the dateddmonyyyy, followed by a separator (frequently seen as a colon [:]) and then the time (hh:mm:ss). Aside from the correct formatting of these literal values, the most important part of a date/time/datetime constant is the letter that immediately follows the last quote. The letter "D" stands for date, "T" for time, and "DT" for datetime. Upper or lower case is valid. If you put one of these strings in quotes without the letter at the end, you will create a character variable, not a numeric variable with a date, time, or datetime value. The difference might not become apparent until you try to do something with the variable you created that involves a calculation. Don't forget your "D," "T," or "DT"! This example demonstrates how these constants are defined and then automatically converted to their equivalent SAS values.

The first PROC PRINT statement displays the date, time, and datetime values we created with our constants without formats, so we can see the values as they are stored in the data set.

Unformatted Constants

date

time

datetime

19574

26100

-1748226540

The second PROC PRINT shows the effect of associating the variable DATE with the WORDDATE. format, the variable TIME with the TIMEAMPM. format, and the variable DATETIME with the DATETIME. format.

Formatted Constants

date

time

datetime

August 4, 2013

7:15 AM

07AUG1904:21:31:00

Without the formats, you can see that the date constants we used to create the values stored in the data set are displayed as their actual SAS date, time, and datetime values. They don't make much sense to us until a format is associated with the variable.

What happens if you forget to put the "D," "T," or "DT" after your date constant? In Example 1.2, the "D," "T," and "DT" have been removed from the same date, time, and datetime in Example 1.1.

Example 1.2: Incorrect Date Constants

DATA bad_date_constants;

date = '04aug2013'; /* This is NOT a date constant */

time = '07:15:00'; /* This is NOT a time constant */

datetime = '07aug1904:21:31:00'; /* This is NOT a datetime constant

*/

RUN;

TITLE "Unformatted Constants";

PROC PRINT DATA=bad_date_constants;

VAR date time datetime;

RUN;

Now we print out the values without formats. While the problem may not be apparent at first glance, this result does not look like the unformatted SAS date, time, and datetime values in the previous example.

Unformatted constants

date

time

datetime

04aug2013

07:15:00

07aug1904:21:31:00

Now let's try to add one day to the date, and a minute (60 seconds) to both the time and datetime. Here is a partial log of what happens when we try this with the code in Example 1.2.

12 DATA bad_date_constants;

13 date = '04aug2013' + 1;

14 time = '07:15:00' + 60;

15 datetime = '07aug1904:21:31:00' + 60;

16 RUN;

NOTE: Character values have been converted to numeric

values at the places given by: (Line):(Column).

13:8 14:8 15:12

NOTE: Invalid numeric data, '04aug2013' , at line 13 column 8.

NOTE: Invalid numeric data, '07:15:00' , at line 14 column 8.

NOTE: Invalid numeric data, '07aug1904:21:31:00' , at line 15 column

12.

date=. time=. datetime=. _ERROR_=1 _N_=1

The "invalid numeric data" note in the log tells you that you tried to use a character value to do something that requires a numeric value. The boldface last line tells you that you have missing values for all three variables, because you were trying to do math with a character value. Remember that SAS dates, times, and datetimes are always stored as numbers. When you see "invalid numeric data" where you intended to use a date constant, it is highly probable that your date constant is missing its identifying "D," "T," or "DT."

1.5 Length and Numeric Requirements for Date, Time, and Datetime

You can take advantage of the fact that dates are stored as integers to save space when you create variables to store them. Instead of using the default length of 8 for numeric variables, set the length of the numeric variables where you are storing dates to 4. This will safely store dates from January 1, 1582 (the earliest date SAS can handle), to October 23, 7701. A length of 5 is overkill, although that would extend the ending date another 534,773,760 days! A length of 3 will not accurately store dates outside the range of January 1, 1960, and September 13, 1960. If you declare your date variables to be a length of 4, you will be able to store two dates in the space it would take to store one if you were using the SAS default length for numeric variables. This can save you a great deal of storage space in a large data warehouse.

Times may present a bit of a problem, because you may need to store fractional seconds. The rule is simple enough: If you want to store time values with fractional seconds, you must use a length of 8 to store them accurately. Otherwise, the length of 4 is long enough to store every possible time value from midnight to midnight down to the second. In these cases, not using the default length will allow you to store two times in the same amount of space as one.

Datetime values require more space, because a length of 4 will not store a datetime value with accuracy, regardless of whether you want fractional seconds. The number is just too big. As long as you are not storing fractional seconds, a length of 6 will store datetimes that accurately represent values from midnight on January 1, 1582, to 3:04:31 p.m. on April 9, 6315. Changing the range from the default of 8 to 6 for datetime values results in a 25 percent savings in space, which still may be significant depending on how much data you have. Of course, if you are going to maintain decimal places in your datetime values, you must use the default length of 8.

I have just provided the absolute minimum lengths required for accuracy. DO NOT attempt to save additional space by shrinking the variable lengths beyond 4, 6, or 8 as listed. You will lose precision, which could lead to unexpected results. Example 1.3 shows what can happen if you do not use enough bytes to store your date values. This example uses the value 19941, which represents the date of August 6, 2014, and it is in variables of lengths 3, 4, and 5.

Example 1.3: The Effect of LENGTH Statements on Dates

DATA date_length;

LENGTH len3 3 len4 4 len5 5;

len3 = 19941;

len4 = 19941;

len5 = 19941;

FORMAT len3 len4 len5 mmddyy10.;

RUN;

As the table below shows, when you try to store a date in fewer than 4 bytes, you do not get the correct value. Using a length of 4 to store your dates and times (without fractional seconds) is still a significant (50 percent) savings in the amount of storage required. You will create inaccuracies in your data if you try to save more than that. Saving additional space is not worth the risk of inaccurate data.

len3

len4

len5

08/05/2014

08/06/2014

08/06/2014

1.6 General SAS Options for Dates

Two options influence the default date and time stamp that SAS places on pages of output and the SAS log. The DATE/NODATE option causes the start date and time of the SAS job (or session) to appear on each page of the SAS log and SAS output. These values are obtained from the operating system clock and are displayed as 24-hour clock time, followed by the day of the week, month, day, and four-digit year. If you are running SAS interactively, then the date and time are printed only on the output, not the log. By default, the DATE system option is in effect when you start SAS. However, if you do not want this default display, then use the NODATE option. You probably don't want SAS to display its default date stamp if you are going to put your own date and/or time stamp in the title or in a footnote (see Chapter 6).

As mentioned in the previous paragraph, if the DATE option is enabled, SAS prints the date and time that the current SAS session started on each page. If you want a more exact date and time on those pages, you can use the DTRESET system option, which will cause SAS to retrieve the date and time from the operating system clock each time a page is written. That date and time will then be placed on the page instead of the time that the SAS job started. Since the time is displayed in hours and minutes, you will only see it change every minute. The DTRESET option can be useful in interactive applications or SAS programs that may have been running for days or weeks, where knowing when the output was generated is more important than knowing when the SAS session began. Since the DTRESET option affects the default SAS date and time stamp, it works only if the DATE option is enabled. When you use the NODATE option, using DTRESET will have no effect because you aren't using the SAS date and time stamp on your output.