Converting Dates and Times into SAS Date, Time, and Datetime Values - The Essential Guide to SAS Dates and Times, Second Edition (2014)

The Essential Guide to SAS Dates and Times, Second Edition (2014)

Chapter 3. Converting Dates and Times into SAS Date, Time, and Datetime Values

In Chapter 2, I discussed translating SAS dates to the way we express them. How can we do the reverse? After all, if you have a date, time, or date and time that you need to store or manipulate, it won't be represented as a SAS date, time, or datetime value (unless it comes from another SAS data set). The translation from common date and/or time terminology to SAS is almost as easy as going the other way, and it is done in one of two ways: The first, discussed in Section 1.4, uses date, time, and datetime literals. While this works for a small number of these values that are known at the time you write the program, how do you deal with multiple dates, or those that are known only at run time? By using informats to process them.

3.1 Avoiding the Two-Digit Year Trap

First, it is always good practice to use four-digit years in your date values. If this is not possible, SAS will handle dates with two-digit years without any problems, but this situation is where you, the user, should be concerned with the YEARCUTOFF= system option. Anytime that you translate a date or datetime from an everyday representation using two-digit years (for example, 05/16/89) to its SAS value, the YEARCUTOFF= system option will affect the resulting SAS value. The YEARCUTOFF= system option will define the beginning of the 100-year period for those two-digit years. In SAS 9.4, the SAS default value for this option is 1926, defining a range of 1926 through 2025. Therefore, two-digit years in the range from 26 through 99 will be assigned to the years 1926–1999, while year values from 00 through 25 will be set to the years 2000–2025. This rule applies to everything that has to be translated into a SAS date or datetime value and where there are only two digits representing the year. This means that the YEARCUTOFF= option applies to external data being processed with the INPUT statement, any date or datetime literals, as well as any functions (such as INPUT()) where the input string only has two digits representing the year. The example below shows how date and datetime literals are affected by the YEARCUTOFF= system option. It displays the actual SAS date or datetime value represented by the literal along with its formatted value.

Example 3.1: Effects of the YEARCUTOFF= System Option on Date and Datetime Literals

OPTIONS YEARCUTOFF=1920

SAS Date
Literal

Value as
Stored in SAS

Formatted
Value

a1

'23MAR2005'd

16518

03/23/2005

a2

'23MAR1905'd

-20007

03/23/1905

a3

'23MAR05'd

16518

03/23/2005

a4

'19AUG1959:14:45:00'dt

-11610900

19AUG1959:14:45:00

a5

'19AUG2059:14:45:00'dt

3144149100

19AUG2059:14:45:00

a6

'19AUG59:14:45:00'dt

-11610900

19AUG1959:14:45:00

The variables a3 and a6 are date and datetime constants, respectively, with two-digit years as a part of the constant. While the literals with four-digit years are stored as their proper SAS dates, a3 is in the 21st century, but a6 is set in the 20th century. The 100-year range is 1920–2019. Therefore, '05' falls in the range 00–19 and is assigned to the year 2005, and '59' falls in the range 20–99 and is assigned to the year 1959. For the following example, let's use the same date and datetime literals, but change the value of the YEARCUTOFF= option.

OPTIONS YEARCUTOFF=1905

SAS Date
Literal

Value as
Stored in SAS

Formatted
Value

a1

'23MAR2005'd

16518

03/23/2005

a2

'23MAR1905'd

-20007

03/23/1905

a3

'23MAR05'd

-20007

03/23/1905

a4

'19AUG1959:14:45:00'dt

-11610900

19AUG1959:14:45:00

a5

'19AUG2059:14:45:00'dt

3144149100

19AUG2059:14:45:00

a6

'19AUG59:14:45:00'dt

-11610900

19AUG1959:14:45:00

Now, a3 has moved to the 20th century. The literals with four-digit years remain stored as their proper SAS dates and a6 remains in the 20th century, but we have moved the 100-year range to start in 1905. Therefore, the range is now 1905–2004, and '05' falls in the 05–99 range, so it is assigned to 1905.

The same thing will happen when you use an informat to translate a date or datetime string where there are only two digits representing the year.

3.2 Using Informats

Formats take values and display them in a specific fashion, while informats take a series of alphanumeric characters and translate them into a single value. Dates and times are the best example of applying informats, since SAS date values are not normally how the majority of the planet expresses dates. A date such as 11/24/05, or 24-05-2002 contains non-numeric characters, so they would have to be read as character values, which is quite a distance from numeric SAS date values. As with formats, you can create (and store for future use) your own informats if one is not available within SAS to fit your needs. There are fewer SAS-supplied informats than formats , so you should be careful to only use the informats listed in this chapter to translate your date and datetime strings, and not the formats from Chapter 2.

Just as you can use the FORMAT statement to apply a format to a variable, you can use the INFORMAT statement to apply an informat to a variable in a procedure or a DATA step. However, the most common use of informats is with the INPUT statement in a DATA step, as data are being read in, or with the INPUT() function to translate character data that are already in data sets.

You specify an informat by giving the informat name, followed by an optional width specification and a period (.). Informats are like formats in that each informat has a default width that SAS will use if none is specified.

3.3 The INFORMAT Statement

The INFORMAT statement is analogous to the FORMAT statement. You use the INFORMAT statement to associate an informat with a variable in a SAS data set. You can also remove an informat that has been permanently associated with the variable by leaving the informat name blank. You might also use the INFORMAT statement to associate an informat with a variable for the duration of the procedure (which might be useful in certain procedures such as the FSEDIT procedure).

INFORMAT date1 mmddyy10.;

The statement above says that any time a character string is read into the variable date1, it will be translated into a SAS date value using the informat MMDDYY10 (detailed below). If the character string being processed is not in an MMsDDsYY(YY) (s stands for a separator), you will get a missing value stored in date1 and an error message in the SAS log.

Just as with formats, you can remove informats that have been permanently associated with a variable by using the variable name in the INFORMAT statement without an informat, as is shown below.

INFORMAT time3;

This statement will remove any informat that has been permanently associated with the variable time3.

3.3.1 Using Informats with the INPUT Statement

The basic syntax of an INPUT statement with an informat is as follows:

INPUT @1 date1 mmddyy10.;

First, you will usually specify a starting column. The default starting column is 1, but you can specify the starting column with the @ sign, followed by the column number. If you do not, the starting column will be set to the current location of the input pointer. You should also specify a width for the informat to indicate how many characters are to be read. The above INPUT statement will read the first ten characters in a line, starting at the first character in a data line, and SAS will expect it to look like mmsdds(yy)yy, where mm is the month from 01 to 12, s represents a separator character, dd is a day from 01 to 31, and (yy)yy is a two- or four-digit year. The following example demonstrates that the separator character does not have to be the same on every line, and the field does not have to be exactly ten characters long.

Example 3.2: INPUT Statement

1. DATA informats_are_smart;

2. INPUT @1 date1 :MMDDYY10.;

3. unformatted_date = date1;

4. DATALINES;

5. 10/17/2014

6. 05-04-59

7. 3-1-1940

8. RUN;

9. PROC PRINT;

10. FORMAT date worddatx.;

11. RUN;

The Result

Obs

date1

unformatted_date

1

17 October 2014

20013

2

4 May 1959

-242

3

1 March 1940

-7245

As you can see, the characters in each line of the DATALINES statement were converted to a SAS date value. The length of the informat must be long enough to read all of the characters in the date string. In the example, there are a maximum of 10 characters in the date string. Therefore, the width of the informat is specified as 10. The other important thing to note is the use of the colon (:, line 2, in bold) modifier preceding the informat name. It is best practice to use the colon modifier with any informat, not just those relating to dates and times, when you are dealing with varying length fields. The colon modifier ensures that it doesn't matter if the lengths of the character strings representing date in the data are less than the width specified for the informat.

If the characters read do not match that layout (for example, June 26, 2014, when the informat specified is MMDDYY.) or if the informat would yield an impossible value (for example, February 31), SAS will set the value of the variable date1 to missing and set the system variable _ERROR_ to 1. In general, you should know the layout of the characters before selecting an informat. Beginning with version 9 of SAS, if you do not know the layout of the dates ahead of time, the ANYDT family of informats (see Section 3.4.4) are designed to solve this problem.

3.3.2 Informats with the INPUT() Function

The INPUT() function is the parallel to the PUT() function, and it stores a numeric or character value as a numeric or character variable. The type of the result depends on the type of the informat that is used. A character informat (one that begins with a $) will return a character value. All informats used with dates, times, and datetimes are numeric. Therefore, the variable returned is numeric. The syntax is given below.

INPUT(character-value,informat-name);

If you want to define the informat that is to be applied during a SAS job (at run time), you will need to use the INPUTN(character-value,informat-name) function (or INPUTC(), if you want to produce a character variable) instead. informat-name represents a character variable or character constant that contains an informat name, while the INPUT() function needs an actual informat name. Make sure that you have defined the width of the informat so that it is long enough to capture all the characters in the entire character variable. The following examples illustrate the use of the INPUT() and INPUTN() functions:

Example 3.3: INPUT() Function

DATA _NULL_ ;

datestr = "15-NOV-2013";

sasdate = INPUT(a,date11.);

PUT sasdate=;

RUN

The INPUT() function translated the date in the character variable datestr into its equivalent SAS date value, 19677, and stored it in the numeric variable sasdate. The date11. informat accounts for the length of the character variable.

Example 3.4: INPUTN() Function

DATA _NULL_;

datestr = "15-NOV-2013";

inf = "DATE11.";

sasdate = INPUTN(a,inf);

PUT sasdate=;

RUN;

The INPUTN() function used the value of the character value inf (DATE11.) as the informat to use in translating the date in the character variable datestr into its equivalent SAS date value, 19677.

3.3.3 When the Informat Does Not Match the Data Being Read

Informats, like formats, are separated into classes according to the type of data that are being read. In most cases, if you use the wrong informat for the data type, informats will return an error (set the SAS automatic variable _ERROR_ to 1), and the value of the variable being read will be set to missing.

This behavior differs from date, time, and datetime formats in that if you use the wrong type of format to display a value (for example, if you use a date format to display a time value), no error will occur, and at worst, you will get a warning in the SAS log. However, incorrectly specifying a format will most likely cause the display to be incorrect. Example 3.5 shows what happens when you try to use an informat that does not match the character string that you are trying to process.

Example 3.5: Using the Wrong Informat

DATA bad_informat;

INPUT @1 date :datetime18.;

DATALINES;

11-06-1988

8-25-2004

4-24-2005

;;;;

RUN;

The Log

1 DATA bad_informat;

2 INPUT @1 date :datetime18.;

3 DATALINES;

NOTE: Invalid data for date in line 4 1-18.

RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+

4 11-06-1988

date=. _ERROR_=1 _N_=1

NOTE: Invalid data for date in line 5 1-18.

5 8-25-2004

date=. _ERROR_=1 _N_=2

NOTE: Invalid data for date in line 6 1-18.

6 4-24-2005

date=. _ERROR_=1 _N_=3

NOTE: The data set WORK.BAD_INFORMAT has 3 observations and 1 variables.

NOTE: DATA statement used (Total process time):

real time 0.53 seconds

cpu time 0.03 seconds

7 ;;;;

8 RUN;

The Resulting SAS Data set

Obs

date

1

.

2

.

3

.

As you can see in the above example, using the DATETIME. informat to process a series of character strings that do not represent datetimes produces a note in the log. It also sets the automatic variable _ERROR_ to 1 for each record where it encountered a mismatch between the informat specified and the data that it attempted to read. The end result is that the value of the date variable in your output data set is missing because SAS was not able to process the characters using the specified informat. Remember to always check your log and your data set after reading a text file.

3.4 Listing and Discussion of Informats

Each discussion of an informat in this section will provide an explanation of the informat, its width specification, and the text it is designed to process. Each subsection is accompanied by a table that gives examples of the text that is to be processed, along with the informat (and its width specification) and the resulting SAS date, time, or datetime value. SAS continues to develop formats and informats, so it is always a good idea to check the online documentation at support.sas.com.

3.4.1 Date Informats

DATEw.

DATEw. reads dates in the form ddmonyy(yy), where dd represents the day of the month, mon is the three-letter month abbreviation, and yy(yy) is the two- or four-digit year. The default value of w is 7, but you should specify 9 if you are reading four-digit years. dd, mon, and yy(yy) can be separated by blanks or special characters. If you separate them, you must account for the blanks (or special characters) in the width specification. If you have blanks after the month and the day, then you need to have a width of 9 for two-digit years or 11 for four-digit years. If the leading zero for dd is missing, it has no effect on the value. The following table gives examples of how to apply this informat to yield the SAS date value that corresponds to the text shown in each line.

Characters
Read

Informat

Resulting SAS
Date Value

Formatted Date
Using WEEKDATE.

20sep15

DATE7.

20351

Sunday, September 20, 2015

4 feb 2014

DATE11.

19758

Tuesday, February 4, 2014

07-may-1960

DATE11.

127

Saturday, May 7, 1960

DDMMYYw.

DDMMYYw. reads dates of the form ddmmyy(yy), where dd represents the day of the month, mm represents the number of the month, and yy(yy) is the two- or four-digit year. The default value of w is 6, but you should specify 8 if you are reading four-digit years. dd, mm, and yy(yy) can be separated by blanks or special characters. If you separate them, you must account for the separating characters in the width specification. If you have blanks after the month and the day, then you need to have a width of 8 for two-digit years or 10 for four-digit years. SAS will do its best to decipher the string if no separators are used, but some dates cannot be processed—for example, 2112008. Without place-holding zeros or separators, there is no way to know whether the date is 21 January 2008 or 2 November 2008. The following table gives examples of how to apply this informat to yield the SAS date value that corresponds to the text shown in each line.

Characters
Read

Informat

Resulting SAS
Date Value

Formatted Date
Using WEEKDATE.

140390

DDMMYY6.

11030

Wednesday, March 14, 1990

06/09/05

DDMMYY8.

16685

Tuesday, September 6, 2005

22-04-2003

DDMMYY10.

15817

Tuesday, April 22, 2003

JULIANw.

JULIANw. translates a Julian date in the form yy(yy)ddd, with the two- or four-digit year preceding the zero-filled day of the year. It is right-justified. w can be from 5 to 32, and the default is 5. If you specify 5, the year portion of the Julian date is two digits long. If you specify 7 or more, the year portion is four digits long. Zeros must fill the space between the year and day values: For example, the fifth day of the year must be given as "005." Any date preceding the year 1582 on the Gregorian calendar cannot be read as a Julian value. The following table gives examples of how to apply this informat to yield the SAS date value that corresponds to the text shown in each line.

Characters
Read

Informat

Resulting SAS
Date Value

Formatted Date
Using WEEKDATE.

09284

JULIAN5.

18181

Sunday, October 11, 2009

2014005

JULIAN7.

19728

Sunday, January 5, 2014

2012168

JULIAN7.

19160

Saturday, June 16, 2012

MMDDYYw.

MMDDYYw. reads dates of the form mmddyy(yy), where mm represents the number of the month, dd represents the day of the month, and yy(yy) is the two- or four-digit year. The default value of w is 6, but you should specify 8 if you are reading four-digit years. dd, mm, and yy(yy) can be separated by blanks or special characters. If you separate them, you must account for the blanks in the width specification. If you have blanks after the month and the day, then you need to have a width of 8 for two-digit years or 10 for four-digit years. SAS will do its best to decipher the string if no separators are used, but some dates cannot be processed—for example, 1272003. Without place-holding zeros or separators, there is no way to know whether the date is January 27, 2003, or December 7, 2003. The following table gives examples of how to apply this informat to yield the SAS date value that corresponds to the text shown in each line.

Characters
Read

Informat

Resulting SAS
Date Value

Formatted Date
Using WEEKDATE.

041798

MMDDYY6.

13986

Friday, April 17, 1998

1/15/2013

MMDDYY10.

19373

Tuesday, January 15, 2013

08282015

MMDDYY10.

20328

Friday, August 28, 2015

MONYYw.

MONYYw. reads dates of the form monyy(yy), where mon is the three-letter month abbreviation, and yy(yy) is the two- or four-digit year. Using this informat will set the SAS date value that corresponds to the first day of the month. The default value of w is 5, but you should specify 7 if you are reading four-digit years. The following table gives examples of how to apply this informat to yield the SAS date value that corresponds to the text shown in each line.

Characters
Read

Informat

Resulting SAS
Date Value

Formatted Date
Using WEEKDATE.

JAN15

MONYY5.

20089

Thursday, January 1, 2015

dec1920

MONYY7.

-14275

Wednesday, December 1, 1920

aug2020

MONYY7.

22128

Saturday, August 1, 2020

PDJULG4.

PDJULG4. reads a packed Julian date in hexadecimal format for IBM computers. The width specification is always 4, because the Julian date is parsed as follows: the four-digit Gregorian year is written in the first two bytes, and the three-digit integer that represents the day of the year is in the next one-and-a-half bytes. The last half-byte contains all binary 1s, which indicates the value is positive. There is no example given for this informat because packed-decimal Julian dates yield nonprintable characters.

PDJULIw.

PDJULIw. also reads a packed Julian date in hexadecimal format for IBM computers. It differs from the PDJULG. informat in that it expects the two digits of the century in the first byte, followed by two digits of the year in the second byte. The next one-and-a-half-bytes store the three-digit integer that corresponds to the day of the year, while the last half-byte is filled with hexadecimal 1s, representing a positive number. The century and year are calculated by subtracting 1900 from the four-digit Gregorian year. Once again, there is no example, since packed-decimal Julian dates yield nonprintable characters.

YYMMDDw.

YYMMDDw. is used to read dates of the form yy(yy)mmdd, where yy(yy) is the two- or four-digit year, mm represents the number of the month, and dd represents the day of the month. The default value of w is 6, but you should specify 8 if you are reading four-digit years. yy(yy), mm, and dd can be separated by blanks or special characters. If you separate them, you must account for the separating characters in the width specification. If you have blanks after the month and the day, then you need to have a width of 8 for two-digit years or 10 for four-digit years. The following table gives examples of how to apply this informat to yield the SAS date value that corresponds to the text shown in each line.

Characters
Read

Informat

Resulting SAS
Date Value

Formatted Date
Using WEEKDATE.

441205

YYMMDD6.

-5505

Tuesday, December 5, 1944

20150517

YYMMDD8.

20225

Sunday, May 17, 2015

2014-07-06

YYMMDD10.

19910

Sunday, July 6, 2014

YYMMNw.

YYMMNw. reads dates of the form yy(yy)mm, where yy(yy) is the two- or four-digit year, and mm represents the number of the month. The day is automatically set to 1. The default value of w is 4, but you should specify 6 if you are reading four-digit years. The N in the informat name is necessary. Your data must not have any separating characters between the month and the year. This informat will produce a date value that is equal to the first day of the month given. The following table gives examples of how to apply this informat to yield the SAS date value that corresponds to the text shown in each line. Note that line 2 of this table demonstrates what happens if you try to read a date with a two-digit year using the width of 6. In this case, SAS is translating the four digits as the year, so the month value is considered to be missing.

Characters
Read

Informat

Resulting SAS
Date Value

Formatted Date
Using WEEKDATE.

1

9905

YYMMN4.

14365

Saturday, May 1, 1999

2

9905

YYMMN6.

.

3

201403

YYMMN6.

19783

Saturday, March 1, 2014

4

201610

YYMMN6.

20728

Saturday, October 1, 2016

YYQw.

YYQw. reads dates of the form yy(yy)Qq, where yy(yy) is the two- or four-digit year followed by the letter Q and q is a number from 1 to 4, indicating the quarter of the year. The date value produced by this informat will correspond to the first day of the given quarter. Use 6 for w if you are reading four-digit years, or 4 if you are reading two-digit years. The default w is 6. The following table gives examples of how to apply this informat to yield the SAS date value that corresponds to the text shown in each line.

Characters
Read

Informat

Resulting SAS
Date Value

Formatted Date
Using WEEKDATE.

14Q1

YYQ4.

19724

Wednesday, January 1, 2014

2010Q3

YYQ6.

18444

Thursday, July 1, 2010

2015Q2

YYQ6.

20179

Wednesday, April 1, 2015

2013Q4

YYQ6.

19632

Tuesday, October 1, 2013

YYMMDDw.

YYMMDDw. is used to read dates of the form yy(yy)mmdd, where yy(yy) is the two- or four-digit year, mm represents the number of the month, and dd represents the day of the month. The default value of w is 6, but you should specify 8 if you are reading four-digit years. yy(yy), mm, and dd can be separated by blanks or special characters. If you separate them, you must account for the separating characters in the width specification. If you have blanks after the month and the day, then you need to have a width of 8 for two-digit years or 10 for four-digit years. The following table gives examples of how to apply this informat to yield the SAS date value that corresponds to the text shown in each line.

Characters
Read

Informat

Resulting SAS
Date Value

Formatted Date
Using WEEKDATE.

441205

YYMMDD6.

-5505

Tuesday, December 5, 1944

20150517

YYMMDD8.

20225

Sunday, May 17, 2015

2014-07-06

YYMMDD10.

19910

Sunday, July 6, 2014

YYMMNw.

YYMMNw. reads dates of the form yy(yy)mm, where yy(yy) is the two- or four-digit year, and mm represents the number of the month. The day is automatically set to 1. The default value of w is 4, but you should specify 6 if you are reading four-digit years. The N in the informat name is necessary. Your data must not have any separating characters between the month and the year. This informat will produce a date value that is equal to the first day of the month given. The following table gives examples of how to apply this informat to yield the SAS date value that corresponds to the text shown in each line. Note that line 2 of this table demonstrates what happens if you try to read a date with a two-digit year using the width of 6. In this case, SAS is translating the four digits as the year, so the month value is considered to be missing.

Characters
Read

Informat

Resulting SAS
Date Value

Formatted Date
Using WEEKDATE.

1

9905

YYMMN4.

14365

Saturday, May 1, 1999

2

9905

YYMMN6.

.

3

201403

YYMMN6.

19783

Saturday, March 1, 2014

4

201610

YYMMN6.

20728

Saturday, October 1, 2016

YYQw.

YYQw. reads dates of the form yy(yy)Qq, where yy(yy) is the two- or four-digit year followed by the letter Q and q is a number from 1 to 4, indicating the quarter of the year. The date value produced by this informat will correspond to the first day of the given quarter. Use 6 for w if you are reading four-digit years, or 4 if you are reading two-digit years. The default w is 6. The following table gives examples of how to apply this informat to yield the SAS date value that corresponds to the text shown in each line.

Characters
Read

Informat

Resulting SAS
Date Value

Formatted Date
Using WEEKDATE.

14Q1

YYQ4.

19724

Wednesday, January 1, 2014

2010Q3

YYQ6.

18444

Thursday, July 1, 2010

2015Q2

YYQ6.

20179

Wednesday, April 1, 2015

2013Q4

YYQ6.

19632

Tuesday, October 1, 2013

The WEEK Informats

The WEEK informats read dates in the ISO-standard WEEK format. There are three algorithms used to calculate the WEEK value from a given date. The U algorithm calculates weeks based on Sunday being the first day of the week without any other restriction. The V algorithm defines the first week of the year as containing both January 4 and the first Thursday of the year. Therefore, if the first Monday of the year falls on January 2, 3, or 4, the preceding days of the calendar year are considered to be a part of week 53 of the previous calendar year. It is also possible for calendar days at the end of a year to be considered as being in the first week of the next calendar year. Finally, the W algorithm calculates weeks based on Monday being the first day of the week without any other restriction.

The WEEK algorithms calculate dates differently, so it is critical that you use the correct algorithm for the week value that you are converting. This means that you need to know which algorithm was used to create the week value before you try to convert it. If you use a different week algorithm from the one that was used to create the week value, you will get the wrong actual date. Example 3.6 demonstrates this.

Example 3.6: The Difference between the Various WEEK Algorithms (U, V, W)

Week value

Date Formatted
with WEEKDATE
U Algorithm

Date Formatted
with WEEKDATE
V Algorithm

Date Formatted
with WEEKDATE
W Algorithm

1

W12

Sunday, March 23, 2014

Monday, March 17, 2014

Monday, March 24, 2014

2

13W45

Sunday, November 10, 2013

Monday, November 4, 2013

Monday, November 11, 2013

3

15W5303

Tuesday, January 5, 2016

Wednesday, December 30, 2015

Wednesday, January 6, 2016

4

2014W2104

Wednesday, May 28, 2014

Thursday, May 22, 2014

Thursday, May 29, 2014

5

2015-W01-02

Monday, January 5, 2015

Tuesday, December 30, 2014

Tuesday, January 6, 2015

Example 3.6 demonstrates the differences that exist not only in the resulting date between the algorithms, but also in the day of the week, which depends on whether the algorithm uses Sunday or Monday as the first day of the week. Rows 3 and 5 highlight the differences in the way the first and last week of the calendar year are handled by the different algorithms. As you can see, if you use the wrong algorithm, not only can you end up with an unintended date, but the resulting date might not even be in the same year.

WEEKUw.

WEEKUw. is new as of SAS version 9.3. It is used to translate week values into SAS dates using the U algorithm. The U algorithm calculates weeks based on Sunday being the first day of the week, and the week number is displayed as a two-digit number from 0 to 53, with a leading zero if necessary. Week values are dates in the form yyyy-Wnn-dd, where yyyy is the optional year (can be two or four digits). W is the letter "W" for "Week"; nn is the week number ranging from 0 to 53; and dd is the optional day of the week. The dashes are optional separators, but they are the only separators that can be used in ISO 8601 week values and are used between the year, the week number, and the day (if present). The width specification, w, can range from 3 to 200 and tells SAS what to expect from the character string being processed according to the following table.

Width (w)

Pattern Expected

Example

Comment

3–4

Wnn

W08

W is the letter "W," and nn is the week number. If a year is not specified, the year is considered to be the current year. If a day is not given, the day is considered to be the first day of the week.

5–6

yyWnn

14W08

yy is a two-digit year, W is the letter "W," and nn is the week number. Without a day, the day is considered to be the first day of the week.

7–8

yyWnndd

14W0803

yy is a two-digit year, W is the letter "W," nn is the week number, and dd is the day of the week.

9–10

yyyyWnndd

2014W0803

yyyy is a four-digit year, W is the letter "W," nn is the week number, and dd is the day of the week.

11–200

yyyy-Wnn-dd

2014-W08-03

yyyy is a four-digit year, W is the letter "W," nn is the week number, and dd is the day of the week, each separated by dashes.

Specifying any value greater than 11 will have no effect on the date returned, although the implied cursor placement can cause an INPUT statement (not function) to yield unexpected results.

Characters
Read

Informat

Resulting SAS
Date Value

Formatted Date
Using WEEKDATE.

W12

WEEKU3.

19805

Sunday, March 23, 2014

14W40

WEEKU5.

20001

Sunday, October 5, 2014

15W2801

WEEKU7.

20281

Sunday, July 12, 2015

2014W3304

WEEKU9.

19955

Wednesday, August 20, 2014

2015-W06-02

WEEKU11.

20128

Monday, February 9, 2015

WEEKVw.

WEEKVw. is available as of SAS version 9.3. It is used to translate week values into SAS dates using the V algorithm. The V algorithm calculates weeks based on Monday being the first day of the week, and the week number is displayed as a two-digit number from 0 to 53, with a leading zero if necessary. This algorithm defines the first week of the year as containing both January 4 and the first Thursday of the year. Therefore, if the first Monday of the year falls on January 2, 3, or 4, the preceding days of the calendar year are considered to be a part of week 53 of the previous calendar year. Week values are dates in the form yyyy-Wnn-dd, where yyyy is the optional year (can be two or four digits). W is the letter "W" for "Week"; nn is the week number ranging from 0 to 53; and dd is the optional day of the week. The dashes are optional separators, but they are the only separators that can be used in ISO 8601 week values and are used between the year, the week number, and the day (if present). The width specification, w, can range from 3 to 200 and tells SAS what to expect from the character string being processed according to the following table:

Width (w)

Pattern Expected

Example

Comment

3–4

Wnn

W08

W is the letter "W," and nn is the week number. If a year is not specified, the year is considered to be the current year. If a day is not given, the day is considered to be the first day of the week.

5–6

yyWnn

14W08

yy is a two-digit year, W is the letter "W," and nn is the week number. Without a day, the day is considered to be the first day of the week.

7–8

yyWnndd

14W0803

yy is a two-digit year, W is the letter "W," nn is the week number, and dd is the day of the week.

9–10

yyyyWnndd

2014W0803

yyyy is a four-digit year, W is the letter "W," nn is the week number, and dd is the day of the week.

11–200

yyyy-Wnn-dd

2014-W08-03

yyyy is a four-digit year, W is the letter "W," nn is the week number, and dd is the day of the week, each separated by dashes

Specifying any value greater than 11 will have no effect on the date returned, although the implied cursor placement can cause an INPUT statement (not function) to yield unexpected results.

Characters
Read

Informat

Resulting SAS
Date Value

Formatted Date
Using WEEKDATE.

W12

WEEKV3.

19799

Monday, March 17, 2014

14W40

WEEKV5.

19995

Monday, September 29, 2014

15W2801

WEEKV7.

20275

Monday, July 6, 2015

2014W3304

WEEKV9.

19949

Thursday, August 14, 2014

2015-W06-02

WEEKV11.

20122

Tuesday, February 3, 2015

WEEKWw.

WEEKWw. is available as of SAS version 9.3. It is used to translate week values into SAS dates using the W algorithm. The W algorithm calculates weeks based on Monday being the first day of the week without any other restriction. The week number is displayed as a two-digit number from 0 to 53, with a leading zero if necessary. Week values are dates in the form yyyy-Wnn-dd, where yyyy is the optional year (can be two or four digits). W is the letter "W" for "Week"; nn is the week number ranging from 0 to 53; and dd is the optional day of the week. The dashes are optional separators, but they are the only separators that can be used in ISO 8601 week values and are used between the year, the week number, and the day (if present). The width specification, w, can range from 3 to 200 and tells SAS what to expect from the character string being processed according to the following table:

Width (w)

Pattern Expected

Example

Comment

3–4

Wnn

W08

W is the letter "W," and nn is the week number. If a year is not specified, the year is considered to be the current year. If a day is not given, the day is considered to be the first day of the week.

5–6

yyWnn

14W08

yy is a two-digit year, W is the letter "W," and nn is the week number. Without a day, the day is considered to be the first day of the week.

7–8

yyWnndd

14W0803

yy is a two-digit year, W is the letter "W," nn is the week number, and dd is the day of the week.

9–10

yyyyWnndd

2014W0803

yyyy is a four-digit year, W is the letter "W," nn is the week number, and dd is the day of the week.

11–200

yyyy-Wnn-dd

2014-W08-03

yyyy is a four-digit year, W is the letter "W," nn is the week number, and dd is the day of the week, each separated by dashes.

Specifying any value greater than 11 will have no effect on the date returned, although the implied cursor placement can cause an INPUT statement (not function) to yield unexpected results.

Characters
Read

Informat

Resulting SAS
Date Value

Formatted Date
Using WEEKDATE.

W12

WEEKW3.

19806

Monday, March 24, 2014

14W40

WEEKW5.

20002

Monday, October 6, 2014

15W2801

WEEKW7.

20282

Monday, July 13, 2015

2014W3304

WEEKW9.

19956

Thursday, August 21, 2014

2015-W06-02

WEEKW11.

20129

Tuesday, February 10, 2015

3.4.2 Time Informats

HHMMSSw.

HHMMSSw. is available as of SAS version 9.3. It will read time values in the form hh:mm:ss or hhmmss, where hh is hours, mm is minutes, and ss is seconds, with an optional separator. This informat will ignore fractional seconds. The default width for this format is 8, but w can range from 1 to 20. When there are six digits in the string being read, the first two digits will be translated into hours, the second two into minutes, and the last two into seconds.

Characters
Read

Informat

Resulting SAS
Time Value

Formatted Time
Using TIME8.

143500

HHMMSS6.

52500

14:35:00

971406

HHMMSS6.

350046

97:14:06

000339

HHMMSS6.

219

0:03:39

081525

HHMMSS6.

29725

8:15:25

The parsing and interpretation performed by this informat changes if there are fewer than six digits in the string being read and is based on the following rules: If there is an odd number of digits in the string being read, SAS will add a zero to the beginning of the string, and the first two digits will then be translated as hours. SAS will then add zeros to the end of the string until it has six digits and can be translated as hours, minutes, seconds.

Characters
Read

Informat

Resulting SAS
Time Value

Formatted Time
Using TIME8.

What Happened?

1

HHMMSS6.

3600

1:00:00

Because there is an odd number of digits in the string, SAS added a zero to the front of the string, making it "01," and then padded it with zeros out to six digits, making the string that is being read "010000."

11

HHMMSS6.

39600

11:00:00

Here, there are an even number of digits, so SAS does not add a leading zero and reads the first two digits as hours. It pads the remainder of the string with zeros out to six digits, making the string "110000."

112

HHMMSS6.

4320

1:12:00

With the odd number of digits, SAS adds a zero to the beginning of the string, making it "0112," and then pads the string with zeros, so the string being read is "011200."

1127

HHMMSS6.

41220

11:27:00

At four digits, SAS will only pad the string with zeros, making the string being read "112700."

11274

HHMMSS6.

4394

1:13:14

SAS adds a leading zero because of the odd number of digits in the string. The string that is being read is now "011274." This is translated into one hour, twelve minutes, and seventy-four seconds, which is 1:13:14.

112745

HHMMSS6.

41265

11:27:45

String being read is six digits long. Therefore, no zeros are added at the front or the back of the string.

When there are more than six digits in the string that is being read, SAS will parse the string from right to left. The assumption is that since seconds and minutes normally cycle at 60, they will only ever require two digits. It will translate the last four digits of the string as minutes and seconds. All digits to the left of the final four digits will be translated into hours.

Characters
Read

Informat

Resulting
SAS
Time Value

Formatted
Time Using
TIME16.

What Happened?

172154

HHMMSS10.

62514

17:21:54

Parsed as hours, minutes, seconds. Result is as expected.

1721543

HHMMSS10.

620143

172:15:43

Last four digits are "1543," which is translated as 15 minutes, 43 seconds. Preceding digits ("172") translated as hours.

17215430

HHMMSS10.

6198870

1721:54:30

Last four digits are "5430," which is translated as 54 minutes, 30 seconds. Preceding digits ("1721") translated as hours.

172154305

HHMMSS10.

61976585

17215:43:05

1721543058

HHMMSS10.

619756258

172154:30:58

MSEC8.

MSEC8. reads IBM mainframe time values accurate to the nearest millisecond. The width is 8 because the OS TIME macro and STCK system instructions store their time values in 8 bytes.

PDTIME4.

PDTIME4. converts packed-decimal time values contained in SMF and RMF records produced by IBM mainframe systems to SAS time values. The width is shown as 4 because SMF and RMF records are 4 bytes long. While the informat RMFSTAMP8. also reads packed decimal RMF records, RMFSTAMP8. reads both the 4 bytes of time and 4 bytes of date information contained in the RMF record and creates a SAS datetime value from it. PDTIME4. only reads the 4 bytes of time information from an RMF record and creates a SAS time value.

RMFDUR4.

RMFDUR4. converts IBM mainframe RMF duration records into SAS time values. The width is shown as 4 because RMF records are 4-byte-long packed hexadecimal records.

STIMERw.

STIMERw. reads times produced by the STIMER System option in the SAS log. This informat has no default width. It reads times and interprets them based on colons and decimal points. If there is one colon, the first two digits are minutes and the last two are seconds. If there are two colons, the digits preceding the first colon are hours, the next set of two digits is minutes, and the last two are seconds. If there is a decimal point, the value following the decimal point is translated as a decimal fraction of seconds. It can read time values in the following formats, where hhcorresponds to hours, mm corresponds to minutes, ss corresponds to seconds, and ff corresponds to decimal fractions of seconds.

ss

ss.ff

mm:ss

mm:ss.ff

hh:mm:ss

hh:mm:ss.ff

Characters Read

Informat

Resulting SAS Time Value (seconds)

Formatted Time Using TIME11.1.

Comments

33

STIMER2.

33.00

0:00:33.00

When there is no colon in the input string, it is translated as ss.

51.60

STIMER5.

51.60

0:00:51.60

The decimal point causes the translation to be ss.ff.

14:05

STIMER5.

845.00

0:14:05.00

One colon is interpreted as mm:ss.

3:11.03

STIMER7.

191.03

0:03:11.03

One colon, and a decimal point is translated as mm:ss.ff.

1:19:21

STIMER7.

4761.00

1:19:21.00

Two colons are translated as hh:mm:ss.

1:46:17.74

STIMER11.

6377.74

1:46:17.74

Two colons and a decimal point, and the translation is hh:mm:ss.ff.

TIMEw.

TIMEw. will read times in the form hh:mm:ss.ff, where hh indicates the hours, mm is minutes, and ss is the number of seconds. They must be separated by a special character, such as a colon (:), period (.), or hyphen (-). ff indicates decimal fractions of seconds and must be separated from the seconds by a decimal point (.). Both seconds and their decimal fractions are assumed to be zero if they are not present. This informat can read a.m. and p.m. time values. If hh is greater than 24, and/or mm and ss are greater than 60, the time value read will give the correct number of seconds, even if it is greater than 86399.99 (the number of seconds in a day). This informat will parse the time string being read as hours, minutes, seconds from left to right. Therefore, if you are attempting to read only minutes and seconds, you must have leading zeros for the hours, and values for both minutes and seconds. Otherwise, your value will be translated as hours:minutes, not minutes:seconds. w ranges from 5 to 32 with a default of 8.

Characters
Read

Informat

Resulting SAS Time Value

Formatted Time Using TIME11.1.

Comments

1

124:46

TIME6.

449160.0

124:46:00.0

124 hours, 46 minutes

2

14:11:03.3

TIME10.

51063.3

14:11:03.3

3

08-15

TIME5.

29700.0

8:15:00.0

4

00:10

TIME5.

600.0

0:10:00.0

parsed as hours:minutes

5

00:10:42

TIME8.

642.0

0:10:42.0

parsed as hours, minutes, seconds

6

10.42

TIME10.

38520.0

10:42:00.0

parsed as hours:minutes despite the period separator

7

00:10:42.5

TIME10.

642.5

0:10:42.5

hours:minutes:seconds.fractional seconds

TODSTAMP8.

TODSTAMP8. converts an eight-byte time-of-day stamp produced by the OS TIME macro or the STCK instruction on IBM mainframes into a SAS time value. The width is 8 because these calls return eight-byte time-of-day values. Use this informat when you are reading IBM mainframe time-of-day values on other operating systems.

TU4.

TU4. converts IBM mainframe timer units (38,400 timer units per second) to SAS time values. It is used when reading IBM mainframe timer unit values under other operating systems. The width is 4 because the OS TIME macro returns a four-byte word.

3.4.3 Datetime Informats

B8601CIw.d

B8601CIw.d is available as of SAS version 9.3. It reads IBM time values with a century marker of the form cyymmddhhmmss<fff>, where c represents the century digit. The century digit is calculated by subtracting 1900 from the current year, dividing by 100, and dropping the remainder.yy is the two-digit year from 00 to 99, the mm represents the number of the month, and dd represents the day of the month. The time is represented by hhmmss<fff>, where hh indicates the hours, mm is minutes, ss is the number of seconds, and fff indicates thousandths of seconds. wranges from 10 to 26, with a default value of 16, while d ranges from 0 to 6 for the fractional part of seconds.

Characters
Read

Informat

Resulting SAS
Datetime Value

Formatted Datetime
Using DATETIME19.3.

21504231905

B8601CI16.

4901108700.000

23APR15:19:05:00.00

1560928053505

B8601CI16.

3052964105.000

28SEP56:05:35:05.00

2140630102416454

B8601CI19.3

4875416656.454

30JUN14:10:24:16.45

2131216094500

B8601CI16.

4858479900.000

16DEC13:09:45:00.00

B8601DJw.d

B8601DJw.d is available as of SAS version 9.3. It reads datetimes in standard Java date and time notation. yyyymmddhhmmss<ffffff>, where yyyy is the four-digit year, mm represents the number of the month, dd represents the day of the month, and time is represented by hhmmss<ffffff>, where hh indicates the hours, mm is minutes, ss is the number of seconds, and ffffff indicates millionths of seconds. w ranges from 10 to 26, with a default value of 16, while d ranges from 0 to 6 for the fractional part of seconds. The following table gives examples of how to apply this informat to yield the SAS date value that corresponds to the text shown in each line.

Characters
Read

Informat

Resulting SAS
Datetime Value

Formatted Datetime
Using DATETIME19.5.

201607181108

B8601DJ16.

1784459280.0

18JUL16:11:08:00.00000

20141123054509

B8601DJ16.

1732340709.0

23NOV14:05:45:09.00000

201303070814433064

B8601DJ21.4

1678263283.3064

07MAR13:08:14:43.30640

201406241630254

B8601DJ16.1

1719246625.4

24JUN14:16:30:25.40000

DATETIMEw.d

DATETIMEw.d reads SAS datetime values. The datetime value must be in the form ddmonyy(yy), followed by a blank or a special character, and then the time in the format hh:mm:ss.ff. dd represents the day of the month, mon is the three-letter month abbreviation, and yy(yy) is the two- or four-digit year. hh indicates the number of hours, mm is the number of minutes, and ss is the number of seconds. ff indicates fractional parts of seconds. Both seconds and fractional seconds are assumed to be zero if they are not present. w can be from 13 to 40, with a default of 18, while d can range from 0 to 6.

If you use a two-digit year, SAS will apply the YEARCUTOFF= system option in translating the year. This informat can also read a.m. and p.m. time values.

Characters
Read

Informat

Resulting SAS
Datetime Value

Formatted Datetime
Using DATETIME22.1.

22APR2014 5:23 PM

DATETIME18.

1713806580.0

22APR2014:17:23:00.0

22APR2014-17:23

DATETIME16.

1713806580.0

22APR2014:17:23:00.0

22APR2014:05:23:15 PM

DATETIME22.

1713806595.0

22APR2014:17:23:15.0

22APR2014/17:23:15.6

DATETIME21.1

1713806595.6

22APR2014:17:23:15.6

MDYAMPMw.d

MDYAMPMw.d reads datetime values in the form of mm-dd-yy(yy) hh:mm:ss.ff AM|PM, where mm represents the number of the month, dd represents the day of the month, and yy(yy) is the two- or four-digit year, followed by the time where hh represents hours, mm represents minutes,ss represents optional seconds, ff represents optional fractional seconds, and AM|PM indicates a.m. or p.m. The separators are not optional and can be a hyphen (-), or period (.), or slash (/), or colon (:). The default value of w is 19 and ranges from 8 to 40, while d ranges from 0 to 39 for the fractional part of seconds. The following table gives examples of how to apply this informat to yield the SAS date value that corresponds to the text shown in each line. Line 4 shows that this format can translate twenty-four-hour clock values without the AM|PM indicator.

Characters
Read

Informat

Resulting SAS
Datetime Value

Formatted Datetime
Using DATETIME19.2.

1

05-08-2015 9:33 AM

MDYAMPM18.

1746696780.00

08MAY15:09:33:00.00

2

10-26-2014 5:00 PM

MDYAMPM18.

1729962000.00

26OCT14:17:00:00.00

3

04-25-2013 1:00:57.4 PM

MDYAMPM22.3

1682514057.40

25APR13:13:00:57.40

4

08-09-2017 14:15

MDYAMPM17.

1817907300.00

09AUG17:14:15:00.00

RMFSTAMP8.

RMFSTAMP8. converts IBM mainframe RMF date and time records into SAS datetime values. The width is shown as 8 because RMF records are packed-decimal records with 4 bytes of time information followed by 4 bytes of date information. While PDTIME4. also reads RMF records, it only reads the time portion of the RMF record and produces a SAS time value. RMFSTAMP8. reads both the date and time portions of the RMF record and produces a SAS datetime value.

SHRSTAMP8.

SHRSTAMP8. converts IBM mainframe SHR date and time records into SAS datetime values. The width is shown as 8 because SHR records are packed-decimal records with 4 bytes of date information followed by 4 bytes of time information.

SMFSTAMP8.

SMFSTAMP8. converts IBM mainframe SMF records into SAS datetime values. The width is shown as 8 because SMF records are packed-decimal records with 4 bytes of time information followed by 4 bytes of date information. This enables you to read SMF records without regard to operating system.

YMDDTTMw.d

YMDDTTMw.d reads datetime values in the form <yy>yy-mm-dd hh:mm:ss.ff, where special characters such as a hyphen (-), period (.), slash (/), or colon (:) are used to separate the year, month, day, hour, minute, and seconds. Separators are required for each component. The year can be either two or four digits. yy(yy) is the two- or four-digit year, mm represents the number of the month, and dd represents the day of the month. The time follows, where hh represents hours, mm represents minutes, ss represents optional seconds, and ff represents optional fractional seconds, which must be separated from the seconds by a decimal point (.). This informat will translate a.m. and p.m. time values. The default value of w is 19 and can range from 13 to 40, while d ranges from 0 to 39 for the fractional part of seconds.

Characters
Read

Informat

Resulting SAS
Datetime Value

Formatted Datetime
Using DATETIME22.1.

2014-07-18-11-08-05

YMDDTTM19.

1721300885.0

18JUL2014:11:08:05.0

2014/01/11:05:19:45

YMDDTTM19.

1705036785.0

11JAN2014:05:19:45.0

1983.02.18.05:59 PM

YMDDTTM20.

730058340.0

18FEB1983:17:59:00.0

2015-08-23-14:24:16.5

YMDDTTM22.1

1755959056.5

23AUG2015:14:24:16.5

3.4.4 The "ANYDATE" Series of Informats

SAS 9 addressed an issue with the processing of dates and times that has always affected SAS users. Although informats handle the translation of a string of characters into SAS date and time values easily, in order to use them you had to know what the string of characters looked like before you processed them. Given the many ways that dates and times can be represented, it was not uncommon for several records to have incorrect values after processing because of an error in the underlying text strings being translated. For example, if you expect the dates to be in the form "ddMOMyyyy," but halfway through the file the strings were entered as "mmddyyyy," at least half of your resulting data set will have missing date values. The DATE. informat cannot read strings formatted to be read with the MMDDYY. informat, and vice versa. There are now three informats that will intelligently and, for the most part, successfully enable you to avoid this problem.

The ANYDTDTE., ANYDTDTM., and ANYDTTME. informats will translate dates, datetime values, and time values, respectively, into their corresponding SAS values. This translation will be performed without having to know the representation of these date, datetime, and time values in advance. Some limits exist as to the types of representations these informats will be able to translate, and in this era of big data you should also be aware that using these informats will require more CPU time than if you are able to use one of the regular informats to process your data.

The potential for confusion exists with DDMMYY, MMDDYY, and YYMMDD values, especially in the presence of two-digit year values. The SAS option DATESTYLE indicates how such confusions will be resolved. The possible values for the DATESTYLE= system option are shown in the following table.

Value

Explanations

MDY

Sets the default order as month, day, year. "10-03-12" would be translated as October 3, 2012.

YMD

Sets the default order as year, month, day. "10-03-12" would be translated as March 12, 2010.

DMY

Sets the default order as day, month, year. "10-03-12" would be translated as March 10, 2012.

LOCALE
(default)

Sets the default value according to the LOCALE= system option. When the default value for the LOCALE= system option is "English_US," this sets DATESTYLE to MDY. Therefore, by default, "10-03-12" would be translated as October 3, 2012.

Note: YDM, MYD, and DYM have been removed from SAS 9.3 and later versions. If you continue to use them in legacy code, no error will appear in the log, but any input strings that cannot be translated without knowing the DATESTYLE (for example, "10-03-12") will be translated as missing.

What happens when the input string cannot be translated using the DATESTYLE option in effect? The ANYDT series of informats will test the input string to see whether one of the other DATESTYLE options will work. If only one DATESTYLE produces a valid SAS date or datetime value, then SAS will process the input string using that DATESTYLE. However, it is entirely possible that the input string can be translated using more than one of the remaining DATESTYLE options. For example, the input string "270328" cannot be translated using MDY, but both YMD and DMY will produce a valid SAS date. SAS has an internal priority of which DATESTYLE takes precedence when the option in effect will not work, and two or more DATESTYLES will. It is based on the DATESTYLE option in effect. The following table details how the DATESTYLE is chosen if the DATESTYLE option in effect does not yield a valid SAS date value.

DATESYTLE Option in Effect

Input String

DATESTYLEs Possible

DATESTYLE Used

Result

MDY

170514

YMD, DMY

YMD

May 14, 2017

YMD

110845

DMY, MDY

DMY

August 11, 1945

DMY

No ambiguity is possible. DMY will always take precedence.

In short, when the DATESTYLE is MDY, then any ambiguity will be resolved using YMD, and when the DATESTYLE is YMD, any ambiguity is resolved using DMY.

ANYDTDTEw.

ANYDTDTEw. will translate data that can be read with the following informats: DATE, DATETIME, DDMMYY, JULIAN, MDYAMPM, MMDDYY, MMxYY, MONYY, TIME, YMDDTTM, YYMMDD, YYMMn, or YYQ into SAS date values. This informat can also work with the following standard date formats: "February 25, 2012," "08/2014," or "2015-05." w can range from 5 to 32, and the default width is 9. This informat extracts date values from a datetime string. However, if only a time value is given, the date is assumed to be January 1, 1960. The following table uses OPTIONS DATESTYLE=MDY as the default interpretation for two-digit year values.

Characters
Read

Informat

Resulting SAS Date Value

Formatted Date
Using WORDDATE.

Comments

05172014

ANYDTDTE8.

19860

May 17, 2014

20140517

ANYDTDTE8.

19860

May 17, 2014

2014Q2

ANYDTDTE6.

19814

April 1, 2014

First day of the 2nd quarter.

051714

ANYDTDTE6.

19860

May 17, 2014

17052014

ANYDTDTE8.

19860

May 17, 2014

170514

ANYDTDTE6.

20953

May 14, 2017

The date string can be interpreted as YMD or DMY. YMD is used because the DATESTYLE in effect for this example is MDY.

17MAY2014:15:12:06

ANYDTDTE18.

19860

May 17, 2014

15:12:06

ANYDTDTE8.

0

January 1, 1960

Time value, date is considered to be January 1, 1960.

2014137

ANYDTDTE7.

19860

May 17, 2014

MAY2014

ANYDTDTE7.

19844

May 1, 2014

No day is given; day is set to first day of month.

17MAY2014

ANYDTDTE9.

19860

May 17, 2014

May 17, 2014

ANYDTDTE12.

19860

May 17, 2014

14-05

ANYDTDE9.

19844

May 1, 2014

Two-digit year uses YEARCUTOFF= option. No day is given; day is set to first day of month.

2014-05

ANYDTDTE7.

19844

May 1, 2014

No day is given; day is set to first day of month.

05-2014

ANYDTDTE7.

19844

May 1, 2014

No day is given; day is set to first day of month.

05-17-2014 3:12:06 PM

ANYDTDTE21.

19860

May 17, 2014

2014-05-17-15:12:06

ANYDTDTE19.

19860

May 17, 2014

ANYDTDTMw.

ANYDTDTMw. will translate data that can be read with the following informats: DATE, DATETIME, DDMMYY, JULIAN, MDYAMPM, MMDDYY, MMxYY, MONYY, TIME, YMDDTTM, YYMMDD, YYMMn, or YYQ into SAS datetime values. This informat can also work with the following standard date formats: "February 25, 2012," "08/2014," or "2015-05." w can range from 1 to 32, and the default width is 19.

This informat will parse time values from input strings based on colons and periods. If there is one colon (for example, 15:12), the first two digits are translated as hours and the last two as minutes. If there are two colons (for example, 15:12:06), the digits preceding the first colon are translated as hours, the next set of two digits as minutes, and the last two as seconds. If there is a single decimal point and one or more colons (for example, 12:06.5), the value following the decimal point is translated as a decimal fraction of seconds, and the value preceding the decimal point is considered to be seconds. A single colon is therefore interpreted as minutes:seconds.fractional seconds. In order to account for hours, the correct format would be hours:minutes:seconds.fractional seconds.

If there are multiple decimal points (for example, 15.12.06), then they are considered to be delimiters for date values and thus are not translated as time values.

This informat extracts datetime values from a string. If only a time value is given, the date is assumed to be January 1, 1960. If only a date value is given, the time is assumed to be midnight (0:00). The following table uses the same input data as is used in the ANYDTDTE. informat example above.

Characters
Read

Informat

Resulting SAS
Datetime Value

Formatted Datetime
Using DATETIME22.

05172014

ANYDTDTM.

1715904000

17MAY2014:00:00:00

20140517

ANYDTDTM.

1715904000

17MAY2014:00:00:00

2014Q2

ANYDTDTM.

1711929600

01APR2014:00:00:00

051714

ANYDTDTM.

1715904000

17MAY2014:00:00:00

17052014

ANYDTDTM.

1715904000

17MAY2014:00:00:00

170514

ANYDTDTM.

1810339200

14MAY2017:00:00:00

17MAY2014:15:12:06

ANYDTDTM.

1715958726

17MAY2014:15:12:06

15:12:06

ANYDTDTM.

54726

01JAN1960:15:12:06

2014137

ANYDTDTM.

1715904000

17MAY2014:00:00:00

MAY2014

ANYDTDTM.

1714521600

01MAY2014:00:00:00

17MAY2014

ANYDTDTM.

1715904000

17MAY2014:00:00:00

May 17, 2014

ANYDTDTM.

1715904000

17MAY2014:00:00:00

2014-05

ANYDTDTM.

1714521600

01MAY2014:00:00:00

05-2014

ANYDTDTM.

1714521600

01MAY2014:00:00:00

14-05

ANYDTDTM.

1714521600

01MAY2014:00:00:00

05-17-2014 3:12:06 PM

ANYDTDTM.

1715958726

17MAY2014:15:12:06

2014-05-17-15:12:06

ANYDTDTM21.

1715958726

17MAY2014:15:12:06

ANYDTTMEw.

ANYDTTMEw. will translate data that can be read with the following informats: DATE, DATETIME, DDMMYY, JULIAN, MDYAMPM, MMDDYY, MMxYY, MONYY, TIME, YMDDTTM, YYMMDD, YYMMn, or YYQ into SAS time values. This informat can also work with the following standard date formats: "February 25, 2012," "08/2014," or "2015-05." w can range from 1 to 32, and the default width is 8. ANYDTTMEw. can extract time values from a datetime value. However, if only a date value is given, the time is assumed to be 12:00 a.m.

Characters
Read

Informat

Resulting SAS Time Value

Formatted Time Using TIME8.

Comments

17MAY2014:15:12:06

ANYDTTME18.

54726

15:12:06

Extracted time from datetime string.

15:12:06

ANYDTTME.

54726

15:12:06

05-17-2014 3:12:06 PM

ANYDTTME21.

54726

15:12:06

Extracted time from datetime string, processed AM/PM correctly.

2014-05-17-15:12:06

ANYDTTME18.

54726

15:12:06

Extracted time from datetime string.

08.05

ANYDTTME.

.

.

Times are delimited with colons (:).

08:05

ANYDTTME.

29100

8:05:00

08:05:05

ANYDTTME.

29105

8:05:05

17:15

ANYDTTME.

62100

17:15:00

2:45 PM

ANYDTTME.

53100

14:45:00

6 AM

ANYDTTME.

.

.

Need a delimiter to provide context for translation.

6:00 AM

ANYDTTME.

21600

6:00:00

27:36:58

ANYDTTME.

99418

27:36:58

3.4.5 So Why Not Just Use the "ANYDATE" Series of Informats?

It is tempting to automatically use ANYDTDTE., ANYDTTM., or ANYDTTME. to process strings representing dates, datetimes, and times. You do not have to worry about the formatting of the input string, and SAS will make sense out of it. The "ANYDATE" informats do have limitations, and the rules that SAS follows might not be the rules that you need to apply.

First, there is the matter of the additional processing needed. These informats go through a decision tree to determine how to translate every single value encountered; therefore, the amount of additional processing will increase with the number of times each "ANYDATE" informat has to be used. This would have a negligible impact on a small amount of data, but if you need to use them on big data, you might want to consider standardizing the representation of your date, time, and datetime text values beforehand and using the corresponding informat. Second, it is entirely possible that you would want to consider a nonstandard value erroneous and don't want SAS to decide what to do with it without you being able to inspect it first. Third, exceptions can occur, even when the use of an ANYDATE informat is warranted. While it is always a good idea to check all data that you are converting to SAS from another source and especially when you are converting dates, times, and datetime values, it is critical if you are using the ANYDT. informats.

Finally, it is important to note that the ANYDATE series of formats are designed to handle the issue of varying ways of representing dates, but they, like all other date, time, and datetime informats, are not designed to handle dirty data. Missing separators, misspelled words, and missing date components are a few of the major data problems that are encountered when processing dates, times, and datetimes from sources other than SAS. Unless you specifically provide code to fix those problems on the fly or can ensure that your dates are clean before processing them, you will wind up with missing date, time, or datetime values in your data set. SAS programmers have written a great deal of code to handle this specific issue; King and Fleming show one such approach in the 2011 SAS Global Forum proceedings (seehttp://support.sas.com/resources/papers/proceedings11/117-2011.pdf). Example 3.7 is a skeleton of an approach using a user-written function and the ANYDATE. informat.

Example 3.7: A Simple Function to Handle Dates with Missing Separators

LIBNAME control "c:\book\2ndEd\examples";

ODS ESCAPECHAR='~';

OPTIONS CMPLIB=(control.functions);

PROC FCMP OUTLIB=control.functions.dates;

FUNCTION makedate(cdate $)$;

LENGTH dtstr $42

supr $16;

supr=' ';

* assume that date is in MMDDYY form, but check;

* missing dates often have a form of __/__/____;

IF SUBSTR(cdate,7,4) NOT IN (' ','____') THEN DO;

* assume year is good;

yy=INPUT(SUBSTR(cdate,7,4),4.);

m=SUBSTR(cdate,1,2);

IF '01' <= m <= '12' THEN DO;

mm=INPUT(m,2.);

* Month is good chk day;

* Use INTNX to find the maximum number of days in current month;

d=SUBSTR(cdate,4,2);

IF '01' <= d <= PUT(DAY(INTNX('MONTH',MDY(mm,1,yy),0,'e') ),z2.)

THEN dd=INPUT(d,2.);

ELSE DO;

* Day is bad for this month;

* Fix by dividing number of days in month by 2;

dd = CEIL(DAY(INTNX('MONTH',mdy(mm,1,yy),0,'e'))/2);

supr = '~{super Day}';

END;

END;

ELSE DO;

* Month is bad - reset month and day;

mm=6;

dd=31;

supr = '~{super Month}';

END;

* Create the date value;

date=MDY(mm,dd,yy);

dtstr = CATT(PUT(date,MMDDDYY10.),supr);

END;

ELSE DO;

* yr is bad;

dtstr = 'Unknown~{super Y}';

END;

RETURN(dtstr);

ENDSUB;

RUN;

QUIT;

PROC FORMAT;

VALUE $mkdt

OTHER=[makedate()];

RUN;

The code in Example 3.7 created a function to be used as a format for our dates, which will impute a replacement date string when provided with a date string that may or may not have missing date components. Note that this is a character format because it has to treat the incoming date data as character strings. If you try to read date strings with missing date components as a SAS date, you will get a missing value.

Now let's read in some sample data. The data set RPTDATES contains production error tracking data from a legacy system. Unfortunately, some of the date data was corrupted, so there are missing and incorrect days, months, and years.

DATA rptdates;

INPUT recno @4 rptdt $10. errs;

DATALINES;

1 13/15/2013 374

2 09/08/2013 3

3 02/30/2013 32

4 __/15/2013 15

5 04/27/2013 195

6 05/__/2013 17

7 07/08/____ 20

8 06/04/013 5

9 08/32/2013 4

10 11//2013 6

11 01/23/2031 11

;;;;

RUN;

But now that we've created a function that is to be used as a format, let's use it. Below is a data set with some dates and a count of machine faults occurring on that date. As you can see, the dates have not been entered very well, with missing components (rows 3, 5, and 6) and typographical errors (rows 7 and 9). We're going to use the format $MKDT to impute the dates we can, and annotate each date that was imputed in the output string.

Figure 3.1: RPTDATES Data Set with Invalid Dates

image

Now we will use a PROC REPORT to display the table and format our bad data. In line 2, we alias the original date string as a second column in order to show it with and without the format that we created. We leave the original date string in the variable rptdt and in line 4 we apply the format that we created to the values in the aliased column dt.

1. PROC REPORT DATA=rptdates NOWD SPLIT='\';

2. COLUMN rptdt rptdt=dt errs;

3. DEFINE rptdt / DISPLAY "Original\Date from\Error Log";

4. DEFINE dt / 'Imputed\Date' f=$mkdt36.;

5. DEFINE errs / "Errors\Reported";

6. RUN;

The result of the PROC REPORT is below. The "Imputed Date" column is nothing more than the result of displaying the original date string using the $MKDT. format.

Output

Original
Date from
Error Log

Imputed
Date

Errors
Reported

13/15/2013

06/30/2013Month

374

09/08/2013

09/08/2013

3

02/30/2013

02/14/2013Day

32

__/15/2013

06/30/2013Month

15

04/27/2013

04/27/2013

195

05/__/2013

05/16/2013Day

17

07/08/____

UnknownY

20

06/04/013

UnknownY

5

08/32/2013

08/16/2013Day

4

11//2013

11/15/2013Day

6

01/23/2031

UnknownY

11

This capacity to use PROC FCMP to create functions that process character strings and then use the function you create in a custom informat (or format, for that matter) gives you the ability to create custom date informats. With a little creativity on your part, you can use this method to solve some of the more difficult date processing challenges that you may encounter.