Carriage Returns and SAS under Unix

One of the differences between Windows, Macs and Unix computers is the way that these operating systems mark the end of lines in text files. Files created on Windows have both a newline character and carriage return character at the end of each line; on a Mac, there's only a carriage return at the end of each line, while files created under Unix (or Linux) have only a newline character at the end of each line. If you create a file on Windows (say by saving an Excel spreadsheet as a comma-separated file, or by editing a data file in notepad), the extra carriage return characters will confuse SAS, and you may see a message like this:

NOTE: Invalid data for d in line 1 7-8.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+---

1   CHAR  1 2 3 4. 8
    ZONE  32323230
    NUMR  1020304D
a=1 b=2 c=3 d=. _ERROR_=1 _N_=1

The period (.) after the 4 in the line labeled CHAR in the above log extract indicates that SAS saw a "non-printing" character; the 0 and D below the period indicates that the offending character has a hexadecimal value of 0D, which corresponds to a carriage return character. The upshot of this is that the last variable on each line (except for the very last) will not be read correctly.

Similarly, if you create a comma-separated file from Excel on a Mac, SAS may report that it truncates lines, since it won't recognize the carriage return as an actual newline. Alternatively, SAS may simply see far fewer observations than were actually in your data. You may also see something like this in the log:

NOTE: Invalid data for d in line 1 7-9.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+---

1   CHAR  1 2 3 4.5 6 7 8. 16
    ZONE  3232323032323230
    NUMR  1020304D5060708D
a=1 b=2 c=3 d=. _ERROR_=1 _N_=1

The hexadecimal value of 0D in the middle of the line is an indication that SAS is having trouble reading your file because the line endings came from a Mac.

To solve these kinds of problems, you can use a special type of fileref known as a pipe. A pipe is a method wherein the input to your program is the output from some other program, instead of simply being the contents of a file on your computer or the internet. The program we'll use is the Unix utility called tr, which is an abbreviation for translate.

Suppose the data file created by Windows is called mydata.txt. We can create the appropriate fileref as follows:

filename mydata pipe "tr -d '\r' < mydata.txt";

The "tr -d '\r' < " part of the command never changes; the only part that changes is the name of the file. Once you've created this fileref, you can use it in the infile statement to make sure that when SAS reads your data, the carriage returns are removed.

To fix a file created on a Mac, we use a similar pipe:

filename mydata pipe "tr '\r' '\n' < mydata.txt";

As in the Windows example, the "tr '\r' \'n' <" part of the command never changes, and you can use the fileref anywhere that you would ordinarily use a filename.

File translated from T_EX by T_TH, version 3.67.
On 7 Aug 2008, 11:12.