Carriage Returns and SAS under Unix
One of the differences between Windows, Macs and Unix computers is the way that
these operating systems mark
the end of lines in text files. Files created on Windows have both a newline
character and carriage return character at the end of each line; on a Mac,
there's only a carriage return at the end of each line, while files
created under Unix (or Linux) have only a newline character at the end of each
line. If you create a file on Windows (say by saving an Excel spreadsheet as
a comma-separated file, or by editing a data file in notepad), the extra
carriage return characters will confuse SAS, and you may see a message like this:
NOTE: Invalid data for d in line 1 7-8.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+---
1 CHAR 1 2 3 4. 8
ZONE 32323230
NUMR 1020304D
a=1 b=2 c=3 d=. _ERROR_=1 _N_=1
The period (.) after the 4 in the line labeled CHAR in the above
log extract indicates that SAS saw a "non-printing" character; the 0 and
D below the period indicates that the offending character has a hexadecimal
value of 0D, which corresponds to a carriage return character. The upshot of
this is that the last variable on each line (except for the very last) will not be read correctly.
Similarly, if you create a comma-separated file from Excel on a Mac, SAS may
report that it truncates lines, since it won't recognize the carriage return
as an actual newline. Alternatively, SAS may simply see far fewer observations
than were actually in your data. You may also see something like this in the
log:
NOTE: Invalid data for d in line 1 7-9.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+---
1 CHAR 1 2 3 4.5 6 7 8. 16
ZONE 3232323032323230
NUMR 1020304D5060708D
a=1 b=2 c=3 d=. _ERROR_=1 _N_=1
The hexadecimal value of 0D in the middle of the line is an indication
that SAS is having trouble reading your file because the line endings came from
a Mac.
To solve these kinds of problems, you can use a special type of fileref known as a
pipe. A pipe is a method wherein the input to your program is the output from some
other program, instead of simply being the contents of a file on your computer or
the internet. The program we'll use is the Unix utility called tr, which
is an abbreviation for translate.
Suppose the data file created by Windows is called mydata.txt. We can
create the appropriate fileref as follows:
filename mydata pipe "tr -d '\r' < mydata.txt";
The "tr -d '\r' < " part of the command never changes; the only
part that changes is the name of the file. Once you've created this fileref, you
can use it in the infile statement to make sure that when SAS reads your
data, the carriage returns are removed.
To fix a file created on a Mac, we use a similar pipe:
filename mydata pipe "tr '\r' '\n' < mydata.txt";
As in the Windows example, the "tr '\r' \'n' <" part of the command
never changes, and you can use the fileref anywhere that you would ordinarily
use a filename.
File translated from
TEX
by
TTH,
version 3.67.
On 7 Aug 2008, 11:12.