1 General Information
Most procs in SAS will have a data= option to specify which
data set you need to work with. Don't forget that data set options can
be specified as part of this argument. To subset data being used by a proc,
the where statement can be used, just as in a data step.
To process data separately by groups, the by statement can be
used. The label and format statements are sometimes
helpful for making the output look the way you want it to.
Many statement in SAS allow optional arguments; these are generally
specified after a slash (/) in your SAS program.
When you're trying to produce an output data set from a procedure, don't
overlook the possibility of using the Output Delivery System.
2 proc print
The simplest procedure in SAS is proc print, which displays the
values for some or all variables in a SAS data set. If you don't want
all the variables to be printed, the var statement can be used
to provide a list of the variables you want.
By default, proc print prints an observation number at the beginning
of each line; to suppress it use the noobs option. When more than
one panel is required to display all the variables, SAS uses this observation
number to label the multiple panels. If you'd rather use variables of your
own choosing for this, specify the variables you want to appear on each
panel in the id statement.
To use variable labels instead of names as the column headers, specify the
label option.
For more control over the way your data set is displayed, you may want to
investigate the tabulate and report procedures.
3 proc plot and proc gplot
The only required statement for these procedures is the plot
statement. Thus to make a scatter plot with x on the x-axis,
and y on the y-axis, the statements
proc plot data=yourdata;
plot y*x;
run;
could be used. If there's a grouping variable in your data set,
a plot statement like "plot y * x = group" will provide separate
points or lines for each level of group. To overlay several plots
on the same graph, use the overlay option, as in
plot y*x z*x/overlay;
The difference between plot and gplot is that proc plot
produces a low-resolution plot along with all the rest of SAS's output, while
proc gplot opens a new window, and displays a high-resolution plot. To
save such a plot, go to File -> Export As Image and choose an image type
and name for the saved plot. You can also choose Edit -> Edit Current Graph
to change the graph dynamically.
To change the appearance of the points or lines in high-resolution plots, use a
symbol statement before the proc gplot call.
Among the other graphics programs available in SAS are gchart, gcontour,
gmap, g3d, and g3grid.
4 proc means and proc summary
As their names imply, these procedures produce summary statistics (like means, standard
deviations, etc.). The difference between the two is that proc means always produces
printed output, and produces an output data set by request only, while proc summary
must be instructed to either print or produce an output data set.
Internally, the two programs are the same.
To tell these procedures which statistics they should display, you include keywords on
the proc statement. For example, to produce a listing with the mean, standard
error and maximum for variables x, y, and z in data set
mydata, you could use
the following statements:
proc means data=mydata mean stderr max;
var x y z;
run;
Notice that those are the only statistics which will be displayed.
One very useful feature of these procedures is that they can do analyses on
groups of data without the need to sort the data set first, by using the
class statement instead of the by statement. Any variables
on the class statement will be included in output data sets that
are created.
To produce an output data set with either procedure, you must use an output
statement. Assuming the same data set and variables as the previous example, here
are some sample output statements:
output out=new mean=mx my mz nmiss=nmx nmy nmz; *- produces mean and nmiss for each variable;
output out=new mean=mx my std=sx sy sz; *- only outputs mean for x and y;
output out=new max=; *- outputs maximum using each variable's name;
5 proc univariate
In earlier versions of SAS, there were several statistics available through
proc univariate that were not available
through proc means, but this is no longer the case. The primary reason to use
proc univariate is that it provides a graphical view of a variable's distribution through
the plot and boxplot options. In addition, some users find the layout
of proc univariates output to be more useful than the very abbreviated output of
proc means.
6 proc freq
The main purpose of proc freq is to produce tabulations.
For each variable listed on the tables statement, SAS will produce output
showing how many times each possible value of that variable was found in a data set.
To produce a cross-tabulation (that is, a
table where the rows represent possible values for one variable, the columns represent
possible values for a second variable, and the value in the table represents the number
of times observations with the specified values of the two variables were found in the
data set), separate the two variables names with an asterisk(*); if more than
two variables are supplied this way, SAS will produce a series of 2x2 tables.
To see multiple tabulations in a consise form (with one column for each variable, and
another column showing counts), use the list option of the tables statement.
Along with the cross-tabulation, a variety of options to the table statement will
generate statistical tests regarding the independence of the variables being studied. The
out= option can be used with the table statement to create an output
data set. Statistics for the individual cells of the table are also generated by default;
a variety of keywords beginning with the letters no are available on the
tables statement to suppress these additional values.
7 proc reg
proc reg is the basic procedure for performing regression analysis in SAS.
On the model statement, you put the name of your dependent variable(s) on
the left-hand side of an equals sign (=), and list the independent variables
on the right hand side, separated by spaces. A large number of options are available on the
model statement to specify exactly what information about the regression SAS
will compute and display.
The output statement allows you to create output data sets with predicted
and/or residual values, while the outest= and outsscp= clauses of
the proc reg statement allow outputting parameter estimates or the
sums of squares and crossproducts matrix, respectively.
File translated from
TEX
by
TTH,
version 3.67.
On 25 Jul 2008, 11:41.