1  General Information

Most procs in SAS will have a data= option to specify which data set you need to work with. Don't forget that data set options can be specified as part of this argument. To subset data being used by a proc, the where statement can be used, just as in a data step. To process data separately by groups, the by statement can be used. The label and format statements are sometimes helpful for making the output look the way you want it to.
Many statement in SAS allow optional arguments; these are generally specified after a slash (/) in your SAS program.
When you're trying to produce an output data set from a procedure, don't overlook the possibility of using the Output Delivery System.

2  proc print

The simplest procedure in SAS is proc print, which displays the values for some or all variables in a SAS data set. If you don't want all the variables to be printed, the var statement can be used to provide a list of the variables you want.
By default, proc print prints an observation number at the beginning of each line; to suppress it use the noobs option. When more than one panel is required to display all the variables, SAS uses this observation number to label the multiple panels. If you'd rather use variables of your own choosing for this, specify the variables you want to appear on each panel in the id statement.
To use variable labels instead of names as the column headers, specify the label option.
For more control over the way your data set is displayed, you may want to investigate the tabulate and report procedures.

3  proc plot and proc gplot

The only required statement for these procedures is the plot statement. Thus to make a scatter plot with x on the x-axis, and y on the y-axis, the statements
proc plot data=yourdata;
   plot y*x;
run;

could be used. If there's a grouping variable in your data set, a plot statement like "plot y * x = group" will provide separate points or lines for each level of group. To overlay several plots on the same graph, use the overlay option, as in
plot y*x  z*x/overlay;

The difference between plot and gplot is that proc plot produces a low-resolution plot along with all the rest of SAS's output, while proc gplot opens a new window, and displays a high-resolution plot. To save such a plot, go to File -> Export As Image and choose an image type and name for the saved plot. You can also choose Edit -> Edit Current Graph to change the graph dynamically.
To change the appearance of the points or lines in high-resolution plots, use a symbol statement before the proc gplot call.
Among the other graphics programs available in SAS are gchart, gcontour, gmap, g3d, and g3grid.

4  proc means and proc summary

As their names imply, these procedures produce summary statistics (like means, standard deviations, etc.). The difference between the two is that proc means always produces printed output, and produces an output data set by request only, while proc summary must be instructed to either print or produce an output data set. Internally, the two programs are the same.
To tell these procedures which statistics they should display, you include keywords on the proc statement. For example, to produce a listing with the mean, standard error and maximum for variables x, y, and z in data set mydata, you could use the following statements:
proc means data=mydata mean stderr max;
   var x y z;
run;

Notice that those are the only statistics which will be displayed.
One very useful feature of these procedures is that they can do analyses on groups of data without the need to sort the data set first, by using the class statement instead of the by statement. Any variables on the class statement will be included in output data sets that are created.
To produce an output data set with either procedure, you must use an output statement. Assuming the same data set and variables as the previous example, here are some sample output statements:
output out=new mean=mx my mz nmiss=nmx nmy nmz; *- produces mean and nmiss for each variable;
output out=new mean=mx my std=sx sy sz;         *- only outputs mean for x and y;
output out=new max=;                            *- outputs maximum using each variable's name;

5  proc univariate

In earlier versions of SAS, there were several statistics available through proc univariate that were not available through proc means, but this is no longer the case. The primary reason to use proc univariate is that it provides a graphical view of a variable's distribution through the plot and boxplot options. In addition, some users find the layout of proc univariates output to be more useful than the very abbreviated output of proc means.

6  proc freq

The main purpose of proc freq is to produce tabulations. For each variable listed on the tables statement, SAS will produce output showing how many times each possible value of that variable was found in a data set. To produce a cross-tabulation (that is, a table where the rows represent possible values for one variable, the columns represent possible values for a second variable, and the value in the table represents the number of times observations with the specified values of the two variables were found in the data set), separate the two variables names with an asterisk(*); if more than two variables are supplied this way, SAS will produce a series of 2x2 tables. To see multiple tabulations in a consise form (with one column for each variable, and another column showing counts), use the list option of the tables statement.
Along with the cross-tabulation, a variety of options to the table statement will generate statistical tests regarding the independence of the variables being studied. The out= option can be used with the table statement to create an output data set. Statistics for the individual cells of the table are also generated by default; a variety of keywords beginning with the letters no are available on the tables statement to suppress these additional values.

7  proc reg

proc reg is the basic procedure for performing regression analysis in SAS. On the model statement, you put the name of your dependent variable(s) on the left-hand side of an equals sign (=), and list the independent variables on the right hand side, separated by spaces. A large number of options are available on the model statement to specify exactly what information about the regression SAS will compute and display.
The output statement allows you to create output data sets with predicted and/or residual values, while the outest= and outsscp= clauses of the proc reg statement allow outputting parameter estimates or the sums of squares and crossproducts matrix, respectively.



File translated from TEX by TTH, version 3.67.
On 25 Jul 2008, 11:41.