RAFT Documentation

 

Introduction
Download
Examples
Description of the plots
Scatterplot
Importance
Inputs
Prototypes
Votes
Proximities
Interactions
Description of the buttons
Controls
Select (Scatterplot)
Select (Prototypes)
Clear
Reset
Jitter
Output
Clone
Save jpeg
Blend/Lines
Hide/Show nonselected
Lines/Fill
Advanced Features

Introduction

This document illustrates how to use the java-based Random Forests Tool (RAFT) to visualize results from a random forests analysis. RAFT displays the plots that can be useful in a random forests analysis, in an easy-t-use interface. Probably the most valuable single feature is that RAFT allows the user to easily select subgroups and to focus the visualization on those subgroups.

Download

RAFT can be downloaded from the random forests graphics page. The fortran code can be downloaded from the random forests software page. The fortran code can be used with little or no knowledge of fortran, using the instructions provided in the random forests manual page. For most applications, it is only necessary to modify a few parameters at the top of the program, and to tell the program where the data reside. To run RAFT, set iviz=1 (line 7). Compiling and running the fortran program will produce files named save-* and *.txt. Copy the files named *.txt into a suitably-named folder in the RAFT home directory (usually C:\Program Files\RAFT). Double-click the RAFT icon and enter the folder name when prompted.

Examples

We provide two examples of fortran code for running random forests and using RAFT. The first is the glass code, using the glass data, the second is the microarray code, using the microarray data. For convenience, the results of running these programs are included in the RAFT distribution, so the user can run RAFT on the glass data or microarray data simply by double-clicking the executable and typing the word glass or micro when prompted.

Description of the Plots

This section illustrates RAFT using the glass data. RAFT produces a window with tabs for the different displays:

(some of these will not show if the corresponding feature was not requested in the fortran code).

Scatterplot

Initially, this is a 3-D plot of the first 3 MDS coordinates obtained from the proximity matrix. The scatterplot is colored according to the true class. The figure can be rotated using the left mouse button. It can be zoomed by holding down the shift key and dragging the left mouse button. It can be panned by holding down the control key and dragging the left mouse button. Points can be circled by dragging and dropping with the right mouse button. The following buttons are available:

The initial Scatterplot window for the glass data looks like this:

Importance

This is a parallel coordinate plot of variable importance. Initially, each case is represented by a piecewise-linear trace, colored according to the true class of that case. High values suggest that the corresponding variable is important in correctly classifying the case. The colors can be changed using the Controls button on the Scatterplot display.

Negative importance values suggest that the variable can have a detrimental impact on the classification. Typically, negative values for some cases are accompanied by positive values for other cases, suggesting that the classes involved are intermingled and increased accuracy in one class comes at the expense of decreased accuracy for the other. One interpretation is that the variable in question is instrumental in separating those classes.

The following buttons are available:

The initial Importance window for the glass data looks like this:

Inputs

This is a parallel coordinate plot of the input variables. Initially, each case is represented by a piecewise-linear trace, colored according to the true class of that case. The colors can be changed using the Controls button on the Scatterplot display.

The following buttons are available:

The initial Inputs window for the glass data looks like this:

Prototypes

This is a parallel coordinate plot of the prototypes, if requested in the fortran code. Each prototype is represented by a piecewise-linear trace, colored according to the true class of that case.

The following buttons are available:

The initial Prototypes window for the glass data looks like this:

Votes

This shows a heatmap of the votes for each case. The cells are colored according to the true class. A darker cell indicates that a higher percentage of trees voted for that case, when it was out-of-bag. The following buttons are available:

The initial Votes window for the glass data looks like this:

Proximities

This shows a heatmap of the proximity matrix. The cells are colored according to the true class. Darker cells indicate higher proximities. The following buttons are available:

The initial Proximities window for the glass data looks like this:

Interactions

This shows a heatmap of the interaction matrix. Darker cells suggest interactions. The following buttons are available:

The initial Interactions window for the glass data looks like this:

Description of the buttons

Controls

This button pops up a window similar to this one:

To change the image, select a variable in the left-hand panel, click on one of the 5 central buttons, and the variable will be assigned to that aspect of the display, replacing any other variable that was previously assigned. The window above shows the result of selecting "misclassified" and clicking "Color". Typically, the "Clear all " and "Clear selected" buttons are not used. For rare occasions in which a 2-dimensional plot, or an un-colored plot are required, the "Clear all" button can be used to remove all assignments, and the "Clear selected" button can be used to remove assignments that have been selected by clicking in the right-hand panel.

For the glass data, clicking on the "Done" button in the window above gives the following Scatterplot:

Red cases are misclassified, blue are correctly classified. The coloring is pervasive through the Importance plot and the Inputs plot, as shown in the following two images:

Select (Scatterplot)

This button is used to select cases of interest. Cases of interest can be selected by repeatedly dragging and dropping the right mouse button. The code uses a Delaunay tessellation to determine which points are inside the curve. An example display might look like the following:

Once the selected cases have been circled, click the "Select" button, and a window appears:

The check boxes allow the user to select any combination of classes and/or the points inside the curves. The "All" box is a fast way to reset previous selections. In the glass example, clicking the "done" button results in the following plot:

Please note that if the picture is rotated, the curves rotate in a strange way. It was our choice to leave the curves visible, in case a slightly better view is obtained after a slight rotation. Nonselected points are dark grey in the Scatterplot, and also in the Importance, Inputs, and Proximities plots. For example, the Importance plot is shown below. To remove the grey traces, use the Hide nonselected button.

Clear

This button clears the most recently-drawn curve.

Reset

This button resets any mouse-drive rotations, pans and zooms.

Jitter

This button adds a small amount of random noise to the data, primarily for breaking ties. It is a toggle.

Output

This button allows the user to save the selected cases. It pops up a results table like the one given below, which can be saved by clicking "file" and choosing a location. The results are saved in a text file.

Clone

This button clones the current window, selecting points that are circled. Changes made to this window are independent of those made to the first window, so that different views can be compared.

Save jpeg

This button allows the user to save the image to a jpeg file. A dialog box appears, allowing the user to specify the location in which to save the image. Only the plot is saved, not the user interface. The background of the plot is white. Please note that at the present time, these images are not publication-quality images.

Blend/Lines

For the Importance and Inputs plots, clicking on the "Blend" button produces an alpha-blended version of the parallel coordinates plot. For the glass data, the blended Importance plot looks like this:

The opacity of the shading can be increased or decreased by moving the slider to the left or right, respectively:

Hide/Show nonselected

When some cases have been selected by clicking on the "Select" button in the Scatterplot, the Importance and Inputs plots will have grey traces for the nonselected cases:

These can be removed using the "Hide nonselected" button:

Currently, the "Hide nonselected" button works independently for the Importance and Inputs.

Lines/Fill

The default prototype plot shows shading going from the lower quartile to the upper quartile:

To eliminate the shading, click the "Lines" button:

Select (Prototypes)

This button is used to select from the available prototypes. Clicking on the button gives a window like this:

The check boxes allow the user to select any combination of prototypes. The "All" box is a fast way to reset previous selections. In the glass example, clicking the "done" button results in the following plot:

As for the Importance and Inputs plots, the opacity of the shading can be increased or decreased by moving the slider to the left or right, respectively:

 

Advanced Features

Command-line Arguments

There are two command-line arguments:

  • -norescale
  • removes automatic rescaling of variables for the parallel coordinates plots
  • -dir dirname
  • "dirname" specifies the folder name

    Variable Names

    Input variable names can be supplied, one per line, in a file called "names.txt" in the same folder as the data. Currently, these are used to label columns when data are saved to a file.

    Auxiliary Variables

    The user can supply auxhiliary variables that were not used in the random forests analysis, by creating a file called "auxdat.txt" in the same folder as the data. The file must contain the same number of rows as the dataset itself, and may have an arbitrary number of columns. Variable names must be supplied, one per line, in a file named "auxnames.txt". The auxiliary variables are available for plotting using the "Controls" button in the Scatterplot.