SticiGui: Melts in your Browser, not in your Brain

Joint Berkeley-Stanford Statistics Colloquium
at
Stanford University
Department of Statistics

27 October 1998

P.B. Stark
Department of Statistics
University of California, Berkeley

 


Outline

  1. What I'm trying to accomplish with SticiGui

  2. Demonstration of some of the features of SticiGui

  3. Student responses, and caveats for the instructor.

  4. Language basics: HTML, Java, JavaScript, Cookies.

 

 


Basic Goals

The basic idea is to teach introductory statistics using the world-wide web for all aspects of the course: text, labs, problem sets, practice exams, etc.

I want to use the internet not to deliver traditional material at a distance, but rather to involve the student in interactive exercises and demonstrations that help him or her develop understanding, without requiring the student to learn a statistics package.

I want to have reproducible lectures in a sense similar to how Claerbout and Donoho describe reproducible research: the student can reproduce lecture demonstrations, illustrations, calculations, etc., anywhere, anytime. The software is a click away, and its parameter settings are documented.

I believe that some (maybe even many) concepts in introductory statistics can be taught better using web-based materials than simply using traditional lecture methods and "dead-tree" books.

 

Level of the Materials

The materials represent a compromise between a "statistical literacy" course, such as the excellent book Statistics, by Freedman, Pisani, and Purves (Norton, 1996, FPP), and a more traditional "statistical methods" course, such as the book Introduction to the Practice of Statistics by Moore and McCabe. The material has somewhat more computational content than FPP, but the computations are motivated by inference problems. Probability, hypothesis testing, randomization, and sampling error, are woven into the discussion of experiments and sample surveys.

Technical Design Criteria

SticiGui is implemented in HTML, Java, and JavaScript. That choice was motivated by these design criteria:

  1. Maximize accessibility and portability. Recent browsers allow this material to be accessed from almost anywhere in the world, without adding "plug-ins" to the browser, and without buying any proprietary software. The software runs under every major operating system (UNIX, LINUX, Windows, Mac), because Netscape and/or Microsoft have versions of their browsers that run on those operating systems.

  2. Maximize interactivity and minimize technological barriers. I want the students to be able to explore data and to ask and answer "what-if" questions, without needing to learn how to use a conventional statistical software package. Using Java enabled me to develop point-and-click tools whose use was fairly obvious (no hidden menus, consistent GUI, etc.). Moreover, if a "plug-in" were required, downloading the "plug-in" would present a considerable barrier to some students. Browsers come installed on all new personal computers, so this material is immediately accessible.

  3. Minimize bandwidth and maximize speed. Using Java allow the figures and plots to be generated on the client-side. The code and data download to the client, then the client computes and creates the figures. This also is by far the most efficient way to get dynamic interaction with the data. Otherwise, every time the user changed a parameter value, the client would need to send a message to the server, and the server would have to compute the new figure, and send the resulting figure over the internet to the client. Interactive real-time data exploration would not be possible. There are a small number of figures in SticiGui that are stored as GIF or JPG files; almost all the figures are computed by the client. Sending just the data and the rules (programs) for generating figures from the data substantially reduces the time it takes pages to load.

  4. Make it easy to use the materials in lectures. Because the software is free-standing (it does not need a server for computations), it is easy to display the content in the classroom without an internet connection. That allows the instructor to demonstrate concepts and the use of the materials in class.

Using Java and JavaScript together allowed me to make the content dynamic: many of the examples and exercises in the Text change whenever the page is reloaded, so students can get unlimited practice at certain kinds of problems. Similarly, each student gets a different version of the problem set, but can see the solutions to his/her version after the due date.

 

 


SticiGui Demo Here

 

 


Student Reactions, and Caveats for the Instructor

 

What Some Students Know Might Surprise You

Statistics 2 is the minimal course that satisfies the university's quantitative distribution requirement. Statistics 21 is a prerequisite course for Business majors.

As such, one would not expect the students to be terribly "computer literate."

Nonetheless,

 

What Some Students Don't Know Might Surprise You

 


Expect a Mixed Reception

Some students consider using a browser "knowing all about computers." They feel that asking them to point and click is a serious imposition, and can't possibly have anything to do with learning the material.

Consider this.

 

What Students Like Most About the Approach

 

What Students Like Least

Here are the results of a survey from Fall, 1997.

 


University Support?

The technology I'm using (html, html forms, frames, Java, JavaScript) is several years old.

Nonetheless, there are very few places on the Berkeley campus where students have access to hardware and software to access to the material.

Students can access the material from the X-Window terminals in the Statistical Computing facility, but,

 


Java is not "Platform-Independent"

Java is not "Platform-Independent"

Java is not "Platform-Independent"

Java is not "Platform-Independent"
Java is not "Platform-Independent"

You need to view your material using several browsers on several platforms to have a reasonable chance that the students get what you think you put there. Small differences in Netscape and Internet Explorer standards for JavaScript confound matters. It is extremely hard to support more than one version of a browser at a time on a variety of platforms.

Java is advertised as "write once, run anywhere." The reality is "write once, debug everywhere."

Enough said.

 


Why It's Not Worth The Trouble

 

Why Bother?

 


 

Samples of Student Comments

Good

 

Mixed

 

Negative

 

 


Language Basics

What is HTML? Don't ask.

Html (hypertext markup language) is the basic language spoken by web browsers. It is similar in some ways to LaTeX, in that one specifies major structural divisions of the document (headings, lists, tables, ...), fonts, paragraph breaks, centering, etc., and leaves the formatting details to the local implementation of the software.

Html currently lacks a universal, portable, convenient way to display mathematics. I haven't found that to be a limitation in teaching introductory courses, where the mere appearance of X causes extreme anxiety, and necessitates a call to 911. I don't intend to say more than that about html.

 

What is Java? What is it good for?

Java is a programming language designed to be portable across a wide variety of platforms, including Unix machines, Macintoshes, PCs, JavaStations (a Sun trademark, I believe), and even (some) cell phones.

The portability is accomplished by separating the language into two parts: programs are compiled into "byte code" that is platform-independent. Platform dependent "Java Virtual Machines" run the byte code on different kinds of computers. That separation has a performance cost associated with it. Originally, Java ran on the order of 5 times slower than C, but the cap is closing (I think it's under a factor of 2 now).

Java is an object-oriented language, which means that aside from certain basic data types (short and long integers, booleans, single and double-precision real numbers, characters), other data types are "objects" that have "properties" and "methods" associated with them. Java is strongly "typed," which prevents programmers from making some kinds of silly mistakes.

Roger Purves quoted someone (I don't remember whom) as saying something to the effect that in a procedural language, you add 1 and 1 by saying x = 1+1, whereas in an object-oriented language, you send a message to "1" to add "1" to itself.  That's not too far from the truth.

For example, in Java, you might define an object of the type "dataset." The properties of a "dataset" might include variable names, and arrays of values of each of the variables. The methods of a "dataset" might include averages of the variables, regressions of subsets of the variables against other subsets of the variables, random sampling from the variables, etc.

Most Java programs use graphical user interfaces and are "event-driven," meaning that the programs wait for the user to do something like click the mouse, push a "button," select something from a menu, or type something. The Java program would have "methods" to handle each kind of event, including doing whatever calculations are required and updating the display to reflect the results.

One can write "free standing" applications in Java, but one of the principal virtues of Java is that the popular browsers (Netscape and Internet Explorer) incorporate Java virtual machines, so they know how to run certain kinds of Java programs, called "applets." Applets are application programs that can be downloaded over the internet from within a web page (an html page). The browsers allocate some real estate on the screen for the applet to use for its display, and pass the events (mouse clicks, etc.) that occur within the display to the Java program to be "handled."

Information can be passed to an applet in a variety of ways. The values of some parameters can be set within the html document that invokes the applet using "parameter" tags. The applet can provide areas where the user can type in data, select a file, specify a URL, etc. Finally, there are ways that the page can access "methods" of the applet. The examples I will show later do all of these things.

For example, the scatterplot applet can be invoked using "parameter" tags to specify the actual values of the dataset to display, whether to plot the data or the residuals, whether or not to display the regression line, etc. Optionally, "parameter" tags can be used to specify a file from which the applet should read its data. "Parameter" tags can be set so that there is a box in the applet for the user to type the URL of a file from which to read the data. Finally, the html can call a "setVariables" method of the applet to send it a list of data constructed on the fly, for example, using JavaScript (see below).

Here is an example of an applet:

You need Java to see this.

The html "code" in this document that invokes the applet just above is

<p align="center">
<applet code="
Bivariate.class" codebase="../Java" archive="PbsGui.zip" width="600" height="400">
<param name="files" value="
./Data/p6-1990.dat,./Data/cities.dat">
<
param name="sdLineButton" value="false">
<
param name="sdButton" value="false">
<
param name="addPoints" value="false">
<param name="urlBox" value="true">You need Java to see this.
</applet></p>

The code="Bivariate.class" tells the browser where the compiled Java byte code for my scatterplot routine is.

The codebase=".." tells the browser the (relative) directory on the host machine where the Bivariate.class file is stored.

The file "PbsGui.zip" is my library of Java byte code for various parts of the user interface (axis labeling routines, scrollbars linked to displays, number formatting routines, plots that sense the location of the mouse and report its position in natural units, pop-up listings that allow the user to select rows, and pass messages to the plots, etc. Bivariate.class also relies on various routines for sorting data, handling missing values, computing percentiles, correlation coefficients, SDs, etc.).

The 'width="..."' and 'height="..."' tell the browser how much screen real estate (in pixels) to allocate for the applet's display.

The <param name="files" ...  > tells the applet the list of files from which to read data.

The other <param ...> tags pass other information to the applet, for example, suppressing the button that allows the user to display the SD line, suppressing the user's ability to add points to the scatterplot by clicking, and adding an input box into which the user can type another URL from which to read data.

The reason I'm using Java is that it's the only way to do what I want, which is to provide a way for students to do interactive data analysis and visualization and statistical and probabilistic computations from anywhere, without learning a statistical language (point-and-click only!), and without using proprietary "plug-ins," or other complicated procedures. It would be possible to do some of what I want using "server-side includes" or cgi scripts, but the response time would be prohibitively slow, and the bandwidth required would be prohibitively large.

Using Java allows me to download automatically to a student's computer a dataset and a statistics package for displaying and analyzing it. Generating the figures, changing views, calculating probabilities, taking random samples, computing regressions, etc., all happen on the student's machine, not the server.

 

What is JavaScript? What is it good for?

JavaScript is a scripting language, spoken (although in different dialects) by Netscape and Internet Explorer. JavaScript code can be inserted directly into html documents. There is no real "standard" for JavaScript as there is for Java. Recent extensions of JavaScript include support for object-oriented programming, regular expressions, and other niceties.

JavaScript is useful for constructing parts of an html page "on the fly," conditionally on the values of various parameters, etc. For example, JavaScript can append html to an open document. Because you can insert the JavaScript anywhere in the html document, you can build an html page with parts that are different when the "state" at the time the page is visited is different. JavaScript can also be used to change the source files images are read from. Html forms are based on JavaScript, so if you have ever written a page that uses forms, you have used JavaScript.

JavaScript can be used to open new pages, close pages, reload pages, etc. JavaScript can also be used to communicate with applets on a page, as described above.

In the material I will show today, JavaScript is used to construct examples that change each time a page is visited, to pass randomly generated data to applets, to query applets for answers to certain computations, to compute various statistics, to check students' answers to practice problems and problem sets, to control access to problem sets (in conjunction with an applet that reads a due date file from the web), and for certain demonstrations (such as the "Let's Make a Deal" problem). It is also used to generate html used repeatedly in different documents, so that I can control the appearance of many of the pages from a central location.

Here is an example of using JavaScript (including an html form) to make a page interactive:

How many people are here today?

 

The form just above and the response are generated with the following JavaScript code:

<form method="POST">
<script language="JavaScript"><!--
function howMany(n) {
if (n > 10) {
            if (confirm("Gee, Sacha, can you believe that " + n.toString() +
                                 " people care what your dad has to say?") ) {
                    alert("And some of them are still awake!");
           }
}
else { alert("How depressing!"); }
}
// -->
<
/script>
<
div align="center"><center><p><strong>How many people are here today? </strong>
<
input type="text" size="10" onchange="howMany(value);"> </p>
<
/center></div></form>

JavaScript and Java have complementary functionality; using both together allows a great deal of flexibility.

Here is an example of using JavaScript to send data to an applet; in this case, just the value of a correlation coefficient to use in generating a random bivariate normal scatterplot:

What would you like r to be?

You need Java to see this.

The html, JavaScript, and Java used to generate this is:

<form method="post">
<
div align="center"><center><p><strong>What would you like <em>r</em> to be? <input
type="text" name="anotherInput" size="8" onchange="validateAndSetR(value)">
<input type="button" value="Make it random" name="theButton
" onclick="randomR();"></p>
</center></div></form>

<
p align="center">
<
applet code="Correlation.class" codebase="../Java" archive="PbsGui.zip" width="550" height="400" name="scatterApplet">
<
param name="title" value="JavaScript can talk to Java">
<
param name="showR" value="false">
You need Java to see this.
</applet>
</p>
<
script language="JavaScript"><!--
var theApplet = document.applets.length - 1;
function validateAndSetR(v) {
      var r = parseFloat(v);   // see if v starts with a parsable number
      if (r < -1) {r = -1;}    // r cannot be less than -1
      else if (r > 1) {r = 1;} // or greater than 1
      else if (isNaN(r)) {r = 0;} // if v aint parsable, make it zero
      document.forms[1].elements[0].value = (r.toString()).substring(0,8); // replace displayed value
      document.applets[theApplet].setR(r);  // display a
scatterplot with this r
}
function randomR() {
      var r = 2*(Math.random()-0.5);
      validateAndSetR(r.toString());
}
// -->
</script>

 

What are Cookies? What are they good for?

"Cookies" are small text strings associated with a document or a directory on the web. Cookies are stored on the client's hard drive. They can be written and read using JavaScript. They are a way of having persistent "state" as the user navigates among pages: the programmer can test whether a cookie has been set, and what its contents are, so the page can be "personalized" for the user. In the material I will show today, cookies are to store identifying information needed to submit a problem set, to ensure that a student navigated to a page along the right route, and to store the student's answers to the problem set on his or her machine when the problem set is submitted for grading. That way, if something goes wrong with the network while the student's answers to the problem set are in transit from the client to the server, the student can automatically recall his or her answers and re-submit them, without having to start from scratch. The cookies I use are encrypted to protect students' privacy.

The browsers impose limits on the length of individual cookies, the total number of cookies, and the number of cookies from each web domain.


1998, P.B. Stark. All rights reserved.
Last modified 28 October 1998.