## Basic Goals

The basic idea is to teach introductory statistics using the world-wide web for all aspects of the course: text, labs, problem sets, practice exams, etc.

I want to use the internet not to deliver traditional material at a distance, but rather to involve the student in interactive exercises and demonstrations that help him or her develop understanding, without requiring the student to learn a statistics package.

I want to have reproducible lectures in a sense similar to how Claerbout and Donoho describe reproducible research: the student can reproduce lecture demonstrations, illustrations, calculations, etc., anywhere, anytime. The software is a click away, and its parameter settings are documented.

I believe that some (maybe even many) concepts in introductory statistics can be taught better using web-based materials than simply using traditional lecture methods and "dead-tree" books.

## Level of the Materials

The materials represent a compromise between a "statistical literacy" course, such as the excellent book Statistics, by Freedman, Pisani, and Purves (Norton, 1996, FPP), and a more traditional "statistical methods" course, such as the book Introduction to the Practice of Statistics by Moore and McCabe. The material has somewhat more computational content than FPP, but the computations are motivated by inference problems. Probability, hypothesis testing, randomization, and sampling error, are woven into the discussion of experiments and sample surveys.

## Technical Design Criteria

SticiGui© is implemented in HTML, Java, and JavaScript. That choice was motivated by these design criteria:

1. Maximize accessibility and portability. Recent browsers allow this material to be accessed from almost anywhere in the world, without adding "plug-ins" to the browser, and without buying any proprietary software. The software runs under every major operating system (UNIX, LINUX, Windows, Mac), because Netscape and/or Microsoft have versions of their browsers that run on those operating systems.

2. Maximize interactivity and minimize technological barriers. I want the students to be able to explore data and to ask and answer "what-if" questions, without needing to learn how to use a conventional statistical software package. Using Java enabled me to develop point-and-click tools whose use was fairly obvious (no hidden menus, consistent GUI, etc.). Moreover, if a "plug-in" were required, downloading the "plug-in" would present a considerable barrier to some students. Browsers come installed on all new personal computers, so this material is immediately accessible.

3. Minimize bandwidth and maximize speed. Using Java allow the figures and plots to be generated on the client-side. The code and data download to the client, then the client computes and creates the figures. This also is by far the most efficient way to get dynamic interaction with the data. Otherwise, every time the user changed a parameter value, the client would need to send a message to the server, and the server would have to compute the new figure, and send the resulting figure over the internet to the client. Interactive real-time data exploration would not be possible. There are a small number of figures in SticiGui© that are stored as GIF or JPG files; almost all the figures are computed by the client. Sending just the data and the rules (programs) for generating figures from the data substantially reduces the time it takes pages to load.

4. Make it easy to use the materials in lectures. Because the software is free-standing (it does not need a server for computations), it is easy to display the content in the classroom without an internet connection. That allows the instructor to demonstrate concepts and the use of the materials in class.

Using Java and JavaScript together allowed me to make the content dynamic: many of the examples and exercises in the Text change whenever the page is reloaded, so students can get unlimited practice at certain kinds of problems. Similarly, each student gets a different version of the problem set, but can see the solutions to his/her version after the due date.

• Division into Text/Glossary, Problem Sets, Review Materials, Grade Query

• The Java tools for data analysis and visualization, calculations, and demonstrations

• An example of dynamic examples and exercises; a brief tour of some chapters.

• An example of problem sets before and after the due date---individualized problems, individualized solutions.

## Student Reactions, and Caveats for the Instructor

### What Some Students Know Might Surprise You

Statistics 2 is the minimal course that satisfies the university's quantitative distribution requirement. Statistics 21 is a prerequisite course for Business majors.

As such, one would not expect the students to be terribly "computer literate."

Nonetheless,

• over 1/2 of my class regularly accesses the class material and online labs from home, dormitory, or off-campus.

• many have their own accounts with ISPs such as AOL and CompuServe.

### What Some Students Don't Know Might Surprise You

• that you can resize windows, and how.

• that you can move windows, and how.

• that the presence of a scrollbar means there's more to see

• how to use scroll bars

• that something underlined is a link

• how to "select" more than one item from a list in a form.

• how to "select" non-contiguous items from a list in a form.

## Expect a Mixed Reception

Some students consider using a browser "knowing all about computers." They feel that asking them to point and click is a serious imposition, and can't possibly have anything to do with learning the material.

### What Students Like Most About the Approach

• Having lecture notes online, especially if they can print them out before lecture

• Getting their problem set scores instantly

• Being able to "experiment" with the applets in the book

• Online practice exams that are self-scoring

• The ability to do the computer-based problem sets from anywhere.

• The ability to see the solutions to the problem sets from anywhere, after the due date

• The glossary.

### What Students Like Least

• I dim the lights to project the material; they fall asleep.

• Many students complain that it is unfair to offer material on the web because not everyone has a home computer.

• Competition for the few general UCB modem lines limits off-campus access by those who do not subscribe to an ISP or live in a dorm with ethernet.

• Some feel that using a browser is "knowing all about computers," and is too much to expect in a non-technical course.

• The network connections bog down considerably the night before a lab is due.

## University Support?

The technology I'm using (html, html forms, frames, Java, JavaScript) is several years old.

Nonetheless, there are very few places on the Berkeley campus where students have access to hardware and software to access to the material.

Students can access the material from the X-Window terminals in the Statistical Computing facility, but,

• The terminals are monochrome, which requires gymnastics on my part (special color choices and symbol shapes that render visibly differently---all colors are mapped to white or black); even with a great deal of effort on my part, the result is a poor substitute for color.

• Netscape on Unix boxes with monochrome monitors is very buggy: choice menus are invisible, for example

• When 30 students are all downloading 3000-point data sets the day before a lab is due, things bog down.

• The facility is open only 9am-5pm.

## Java is not "Platform-Independent"

### Java is not "Platform-Independent"

#### Java is not "Platform-Independent"

##### Java is not "Platform-Independent"
###### Java is not "Platform-Independent"

You need to view your material using several browsers on several platforms to have a reasonable chance that the students get what you think you put there. Small differences in Netscape and Internet Explorer standards for JavaScript confound matters. It is extremely hard to support more than one version of a browser at a time on a variety of platforms.

Java is advertised as "write once, run anywhere." The reality is "write once, debug everywhere."

Enough said.

### Why It's Not Worth The Trouble

• It's a lot of trouble. I can count on my fingers the nights in which I've gotten more than 4 hours of sleep in the terms I have been developing this material.

• Some fraction of the students will hate it; your teaching evaluations will suffer.

### Why Bother?

• You get more class time for instruction (well, modulo spending 15 minutes of each class explaining to the students how to use the browser, to clear cache, to clear cookies, to use scrollbars, to select multiple items from a list, ... I'm hopeful that these things won't be an issue within a couple of years).

• After it's set up, teaching the class the next time is easy (yeah, right... I've probably spent more time working on this than I would otherwise have spent in a lifetime of lecture preparation).

• It's trendy?

• If you are pessimistic about the economic future of the University, this might ultimately be the most economical way to teach large classes.

• Once you know Perl, HTML, Java, and JavaScript, you can get a job "in the real world" with a real salary!

### Good

• It was great! Very convenient, and easy to use! Thanks for all the extra work it really helps! I especially liked looking at the graphs and questions simultaneously, and not having to write it all out. The graphs are the best part .
• I really liked being able to do my homework directly on the web. Overall it was a great experience....
• It is really helpful that you have the labs on-line. I work about 20 hours a week and barely have time for homework until the weekend. Thank you for being thoughtful. If you need help putting things on-line, please let me know.
• I would just like to let you know that I appriciate how much time you put into our stat class.
• I was really pleased to see that we would be able to send in our homework over the internet to our readers - it made things a whole lot easier for me. The last three assignments that I have sent in however, have not been returned to me in discussion. I only can assume that for one reason or another, they didn't make it through.
• I finished this lab in the amount of time it took me just to try to get into the stat lab for our last lab. Also, netscape is aesthetically superior to the stat server. The only thing I don't like about this format is that I worry I might have accidentally missed a bubble I was supposed to fill in.
• The only thing I really miss ... is that there is a TA there just in case I have no idea what I am doing. Doing it through the web is a lot easier and I am greatful for the time and effort put into it....
• I liked doing my lab at home much better .... I liked being able to fiddle around with the plots though, adding SD lines and residual plots and such. I also liked this type of write-up where we just clicked on the answer we wanted. Overall, I liked doing the lab this way much better.
• I find that having the lab on t/he web is much more convenient; I can work at my own pace, and there is less pressure to finish in a given amount of time than there is at the computer facilities (when sometimes there are students waiting to use the computers)
• Running this program tended to crash all the time so I'd have to sit through a couple restarts and start over each time. But that's not your fault. But w/o the crashes, it was mighty convenient and pretty cool to use. Keep up the good work! (imagine me saying that to a professor, but you know what I mean.) =)
• This method is particularly good through the web because I can do the lab at home. I really didn't have too much difficulty, and Netscape is much easier to use than the Unix system we have in the Statistics Lab ... I was wondering if there was any way I could get to the bulletin board and chatroom if I'm not through the Berkeley server?
• ...even before I read the last section, I was getting a clear idea of the effect of increasing the sample size... accomplishing the point of actually understanding the effect of large sample sizes.
• I liked this lab much more .... i actually feel like i learned better control over the concepts. good job! ...this lab was much easier to use and the extra instructions made it much more helpful in learning the concepts. thanks.

### Mixed

• I found the lab particularly difficult to do at first ...Computer fluency is not mentioned as a requirement for this class, but this lab expects that of us. .... once I got the hang of manipulating the charts, they helped alot. This lab was a good way to make sure we understand the concepts.
• If the computer facility itself in the Statistics department was better, this lab would have been much easier.
• In general it worked great--my only complaint is that the applet took forever to load.
• i think the internet idea is really cool. but i just have hard time setting up the java script to work in my computer. doing lab on internet however is very tideous. the screen was too small. the constant switching back and forth between different windows gets very annoying.
• I am often compelled to leave your lectures early because I am unable to stay awake in them. I truly want to stay alert and hear what you have to say, but it is difficult because I don't feel that I am getting much out of them. I already have your lecture notes from the Web ahead of time, which may be a part of the problem. Plus, you shouldn't dim the lights more than necessary ...
• It took a while to figure out and get comfortable with doing the lab on netscape. But after I did get comfortable , it wasn't too bad.

### Negative

• The program is done poorly. It doesn't let you move around easily. There is no room for mistakes. The concepts are too difficult for a class filled with mostly psych majors.
• putting the lab on the internet is bothersome to me because it's hard staring at a computer screen for long periods of time and also it's a pain to click everywhere just to find out the information that you want to know. getting over here is a pain too.
• AGAIN, THIS WAS VERY INCONVIENIENT AND TIME CONSUMING. I THINK THESE LABS, WHICH MUST BE DONE AT A CAMPUS-BASED COMPUTER SYSTEM ON THE INTERNET ARE BIASED TOWARD THOSE STUDENTS WHO HAVE THE FINANCIAL ACCESS TO HAVE THEIR OWN COMPUTERS AT HOME, WHICH I DO NOT. I ALSO THINK THEY ARE BIASED TOWARD THE TYPICAL CAL STUDEN T WHO LIVES ON CAMPUS. I LIVE VERY FAR AWAY AND IT IS HARD FOR ME TO REARRANGE MY SCHEDULE AND FIND TIME TO MAKE IT TO CAMPUS TO DO THESE LABS. I PERSONALLY THINK THEY ARE FAR ABD BEYOND WHAT IS REQUIRED FOR THIS CLASS.
• i still am not able to get much out of your lectures. i find them confusing, boring, and hardly recognizable compared to the text. no offense professor, i know that you work hard at the aplets, lecture notes, etc., but it just isn't working for me. i find that the text is the best way for me to learn the material and understand it.
• ... I really dislike doing labs on the computer. I would really prefer to be able to do them on paper.
• It was kind of hard to switch back and forth between the place where we type our answers to the place where the graphs are. It just takes longer. I was unable to ask a TA questions because I could not fit in a time in my schedule when I could go to the Stat computer lab. I could only go after hours to the basement of Evans. It is kind of inconvenient. Not everyone is so computer-oriented.
• I much rather enjoyed doing the lab when I was able to write my answeres out rather having to worry about submitting them and having problems with the computer and reading data. ... I thought it was was interesting to have the graphs to work with, but for some reason this was extremely difficult on these computers. All in all, I did not like diong the lab on these computers. I do not have the ability to do this at home and I hate that I have to come here with hundreds of other students at the same time.
• I feel that the use of the Java histograms and such are not especially those to use. I think it would be better that instead of relying so much on your computer you should write more on the board. Your handwriting is not nearly as bad as a majority of professors on this campus. The other little "complaint" I had was the labs on the internet. Maybe it is just my problem but I do not have access to the internet and much less to a Netscape that is 3.0 to run Java. I think that the dependency on the computer (though I thank you for trying to get us updated in the cyberspace-whatever they call it-world) is not conducive to the learning of Statistics. Your notes on the web page are useful but turning in work this way is unfair to those of us that truly do not have all this technology available to us at 3 a.m. when Evans is closed.
• This lab was confusing because each time you logged on your samples were different, so you couldn't look at the lab once and then go back to it, you had to do it right off . We should have been told in advance that each time we log on the lab would change. It was also tough because you could not verify your work because if you tried to go back chances are you would get a different sample of answers. I still think that internet labs are time consuming and frustrating.
• I know we are trying to get into new technology, and the way of the the future, but, I think I missed that generation. I have no problem admitting that third graders are more computer literate than I am, and I know I should get to be more computer friendly, but I have a very difficult time with these labs. I think what you are doing is wonderful, but, it totally stresses me out. I spent all of Friday night (I know, I'm a wild and crazy college kid) in my friends room who has internet access, with another student, struggling throught this lab. The three of us spent 2 hours working on it, and when we got through it once, decided to call it quits for the night and take a break. When one girl tried to do it on Sunday night, she found that no longer was the data the same. Each time she restarted the computer, new information came up. To say the least, this threw us all into a panic . With midterms and the various other previous engagements we all had, we were completely stressed to find out that the lab we had worked on, and spent our Friday night doing, ich we thought we had completed, had to be redone!! I know that you are trying to prevent cheating, but if you could have warned us, it would have saved each of us a great deal of time. This was very discouraging finding out that we could not help each other out and work together to do this lab. Worse yet, I went back and tried to check my answers and each time the computer would draw a brand new sample, I know each sample is different, but this was extremely frustrating that I could not go back and check my answers!!!!!
• Doing the labs this way sucks because we can't see everything and because it is sooooo easy to loose all of your information. there has to be a way to save information or to have a hard copy of it so that all your work is not lost. besides, i don't really trust technology and i always worry after i hit the send button because i'm unsure of where it actually went.

## What is HTML? Don't ask.

Html (hypertext markup language) is the basic language spoken by web browsers. It is similar in some ways to LaTeX, in that one specifies major structural divisions of the document (headings, lists, tables, ...), fonts, paragraph breaks, centering, etc., and leaves the formatting details to the local implementation of the software.

Html currently lacks a universal, portable, convenient way to display mathematics. I haven't found that to be a limitation in teaching introductory courses, where the mere appearance of X causes extreme anxiety, and ß necessitates a call to 911. I don't intend to say more than that about html.

## What is Java? What is it good for?

Java is a programming language designed to be portable across a wide variety of platforms, including Unix machines, Macintoshes, PCs, JavaStations (a Sun trademark, I believe), and even (some) cell phones.

The portability is accomplished by separating the language into two parts: programs are compiled into "byte code" that is platform-independent. Platform dependent "Java Virtual Machines" run the byte code on different kinds of computers. That separation has a performance cost associated with it. Originally, Java ran on the order of 5 times slower than C, but the cap is closing (I think it's under a factor of 2 now).

Java is an object-oriented language, which means that aside from certain basic data types (short and long integers, booleans, single and double-precision real numbers, characters), other data types are "objects" that have "properties" and "methods" associated with them. Java is strongly "typed," which prevents programmers from making some kinds of silly mistakes.

Roger Purves quoted someone (I don't remember whom) as saying something to the effect that in a procedural language, you add 1 and 1 by saying x = 1+1, whereas in an object-oriented language, you send a message to "1" to add "1" to itself.  That's not too far from the truth.

For example, in Java, you might define an object of the type "dataset." The properties of a "dataset" might include variable names, and arrays of values of each of the variables. The methods of a "dataset" might include averages of the variables, regressions of subsets of the variables against other subsets of the variables, random sampling from the variables, etc.

Most Java programs use graphical user interfaces and are "event-driven," meaning that the programs wait for the user to do something like click the mouse, push a "button," select something from a menu, or type something. The Java program would have "methods" to handle each kind of event, including doing whatever calculations are required and updating the display to reflect the results.

One can write "free standing" applications in Java, but one of the principal virtues of Java is that the popular browsers (Netscape and Internet Explorer) incorporate Java virtual machines, so they know how to run certain kinds of Java programs, called "applets." Applets are application programs that can be downloaded over the internet from within a web page (an html page). The browsers allocate some real estate on the screen for the applet to use for its display, and pass the events (mouse clicks, etc.) that occur within the display to the Java program to be "handled."

Information can be passed to an applet in a variety of ways. The values of some parameters can be set within the html document that invokes the applet using "parameter" tags. The applet can provide areas where the user can type in data, select a file, specify a URL, etc. Finally, there are ways that the page can access "methods" of the applet. The examples I will show later do all of these things.

For example, the scatterplot applet can be invoked using "parameter" tags to specify the actual values of the dataset to display, whether to plot the data or the residuals, whether or not to display the regression line, etc. Optionally, "parameter" tags can be used to specify a file from which the applet should read its data. "Parameter" tags can be set so that there is a box in the applet for the user to type the URL of a file from which to read the data. Finally, the html can call a "setVariables" method of the applet to send it a list of data constructed on the fly, for example, using JavaScript (see below).

Here is an example of an applet:

You need Java to see this.

The html "code" in this document that invokes the applet just above is

<p align="center">
<applet code="
Bivariate.class" codebase="../Java" archive="PbsGui.zip" width="600" height="400">
<param name="files" value="
./Data/p6-1990.dat,./Data/cities.dat">
<
param name="sdLineButton" value="false">
<
param name="sdButton" value="false">
<
<param name="urlBox" value="true">You need Java to see this.
</applet></p>

The code="Bivariate.class" tells the browser where the compiled Java byte code for my scatterplot routine is.

The codebase=".." tells the browser the (relative) directory on the host machine where the Bivariate.class file is stored.

The file "PbsGui.zip" is my library of Java byte code for various parts of the user interface (axis labeling routines, scrollbars linked to displays, number formatting routines, plots that sense the location of the mouse and report its position in natural units, pop-up listings that allow the user to select rows, and pass messages to the plots, etc. Bivariate.class also relies on various routines for sorting data, handling missing values, computing percentiles, correlation coefficients, SDs, etc.).

The 'width="..."' and 'height="..."' tell the browser how much screen real estate (in pixels) to allocate for the applet's display.

The <param name="files" ...  > tells the applet the list of files from which to read data.

The other <param ...> tags pass other information to the applet, for example, suppressing the button that allows the user to display the SD line, suppressing the user's ability to add points to the scatterplot by clicking, and adding an input box into which the user can type another URL from which to read data.

The reason I'm using Java is that it's the only way to do what I want, which is to provide a way for students to do interactive data analysis and visualization and statistical and probabilistic computations from anywhere, without learning a statistical language (point-and-click only!), and without using proprietary "plug-ins," or other complicated procedures. It would be possible to do some of what I want using "server-side includes" or cgi scripts, but the response time would be prohibitively slow, and the bandwidth required would be prohibitively large.

Using Java allows me to download automatically to a student's computer a dataset and a statistics package for displaying and analyzing it. Generating the figures, changing views, calculating probabilities, taking random samples, computing regressions, etc., all happen on the student's machine, not the server.

## What is JavaScript? What is it good for?

JavaScript is a scripting language, spoken (although in different dialects) by Netscape and Internet Explorer. JavaScript code can be inserted directly into html documents. There is no real "standard" for JavaScript as there is for Java. Recent extensions of JavaScript include support for object-oriented programming, regular expressions, and other niceties.

JavaScript is useful for constructing parts of an html page "on the fly," conditionally on the values of various parameters, etc. For example, JavaScript can append html to an open document. Because you can insert the JavaScript anywhere in the html document, you can build an html page with parts that are different when the "state" at the time the page is visited is different. JavaScript can also be used to change the source files images are read from. Html forms are based on JavaScript, so if you have ever written a page that uses forms, you have used JavaScript.

JavaScript can be used to open new pages, close pages, reload pages, etc. JavaScript can also be used to communicate with applets on a page, as described above.

In the material I will show today, JavaScript is used to construct examples that change each time a page is visited, to pass randomly generated data to applets, to query applets for answers to certain computations, to compute various statistics, to check students' answers to practice problems and problem sets, to control access to problem sets (in conjunction with an applet that reads a due date file from the web), and for certain demonstrations (such as the "Let's Make a Deal" problem). It is also used to generate html used repeatedly in different documents, so that I can control the appearance of many of the pages from a central location.

Here is an example of using JavaScript (including an html form) to make a page interactive:

How many people are here today?

The form just above and the response are generated with the following JavaScript code:

<form method="POST">
<script language="JavaScript"><!--
function howMany(n) {
if (n > 10) {
if (confirm("Gee, Sacha, can you believe that " + n.toString() +
alert("And some of them are still awake!");
}
}
}
// -->
<
/script>
<
div align="center"><center><p><strong>How many people are here today? </strong>
<
input type="text" size="10" onchange="howMany(value);"> </p>
<
/center></div></form>

JavaScript and Java have complementary functionality; using both together allows a great deal of flexibility.

Here is an example of using JavaScript to send data to an applet; in this case, just the value of a correlation coefficient to use in generating a random bivariate normal scatterplot:

What would you like r to be?

You need Java to see this.

The html, JavaScript, and Java used to generate this is:

<form method="post">
<
div align="center"><center><p><strong>What would you like <em>r</em> to be? <input
type="text" name="anotherInput" size="8" onchange="validateAndSetR(value)">
<input type="button" value="Make it random" name="theButton
" onclick="randomR();"></p>
</center></div></form>

<
p align="center">
<
applet code="Correlation.class" codebase="../Java" archive="PbsGui.zip" width="550" height="400" name="scatterApplet">
<
param name="title" value="JavaScript can talk to Java">
<
param name="showR" value="false">
You need Java to see this.
</applet>
</p>
<
script language="JavaScript"><!--
var theApplet = document.applets.length - 1;
function validateAndSetR(v) {
var r = parseFloat(v);   // see if v starts with a parsable number
if (r < -1) {r = -1;}    // r cannot be less than -1
else if (r > 1) {r = 1;} // or greater than 1
else if (isNaN(r)) {r = 0;} // if v aint parsable, make it zero
document.forms[1].elements[0].value = (r.toString()).substring(0,8); // replace displayed value
document.applets[theApplet].setR(r);  // display a
scatterplot with this r
}
function randomR() {
var r = 2*(Math.random()-0.5);
validateAndSetR(r.toString());
}
// -->
</script>

## What are Cookies? What are they good for?

"Cookies" are small text strings associated with a document or a directory on the web. Cookies are stored on the client's hard drive. They can be written and read using JavaScript. They are a way of having persistent "state" as the user navigates among pages: the programmer can test whether a cookie has been set, and what its contents are, so the page can be "personalized" for the user. In the material I will show today, cookies are to store identifying information needed to submit a problem set, to ensure that a student navigated to a page along the right route, and to store the student's answers to the problem set on his or her machine when the problem set is submitted for grading. That way, if something goes wrong with the network while the student's answers to the problem set are in transit from the client to the server, the student can automatically recall his or her answers and re-submit them, without having to start from scratch. The cookies I use are encrypted to protect students' privacy.

The browsers impose limits on the length of individual cookies, the total number of cookies, and the number of cookies from each web domain.