Language basics: HTML, Java, JavaScript, Cookies.
The basic idea is to teach introductory statistics using the world-wide web for all aspects of the course: text, labs, problem sets, practice exams, etc.
I want to use the internet not to deliver traditional material at a distance, but rather to involve the student in interactive exercises and demonstrations that help him or her develop understanding, without requiring the student to learn a statistics package.
I want to have reproducible lectures in a sense similar to how Claerbout and Donoho describe reproducible research: the student can reproduce lecture demonstrations, illustrations, calculations, etc., anywhere, anytime. The software is a click away, and its parameter settings are documented.
I believe that some (maybe even many) concepts in introductory statistics can be taught better using web-based materials than simply using traditional lecture methods and "dead-tree" books.
The materials represent a compromise between a "statistical literacy" course, such as the excellent book Statistics, by Freedman, Pisani, and Purves (Norton, 1996, FPP), and a more traditional "statistical methods" course, such as the book Introduction to the Practice of Statistics by Moore and McCabe. The material has somewhat more computational content than FPP, but the computations are motivated by inference problems. Probability, hypothesis testing, randomization, and sampling error, are woven into the discussion of experiments and sample surveys.
SticiGui© is implemented in HTML, Java, and JavaScript. That choice was motivated by these design criteria:
Maximize accessibility and portability. Recent browsers allow this material to be accessed from almost anywhere in the world, without adding "plug-ins" to the browser, and without buying any proprietary software. The software runs under every major operating system (UNIX, LINUX, Windows, Mac), because Netscape and/or Microsoft have versions of their browsers that run on those operating systems.
Maximize interactivity and minimize technological barriers. I want the students to be able to explore data and to ask and answer "what-if" questions, without needing to learn how to use a conventional statistical software package. Using Java enabled me to develop point-and-click tools whose use was fairly obvious (no hidden menus, consistent GUI, etc.). Moreover, if a "plug-in" were required, downloading the "plug-in" would present a considerable barrier to some students. Browsers come installed on all new personal computers, so this material is immediately accessible.
Minimize bandwidth and maximize speed. Using Java allow the figures and plots to be generated on the client-side. The code and data download to the client, then the client computes and creates the figures. This also is by far the most efficient way to get dynamic interaction with the data. Otherwise, every time the user changed a parameter value, the client would need to send a message to the server, and the server would have to compute the new figure, and send the resulting figure over the internet to the client. Interactive real-time data exploration would not be possible. There are a small number of figures in SticiGui© that are stored as GIF or JPG files; almost all the figures are computed by the client. Sending just the data and the rules (programs) for generating figures from the data substantially reduces the time it takes pages to load.
Make it easy to use the materials in lectures. Because the software is free-standing (it does not need a server for computations), it is easy to display the content in the classroom without an internet connection. That allows the instructor to demonstrate concepts and the use of the materials in class.
Using Java and JavaScript together allowed me to make the content dynamic: many of the examples and exercises in the Text change whenever the page is reloaded, so students can get unlimited practice at certain kinds of problems. Similarly, each student gets a different version of the problem set, but can see the solutions to his/her version after the due date.
Division into Text/Glossary, Problem Sets, Review Materials, Grade Query
The Java tools for data analysis and visualization, calculations, and demonstrations
An example of dynamic examples and exercises; a brief tour of some chapters.
An example of problem sets before and after the due date---individualized problems, individualized solutions.
Statistics 2 is the minimal course that satisfies the university's quantitative distribution requirement. Statistics 21 is a prerequisite course for Business majors.
As such, one would not expect the students to be terribly "computer literate."
Nonetheless,
over 1/2 of my class regularly accesses the class material and online labs from home, dormitory, or off-campus.
many have their own accounts with ISPs such as AOL and CompuServe.
that you can resize windows, and how.
that you can move windows, and how.
that the presence of a scrollbar means there's more to see
how to use scroll bars
that something underlined is a link
how to "select" more than one item from a list in a form.
how to "select" non-contiguous items from a list in a form.
Some students consider using a browser "knowing all about computers." They feel that asking them to point and click is a serious imposition, and can't possibly have anything to do with learning the material.
Having lecture notes online, especially if they can print them out before lecture
Getting their problem set scores instantly
Being able to "experiment" with the applets in the book
Online practice exams that are self-scoring
The ability to do the computer-based problem sets from anywhere.
The ability to see the solutions to the problem sets from anywhere, after the due date
The glossary.
I dim the lights to project the material; they fall asleep.
Many students complain that it is unfair to offer material on the web because not everyone has a home computer.
Competition for the few general UCB modem lines limits off-campus access by those who do not subscribe to an ISP or live in a dorm with ethernet.
Some feel that using a browser is "knowing all about computers," and is too much to expect in a non-technical course.
The network connections bog down considerably the night before a lab is due.
Here are the results of a survey from Fall, 1997.
The technology I'm using (html, html forms, frames, Java, JavaScript) is several years old.
Nonetheless, there are very few places on the Berkeley campus where students have access to hardware and software to access to the material.
Students can access the material from the X-Window terminals in the Statistical Computing facility, but,
The terminals are monochrome, which requires gymnastics on my part (special color choices and symbol shapes that render visibly differently---all colors are mapped to white or black); even with a great deal of effort on my part, the result is a poor substitute for color.
Netscape on Unix boxes with monochrome monitors is very buggy: choice menus are invisible, for example
When 30 students are all downloading 3000-point data sets the day before a lab is due, things bog down.
The facility is open only 9am-5pm.
You need to view your material using several browsers on several platforms to have a reasonable chance that the students get what you think you put there. Small differences in Netscape and Internet Explorer standards for JavaScript confound matters. It is extremely hard to support more than one version of a browser at a time on a variety of platforms.
Java is advertised as "write once, run anywhere." The reality is "write once, debug everywhere."
Enough said.
It's a lot of trouble. I can count on my fingers the nights in which I've gotten more than 4 hours of sleep in the terms I have been developing this material.
Some fraction of the students will hate it; your teaching evaluations will suffer.
You get more class time for instruction (well, modulo spending 15 minutes of each class explaining to the students how to use the browser, to clear cache, to clear cookies, to use scrollbars, to select multiple items from a list, ... I'm hopeful that these things won't be an issue within a couple of years).
After it's set up, teaching the class the next time is easy (yeah, right... I've probably spent more time working on this than I would otherwise have spent in a lifetime of lecture preparation).
It's trendy?
If you are pessimistic about the economic future of the University, this might ultimately be the most economical way to teach large classes.
Once you know Perl, HTML, Java, and JavaScript, you can get a job "in the real world" with a real salary!
Html (hypertext markup language) is the basic language spoken by web browsers. It is similar in some ways to LaTeX, in that one specifies major structural divisions of the document (headings, lists, tables, ...), fonts, paragraph breaks, centering, etc., and leaves the formatting details to the local implementation of the software.
Html currently lacks a universal, portable, convenient way to display mathematics. I haven't found that to be a limitation in teaching introductory courses, where the mere appearance of X causes extreme anxiety, and ß necessitates a call to 911. I don't intend to say more than that about html.
Java is a programming language designed to be portable across a wide variety of platforms, including Unix machines, Macintoshes, PCs, JavaStations (a Sun trademark, I believe), and even (some) cell phones.
The portability is accomplished by separating the language into two parts: programs are compiled into "byte code" that is platform-independent. Platform dependent "Java Virtual Machines" run the byte code on different kinds of computers. That separation has a performance cost associated with it. Originally, Java ran on the order of 5 times slower than C, but the cap is closing (I think it's under a factor of 2 now).
Java is an object-oriented language, which means that aside from certain basic data types (short and long integers, booleans, single and double-precision real numbers, characters), other data types are "objects" that have "properties" and "methods" associated with them. Java is strongly "typed," which prevents programmers from making some kinds of silly mistakes.
Roger Purves quoted someone (I don't remember whom) as saying something to the effect that in a procedural language, you add 1 and 1 by saying x = 1+1, whereas in an object-oriented language, you send a message to "1" to add "1" to itself. That's not too far from the truth.
For example, in Java, you might define an object of the type "dataset." The properties of a "dataset" might include variable names, and arrays of values of each of the variables. The methods of a "dataset" might include averages of the variables, regressions of subsets of the variables against other subsets of the variables, random sampling from the variables, etc.
Most Java programs use graphical user interfaces and are "event-driven," meaning that the programs wait for the user to do something like click the mouse, push a "button," select something from a menu, or type something. The Java program would have "methods" to handle each kind of event, including doing whatever calculations are required and updating the display to reflect the results.
One can write "free standing" applications in Java, but one of the principal virtues of Java is that the popular browsers (Netscape and Internet Explorer) incorporate Java virtual machines, so they know how to run certain kinds of Java programs, called "applets." Applets are application programs that can be downloaded over the internet from within a web page (an html page). The browsers allocate some real estate on the screen for the applet to use for its display, and pass the events (mouse clicks, etc.) that occur within the display to the Java program to be "handled."
Information can be passed to an applet in a variety of ways. The values of some parameters can be set within the html document that invokes the applet using "parameter" tags. The applet can provide areas where the user can type in data, select a file, specify a URL, etc. Finally, there are ways that the page can access "methods" of the applet. The examples I will show later do all of these things.
For example, the scatterplot applet can be invoked using "parameter" tags to specify the actual values of the dataset to display, whether to plot the data or the residuals, whether or not to display the regression line, etc. Optionally, "parameter" tags can be used to specify a file from which the applet should read its data. "Parameter" tags can be set so that there is a box in the applet for the user to type the URL of a file from which to read the data. Finally, the html can call a "setVariables" method of the applet to send it a list of data constructed on the fly, for example, using JavaScript (see below).
Here is an example of an applet:
The html "code" in this document that invokes the applet just above is
<p align="center">The code="Bivariate.class" tells the browser where the compiled Java byte code for my scatterplot routine is.
The codebase=".." tells the browser the (relative) directory on the host machine where the Bivariate.class file is stored.
The file "PbsGui.zip" is my library of Java byte code for various parts of the user interface (axis labeling routines, scrollbars linked to displays, number formatting routines, plots that sense the location of the mouse and report its position in natural units, pop-up listings that allow the user to select rows, and pass messages to the plots, etc. Bivariate.class also relies on various routines for sorting data, handling missing values, computing percentiles, correlation coefficients, SDs, etc.).
The 'width="..."' and 'height="..."' tell the browser how much screen real estate (in pixels) to allocate for the applet's display.
The <param name="files" ... > tells the applet the list of files from which to read data.
The other <param ...> tags pass other information to the applet, for example, suppressing the button that allows the user to display the SD line, suppressing the user's ability to add points to the scatterplot by clicking, and adding an input box into which the user can type another URL from which to read data.
The reason I'm using Java is that it's the only way to do what I want, which is to provide a way for students to do interactive data analysis and visualization and statistical and probabilistic computations from anywhere, without learning a statistical language (point-and-click only!), and without using proprietary "plug-ins," or other complicated procedures. It would be possible to do some of what I want using "server-side includes" or cgi scripts, but the response time would be prohibitively slow, and the bandwidth required would be prohibitively large.
Using Java allows me to download automatically to a student's computer a dataset and a statistics package for displaying and analyzing it. Generating the figures, changing views, calculating probabilities, taking random samples, computing regressions, etc., all happen on the student's machine, not the server.
JavaScript is a scripting language, spoken (although in different dialects) by Netscape and Internet Explorer. JavaScript code can be inserted directly into html documents. There is no real "standard" for JavaScript as there is for Java. Recent extensions of JavaScript include support for object-oriented programming, regular expressions, and other niceties.
JavaScript is useful for constructing parts of an html page "on the fly," conditionally on the values of various parameters, etc. For example, JavaScript can append html to an open document. Because you can insert the JavaScript anywhere in the html document, you can build an html page with parts that are different when the "state" at the time the page is visited is different. JavaScript can also be used to change the source files images are read from. Html forms are based on JavaScript, so if you have ever written a page that uses forms, you have used JavaScript.
JavaScript can be used to open new pages, close pages, reload pages, etc. JavaScript can also be used to communicate with applets on a page, as described above.
In the material I will show today, JavaScript is used to construct examples that change each time a page is visited, to pass randomly generated data to applets, to query applets for answers to certain computations, to compute various statistics, to check students' answers to practice problems and problem sets, to control access to problem sets (in conjunction with an applet that reads a due date file from the web), and for certain demonstrations (such as the "Let's Make a Deal" problem). It is also used to generate html used repeatedly in different documents, so that I can control the appearance of many of the pages from a central location.
Here is an example of using JavaScript (including an html form) to make a page interactive:
The form just above and the response are generated with the following JavaScript code:
<form method="POST">
<script language="JavaScript"><!--
function howMany(n) {
if (n > 10) {
if (confirm("Gee,
Sacha, can you believe that " + n.toString() +
" people care what your dad has to say?") ) {
alert("And some of them are still awake!");
}
}
else { alert("How depressing!"); }
}
// -->
</script>
<div align="center"><center><p><strong>How
many people are here today? </strong>
<input type="text" size="10" onchange="howMany(value);"> </p>
</center></div></form>
JavaScript and Java have complementary functionality; using both together allows a great deal of flexibility.
Here is an example of using JavaScript to send data to an applet; in this case, just the value of a correlation coefficient to use in generating a random bivariate normal scatterplot:
The html, JavaScript, and Java used to generate this is:
<form method="post">
"Cookies" are small text strings associated with a document or a directory on the web. Cookies are stored on the client's hard drive. They can be written and read using JavaScript. They are a way of having persistent "state" as the user navigates among pages: the programmer can test whether a cookie has been set, and what its contents are, so the page can be "personalized" for the user. In the material I will show today, cookies are to store identifying information needed to submit a problem set, to ensure that a student navigated to a page along the right route, and to store the student's answers to the problem set on his or her machine when the problem set is submitted for grading. That way, if something goes wrong with the network while the student's answers to the problem set are in transit from the client to the server, the student can automatically recall his or her answers and re-submit them, without having to start from scratch. The cookies I use are encrypted to protect students' privacy.
The browsers impose limits on the length of individual cookies, the total number of cookies, and the number of cookies from each web domain.
©1998, P.B. Stark. All rights reserved.
Last modified 28 October 1998.