TILE - Program Evaluation Literature


Summaries of existing program evaluation literature in the context of TILE:

Gross Davis, Barbara and Sheila Humphreys. 1985. Evaluating Intervention Programs-Applications from Women's Programs in Math and Science. New York: Teachers College Press.

Anandam, Kamala and J. Terence Kelly. 1981. "Evaluating the use of Technology in Education." Journal of Educational Technology Systems. 10, 1: 21-31.

Mausner, Bernard, Edward F. Wolff, Richard W. Evans, Mary M. DeBoer, Steven P. Gulkus, Anita D'Amore, and Samuel Hirsch. 1983. "A Program of Computer Assisted Instruction for a Personalized Instructional Course in Statistics." Teaching of Psychology. 10, 4, December: 195-200.

Ross, Steven. 1984. "Matching the Lesson to the Student: Alternative Adaptive Designs for Individualized Learning Systems." Journal of Computer-Based Instruction. 11, 2, Spring: 42-48.

Duncan, Nancy C. 1993. "Evaluation of instructional software: Design considerations and recommendations." Behavior Research Methods, Instruments and Computers.  25, 2: 223-227.

Castellan, N John. 1993. "Evaluating information technology in teaching and learning."  Behavior Research Methods, Instruments and Computers.  25, 2: 233-237.

Ransdell, Sarah. 1993. "Educational software evaluation research: Balancing internal, external, and ecological validity."Behavior Research Methods, Instruments and Computers.  25, 2: 228-232.

Welsh, Josephine. 1993. "The effectiveness of computerized instruction at he college level." Behavior Research Methods, Instruments and Computers. 25, 2: 220-222


Gross Davis, Barbara and Sheila Humphreys. 1985. Evaluating Intervention Programs-Applications from Women's Programs in Math and Science. New York: Teachers College Press.

A descriptions of three basic types of evaluations is provided:

Preformative: during the planning phases.

Formative: to improve a program that is still being developed.

Summative: reports on the overall quality and effectiveness of a program.

To help in creating the evaluation, the authors provide ideas and processes that can help generate evaluation goals. Possible sources for evaluation questions include:

Discussions with the audiences for evaluation (e.g. Funder)

Discussions among the program staff, including the following issues:

Evaluation goals and objectives can further be defined by re-reading the original proposal, observing the program in action and reading previous evaluations.

Depending on what the evaluation pertains to measure (cognitive or attitudinal change, or both), different instruments or data sources are suggested:

Cognitive: - administering standardized test
- teacher's judgment of students' performance
- reviewing students' past assignments/tests
Attitudinal: - interviewing students
- administering questionnaires
Both: - observing students in the lab

If one of the questions that has to be answered is cognitive change, the administering of tests would be appropriate. Two types of tests are identified:

Norm-Referenced Tests provide information on how well an individual or group does in comparison to other individuals taking the same test.

Criterion-Referenced Tests provide information about an individual's or group's performance relative to certain independently defined standards rather than relative to the performance of others.

The attitudinal dimension, on the other hand, if applied to statistics or math, may include:

The authors recommend to come up with an evaluation plan, which should provide the necessary instruments/data sources, the sample, and data collection times for each evaluation question we would like to answer. The following is a synthesis of how this might look like if applied to programs similar to TILE:

Evaluation Question Instrument/         Data Source Sample Data Collection
Is the lab well attended? Observation; attendance list Teacher; audience Beginning and end of each session
What is the immediate impact of a lab session on the audience's attitude/knowledge Questionnaire; interview; test Audience End of each session
What is the audience's reaction to the entire program Questionnaire; interview Sample of those who attended 1+ lab session End of program/semester

Tools of Evaluation are discussed in more detail, and a list of different means for conducting an evaluation with discussion of pros and cons for each is provided:

Technique Advantages Disadvantages
Questionnaires - anonymous
- wide range of topics
- structured and inflexible
- respondents may interpret questions differently (reliability issue)
Interviews - flexible/adaptable
- in-depth (probing)
- responses difficult to summarize/analyze
Observations - information about natural setting
- may uncover issues not revealed through other means
- data difficult to synthesize
Tests - may provide most convincing evidence for some audiences (e.g. Funders) - inappropriate for interventions of limited duration
Documents, Records, Materials - provides background information
- may uncover issues not revealed through other sources
- potentially low payoff
- interpretations/explanations may be lacking

For questionnaire development, the authors point out the importance of reliability and validity issues:

Reliability Issue: Does the questionnaire mean the same thing to various people at various times?
Validity Issue: Does the questionnaire measure what it purports to measure (are the questions the right indicators for the variable(s) we are trying to measure)?

Examples on how to frame questions, for example in the form of open- or closed-ended questions, are shown. A checklist for good questions is provided:

Gross Davis and Humphreys further suggest the pilot testing of questions by

The questions then can be rewritten, and, if time allows, the pretest can be given to the pilot group again after two weeks to check for reliability.

For a formative evaluation, questionnaires may be administered in the following ways:

As for interviews, the authors recommend structured interviews (with a choice of answers already provided) or semi-structured interviews (built around a specific set of questions) rather than unstructured interviews--the more you know what information you want and the less experienced the interviewers, the more structured interviews should be. As for questionnaires, the framing of questions and pilot testing is also important for interviews.

For the evaluation of courses or curricula, it is particularly important that the interviewer is not the teacher or professor.

Observation as a tool for data gathering is also explored. The authors, again, distinguish between different formats:

A list of what to observe is also provided:

If the program evaluation does not involve a control group, the authors recommend the following alternatives as suitable comparison groups:

 


Anandam, Kamala and J. Terence Kelly. 1981. "Evaluating the use of Technology in Education." Journal of Educational Technology Systems. 10, 1: 21-31.

The authors distinguish between the following instructional uses of the computer. In addition to "learning about the computer", they differentiate between:

[see K.L. Zinn, Instructional Uses of Computers in Higher Education, The Fourth Inventory of Computers in Higher Education: An Interpretive Report, EDUCOM, Princeton, New Jersey, pp. 103-126, 1979]

Studies in the 1970s concluded that all of these computerized methods are at least as effective as non-computerized methods in bringing about learning-gains. CAI applications, however, have not considered student characteristics and subject matter uniqueness due to a "passive view of the student" in a "frame-oriented approach" to learning.

Anandam and Kelly stress that CAI really is more "instruction" than "learning" oriented. For learning to occur, more "individualization" is needed, for which there are four levels:

  1. Arranging a predetermined instructional sequence conditional on different responses to prespecified questions with immediate feedback.
  2. Choosing subsequent instruction based on a dynamic measure of performance on previous materials.
  3. Providing different presentation modes or instructional sequences based on individual differences such as aptitudes, interests, or personality.
  4. Hypothesizing a model of learning for each student consisting of procedures for presentation of instructional materials and assessment of performance (model is modified as learning occurs, allowing student to learn the material and gain insight into learning itself).

[see: G.P. Kearsley, Some Conceptional Issues in Computer-Assisted Instruction, Journal of Computer-based Instruction, 4, pp.8-16, August 1977]

The authors conclude that, for effective learning to occur, flexibility and selectivity are needed. To promote effectiveness, the following curriculum questions arise:

top.jpg (1064 bytes)

 


Mausner, Bernard, Edward F. Wolff, Richard W. Evans, Mary M. DeBoer, Steven P. Gulkus, Anita D'Amore, and Samuel Hirsch. 1983 "A Program of Computer Assisted Instruction for a Personalized Instructional Course in Statistics." Teaching of Psychology. 10, 4, December: 195-200.

The article is an evaluation of highly interactive computer units, a CAI System Program for Statistics at Beaver College (CAI-Stat).

The objective of the software was to develop a procedure by which students learn underlying concepts of descriptive and inferential statistics (problem solving).

This occurred in a largely self-paced process (students could move as fast or slowly as they liked) of computer interaction with immediate feedback. After each completed unit of the program, tests were administered to the student. Lastly, a final exam involved concepts and problem-solving exercises.

The features of the software included:

The authors identify two principles for courseware design for the design of instructional units:

Principle #1 Context or "Problem-Oriented Instruction"

Principle #2 Employing the Expert's Problem Solving Procedure: "When several solution principles are to be taught within the context of a relatively complex problem, the order of instruction should follow an expert's order of access to these problem-solving principles"

In the program, most statistical procedures were taught according to these principles.

For the evaluation, a posttest of problem solving ability was administered to two groups and produced the following results:

Experimental group: Computer-based course (mean of 6.61 correct answers)
Control group: Workbook-based course (mean of 4.64 correct answers)

[ The TEST can be found in: Evans, R.W. A computerized course in elementary statistics: Educational objectives and methods. In Proceedings of NECC 1981: National Educational Computing Conference, pp. 254-258 ]

The authors concluded that improved performance in problem-solving ability is most likely a result of the design principles employed in the creation of the courseware.

Furthermore, three criteria for evaluating CAI programs are discussed. The program/software should:

  1. include adequate training of tutors/teachers in using the computer as an instructional aid
  2. fit the needs of the students and should blend into the existing curriculum
  3. utilize the unique capacities of the computer as an interactive tool, especially in the form of branching capabilities, which a textbook cannot offer

Furthermore, an analysis of individual student characteristics related to performance in the CAI course was made.

This was based on:

1) Log-file including data on:

2) Pretest:

  1. Perception of teacher
  2. Anxiety towards math
  3. Value of mathematics in society
  4. Self-concept in math
  5. Enjoyment of math
  6. Motivation in math

3) Final Grades

All of the math attitude factor scores, except for self-concept, were significantly and highly correlated with final grades, especially math anxiety (Correlation: -.70)

The attitudinal dimension yielded a positive reaction of students to the program.

Finally, the authors emphasize the relationship between self-pacing and different types of students: students oriented toward externals, grades, grad school, or mastery, expect positive consequences from regular and systematic work. Those oriented toward internal rewards, avoidance of anxiety, creativity, long-term memory, expect lower utilities from self-pacing. Thus, encouragement of students to use the program is vital.

top.jpg (1064 bytes)

 


Ross, Steven. 1984. "Matching the Lesson to the Student: Alternative Adaptive Designs for Individualized Learning Systems." Journal of Computer-Based Instruction. 11, 2, Spring: 42-48.

Ross outlines typical educational computer programs, which tend to incorporate the following orientations of "control":

Ross' research broadens the types of instructional properties and bases adaptive decisions on more extensive information about learner background and current needs. He examines adaptation types, including

which were applied to the teaching of basic concepts in an undergraduate statistics course.

"Program Control" was implemented for a self-instructional lesson (CMI) covering 10 algebraic rules which comprised prerequisite learning for a statistics course. The "individual adaptive strategy" in the computer program incorporated the following steps (steps 4 - 6 represented a loop until all lessons were completed):

  1. Entry Test (pretest, aptitude, locus of control, anxiety, etc.)
  2. Regression Prediction (a set of 10 predicted scores for each student were generated using multiple regression equations)
  3. Adaptive Prescription
  4. Lesson
  5. Immediate Posttest (after each lesson)
  6. Refinements (based on posttest scores used to make refinements for the next lesson)
  7. Cumulative posttest

The evaluation of the computer software occurred in different studies.

Adaptation Study 1

The following are results (based on % answers correct on the posttest) for the first adaptive study, involving several strategies:

  1. Individualized-adaptive strategy 75% correct
  2. Group-adaptive strategy 62% correct
  3. Nonadaptive strategy 57% correct

Based on these findings, the author concludes that an individualized adaptive strategy is good for CMI models of instruction. Four subsequent studies were made. A replication of the first study, in Study 2, again favored the former treatment.

Study 3 and 4 involved an experiment on rewards or incentives, while the purpose of Study 5 was an evaluation of Program vs. Learner Control in PSI models.

Program control surpassed learner control and lecture on the immediate posttest and all treatments on the delayed posttest. Learner Control was associated with the lowest performance. Ross concludes that Program Control was most beneficial relative to learner control when pretestscores were low.

Ross further emphasizes that adaptation of context may increase conceptual retention.

In an experiment, statistical probability rules were presented in contexts varying in relatedness to subjects' academic majors.

The hypothesis to be tested, thus, was that adaptive (familiar) contexts facilitate assimilation of new information in memory and are more likely to promote meaningful learning.

An evaluation of Education,. Medical, and Abstract Contexts supported this hypothesis:

Some benefits were observed in comparison to nonadaptive contexts, but especially in comparison to uses of abstract contexts.

In conclusion, the author stresses the importance of adaptive contexts, and suggests the support of three types of functions through adaptive contexts:

  1. Generate interest in the task
  2. Activate relevant past experiences as conceptual anchoring for information
  3. Associate rules in memory with a meaningful set of ideas

top.jpg (1064 bytes)


Duncan, Nancy C. 1993. "Evaluation of instructional software: Design considerations and recommendations." Behavior Research Methods, Instruments and Computers; 25, 2: 223-227.

Duncan discusses general evaluation questions, including sources of internal invalidity (non-randomness in selection process) and considers types of compared educational activity and outcome measures.

Possible sources of internal invalidity includes selection bias: results might be difficult to assess if the volunteer selection method is used. (Another problem mentioned in this article is the awareness of students that some are receiving CAI while others are not). The author stresses the importance of the measurement of learner's characteristics to construct an effective evaluation. However, which characteristics should be assessed and what instruments should be used for such assessments are flexible and depends on each evaluation design. Besides tests, some others possible assessments of student characteristics include:

The potential of assessing the nature of the learning process by the computer (similar to "tracing" student use) is also discussed by Duncan. Several packages are mentioned, such as:

Computer-based supplemental exercises are seen by many not as an educational opportunity, but rather an added requirement. How to make the idea of an educational software sound attractive would perhaps be an important task.

The evaluation of TILE concerns two subcategories:

Assessment of student's perception of the usefulness of the program, and any problems that they encountered. (some example questions are listed in Ransdell, pp231, same issue) .

Assessment of the effectiveness of the software by comparing test results of "treatment" and "control" groups. The selection of a proper activity for comparison can be a difficult task. This part of evaluation will run into problems concerning the validity of evaluation, issues such as the dilemma in between-group comparisons, and factors such as student's study habits and variable interest, etc. Measurement could take into account long-term factors, by looking at how this educational software has stimulated interest in course content, even major or career choices.


Castellan, N John. 1993. "Evaluating information technology in teaching and learning." Behavior Research Methods, Instruments and Computers; 25, 2: 233-237.

Castellan discusses "strategic evaluation", which emphasizes technical accuracy, crucial to ensure that students are not hindered by the inability to use the technology. All the procedures involving technical or computer skills should be well explained.

Furthermore, pedagogical soundness is emphasized, the importance of software conveying the content and concepts to be learned. Clearly articulated instructional goals are also considered important. Some questions to ask yourself in designing instructional software include: Does the technology encourage testing ideas and concepts? Can the skills and concepts learned be transferred beyond the context of introduction?

Then, substantive fidelity is outlined: the material has to be accurate and worth learning. Moreover, integrative flexibility is emphasized, the need to modify the class/course structure along with the introduction of this new technology..

Finally, cyclic improvement calls for evaluations that must be made during and after the course, in order to compare the results with evaluations made before the software is used. For instance: do student's opinion change toward CAI (computer aided instruction) after the experience with the program?
top.jpg (1064 bytes)


Ransdell, Sarah. 1993. "Educational software evaluation research: Balancing internal, external, and ecological validity." Behavior Research Methods, Instruments and Computers; 25, 2: 228-232.

Ransdell discusses the difficulties in evaluating software in terms of tradeoffs between internal and external or ecological validity.

Internal validity refers to the "degree to which a design allows for unconfounded results." The problematic of comparing computer-based instruction with traditional forms of teaching is one example of an internal validity risk when media, rather than "messages" (symbols carrying meaning) are compared. Further threats to internal validity include "improvements or declines in performance due to students' attendance or study habits" as well as "variable interest in, and difficulty of, individual topics."

External validity is the "degree to which results can be generalized to apply to other populations, settings, or levels of variables." External and ecological validity include factors such as "studies of short duration," "homogeneous samples of college student ability," students' respondent and instructors' observer biases.

The evaluation discussed by the author, involving two groups of students (one from a community college, the other from a university) is based on a survey which is administered along with a midterm exam, including some potentially useful questionnaire items for a formative evaluation:

Q: When watching the computer activities, were you ever frustrated by them? If so, can you describe them?

Q: Did you usually understand the computer program's main objective? If not, which activity was difficult to see what it was for?

Q: Are the concepts and terminology in the computer activities related to the material covered in the class lectures? Command

Q: Describe any activities particularly interesting to you and those that were tedious or boring.

 

top.jpg (1064 bytes)


Welsh, Josephine. 1993. "The effectiveness of computerized instruction at the college level." Behavior Research Methods, Instruments and Computers. 25, 2:220-222

Five suggestions for a successful software implementation are discussed:

  1. Resource conservation, such as efficiency and cost effectiveness, rather than loss, that is possible with CAI and CBI (computer based instruction) should be emphasized.
  2. Software is to be chosen wisely and requires the involvement of teachers in its evaluation
  3. Be wary of the fact that some students fear computers, even though, at the college level, the problem of attitude is not as great as at the secondary school level.
  4. The use of computers should not be limited to Computer Assisted Instruction, as the effectiveness of Computer Based Instruction has been repeatedly proven.
  5. A continued investigation of both CAI and CBI, in the form of comparative research, is still needed

Relevant to TILE is the author's note that students may pay more attention to screens that contain interactive examples or provide review questions with immediate feedback. Welsh found that students tend to hurry through the screens of text and spend time on the demonstrations. While this can make the software attractive, the author alerts us to the fact that this also may not compel students to learn.

top.jpg (1064 bytes)


|Main| |Goals| |Labs| |Toolkit| |Tools|


This page was last updated on 07/29/98. Questions, comments, suggestions.