Ross, Steven. 1984. "Matching the Lesson to the Student:
Alternative Adaptive Designs for Individualized Learning Systems." Journal of
Computer-Based Instruction. 11, 2, Spring: 42-48.
Duncan, Nancy C. 1993. "Evaluation of instructional
software: Design considerations and recommendations." Behavior Research Methods,
Instruments and Computers. 25, 2: 223-227.
Castellan, N John. 1993. "Evaluating information technology
in teaching and learning." Behavior Research Methods, Instruments and
Computers. 25, 2: 233-237.
Ransdell, Sarah. 1993. "Educational software evaluation research:
Balancing internal, external, and ecological validity."Behavior Research Methods,
Instruments and Computers. 25, 2: 228-232.
Gross Davis, Barbara and Sheila Humphreys. 1985. Evaluating
Intervention Programs-Applications from Women's Programs in Math and Science. New
York: Teachers College Press.
A descriptions of three basic types of evaluations is provided:
Preformative: during the planning phases.
Formative: to improve a program that is still being developed.
Summative: reports on the overall quality and effectiveness of a program.
To help in creating the evaluation, the authors provide ideas and processes that can help generate evaluation goals. Possible sources for evaluation questions include:
Discussions with the audiences for evaluation (e.g. Funder)
Discussions among the program staff, including the following issues:
Evaluation goals and objectives can further be defined by re-reading the original proposal, observing the program in action and reading previous evaluations.
Depending on what the evaluation pertains to measure (cognitive or attitudinal change, or both), different instruments or data sources are suggested:
Cognitive: | - administering standardized test |
- teacher's judgment of students' performance | |
- reviewing students' past assignments/tests |
|
Attitudinal: | - interviewing students |
- administering questionnaires |
|
Both: | - observing students in the lab |
If one of the questions that has to be answered is cognitive change,
the administering of tests would be appropriate. Two types of tests are
identified:
Norm-Referenced Tests provide information on how well
an individual or group does in comparison to other individuals taking the same test.
Criterion-Referenced Tests provide information about an individual's or group's performance relative to certain independently defined standards rather than relative to the performance of others.
The attitudinal dimension, on the other hand, if applied to statistics or math, may include:
The authors recommend to come up with an evaluation plan, which should
provide the necessary instruments/data sources, the sample, and data
collection times for each evaluation question we would like to answer. The following
is a synthesis of how this might look like if applied to programs similar to TILE:
Evaluation Question | Instrument/ Data Source | Sample | Data Collection |
Is the lab well attended? | Observation; attendance list | Teacher; audience | Beginning and end of each session |
What is the immediate impact of a lab session on the audience's attitude/knowledge | Questionnaire; interview; test | Audience | End of each session |
What is the audience's reaction to the entire program | Questionnaire; interview | Sample of those who attended 1+ lab session | End of program/semester |
Tools of Evaluation are discussed in more detail, and a list of different means for conducting an evaluation with discussion of pros and cons for each is provided:
Technique | Advantages | Disadvantages |
Questionnaires | - anonymous - wide range of topics |
- structured and inflexible - respondents may interpret questions differently (reliability issue) |
Interviews | - flexible/adaptable - in-depth (probing) |
- responses difficult to summarize/analyze |
Observations | - information about natural setting - may uncover issues not revealed through other means |
- data difficult to synthesize |
Tests | - may provide most convincing evidence for some audiences (e.g. Funders) | - inappropriate for interventions of limited duration |
Documents, Records, Materials | - provides background information - may uncover issues not revealed through other sources |
- potentially low payoff - interpretations/explanations may be lacking |
For questionnaire development, the authors point out the importance of reliability and validity issues:
Reliability Issue: | Does the questionnaire mean the same thing to various people at various times? |
Validity Issue: | Does the questionnaire measure what it purports to measure (are the questions the right indicators for the variable(s) we are trying to measure)? |
Examples on how to frame questions, for example in the form of open- or closed-ended questions, are shown. A checklist for good questions is provided:
Gross Davis and Humphreys further suggest the pilot testing of questions by
The questions then can be rewritten, and, if time allows, the pretest can be given to the pilot group again after two weeks to check for reliability.
For a formative evaluation, questionnaires may be administered in the following ways:
As for interviews, the authors recommend structured interviews (with a choice of answers already provided) or semi-structured interviews (built around a specific set of questions) rather than unstructured interviews--the more you know what information you want and the less experienced the interviewers, the more structured interviews should be. As for questionnaires, the framing of questions and pilot testing is also important for interviews.
For the evaluation of courses or curricula, it is particularly important that the interviewer is not the teacher or professor.
Observation as a tool for data gathering is also explored. The authors, again, distinguish between different formats:
A list of what to observe is also provided:
If the program evaluation does not involve a control group, the authors recommend the following alternatives as suitable comparison groups:
Anandam, Kamala and J. Terence Kelly. 1981. "Evaluating the use of Technology in Education." Journal of Educational Technology Systems. 10, 1: 21-31.
The authors distinguish between the following instructional uses of the computer. In addition to "learning about the computer", they differentiate between:
[see K.L. Zinn, Instructional Uses of Computers in Higher Education, The Fourth Inventory of Computers in Higher Education: An Interpretive Report, EDUCOM, Princeton, New Jersey, pp. 103-126, 1979]
Studies in the 1970s concluded that all of these computerized methods are at least as effective as non-computerized methods in bringing about learning-gains. CAI applications, however, have not considered student characteristics and subject matter uniqueness due to a "passive view of the student" in a "frame-oriented approach" to learning.
Anandam and Kelly stress that CAI really is more "instruction" than "learning" oriented. For learning to occur, more "individualization" is needed, for which there are four levels:
[see: G.P. Kearsley, Some Conceptional Issues in Computer-Assisted Instruction, Journal of Computer-based Instruction, 4, pp.8-16, August 1977]
The authors conclude that, for effective learning to occur, flexibility and selectivity are needed. To promote effectiveness, the following curriculum questions arise:
Mausner, Bernard, Edward F. Wolff, Richard W. Evans, Mary
M. DeBoer, Steven P. Gulkus, Anita D'Amore, and Samuel Hirsch. 1983 "A Program of
Computer Assisted Instruction for a Personalized Instructional Course in Statistics."
Teaching of Psychology. 10, 4, December: 195-200.
The article is an evaluation of highly interactive computer units, a CAI System Program for Statistics at Beaver College (CAI-Stat).
The objective of the software was to develop a procedure by which students learn underlying concepts of descriptive and inferential statistics (problem solving).
This occurred in a largely self-paced process (students could move as fast or slowly as they liked) of computer interaction with immediate feedback. After each completed unit of the program, tests were administered to the student. Lastly, a final exam involved concepts and problem-solving exercises.
The features of the software included:
The authors identify two principles for courseware design for the design of instructional units:
Principle #1 Context or "Problem-Oriented Instruction"
Principle #2 Employing the Expert's Problem Solving Procedure: "When several solution principles are to be taught within the context of a relatively complex problem, the order of instruction should follow an expert's order of access to these problem-solving principles"
In the program, most statistical procedures were taught according to these principles.
For the evaluation, a posttest of problem solving ability was administered to two groups and produced the following results:
Experimental group: Computer-based course (mean of 6.61
correct answers)
Control group: Workbook-based course (mean of
4.64 correct answers)
[ The TEST can be found in: Evans, R.W. A computerized course in elementary statistics: Educational objectives and methods. In Proceedings of NECC 1981: National Educational Computing Conference, pp. 254-258 ]
The authors concluded that improved performance in problem-solving ability is most likely a result of the design principles employed in the creation of the courseware.
Furthermore, three criteria for evaluating CAI programs are discussed. The program/software should:
Furthermore, an analysis of individual student characteristics related to performance in the CAI course was made.
This was based on:
1) Log-file including data on:
2) Pretest:
3) Final Grades
All of the math attitude factor scores, except for self-concept, were significantly and highly correlated with final grades, especially math anxiety (Correlation: -.70)
The attitudinal dimension yielded a positive reaction of students to the program.
Finally, the authors emphasize the relationship between self-pacing and
different types of students: students oriented toward externals,
grades, grad school, or mastery, expect positive consequences from regular and systematic
work. Those oriented toward internal rewards, avoidance of anxiety,
creativity, long-term memory, expect lower utilities from self-pacing. Thus, encouragement
of students to use the program is vital.
Ross, Steven. 1984. "Matching the Lesson to the Student: Alternative Adaptive Designs for Individualized Learning Systems." Journal of Computer-Based Instruction. 11, 2, Spring: 42-48.
Ross outlines typical educational computer programs, which tend to incorporate the following orientations of "control":
Ross' research broadens the types of instructional properties and bases adaptive decisions on more extensive information about learner background and current needs. He examines adaptation types, including
which were applied to the teaching of basic concepts in an undergraduate statistics course.
"Program Control" was implemented for a self-instructional lesson (CMI) covering 10 algebraic rules which comprised prerequisite learning for a statistics course. The "individual adaptive strategy" in the computer program incorporated the following steps (steps 4 - 6 represented a loop until all lessons were completed):
The evaluation of the computer software occurred in different studies.
Adaptation Study 1
The following are results (based on % answers correct on the posttest) for the first adaptive study, involving several strategies:
Based on these findings, the author concludes that an individualized adaptive strategy is good for CMI models of instruction. Four subsequent studies were made. A replication of the first study, in Study 2, again favored the former treatment.
Study 3 and 4 involved an experiment on rewards or incentives, while the purpose of Study 5 was an evaluation of Program vs. Learner Control in PSI models.
Program control surpassed learner control and lecture on the immediate posttest and all treatments on the delayed posttest. Learner Control was associated with the lowest performance. Ross concludes that Program Control was most beneficial relative to learner control when pretestscores were low.
Ross further emphasizes that adaptation of context may increase conceptual retention.
In an experiment, statistical probability rules were presented in contexts varying in relatedness to subjects' academic majors.
The hypothesis to be tested, thus, was that adaptive (familiar) contexts facilitate assimilation of new information in memory and are more likely to promote meaningful learning.
An evaluation of Education,. Medical, and Abstract Contexts supported this hypothesis:
Some benefits were observed in comparison to nonadaptive contexts, but especially in comparison to uses of abstract contexts.
In conclusion, the author stresses the importance of adaptive contexts, and suggests the support of three types of functions through adaptive contexts:
Duncan, Nancy C. 1993. "Evaluation of instructional software: Design considerations and recommendations." Behavior Research Methods, Instruments and Computers; 25, 2: 223-227.
Duncan discusses general evaluation questions, including sources of internal invalidity (non-randomness in selection process) and considers types of compared educational activity and outcome measures.
Possible sources of internal invalidity includes selection bias: results might be difficult to assess if the volunteer selection method is used. (Another problem mentioned in this article is the awareness of students that some are receiving CAI while others are not). The author stresses the importance of the measurement of learner's characteristics to construct an effective evaluation. However, which characteristics should be assessed and what instruments should be used for such assessments are flexible and depends on each evaluation design. Besides tests, some others possible assessments of student characteristics include:
The potential of assessing the nature of the learning process by the computer (similar to "tracing" student use) is also discussed by Duncan. Several packages are mentioned, such as:
Computer-based supplemental exercises are seen by many not as an educational opportunity, but rather an added requirement. How to make the idea of an educational software sound attractive would perhaps be an important task.
The evaluation of TILE concerns two subcategories:
Assessment of student's perception of the usefulness of the program, and any problems that they encountered. (some example questions are listed in Ransdell, pp231, same issue) .
Assessment of the effectiveness of the software by comparing test results of "treatment" and "control" groups. The selection of a proper activity for comparison can be a difficult task. This part of evaluation will run into problems concerning the validity of evaluation, issues such as the dilemma in between-group comparisons, and factors such as student's study habits and variable interest, etc. Measurement could take into account long-term factors, by looking at how this educational software has stimulated interest in course content, even major or career choices.
Castellan, N John. 1993. "Evaluating information technology in teaching and learning." Behavior Research Methods, Instruments and Computers; 25, 2: 233-237.
Castellan discusses "strategic evaluation", which emphasizes technical accuracy, crucial to ensure that students are not hindered by the inability to use the technology. All the procedures involving technical or computer skills should be well explained.
Furthermore, pedagogical soundness is emphasized, the importance of software conveying the content and concepts to be learned. Clearly articulated instructional goals are also considered important. Some questions to ask yourself in designing instructional software include: Does the technology encourage testing ideas and concepts? Can the skills and concepts learned be transferred beyond the context of introduction?
Then, substantive fidelity is outlined: the material has to be accurate and worth learning. Moreover, integrative flexibility is emphasized, the need to modify the class/course structure along with the introduction of this new technology..
Finally, cyclic improvement calls for evaluations that
must be made during and after the course, in order to compare the results
with evaluations made before the software is used. For instance: do student's opinion
change toward CAI (computer aided instruction) after the experience with the program?
Ransdell, Sarah. 1993. "Educational software evaluation research: Balancing internal, external, and ecological validity." Behavior Research Methods, Instruments and Computers; 25, 2: 228-232.
Ransdell discusses the difficulties in evaluating software in terms of tradeoffs between internal and external or ecological validity.
Internal validity refers to the "degree to which a design allows for unconfounded results." The problematic of comparing computer-based instruction with traditional forms of teaching is one example of an internal validity risk when media, rather than "messages" (symbols carrying meaning) are compared. Further threats to internal validity include "improvements or declines in performance due to students' attendance or study habits" as well as "variable interest in, and difficulty of, individual topics."
External validity is the "degree to which results can be generalized to apply to other populations, settings, or levels of variables." External and ecological validity include factors such as "studies of short duration," "homogeneous samples of college student ability," students' respondent and instructors' observer biases.
The evaluation discussed by the author, involving two groups of students (one from a community college, the other from a university) is based on a survey which is administered along with a midterm exam, including some potentially useful questionnaire items for a formative evaluation:
Q: When watching the computer activities, were you ever frustrated by them? If so, can you describe them?
Q: Did you usually understand the computer program's main objective? If not, which activity was difficult to see what it was for?
Q: Are the concepts and terminology in the computer activities related to the material covered in the class lectures? Command
Q: Describe any activities particularly interesting to you and those that were tedious or boring.
Welsh, Josephine. 1993. "The effectiveness of computerized instruction at the college level." Behavior Research Methods, Instruments and Computers. 25, 2:220-222
Five suggestions for a successful software implementation are discussed:
Relevant to TILE is the author's note that students may pay more attention to screens that contain interactive examples or provide review questions with immediate feedback. Welsh found that students tend to hurry through the screens of text and spend time on the demonstrations. While this can make the software attractive, the author alerts us to the fact that this also may not compel students to learn.
|Main| |Goals| |Labs| |Toolkit| |Tools|
This page was last updated on 07/29/98. Questions, comments, suggestions.