As defined earlier, traditional assessment generally refers to written testing, such as multiple choice, matching, true/false, fill in the blank, etc. Written assessments must typically be completed within a specific amount of time. There is a single, correct response for each item. The assessment, or test, assumes that all students should learn the same thing, and relies on rote memorization of facts. Responses are often machine scored, and offer little opportunity for a demonstration of the thought processes characteristic of critical thinking skills.
One shortcoming is that traditional assessment approaches are generally instructor centered, and that they measure performance against an empirical standard. In traditional assessment, fairly simple grading matrices such as shown in Figure 5-2 are used. The problem with this type of assessment has always been that a satisfactory grade for the first lesson may be an unsatisfactory on lesson number three.
Still, tests of this nature do have a place in the assessment hierarchy. Multiple choice, supply type, and other such tests are useful in assessing the student’s grasp of information, concepts, terms, processes, and rules—factual knowledge that forms the foundation needed for the student to advance to higher levels of learning.
Characteristics of a Good Written Assessment (Test)
Whether or not an instructor designs his or her own tests or uses commercially available test banks, it is important to know the components of an effective test. (Note: This section is intended to introduce basic concepts of written test design. Please see Appendix A for testing and test writing publications.)
A test is a set of questions, problems, or exercises intended to determine whether the student possesses a particular knowledge or skill. A test can consist of just one test item, but it usually consists of a number of test items. A test item measures a single objective, and calls for a single response. The test could be as simple as the correct answer to an essay question or as complex as completing a knowledge or practical test. Regardless of the underlying purpose, effective tests share certain characteristics. [Figure 5-3]
Reliability is the degree to which test results are consistent with repeated measurements. If identical measurements are obtained every time a certain instrument is applied to a certain dimension, the instrument is considered reliable. The reliability of a written test is judged by whether it gives consistent measurement to a particular individual or group. Keep in mind, though, that knowledge, skills, and understanding can improve with subsequent attempts at taking the same test, because the first test serves as a learning device.
Validity is the extent to which a test measures what it is supposed to measure, and it is the most important consideration in test evaluation. The instructor must carefully consider whether the test actually measures what it is supposed to measure. To estimate validity, several instructors read the test critically and consider its content relative to the stated objectives of the instruction. Items that do not pertain directly to the objectives of the course should be modified or eliminated.
Usability refers to the functionality of tests. A usable written test is easy to give if it is printed in a type size large enough for students to read easily. The wording of both the directions for taking the test and of the test items needs to be clear and concise. Graphics, charts, and illustrations appropriate to the test items must be clearly drawn, and the test should be easily graded.
Objectivity describes singleness of scoring of a test. Essay questions provide an example of this principle. It is nearly impossible to prevent an instructor’s own knowledge and experience in the subject area, writing style, or grammar from affecting the grade awarded. Selection-type test items, such as true/false or multiple choice, are much easier to grade objectively.
Comprehensiveness is the degree to which a test measures the overall objectives. Suppose, for example, an AMT wants to measure the compression of an aircraft engine. Measuring compression on a single cylinder would not provide an indication of the entire engine. Similarly, a written test must sample an appropriate cross-section of the objectives of instruction. The instructor has to make certain the evaluation includes a representative and comprehensive sampling of the objectives of the course.
Discrimination is the degree to which a test distinguishes the difference between students. In classroom evaluation, a test must measure small differences in achievement in relation to the objectives of the course. A test constructed to identify the difference in the achievement of students has three features:
- A wide range of scores
- All levels of difficulty
- Items that distinguish between students with differing levels of achievement of the course objectives
Please see the reference section for information on the advantages and disadvantages of multiple-choice, supply-type, and other written assessment instruments, as well as guidance on creating effective test items.