People often ask, “How do you decide what to put on the test?” and “How do you create the test?”
Producing a Stimulus and Questions
The development of TOEFL test content begins with test developers who are experts in a wide variety of subjects. These test developers search for source material that is typical of what can be found in a first- or second-year university-level class. They consult textbooks and published research studies, work with professors and researchers, and draw on their own expertise to produce what we call a stimulus—material such as a reading passage or a lecture. They then create questions which ask about the content of the stimulus. But things don’t end there.
Fairness, Reliability and Validity
Next, the stimulus and questions go through multiple reviews by other experts. They are reviewed for content accuracy. They are also reviewed for fairness, reliability and validity. To be fair, test content must not be biased toward one group over another. It must not include content that could be unnecessarily upsetting to test takers; this could cause them to perform poorly due to the stress of focusing on the emotionally charged content. To be reliable, tests given at different times must be of a similar difficulty to one another. To be valid, the test must be designed to assess only the skills that need to be measured. In other words, we must test what we say we are testing. A test of English should not ask people to solve complex math problems because we are not testing math skills—we are testing English skills.
Multiple-choice Questions
For multiple-choice questions in the Reading and Listening sections, reviewers ensure that there is one and only one right answer, unless the question was specifically designed to have more than one right answer. They also ensure that right answers are definitely right and that wrong answers are definitely wrong. For Reading, right answers represent what a good, competent reader of English would understand from the passage. For Listening, right answers represent what a good, competent listener of English would understand from the conversation or lecture. Each set of questions covers the content of an entire reading or listening stimulus.
Editing and Design
The stimulus and questions are then modified as needed, in consultation with the original test developer who produced the material. Then they are sent to editors who fact check the material, correct any grammatical and typographical errors, and ensure a consistent style. If an image is needed, such as artwork or a photograph, it is created or obtained by our art department. If copyright permission is needed, it is acquired by our copyright permissions department. If recordings are needed, they are produced in a studio by voice actors who read scripts containing the test material. Once all the materials are gathered, they are entered into a system that displays the content exactly as it will be seen by test takers, and everything is proofed several more times.
Pretesting and Finalizing
Some of the content is pretested to determine whether the test questions perform as anticipated. We then analyze the pretest results and can determine from these analyses whether a question is too difficult, not difficult enough, or whether it is more difficult for one population than for a different population of the same ability level. For example, did females perform better than males on a question? This would suggest that there is a problem—perhaps something about the question is not fair to males. We want to be fair and equitable to everyone, so such a question would need to be examined and probably revised. Also, we can see from the pretest results whether a large percentage of people of high English proficiency chose an incorrect answer. This could mean that the incorrect answer choice that they chose is faulty in some way. It could also mean that the correct answer is not precisely or clearly written. Problematic answer choices are then modified. After all analyses are complete and any necessary modifications are made, the question can be used in a test.
Test Assembly and Certification
All finalized stimuli and questions are sent to a database from which test assemblers choose material and assemble a test. Experts then take the entire test, going through the questions one by one. These experts also consider an assembled form as a whole. If overlapping content is found, it is replaced. The form is then certified as ready to be administered at test centers around the world.
But that’s not enough for us. Once test takers have taken the test and the data is returned, we look at the results to make sure that all the test questions performed as intended. If any question did not, it is thrown out and does not count toward a test taker’s score.
Our Responsibility to You
So whenever you take your TOEFL iBT® test, you can be assured that hundreds of ETS employees have worked to make sure your score accurately reflects your ability. We know that our tests have an impact on real people, so it is our responsibility to create the highest quality assessments possible, one question at a time.
Commentaires
Enregistrer un commentaire