STATISTICAL METHODS FOR ANALYSIS
THE QUALITY OF
TEST QUESTIONS AND TASKS.
Farzon Nosiri
e-mail: farzon@kth.se
During my diploma
project at university I was involved to establish the set of the programs to
automate the testing system at Khujand Branch of Technological University of
Tajikistan. My task was to assess the quality of test questions and tasks.
Using mathematical model of the classical theory of the tests there was
analysis conducted on the results of the exams of the spring term 2009. Different
models and methods of the analysis of test questions and tasks were studied. Appropriate
statistical and analytical characteristics were applied to the current system.
Analysis
of the tasks on the coefficient of difficulty and easiness. These are the first characteristics which
included to the module of analysis. It can be called Index of Easiness of the task (IE) and Index of Difficulty (ID):
(1.1)

where, xavgj – average of grades, received by all tested
students for the fulfillment of j-task,
xmaxj – maximum possible number of marks can be taken
for fulfillment of j-task,
N - the number of tested
students.
The importance of quantitative
characteristic of difficulty of the tests is that making tasks to be able to differentiate
tested students by the level of their preparedness, the difficulty of the
tasks should appropriate to the level of
preparedness of the tested students. Generally test should include the set of the
different tasks – from the easiest to the difficult ones. However very easy tasks,
when everybody gives right answer and very difficult tasks when nobody gives
right answers do not have ability to differentiate tested students by their
level of preparedness and that is why these type of
tasks can’t be considered as testing tasks.
The other characteristic
is Dispersion (variation) of the results
of the testing tasks, which calculates as:
(1.2)
This characteristic shows
spread of grades, received by N being tested during the answer to the exact (j)
task of the test. If all tested students answer to the question similarly, then
the spread of grades will be 0. The tasks with zero or very low value of
dispersion have very low ability to divide tested students by their
preparedness and therefore need to be excluded from the test. If dispersion is
high, then the quality of the test is higher.
One more very
important statistic characteristic with ability to differentiate the test tasks
is Coefficient of Differentiation(CD). Coefficient of Differentiation
is calculated as:
(1.3)
where

|
where,
|
– dispersion of
summary results of tested students for fulfillment of all test tasks;
Savg
− average of marks received by all N tested students for the whole
test;
si
− the sum of marks of i-tested student for fulfillment of all test
tasks.
|
This characteristic is
a coefficient of correlation of the range of answers, received from tested students
on the exact task with the result of the whole test received from the same students.
This characteristic can be between -1 and +1 and also is a measure of ability
of exact task define strong and weak students. Positive values of this
characteristic respond to the tasks which define “strong” and “weak” students
indeed. Negative values of this characteristic show that not prepared students
answer to this task in average better then well prepared students. Obviously, these
kind of the tasks are not well stated and can’t be as
test tasks and should be sort out from the test.
The next characteristic
of the test analysis is graphical view,
i.e. diagrams. This characteristic divides the students into 3 or 4 groups
according their answers in the test. The
diagram shows general average percentage of each group of correctly answered
students by each task separately. The
good test task will have diagram as rising diagonal. If diagram has a form of zigzag
or divergence, the test can be considered as not qualitative.
Spread of students will be done according 2 criteria:
the first criteria divides the students based on previous grades. The second criteria
divides the students based on the grades received in the current test.
According
to these methods there was a program written in PHP language by me. This
program was tested, debugged and currently it is on use at Khujand Branch of
Technological University of Tajikistan. Although automated testing process is
popular in other world universities, it is new for Tajik universities and other
schools can adopt this system too.
References:
1. Rasch G. Probabilistic Models for Some Intelligence and Attainment
Tests.
Copenhagen, Denmark: Danish Institute for Educational Research, 1960.
2. Item Analysis in “Moodle”. http://opp.psy.msu.ru/help.php?module=quiz&file=itemanalysis.html
3. Minin M.G., Stas N.F., Zhidkova E.V., Rodkevich
O. B. Statistical analysis of test quality, applied to control knowledge in
chemistry. Russia: News of polytechnic
university of Tomks. 2007. C. 310. No. 1