Test Scoring and Analysis Using SAS (2014)
Chapter 10. Detecting Cheating on Multiple-Choice Tests
Introduction
This chapter covers several methods on how to detect cheating on multiple-choice exams. It is important to investigate claims of cheating very carefully since an accusation of cheating has such serious consequences. We recommend that the methods and programs presented here be used only after a question of cheating has been raised by independent means, such as suspicious behavior observed by a proctor or other independent evidence that cheating occurred. This is important because if you look at every student in a large class, the probability of finding a possible cheating incident increases (analogous to an experiment type I error in hypothesis testing).
The methods presented here look at the pattern of wrong answer choices. The first method discussed looks at the set of wrong answers from one student (whom we will call Student 1) and then counts the number of students who chose the same wrong answers to this set of items. This method, and a similar method that uses a set of items that two students got wrong in common (called joint-wrongs), requires that there be enough difficult questions on the test. If a test is too easy (or too short), the set of wrong answers or joint-wrongs will not be large enough for a meaningful analysis. The normal p-values used in statistical tests (alpha = .05 or .01) are not used in this type of investigation. Typically, p-values of less than 10-5 or lower are used.
How to Detect Cheating: Method One
This first method, described in a paper (Cody, 1985), uses a set of wrong answers by Student 1 and counts the number of the same wrong answers with every member of the class (with the exception of Student 2, who was possibly involved in cheating). The mean and standard deviation are computed. The distribution of the number of the same wrong answers is approximately normal (this can be tested), so you can compute a z-score and p-value for the number of the same wrong answers between Students 1 and 2. The probability of two students selecting the same wrong answers also depends on each student's ability. Two weak students are more likely to answer an item incorrectly and are, therefore, more likely to select the same wrong answer than two strong students. As a check, you can run the analysis on all students with grades close to Student 2 and inspect the z-score and p-value for these students.
Before you begin examining the SAS program that implements this method of detection, let's focus on a very simple data set (test_cheat.txt) to see exactly how the analysis works. Here is a listing of test_cheat.txt:
Listing of Data Set test_cheat.txt
ABCDEA
001AAAAAA
002ABCEEA
003AAAEEA
004ABCAAC
005AAAAAA
The table below shows the number of wrong answers for each student as well as the number of items where a student chose the same wrong answer as Student 1:
Table Showing Wrong Answers and Same Wrong Answers for Each Student
Key or ID |
Item 1 |
Item 2 |
Item 3 |
Item 4 |
Item 5 |
Item 6 |
# of Same Wrong |
Key |
A |
B |
C |
D |
E |
A |
|
001 |
A |
|A| |
|A| |
|A| |
|A| |
A |
|
002 |
A |
B |
C |
|E| |
E |
A |
0 |
003 |
A |
||A|| |
||A|| |
|E| |
E |
A |
2 |
004 |
A |
B |
C |
||A|| |
||A|| |
|C| |
2 |
005 |
A |
||A|| |
||A|| |
||A|| |
||A|| |
A |
4 |
Table Key: Single bars |X| indicate a wrong answer; double bars ||X|| indicate the same wrong answer as 001
In this example, Student 001 is Student 1, with four wrong answers. The last column shows the number of items where the student chose the same wrong answer as Student 1. The programs described in this chapter do not require that the data for Student 1 be the first record in the file (following the answer key).
Only a general discussion of the program next is provided.
Program 10.1: Program to Detect Cheating: Method One
*Macro Compare_Wrong uses the set of wrong answers from ID1
and computes the number of same wrong answers for all students
in the data file. It then computes the mean and standard
deviation of the number of same wrong answers (variable Num_Wrong)
with both ID1 and ID2 removed from the calculation. Finally,
it computes a z-value for the number of same wrongs for student ID2;
%macro Compare_wrong ➊
(File=, /*Name of text file containing key and test data */
Length_ID=, /*Number of bytes in the ID */
Start=, /*Starting column of student answers */
ID1=, /*ID of first student */
ID2=, /*ID of second student */
Nitems= /*Number of items on the test */ );
data ID_one(keep=ID Num_wrong_One Ans_One1-Ans_One&Nitems ➋
Wrong_One1-Wrong_One&Nitems)
Not_One(keep=ID Num_wrong Ans1-Ans&Nitems
Wrong1-Wrong&Nitems);
/* Data set ID_One contains values for Student 1
Data set Not_one contains data on other students
Arrays with "one" in the variable names are data from
ID1.
*/
infile "&File" end=last pad;
/*First record is the answer key*/
array Ans[&Nitems] $ 1;
array Ans_One[&Nitems] $ 1;
array Key[&Nitems] $ 1;
array Wrong[&Nitems];
array Wrong_One[&Nitems];
retain Key1-Key&Nitems;
if _n_ = 1 then input @&Start (Key1-Key&Nitems)($1.); ➌
input @1 ID $&Length_ID..
@&Start (Ans1-Ans&Nitems)($1.);
if ID = "&ID1" then do; ➍
do i = 1 to &Nitems;
Wrong_One[i] = Key[i] ne Ans[i];
Ans_One[i] = Ans[i];
end;
Num_Wrong_One = sum(of Wrong_One1-Wrong_One&Nitems);
output ID_one;
return;
end;
do i = 1 to &Nitems; ➎
Wrong[i] = Key[i] ne Ans[i];
end;
output Not_One;
run;
/*
DATA step COMPARE counts the number of same wrong answers as Student
ID1.
*/
data compare; ➏
if _n_ = 1 then set ID_One;
set Not_One;
array Ans[&Nitems] $ 1;
array Ans_One[&Nitems] $ 1;
array Wrong[&Nitems];
array Wrong_One[&Nitems];
Num_Match = 0;
do i = 1 to &Nitems;
if Wrong_One[i] = 1 then Num_Match + Ans[i] eq Ans_One[i];
end;
keep Num_Match ID;
run;
proc sgplot data=compare; ➐
title 'Distribution of the number of matches between';
title2 "Students &ID1, &ID2, and the rest of the class";
title3 "Data file is &File";
histogram Num_Match;
run;
/*
Compute the mean and standard deviation on the number of same
wrong answers as ID1 but eliminate both ID1 and ID2 from the
calculation
*/
proc means data=compare(where=(ID not in ("&ID1" "&ID2")))noprint; ➑
var Num_Match;
output out=Mean_SD mean=Mean_Num_match std=SD_Num_match;
run;
data _null_; ➒
file print;
title1 "Exam file name: &File";
title2 "Number of Items: &Nitems";
title3 "Statistics for students &ID1 and &ID2";
set compare (where=(ID = "&ID2"));
set mean_sd;
set ID_One;
Diff = Num_Match - Mean_Num_Match;
z = Diff / SD_Num_match;
Prob = 1 - probnorm(z);
put // "Student &ID1 got " Num_wrong_One "items wrong" /
"Students &ID1 and &ID2 have " Num_Match "wrong answers in common" /
"The mean number of matches is" Mean_Num_Match 6.3/
"The standard deviation is" SD_Num_match 6.3/
"The z-score is " z 6.3 " with a probability of" Prob;
run;
file print;
title1 "Exam file name: &File Number of Items: &Nitems";
title2 "Statistics for students &ID1 and &ID2";
set compare (where=(ID = "&ID2"));
set mean_sd;
set ID_One;
Diff = Num_Match - Mean_Num_Match;
z = Diff / SD_Num_match;
Prob = 1 - probnorm(z);
put // "Student &ID1 got " Num_wrong_One "items wrong" /
"Students &ID1 and &ID2 have " Num_Match "wrong answers in common" /
"The mean number of matchs is " Mean_Num_Match /
"The standard deviation is " SD_Num_match /
"This gives a z-score of " z " and a probability of " Prob;
run;
proc datasets library=work; ➓
delete ID_One;
delete Not_One Means;
quit;
%mend compare_wrong;
➊ The calling arguments to this macro are the name of the text file containing the test data, the length of the student ID field, the position in the file where the student answers start, the two IDs of interest, and the number of items on the test.
➋ Two data sets are created. ID_One contains the data for ID1. Variables in this data set include the answers to each of the items (Ans_One variables), which items were answered incorrectly (Wrong_One variables equal to 1), and the number of wrong answers by ID1 (Num_Wrong_One). Data set Not_One contains the test data for all of the other students in the class.
➌ You read the answer key into the key variables and then read in student data.
➍ If the ID is equal to ID1, store the answers in the Ans_One variables and set the Wrong_One variables to 1 for all wrong answers (and 0 for correct answers). Then output these values to the ID_One data set.
➎ For all the other students, score the test in the usual way.
➏ Compute the number of the same wrong answers for each of the students.
➐ Use PROC SGPLOT to produce a histogram of the number of the same wrong answers.
➑ Compute the mean and standard deviation for the number of the same wrong answers, excluding Students 1 and 2.
➒ Compute the z- and p-values and produce a report.
➓ Delete work data sets created by the macro.
Let's first test the macro using the test_cheat.txt file described previously. For this file, the length of the ID field is 3, and the student answers start in column 4. If Student 001 is to be ID1 and Student 005 is to be ID2, the calling sequence looks like this:
Calling Sequence for Macro %Compare_Wrong
%Compare_Wrong(File=c:\books\test scoring\test_cheat.txt,
Length_ID=3,
Start=4,
ID1=001,
ID2=005,
Nitems=6)
Here are the results:
Output from Macro %Compare_Wrong
Exam file name: c:\books\test scoring\test_cheat.txt
Number of Items: 6
Statistics for students 001 and 005
Student 001 got 4 items wrong
Students 001 and 005 have 4 wrong answers in common
The mean number of matches is 1.333
The standard deviation is 1.155
The z-score is 2.309 with a probability of 0.0104606677
You can verify that the number of matches, the mean, and the standard deviation of the number of matches are consistent with the sample data in the table presented earlier in this chapter.
For a more realistic test, the test data from the 56-item statistics test was doctored so that two students had similar wrong answers. The two students with modified data were ID 123456789 and ID 987654321. Calling the %Compare_Wrong macro with this information looks like this:
Calling the %Compare_Wrong Macro with IDs 123456789 and 987654321 Identified
%compare_wrong(File=c:\books\test scoring\stat_cheat.txt,
Length_ID=9,
Start=11,
ID1=123456789,
ID2=987654321,
Nitems=56)
Output from the %Compare_Wrong Macro
Exam file name: c:\books\test scoring\stat_cheat.txt
Number of Items: 56
Statistics for students 123456789 and 987654321
Student 123456789 got 11 items wrong
Students 123456789 and 987654321 have 10 wrong answers in common
The mean number of matchs is 2.279
The standard deviation is 1.315
The z-score is 5.872 with a probability of 2.1533318E-9
Results like this would lead you to the conclusion that there was cheating between students 123456789 and 987654321.
How to Detect Cheating: Method Two
An alternative method to detect cheating uses items that the two students in question both got wrong, known as joint-wrongs. The program that utilizes this method is similar to the previous one. You need to first find the set of joint-wrongs and then compute the number of the same wrong answers on this set of items for all students in the class. A program that uses this method is presented next (without detailed explanation):
Program 10.2: Using the Method of Joint-Wrongs to Detect Cheating
*Macro Joint_Wrong is similar to Compare_Wrong except that it uses
joint-wrong items (items that both ID1 and ID2 got wrong) as the basis
for the calculations;
%macro Joint_Wrong
(File=, /*Name of text file containing key and
test data */
Length_ID=, /*Number of bytes in the ID */
Start=, /*Starting column of student answers */
ID1=, /*ID of first student */
ID2=, /*ID of second student */
Nitems= /*Number of items on the test */ );
data ID_one(keep=ID Num_Wrong_One Ans_One1-Ans_One&Nitems
Wrong_One1-Wrong_One&Nitems)
ID_two(keep=ID Num_Wrong_Two Ans_Two1-Ans_Two&Nitems
Wrong_Two1-Wrong_Two&Nitems)
Others(keep=ID Num_wrong Ans1-Ans&Nitems Wrong1-Wrong&Nitems);
/* Data set ID_One contains values for Student 1
Data set ID_Two contains values for Student 2
Data set Others contains data on other students
*/
infile "&File" end=last pad;
/*First record is the answer key*/
array Ans[&Nitems] $ 1;
array Ans_One[&Nitems] $ 1;
array Ans_Two[&Nitems] $ 1;
array Key[&Nitems] $ 1;
array Wrong[&Nitems];
array Wrong_One[&Nitems];
array Wrong_Two[&Nitems];
array Joint[&Nitems];
retain Key1-Key&Nitems;
if _n_ = 1 then input @&Start (Key1-Key&Nitems)($1.);
input @1 ID $&Length_ID..
@&Start (Ans1-Ans&Nitems)($1.);
if ID = "&ID1" then do;
do i = 1 to &Nitems;
Wrong_One[i] = Key[i] ne Ans[i];
Ans_One[i] = Ans[i];
end;
Num_Wrong_One = sum(of Wrong_One1-Wrong_One&Nitems);
output ID_One others;
return;
end;
if ID = "&ID2" then do;
do i = 1 to &Nitems;
Wrong_Two[i] = Key[i] ne Ans[i];
Ans_Two[i] = Ans[i];
end;
Num_Wrong_Two = sum(of Wrong_Two1-Wrong_Two&Nitems);
output ID_Two others;
return;
end;
/*Compute wrong answers for the class, not including ID1 and ID2 */
Num_Wrong = 0;
do i = 1 to &Nitems;
Wrong[i] = Key[i] ne Ans[i];
end;
Num_Wrong = sum(of Wrong1-Wrong&Nitems);
output Others;
run;
*DATA step joint compute item number for the joint-wrongs;
Data ID1ID2;
array Wrong_One[&Nitems];
array Wrong_Two[&Nitems];
array Joint[&Nitems];
set ID_One(keep=Wrong_One1-Wrong_One&Nitems);
Set ID_Two(keep=Wrong_Two1-Wrong_Two&Nitems);
Num_Wrong_Both = 0;
do i = 1 to &Nitems;
Joint[i] = Wrong_One[i] and Wrong_Two[i];
Num_Wrong_Both + Wrong_One[i] and Wrong_Two[i];
end;
drop i;
run;
*DATA step COMPARE counts the number of same wrong answers on joint-wrongs.;
data compare;
if _n_ = 1 then do;
set ID_One(keep=Ans_One1-Ans_One&Nitems);
set ID1ID2;
end;
set others;
array Ans[&Nitems] $ 1;
array Ans_One[&Nitems] $ 1;
array Joint[&Nitems];
Num_Match = 0;
do i = 1 to &Nitems;
if Joint[i] = 1 then Num_Match + Ans[i] eq Ans_One[i];
end;
keep Num_Match ID;
run;
proc sgplot data=compare;
title 'Distribution of the number of matches between';
title2 "Students &ID1, &ID2, and the rest of the class";
title3 "Data file is &File";
histogram Num_Match;
run;
/*
Compute the mean and standard deviation on the number of same
wrong answers as ID1, but eliminate both ID1 and ID2 from the
calculation
*/
proc means data=compare(where=(ID not in ("&ID1" "&ID2")))noprint;
var Num_Match;
output out=Mean_SD mean=Mean_Num_match std=SD_Num_match;
run;
data _null_;
file print;
title1 "Exam file name: &File";
title2 "Number of Items: &Nitems";
title3 "Statistics for students &ID1 and &ID2";
set mean_sd;
set ID_One(Keep=Num_Wrong_One);
set ID_Two(keep=Num_Wrong_Two);
set ID1ID2(Keep=Num_Wrong_Both);
set compare(where=(ID eq "&ID2"));
Diff = Num_Wrong_Both - Mean_Num_Match;
z = Diff / SD_Num_match;
Prob = 1 - probnorm(z);
put // "Student &ID1 has " Num_Wrong_One "items wrong" /
"Student &ID2 has " Num_Wrong_Two "items wrong" /
"Students &ID1 and &ID2 have " Num_Wrong_Both
"wrong answers in common" /
"Students &ID1 and &ID2 have " Num_Match
"items with the same wrong answer" /
73*'-' /
"The mean number of matches is" Mean_Num_Match 6.3 /
"The standard deviation is " SD_Num_match 6.3 /
"The z-score is" z 6.3 " with a probability of " Prob;
run;
proc datasets library=work noprint;
delete ID_One ID_Two Others compare ID1ID2 Mean_SD plot;
quit;
%mend joint_wrong;
The program starts out by creating three data sets. Data sets ID_One and ID_Two contain one observation each with data for Students ID1 and ID2, respectively. Data set Others contains test data for the remaining students. Data set ID1ID2 computes the item numbers for the joint-wrongs. Data set Compare computes the number of the same wrong answers for the remaining students. PROC MEANS is used to compute the mean and standard deviation for the number of the same wrong answers (not including students ID1 and ID2). PROC SGPLOT is used to plot a histogram, and the final DATA _NULL_ step compute a z-score and p-value for the two students.
The following call routine computes statistics for the same test used in the previous example:
Calling the %Joint_Wrong Macro with IDs 123456789 and 987654321 Identified
%joint_wrong(File=c:\books\test scoring\stat_cheat.txt,
Length_ID=9,
Start=11,
ID1=123456789,
ID2=987654321,
Nitems=56)
Output from %Joint_Wrong Macro
Exam file name: c:\books\test scoring\stat_cheat.txt
Number of Items: 56
Statistics for students 123456789 and 987654321
Student 123456789 has 11 items wrong
Student 987654321 has 12 items wrong
Students 123456789 and 987654321 have 10 wrong answers in common
Students 123456789 and 987654321 have 10 items with the same wrong answer
-------------------------------------------------------------
The mean number of matches is 2.243
The standard deviation is 1.262
The z-score is 6.147 with a probability of 3.946199E-10
This method of cheating detection also produces convincing evidence that cheating occurred.
Searching for a Match
The last section in this chapter describes a program to search a file for wrong answer matches that suggest possible cheating. The program is very similar to Program 10.1, except that you supply a single ID and the program compares the wrong answer choices against all the other students in the class. Instead of computing a z-score and p-value for a particular student, this program lists all the students with a number of the same wrong answer choices as Student 1 where the probability is less than or equal to a predetermined threshold value. A listing of the program follows:
Program 10.3: Searching for IDs Where the Number of the Same Wrong Answers is Unlikely
*Macro Search searches a file and computes the number of same
wrong answers as a student identified as ID1. The macro outputs
the z- and p-values for all students with similar wrong answer
choices, with a p-value cutoff determined in the macro call;
%macro search
(File=, /*Name of text file containing key and
test data */
Length_ID=, /*Number of bytes in the ID */
Start=, /*Starting column of student answers */
ID1=, /*ID of first student */
Threshold=.01, /*Probability threshold */
Nitems= /*Number of items on the test */);
***This DATA step finds the item numbers incorrect in the first ID;
data ID_one(keep=ID Num_wrong_One Ans_One1-Ans_One&Nitems
Wrong_One1-Wrong_One&Nitems)
Not_One(keep=ID Num_wrong Ans1-Ans&Nitems Wrong1-Wrong&Nitems);
/* Data set ID_One contains values for Student 1
Data set Not_one contains data on other students
Arrays with "one" in the variable names are data from
ID1.
*/
infile "&File" end=last pad;
retain Key1-Key&&Nitems;
/*First record is the answer key*/
array Ans[&Nitems] $ 1;
array Ans_One[&Nitems] $ 1;
array Key[&Nitems] $ 1;
array Wrong[&Nitems];
array Wrong_One[&Nitems];
if _n_ = 1 then input @&Start (Key1-Key&Nitems)($1.);
input @1 ID $&Length_ID..
@&Start (Ans1-Ans&Nitems)($1.);
if ID = "&ID1" then do;
do i = 1 to &Nitems;
Wrong_One[i] = Key[i] ne Ans[i];
Ans_One[i] = Ans[i];
end;
Num_Wrong_One = sum(of Wrong_One1-Wrong_One&Nitems);
output ID_one;
return;
end;
do i = 1 to &Nitems;
Wrong[i] = Key[i] ne Ans[i];
end;
Num_Wrong = sum(of Wrong1-Wrong&Nitems);
drop i;
output Not_One;
run;
data compare;
array Ans[&Nitems] $ 1;
array Wrong[&Nitems];
array Wrong_One[&Nitems];
array Ans_One[&Nitems] $ 1;
set Not_One;
if _n_ = 1 then set ID_One(drop=ID);
* if ID = "&ID" then delete;
***Compute # matches on set of wrong answers;
Num_Match = 0;
do i = 1 to &Nitems;
if Wrong_One[i] = 1 then Num_Match + Ans[i] eq Ans_One[i];
end;
keep ID Num_Match Num_Wrong_One;
run;
proc means data=compare(where=(ID ne "&ID1")) noprint;
var Num_Match;
output out=means(drop=_type_ _freq_) mean=Mean_match std=Sd_match;
run;
title 'Distribution of the number of matches between';
title2 "Student &ID1 and the rest of the class";
title3 "Data file is &File";
proc sgplot data=compare;
histogram Num_Match / binwidth=1;
run;
data _null_;
file print;
title "Statistics for student &ID1";
if _n_ = 1 then set means;
set compare;
z = (Num_Match - Mean_match) / Sd_match;
Prob = 1 - probnorm(z);
if Prob < &Threshold then
put /
"ID = " ID "had " @20 Num_Match " wrong answers compare, "
"Prob = " Prob;
run;
proc datasets library=work noprint;
delete compare ID_One means Not_One;
quit;
%mend search;
The logic of this program is similar to Program 10.1, except that, in this case, you want to compute the z- and p-values for every student in the class and list those students where the p-value is below the threshold.
Let's first run this macro on the test_cheat.txt file. The macro call looks like this:
Calling the Macro to Detect Possible Cheaters in the test_cheat.txt File
%search(File=c:\books\test scoring\test_cheat.txt,
Length_ID=3,
Start=4,
ID1=001,
Threshold=.6,
Nitems=6)
This calling sequence produces the following output:
Output from Program 10.8
Statistics for student 001
ID = 003 had 2 wrong answers compare, Prob = 0.5
ID = 004 had 2 wrong answers compare, Prob = 0.5
ID = 005 had 4 wrong answers compare, Prob = 0.110335681
In normal practice, you would set the threshold value quite small. To see a more realistic example, let's run the macro on the stat_cheat.txt file like this:
Running the Search Macro on the stat_cheat.txt File
%search(File=c:\books\test scoring\stat_cheat.txt,
Length_ID=9,
Start=11,
ID1=123456789,
Threshold=.01,
Nitems=56)
Here are the results:
Running the Search Macro on the stat_cheat.txt File with a Threshold of .01
Statistics for student 123456789
ID = 987654321 had 10 wrong answers compare, Prob = 8.6804384E-8
ID = 957897193 had 7 wrong answers compare, Prob = 0.0007360208
ID = 605568642 had 6 wrong answers compare, Prob = 0.0062390769
ID = 700024487 had 6 wrong answers compare, Prob = 0.0062390769
The only highly significant match is with student 987654321. You may wonder why the probability for student 987654321 is larger than the probability computed by the first method described in this chapter. The reason is that in the first method, the mean and standard deviation for the number of the same wrong answers as Student 1 are computed with both Students 1 and 2 removed from the calculation. In the search program, Student 2 is included in the computation. Since Students 1 and 2 have so many of the same wrong answers, leaving Student 2 in the calculation inflates the mean and standard deviation. We suggest that if the search program identifies a possible occurrence of cheating, you should then run either the first program (Compare_Wrong) or the second program (Joint_Wrong) with the two students in question.
Conclusion
This chapter presented three programs (Compare_Wrong, Joint_Wrong, and Search) that you can use to investigate possible cheating on multiple-choice exams. It must be emphasized that great care needs to be taken when using these programs since the consequences are so important to the students in question and to the institution that is conducting the test.
References
Cody, Ron. 1985. “Statistical Analysis of Examinations to Detect Cheating.” Journal of Medical Education Feb 60 (2) 136-137