This project is about how statistics can improve fairness in a high-stake education test, taken by millions every year in Brazil.
Enem is an exam similar to SAT, that defines since 2009 who will have access to one of the few spots in the best universities in the country. Having these courses is a life changing opportunity, for both wealthy and poor students. It means also reaching a college degree for free, since Brazilian public universities does not charge tuition and fees.
With so much at stake, security is something critical. Very often, federal administration and the police have been informing that people were caught trying to cheat in the exam, specially using radio system to receive answers from people out of the rooms along the test.
But how efficient is this security approach? Society was not sure.
In order to answer this question, we applied statistical models, used by academic studies, that show how unlikely are two or more set of answers. It matters how close are the right answers, but mainly how close are the wrong ones, since there are four incorrect options for each one of the 180 tests in every annual test. We analyzed more than 3 million of exams.
After three months of exploration, we labeled 1,125 tests as probably frauded, between 2011 and 2016. It is almost 80 times higher than was officially reported by Inep (federal agency that runs the exam).
In our model, a pair of exams were tagged as a probable fraud if their similarity by chance was below 1 in 1,000.
In our model, a pair of exams were tagged as a probable fraud if the probability of the similarity by chance was less than 1 one in a 1,000.
In our model, a test was said that had a high probability of fraud when the chance of two tests having the observed similarity was less than one in a thousand.
In order to have more reliability in our statistical study, we shared our code and all findings with Inep and the Federal Police, who is also in charge of the exam security. They made no contestation.
We also resorted the classical shoe leather journalism to find out some people responsible for these suspicious answers, trying to better understand the schemes and add more journalistic value to the project.
The dataset analyzed has anonymized informations, but we ended finding some names crossing different datasets. We then reached one of these suspects, who provided multiple and nonsense answers to our questions –which gave us even more confidence in our findings.
After our project was released, the federal agency announced new security steps to the exam. And said will pay more attention in statistics as a tool to prevent and find frauds.
This project was published in Folha de São Paulo, one of the largests newspaper in Brazil, for the general public.
What makes this project innovative?
Statistics techniques have not been largely used in newsrooms around the world in order to bring official exams fairer and more accountable, as we implemented in this project. Also, it is even rare to find journalistic content bringing robust statistical models applied to a large database.
What was the impact of your project? How did you measure it?
This project had an above average audience and retention time at Folha de S. Paulo (one of the main media outlets in Brazil). In the printed issue it was the main headline on the front page. Also, additional security steps were taken by the federal agency after the publication, such as new metal detectors in the exams sites. This agency also announced that will be take more in account statistical models in order to prevent and detect frauds in the exam.
Source and methodology
We download the data from each student that took the Enem exam from 2012 to 2016 from Inep (federal agency responsible for Enem) website. This data contains the answers given for each item and information about the students (city where they live, where they took the test, age, etc.). As we were only interested in high grades tests, the ones that would allow a place in a public university, we removed tests that were not in the top 10% best grades. From here we calculated two indexes that shows how unlikely the similarity between pair of tests were. We first calculate the Angoff Index which shows the likelihood of similarity between responses given the relation between matches in the wrong and right answers. This allowed us to select pairs of suspicious tests. For these pairs we calculate the M4 index, a slower but more statistic robust test, that adds to the calculation information about the students' abilities and information about the difficulties of each questions. A pair of test was said that had a high probability of being fraudulent when chance of them having the observed similarity by the M4 index was less than one in a thousand. The statistical model considers that the examination is done by millions of students each year, so this threshold is after we have made the necessary corrections. Comparing all the suspicious tests between them we were able to identify clusters of three more related tests. Looking at the data from where the tests were done we noticed that these clusters occurred geographically close, which is expected, as the gangs that apply these scams publicize their services by word of mouth. Another corroboration of our analysis was that some cities where we found suspicious tests were ones that had already undergone operations of the Federal Police evolving Enem frauds. Finally, it was shown that nearby cities in the northeast of the country had clusters of suspicious tests for three consecutive last years, this shows that a scheme of fraud in the region may had occurred during this period without being disturbed by the authorities.
All analyzes and data formatting were done in R.
Fábio Takahashi, Mariana Zylberkan and Paulo Saldaña