The location data was collected at was https://openpsychometrics.org/tests/characters/. Users found the link through the front page of the host website, Google searches, or from the link being shared on social media.
In the main body of the quiz users rated themselves between a series of bipolar adjective pairs using a slider that ranged from 1 to 100. The user's slider value was also displayed as percentages for each pole which would update when the slider was moved. The screenshot below shows how this looked to the user.
After this the user was asked if they would be willing to rate some characters for research purposes. If they said yes, they were asked to select from a list any fictional universes they knew about.
Characters from the selected universes were paired together and the user was asked to rate how similar that pair was in terms of their personalities.
universes_selected / universes_unselected | Two lists: one of the universes the user selected indicating that they could answer questions about characters, the other list has the remaining unselected universes. The number of items shown to a user that they could select from varied in size over time. |
age_group | The user was prompted to enter their age into a text box. They were categorized into the following age groups. (13-19) = 1 (20-27) = 2 (28-80) = 3 If the age could not be parsed as an integer or was not in one of these ranges, that record is not included. |
engNat | "Is English your native language?" 1=Yes, 2=No (0=unanswered) |
gender | "What is your gender?" 1=Male, 2=Female, 3=Other (0=unanswered) |
network_country | ISO country code of the country where the network the user connected from is, inferred from technical information not asked as a question. Very rarely this data is wrong, but often it is misleading (e.g. tourists - common) or proxied traffic (probably rare). Cross reference with engNat, but still be mindful that e.g. a non-native english speaker connecting from Russia could be a German tourist etc.. Set to "" if less than 100 in dataset. |
network_region | ISO country code for the next highest subdivision of network_country (in USA this is states). Set to "" if less than 100 in dataset. |
occupation | "What is your current occupation or way of life (e.g. 'truck driver', 'lawyer', 'student', 'homemaker', 'unemployed', 'travelling', 'imprisoned', etc.)?" capitalized. Entries with less than 100 matching removed. |
screen | The size of the user's screen. 1 if both dimensions of the screen are larger than 700 pixels (most laptops and desktops), 2 otherwise (most phones) |
year | Year the user started the test |
introelapse | Time the user spent on the introduction page in seconds |
testelapse | Time the user spent on the main body of the self report quiz in seconds |
endelapse | Time the user spent on the page after the quiz where they were asked if they would volunteer to do the survey |
surveyelapse | Time the user spent on the optional character rating survey |
quiz_items | JSON array of arrays. Each element is the answer to a single question in the main self report quiz. [0] -> the position of that question in the quiz (question order was shuffled for each user). [1] -> the question ID, sey key for the text of each item [2] -> the user's response (originally 1 - 100 scale, but rounded to nearest 10 here for privacy protection) [3] -> time between question load and answer in milliseconds Answers were not included in the dataset if the answer was a skip, or the answer was done in less than 1000ms. Also, for user protection the last 20% of answers were removed |
survey_ratings | JSON array of arrays. Each element corresponds to the users reaction to one character pair. [0] -> first character ID [1] -> second character ID [2] -> the user's rating. 1=Extremely similar, 2=Similar, 3=Slightly similar, 4=Slightly different, 5=Different, 6=Extremely different [3] -> the time in ms the user took to answer Responses made in less than 1000ms as well as skipped ones removed. Records only retained in dataset if > 1 valid rating. |
surveyconfig | Which pool of characters the user was asked to rate from. See the pools in /resources/characters_pairsimilarity.php |
quiz_form | Which set of self report items the user answered, see /resources/quiz_forms.js |
The code used to field the survey is in /resources/characters_pairsimilarity.php, it changed over time but this file in generally representative.