Paired similarity survey codebook

The location data was collected at was https://openpsychometrics.org/tests/characters/. Users found the link through the front page of the host website, Google searches, or from the link being shared on social media.

In the main body of the quiz users rated themselves between a series of bipolar adjective pairs using a slider that ranged from 1 to 100. The user's slider value was also displayed as percentages for each pole which would update when the slider was moved. The screenshot below shows how this looked to the user.

After this the user was asked if they would be willing to rate some characters for research purposes. If they said yes, they were asked to select from a list any fictional universes they knew about.

Characters from the selected universes were paired together and the user was asked to rate how similar that pair was in terms of their personalities.

Fields

universes_selected / universes_unselected Two lists: one of the universes the user selected indicating that they could answer questions about characters, the other list has the remaining unselected universes. The number of items shown to a user that they could select from varied in size over time.
age_group The user was prompted to enter their age into a text box. They were categorized into the following age groups.
(13-19) = 1
(20-27) = 2
(28-80) = 3
If the age could not be parsed as an integer or was not in one of these ranges, that record is not included.
engNat "Is English your native language?" 1=Yes, 2=No (0=unanswered)
gender "What is your gender?" 1=Male, 2=Female, 3=Other (0=unanswered)
network_country ISO country code of the country where the network the user connected from is, inferred from technical information not asked as a question. Very rarely this data is wrong, but often it is misleading (e.g. tourists - common) or proxied traffic (probably rare). Cross reference with engNat, but still be mindful that e.g. a non-native english speaker connecting from Russia could be a German tourist etc.. Set to "" if less than 100 in dataset.
network_region ISO country code for the next highest subdivision of network_country (in USA this is states). Set to "" if less than 100 in dataset.
occupation "What is your current occupation or way of life (e.g. 'truck driver', 'lawyer', 'student', 'homemaker', 'unemployed', 'travelling', 'imprisoned', etc.)?" capitalized. Entries with less than 100 matching removed.
screen The size of the user's screen. 1 if both dimensions of the screen are larger than 700 pixels (most laptops and desktops), 2 otherwise (most phones)
year Year the user started the test
introelapse Time the user spent on the introduction page in seconds
testelapse Time the user spent on the main body of the self report quiz in seconds
endelapse Time the user spent on the page after the quiz where they were asked if they would volunteer to do the survey
surveyelapse Time the user spent on the optional character rating survey
quiz_items JSON array of arrays. Each element is the answer to a single question in the main self report quiz.
[0] -> the position of that question in the quiz (question order was shuffled for each user).
[1] -> the question ID, sey key for the text of each item
[2] -> the user's response (originally 1 - 100 scale, but rounded to nearest 10 here for privacy protection)
[3] -> time between question load and answer in milliseconds
Answers were not included in the dataset if the answer was a skip, or the answer was done in less than 1000ms. Also, for user protection the last 20% of answers were removed
survey_ratings JSON array of arrays. Each element corresponds to the users reaction to one character pair.
[0] -> first character ID
[1] -> second character ID
[2] -> the user's rating. 1=Extremely similar, 2=Similar, 3=Slightly similar, 4=Slightly different, 5=Different, 6=Extremely different
[3] -> the time in ms the user took to answer
Responses made in less than 1000ms as well as skipped ones removed. Records only retained in dataset if > 1 valid rating.
surveyconfig Which pool of characters the user was asked to rate from. See the pools in /resources/characters_pairsimilarity.php
quiz_form Which set of self report items the user answered, see /resources/quiz_forms.js

The code used to field the survey is in /resources/characters_pairsimilarity.php, it changed over time but this file in generally representative.