Qualtrics-based surveys

Diary

Fri Dec 14 - mjm61 (Madeline McKerchar) and tl136 (Tim Love) met to assess what can be done to improve the survey and reduce work-load. We decided that we could produce a new system for Mich 2019, muddling through the rest of this academic year.
- Madeline is prepared to produce surveys for Lent and Easter in the same way that she produced the Mich surveys.
- Madeline can produce per-course results. Overview output is required on 01.07.19 by committees. This is currently too onerous to prepare in the time available. Tim said that by then he'll automate production of such output.
- The new system will automate survey production and processing as much as possible. Now would be a good time to re-assess the surveys. In particular -
  - Are the questions asked appropriate? If the answers aren't looked at, need the questions be asked?
  - What does the TO need the survey for? How should the results be processed to fulfill these aims?
  - To what extent should surveys can be customised by lecturers? There’s a balance between ensuring that the data is useful for lecturers and ensuring that the modules can be compared via the aggregate report. Customisation will also be more labour intensive.
  - How much should the survey design and processing be designed to benefit students wanting to know about the courses and lecturers?
  - Can small changes be made to the design that would greatly simplify automated survey generation and processing?
- Madeline will make Tim a collaborator and/or send him samples of the survey and output. Tim will try to replicate (using programs) some of the sample output in the hope of automating this for the end of Lent.
- There's an SSJC meeting on 28th Jan
Thu Dec 20 - I (tl136) have looked at the task. The information for the 1A survey is in 2 files that I've refer to as the Survey file and the Replies file
- Survey file - this *.qsf file is a complete definition of the survey (actually it can be loaded in to produce a survey). It's a JSON file - everything in it has a label, and units are nested/grouped inside other units.
- Replies file - this CSV file has 3 rows of headings then a row for each student's submission. The first 17 columns of each submission contain info about the user, time of submission, etc, and the others contain the replies. The file doesn't contain all the info about the survey - it lacks some text and structuring. For example I can't determine which columns are to do with coursework.
I can get data from the Replies file (for example, I've retrieved all the data and comments for the 1A Python course) but it involves some guesswork - the "hours taken" data is in column 392, etc. So what I'll need to do to gather the Mich Coursework data is as follows -
- Find the unit in the survey that lists the coursework - it's identified by having 'QuestionText' equal to 'Which Michaelmas Term coursework activities would you like to give feedback on (please tick all that apply)?'. This unit contains the IDs of each of the coursework units.
- Those IDs are in the column headings of the CSV file, so I can read the appropriate data for each piece of coursework
I can use the same approach to get info on the other coursework, the lectures and the general questions. The loop facility and Matrix Table questions complicate matters, and the general questions are in several formats each requiring different treatment, but for the most part it's a matter of making lists and following cross-references. I can use the manually-produced output to check the correctness of my code.
Note that a simple question tag like "Q2" can have digits before or after it ("Q2_3" for example), or indicate the type of data (e.g. "Q2_TEXT"). The labels associated with a question may be in a different unit to the question, necessitating more cross-referencing.
Mon Dec 31 - I (tl136) have produced python code to process the 1st year survey replies. The code is in ~tl136/qualtricsdir on the central linux system - see the README there for details. Output is currently PDF (via LaTeX). I've added some graphs and summary pages to show what can be done.
I'll work out to deal with each question type as required. Questions have a type (e.g. "Matrix") and various other features, some of which control the behaviour of the survey (e.g. "ForceReplyType", which I don't care about) and some determine the sub-type (e.g. "SubSelector":"SingleAnswer").
Fri Mar 15 - Time for a summary of work done during the Mich term. The code I (tl136) did earlier to process Mich results needed teaking to cope with Lent's replies, but I got the "half term" IDP info out ok. I've not written a general processor for all possible qualtrics surveys, but I think I can complete the programs for internal use without too much trouble. The plan is for mjm61 to process the surveys once Lent is over, and for tl136 to produce equivalent output, using mjm61's output to check accuracy. Then tl136's programs will be used to produce output at the end of the academic year.
I'm meeting mjm61 on 25th March to agree on what output's needed - PDF? HTML? per-course output? year-overviews? Will results be made available to students, comments included?
I've spent a fair bit of time looking at automated survey production, a topic that deserves a separate section on this page.

Survey production

There are 3 options for automating the production of qualtrics surveys -

Use a "Simple" text file to create the survey. The file has to be written in a format described online, a format which is simple for humans to write using Word, etc. Here for example is how a multiple choice question is created.
```
1. This is a multiple choice question. Every question starts with a
number followed by a period. There should be a blank line between
the question text and the choices.

a
b
c
d
```
Online is an example file with several questions. If you log into the qualtrics site and import this file, you'll get a survey. It's easy to create files like this from a program that adds CUED-specific information.

Alas, only simple Multiple Choice and Matrix questions can be used, so it's of limited use to us.
Use a "Advanced Format" text file to create the survey. These files are harder for humans to understand than "Simple" files are, but more facilities (described online) are available. Here for example is how a matrix question is created.
```
[[Question:Matrix]]
This question is a matrix question.

It has lots of question text on multiple lines and uses
advanced answers.

[[Choices]]
statement a
statement b
statement c
[[AdvancedAnswers]]
[[Answer]]
answer 1
[[Answer]]
answer 2
[[Answer]]
answer 3
```
Like "Simple" files, it's easy to create files like this from a program that adds CUED-specific information. Also like "Simple" files, many features are unavailable, which is why Qualtrics doesn't offer the option of Exporting a survey in this format - surveys (CUED's in particular) are likely to use features that can't currently be described in "Advanced Format" files.
Use QSF file. These are text files, but they're much harder for humans to read (the 1ASurvey's QSF file's size is 110k, which itself is a challenge).To download a QSF file via the web interface
- Log in. Choose survey
- Choose Survey -> Tools -> Import/Export -> Export Survey
These files can express all a survey's features. They can be imported and exported. It's possible to create QSF files from a program that adds CUED-specific information, but there are several difficulties. See below.

About QSF files

QSF files are JSON files - structured text. I can't find full details of the format. Quickstart Guide to understanding the Qualtrics Survey File is a community attempt to write a description of the format. However, it begins with "This information is likely to quickly become outdated when Qualtrics next changes the formatting of the QSF file." On another page describing some Qualtrics survey-producing software it now says - "We are sad to announce that we have decided to retire the QRTEngine and support. The continuing and often unpredictable developments in the Qualtrics environment make it increasingly difficult to provide a stable QRTE platform."

The QSF file downloaded from the GUI is quite different from what you get when downloading using the API. This is mentioned on https://www.qualtrics.com/community/discussion/2708/exporting-survey-details-to-txt-or-qsf-format where it says re getting the formats to match that - "the Qualtrics team has mentioned they are going to consider this for a feature request (November 2018)".

By reverse engineering I've been able to work out many format details. Each question type (there are 17) requires a new investigation. Details for a multiple choice question are 5 layers down in the structure, and there are traps for the unwary - e.g. the order of multiple choice questions aren't necessarily presented in the same order in the survey as the appear in the QSF file; there's a ChoiceOrder variable that controls the order.

Such files aren't complex as such, but they're cumbersome to deal with, and anything beyond simple 1-to-1 representations is a challenge. For example, I still haven't got to the bottom of how the "Loop and Merge" feature is represented.

I'm writing a inspect.py which given a QSF file will try to describe its content. It's not ready yet but to give you a flavour of the format, part of the file (starting a few layers down) is on the right.

The BlockElements of a payload with the Description of a section contains the Question IDs of the constituent questions. Note that a question may involve variations of the basic id: Q23, Q23_1, Q23_2, Q23_TEXT, QID23, etc

That said, it's easy to read a QSF file into a program, make major adjustments to the details, and create a new survey, so I won't produce a QSF file from scratch. I'll export a survey to create a QSF file that I'll use as a basis to create variants.

There's a surveyhelper python package available, but rather like the QuickStart guide mentioned above, it's incomplete. While I'm writing code for our needs I'm tempted to contribute to the online attempt to deal with this issue. I've already written code that I think others would find useful.

Survey Flow - always appears once in the QSF. Its payload contains a list of blocks ordered according to the flow of the survey.

Survey Questions - Each question contains a PrimaryElement which is its Question ID, a SecondaryElement which is the first 99 characters of the Question Text, and a Payload with the details of the question.

The PrimaryAttribute is the same as the QuestionID for questions. It is always "QID" followed by some numbers. The numbers usually go in the order of the creation of questions as the survey was made. These QIDs are distinct from the DataExportTags which users can customize in the Qualtrics interface. QIDs cannot be changed.

Survey processing

To download a CSV file via the web interface,

Log in. Choose survey
Choose Data & Analysis -> Data
Choose "Export & Import" on the right
Choose Export CSV

You'll get a zip file. Unzipped, ours is typically 1M.

The CSV file contains all the replies and much about the questions, though structural information is lacking. The first 17 columns of the replies contain meta information - times and locations of replies, etc.

Row 1 has row headers. Row 2 has text related to the contents of the column (e.g. the text of the question). Row 3 contains info to help cross-reference the information. Each subsequent row contains a reply. Note in the CSV file that columns that begin like the following are possible

Q52
Any comments on the examples papers:
{"ImportId":"QID86_TEXT"}

Usually the 1st and 3rd row match as in

Q83
How can we improve the coursework/labs?
{"ImportId":"QID83_TEXT"}

The format partly depends on the options used to produce the CSV file.

If you select Split multi-value fields into columns, each choice will include its own column in the dataset. On each participant’s row, there will be a “1” in the columns of the choices they selected. Q1_1, Q1_2 ... with 1 or blank in cel ls
If you deselect Split multi-value fields into columns, your question will have only one column in your data set, and the values of the selected choices will be comma-separated beneath it.

For single select variations of the Likert Matrix Table, where a participant can choose one answer per row on your Matrix Table, you’ll find one column in your data set for each row in your Matrix Table. Q1_1, Q1_2

Each choice in your question has a “coded value”, or number associated with the choice. The number for the selected choice is displayed in each participant’s row.

I've written some python programs to produce reports for surveys. They're somwhat ad-hoc and non-general, but may be of use to others so I'll release them in time. As an example of my approach here's what I do to gather the Michelmas Coursework data -

I find the unit in the QSF file that lists the coursework - it's identified by having 'QuestionText' equal to 'Which Michaelmas Term coursework activities would you like to give feedback on (please tick all that apply)?'. This unit contains the IDs of each of the coursework units.
Those IDs are in the column headings of the CSV file, so I read the appropriate data for each piece of coursework from the CSV using routines to sum columns of CSV files, work out averages, etc.
I produce LaTeX files using the pgfplots to produce reports with graphics.

Loop and Merge

This survey-creating facility complicates the QSF file. With a question like the one on the right, associated with a table, 3 multiple choice questions appear in the final survey - "pick a hat colourone", "pick a hat colourtwo", and "pick a hat colourthree". The text is an example of "Piped Text" which can contain various types of dynamic information - see the cascading menus above. By picking the "Field 1: one, two, three" option the ${lm://Field/1} code is automatically inserted into the text field.

In the QSF file, the question appears in a Survey Element (the 0th in this example) with a Element value of "BL" (Block?). The Payload of this element contains the QuestionID (in this case QID3) and looping information. In this case the information is Static (i.e. the same for all users at all times) and the table of information to be looped over is displayed. The Static entry has 3 items (matching the 3 rows of data)

In this example, the 9th Survey Element has the Primary Attribute QID3. The Element type is "SQ" (Sequence?). The Payload contains the question text and the DataExportTag (used in the CSV file?)

The resulting CSV file includes the information in the table below when a reply with answers"white", "blue", "red", was submitted -

row 1	one - pick a hat colour[Field-1]	two - pick a hat colour[Field-1]	three - pick a hat colour[Field-1]
row 2	1_Q1	2_Q1	3_Q1
row 3	"ImportId""1_QID3""}"	"{""ImportId"":""2_QID3""}"	"{""ImportId"":""3_QID3""}"
row 4	white	blue	red

From this you can see that row 1 shows first the value in "Field-1" used for that question - i.e. the question referred to in the 1st column here is "pick a hat colourone". Note also how the looped questions are numbered, prefixing the base tage by "number_".

Downloading using the API

This is the draft of a program that works for us, downloading the CSV and JSON files for a named list of projects. It's a merge of 2 programs from the WWW

import requests
import zipfile
import json
import urllib.request
import sys
import io,os
import shutil

import time

# secret token - Go to Account Settings in the user dropdown, then go to Qualtrics IDs
def usage():
   print ("2 args required. You supplied",len(sys.argv)-1) 
   print (sys.argv[0], "project_names (comma sep)","targetdir")
   # E.g. [program name] "4M1: French, 4C8: Vehicle Dynamics" 4Lent2019      


# List of Argument taken from the command line

if len(sys.argv) <3:
   usage()
   sys.exit()

project_names = sys.argv[1]

lst = []
for i in project_names.split(','):
    lst.append(i.strip())

target_dir = sys.argv[2]
base_url = "https://cambridge.eu.qualtrics.com"
try:
    secret_token = os.environ['X_API_TOKEN']
except KeyError:
    print("set environment variable X_API_TOKEN")
    sys.exit(2) 

# (1) Getting a list of ID

def get_project_id(names):
    '''The function takes a list of project names and return a list of IDs
    '''
    report_dic = {}
    url = base_url + '/API/v3/surveys/'
    header = {'X-API-TOKEN': secret_token}

    # (1) generating the request object
    req = urllib.request.Request(url,None,header)

    # (2) Make request
    response = urllib.request.urlopen(req)
    data = json.load(response)
    # (3) Find Id for each project
    for target_name in names:
        print (target_name)
        for i in data['result']['elements']:
             if i['name'] == target_name:
                report_dic[target_name] = i['id']

    return report_dic

# (2) Get Metadata

# https://www.qualtrics.com/community/discussion/2708/exporting-survey-details-to-txt-or-qsf-format
# I reviewed the output from QSF and the api response and other than the flow logic - which I'd prefer to have for an import - the field names and structure also seems to be different. "SurveyName" in qsf v/s 'name' in Json.

# At the moment, this is not quite sufficient for my needs, but the Qualtrics team has mentioned they are going to consider this for a feature request.
# November 2018



def get_survey_metadata(report_dic, target_dir):
    '''Takes survey ID and create json file in a specified directory'''

    for k, v in report_dic.items():

        url = "https://cambridge.eu.qualtrics.com/API/v3" + '/surveys/' + v
        header = {'X-API-TOKEN': secret_token}

        req = urllib.request.Request(url,None,header)
        response = urllib.request.urlopen(req)

        data = json.load(response)
#        pretty = json.dumps(data, sort_keys=False, indent=4)
        pretty = json.dumps(data, sort_keys=True, indent=4)
        file = open('./' + target_dir + '/' + k + '_meta.json', 'w')
        file.write(pretty)
        print('Metadata File for %s Generated!' % (k))


def create_dir(target_dir):
    direc = "./" + target_dir

    if not os.path.exists(direc):
        os.makedirs(direc)
        print('New directory %s has been created' % (target_dir))
    else:
        shutil.rmtree(direc)
        os.makedirs(direc)
        print('New directory %s has been created' % (target_dir))

# (3) Exporting reports

fileFormat = "csv"
headers = {"content-type": "application/json","x-api-token": secret_token}

def bulk_exports(report_dic):
    '''This function takes a list of ids and create data export'''

    for key, val in report_dic.items():
        # Step 1: Creating Data Export
        print(key, val)
        requestCheckProgress = 0.0
        progressStatus = "inProgress"
        baseUrl = "https://cambridge.eu.qualtrics.com/API/v3/surveys/{0}/export-responses/".format(val)
        downloadRequestUrl = baseUrl
        downloadRequestPayload = '{"format":"' + fileFormat + '"}'
        downloadRequestResponse = requests.request("POST", downloadRequestUrl, \
        data=downloadRequestPayload, headers=headers)
        progressId = downloadRequestResponse.json()["result"]["progressId"]
        while progressStatus != "complete" and progressStatus != "failed":
           print ("progressStatus=", progressStatus)
           requestCheckUrl = baseUrl + progressId
           requestCheckResponse = requests.request("GET", requestCheckUrl, headers=headers)
           requestCheckProgress = requestCheckResponse.json()["result"]["percentComplete"]
           print("Download is " + str(requestCheckProgress) + " complete")
           progressStatus = requestCheckResponse.json()["result"]["status"]
        if progressStatus is "failed":
             raise Exception("export failed")
        fileId = requestCheckResponse.json()["result"]["fileId"]

        # Step 3: Downloading file
        requestDownloadUrl = baseUrl + fileId + '/file'
        requestDownload = requests.request("GET", requestDownloadUrl, headers=headers, stream=True)
        zipfile.ZipFile(io.BytesIO(requestDownload.content)).extractall(target_dir)  

   
        for filename in os.listdir(target_dir):
           print(filename,"created")
    
# (1) get the dictionary with report name and report id to export
reports = get_project_id(lst)
print(reports)

create_dir(target_dir)

bulk_exports(reports)

get_survey_metadata(reports, target_dir)

Inspecting QSF Files

This is the draft of a program that inspects QSF files downloaded using the API.

import json
import csv
import sys

questiontype={"TE":"Text entry","MC":"Multiple choice","DB":"Descriptive Block","RO":"Rank order","Matrix":"Matrix","DD":"DD??"}

selectortype={"SAHR":"single answer horizontal","MAVR":"multiple answer vertical","MAHR":"multiple answer horizontal","SAVR":"single answer vertical","MSB":"Multiple Choice Select Box","MACOL":"Multiple Answer Column","SACOL":"Single Answer Column","DL":"Dropdown List","SB":"Select Box","Bipolar":"Bipolar","Likert":"Likert","SL":"Slider?","ESTB":"ESTB?","TB":"Text box?","ML":"Merge Loop???" }

subselectortype={"SingleAnswer":"Single Answer","DL":"???", "TX":"Text?"}

blocktype={"BL":"Block","SQ":"Sequence","FL":"Survey Flow","SO":"Survey Options","SCO":"Scoring","PROJ":"PROJ???","STAT":"Survey Statistic","QC":"Question Count","RS":"Response Set"}

def investigateAPIsurveyquestion(blockid,qidstring,question,qtoblockmapping,loopinfo,i=None):
    '''
    Deal with one question. Return True if at end of loop question
    '''
    returnvalue=True
    field1=None
    field2=None
    field3=None
    if 'questionText' in question.keys():
        pass
    else:
        return
    questiontext=question['questionText']
    if type(loopinfo) == dict and loopinfo!={} and type(qtoblockmapping) == dict and type(qtoblockmapping[qidstring])==str and blockid in loopinfo.keys():
          loop=loopinfo[blockid]
          if 'field1' in loop['columnNames']:
             field1=loop['columnNames']['field1']
          if 'field2' in loop['columnNames']:
             field2=loop['columnNames']['field2']
          if 'field3' in loop['columnNames']:
             field3=loop['columnNames']['field3']
          if field1!=None and i!=None and len(field1)>i and field1[i]!=None:
             questiontext=questiontext.replace("${lm://Field/1}",field1[i])
             returnvalue=False
          if field2!=None and i!=None and len(field2)>i and field2[i]!=None:
             questiontext=questiontext.replace("${lm://Field/2}",field2[i])
             returnvalue=False
          if field3!=None and i!=None and len(field3)>i and field3[i]!=None:
             questiontext=questiontext.replace("${lm://Field/3}",field3[i])
             returnvalue=False
          if returnvalue==True:
             return returnvalue
    else:
         print(qidstring,"is not in a block?")
    if i==None:
       print(qidstring,"questionText=",questiontext)
       xrefqidstring=qidstring
    else:
       print(qidstring,"questionText=",questiontext)
       xrefqidstring=str(i+1)+"_"+qidstring
    thetype=question['questionType']['type']
    print("questionType=",thetype,"({})".format(questiontype[thetype]))
    if 'choices' in question.keys() and question['choices']!=None:
       print("choices =",end="")
       for i,v in question['choices'].items():
           print("\"{}\"".format(v['choiceText']),end=" ")
       print("")
    if  'subQuestions' in question.keys() and question['subQuestions']!=None:
       print("subQuestions -")
       for i,v in question['subQuestions'].items():
           print("*",v['choiceText'])
       tmp=1
       print("ImportId=",end=" ")
       for i,v in question['subQuestions'].items():
          print(xrefqidstring+"_"+str(tmp),end=", ")       
          tmp=tmp+1
       print("")
    else:
       if thetype=="TE":
          print("ImportIdx="+xrefqidstring+"_TEXT")
       else:
          if thetype!="DB":
             print("ImportIdx=",xrefqidstring)
    return returnvalue

def processaloopblock(x,blockid,listofquestions,qtoblockmapping,loopinfo):
    '''
    Loop blocks require special treatment
    '''
    i=0
    endofloop=False    
    #for i in range(0,10): #10 is arbitrary 
    while endofloop==False:
        for q in listofquestions:
           if 'questionId' in q.keys():
              endofloop=investigateAPIsurveyquestion(blockid,q['questionId'],x['result']['questions'][q['questionId']],qtoblockmapping,loopinfo,i)
        i=i+1

def investigateAPIblock(blockid,x,loopinfo,qtoblockmapping):
           ''' 
           Process the block
           '''
           v=x['result']['blocks'][blockid]
           print("***",v['description'])
           if blockid in loopinfo.keys():
              #print("loop question")
              processaloopblock(x,blockid,v['elements'],qtoblockmapping,loopinfo)
           i=0
           for l in v['elements']:
              if 'questionId' in l.keys():                 
                 qtoblockmapping[l['questionId']]=blockid
                 q=l['questionId']
                 investigateAPIsurveyquestion(blockid,q,x['result']['questions'][q],qtoblockmapping,loopinfo)
                 print("")
              else:
                 print(l['type'])
              i=i+1


def processAPIfile(x):
       '''
       Read the main blocks, collecting data for later
       '''
       qtoblockmapping=dict()
       print ("This file has been downloaded via the API.")
       theformat="APIdownload"
       #print("*** result")
       print(x['result']['name'],"created on",x['result']['creationDate'])
       #print("*** flow -")
       flowIDs=[]
       for tmp in x['result']['flow']:
          flowIDs.append(tmp['id'])    
       loopinfo=x['result']['loopAndMerge']
       for k,v in x['result']['blocks'].items():
          for l in v['elements']:
              if 'questionId' in l.keys():
                 qtoblockmapping[l['questionId']]=k

       for blockid in flowIDs:
          investigateAPIblock(blockid,x,loopinfo,qtoblockmapping)           

       for k,v in x['result']['blocks'].items():
           #print(v['description'])
           for l in v['elements']:
              if 'questionId' in l.keys():
                 qtoblockmapping[l['questionId']]=k
              else:
                 pass
                 #print(l['type'])
       #print("*** exportColumnMap -")
       for k,v in x['result']['exportColumnMap'].items():
           if "subQuestion" in v.keys():
              xx=k.split("_")
              subquestionnumber=xx[-1]
              tmp=subquestionnumber.find("(")
              if tmp!=-1:
                 subquestionnumber=subquestionnumber[:tmp]
                 #print(k,"->",v['question'], "subQuestion",v['subQuestion'],x['result']['questions'][v['question']]['subQuestions'][subquestionnumber]['choiceText'])

           else:
              pass
              #print(k,"->",v['question'],x['result']['questions'][v['question']]['questionText'])

       sys.exit()

def run():
   if len(sys.argv) != 2:
      print ("Provide one QSF filename as argument")
      sys.exit()

   with open(sys.argv[1], 'r') as f:
      try:
         x = json.load(f)
      except ValueError as err:
         print("Error parsing file -", err.msg,"at line:",err.lineno)
         sys.exit()
      k=list(x.keys())
      for n in k:
         print(n," has ", len(x[n]),"elements")
      if list(x.keys())[0]=="meta":
         processAPIfile(x)
      else:
         print ("This file has been downloaded via the GUI")
         print ("so you'll need to use a different program")


# Start here
if __name__ == '__main__':
       run()

Department of Engineering

IT Services