WO2008102146A1 - Assessment method - Google Patents

Assessment method Download PDF

Info

Publication number
WO2008102146A1
WO2008102146A1 PCT/GB2008/000603 GB2008000603W WO2008102146A1 WO 2008102146 A1 WO2008102146 A1 WO 2008102146A1 GB 2008000603 W GB2008000603 W GB 2008000603W WO 2008102146 A1 WO2008102146 A1 WO 2008102146A1
Authority
WO
WIPO (PCT)
Prior art keywords
answer
submitted
computer aided
computer
assessment method
Prior art date
Application number
PCT/GB2008/000603
Other languages
French (fr)
Inventor
John Sargeant
Mary Mcgee Wood
Christos Tselonis
Craig Alan Jones
Christian Thomas Beck
Original Assignee
Assessment21 Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Assessment21 Ltd filed Critical Assessment21 Ltd
Publication of WO2008102146A1 publication Critical patent/WO2008102146A1/en

Links

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B7/00Electrically-operated teaching apparatus or devices working with questions and answers
    • G09B7/02Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student
    • G09B7/04Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student characterised by modifying the teaching programme in response to a wrong answer, e.g. repeating the question, supplying a further explanation

Definitions

  • the present invention relates to assessment methods.
  • the present invention relates to computer-aided assessment methods.
  • the invention also relates to methods for processing textual data.
  • a selected answer question is a question requiring a student to identify one or more correct answers or parts of answers from a plurality of possible answers presented to the student.
  • An example of a selected answer question is a multiple-choice question.
  • Students sitting a computer-aided assessment based on selected answer questions via a computer are presented with a graphical user interface displaying a question and a number of possible answers. Each possible answer is associated with a selection box.
  • a student can manipulate the selection box associated with each answer they believe to be correct, for instance using a mouse or other input device, in order to indicate their choice(s).
  • the marking process may be completely automated by simply comparing answers selected by students with a list of correct answers.
  • Computer-aided assessment can completely remove the burden of marking for selected answer questions. Indeed, selected answer questions are predominant in known computer-aided assessment methods.
  • a constructed answer question is a question that requires the student to construct an answer, for instance in the form of text or diagrams, rather than select an answer from a number of possibilities.
  • Open questions are defined as questions where there is not a single or a small number of correct answers, for instance where a student is asked to provide an original example or argument.
  • a closed question is a question requiring a student to provide a specific piece or set of information.
  • Accurate and fully automated marking of constructed answers is known to be particularly difficult to achieve for open constructed answer questions.
  • Studies conducted using data gathered from real university assessments have shown that a significant source of complexity results from the fact that students may attempt to answer even seemingly simple questions in a large number of diverse ways. Even for short text answer questions, research has shown that a group of students may provide a large number of different correct answers, not all of which may be anticipated when setting the questions.
  • even for correct answers if students are allowed to enter free text answers there may be a significant amount of redundant information beyond the information required to gain marks.
  • Human computer collaborative assessment can be considered to be a sub-field of computer- aided assessment.
  • answers may be constructed (for example using text or diagrams) rather than selected from possible answers.
  • Human computer collaborative assessment techniques can, however, support both selected answer and constructed answer question formats.
  • a computer aided assessment method comprising: processing a submitted answer to a question to determine a degree of conformance of the submitted answer to a regular expression indicative of a correct answer format for the question; and generating an output signal based upon said processing.
  • An advantage of the first aspect of the present invention is that by using regular expressions to constrain the format of answers submitted by a group of students undergoing computer-aided assessment, the number of distinct answers (both correct and incorrect) submitted may be reduced. Consequently, identical or closely similar answers may be grouped and marked at the same time, thus reducing the marking time taken, and reducing the workload of assessment markers by reducing the number of distinct marking judgements that need to be made.
  • the submitted answer may comprise a text answer to the question.
  • the method may further comprise displaying a warning message if said output signal indicates that the submitted answer does not conform to the regular expression.
  • the method may further comprise processing the submitted answer to determine a degree of conformance of the submitted answer to a concatenation of two or more regular expressions.
  • the method may further comprise processing the submitted answer to determine a degree of conformance of the submitted answer to two or more alternative regular expressions.
  • the method may further comprise receiving a first input indicative of the question; and receiving a second input indicative of a correct answer format for that question; wherein the second input comprises a regular expression.
  • a computer implemented method of generating an assessment comprising: receiving a first input indicative of a question to be included in said assessment; and receiving a second input indicative of a correct answer format for that question; wherein the second input comprises the regular expression.
  • a computer aided assessment method comprising: comparing a submitted answer to a question with a standard answer to the question to determine a similarity metric for the submitted answer indicative of how similar the submitted answer is to the standard answer; wherein the standard answer comprises a plurality of answer graph objects comprising portions of one or more acceptable answers to the question, said comparing comprising matching each of said answer graph objects to the submitted answer, determining a similarity metric for each answer graph object based upon said matching and determining a similarity metric for the submitted answer based upon the similarity metrics for each answer graph object.
  • An advantage of the third aspect of the present invention is that portions of submitted constructed answers for questions requiring, for instance, a diagram to be drawn can be automatically matched to portions of a standard or model answer. This improves the efficiency of the marking process, as a human marker is only required to confirm a mark already generated. Furthermore, the standard answer may be refined dynamically to take account of unanticipated correct portions of submitted answers. Thus the consistency and accuracy of marking can be improved.
  • the submitted answer may comprise a constructed diagram.
  • the method may further comprise representing the submitted answer as a submitted answer graph object.
  • At least two answer graph objects may comprise overlapping portions of one or more acceptable answers.
  • the answer graph objects may define a structure of a portion of one or more acceptable answers.
  • the answer graph objects may also, or may alternatively, define one or more parameters of a portion of one or more acceptable answers.
  • Said matching may therefore comprise determining the extent to which the structure of each answer graph object corresponds to the structure of a portion of the submitted answer or determining the extent to which a parameter of each answer graph object corresponds to a parameter of a portion of the submitted answer.
  • the standard answer may comprise an AND/OR tree comprising a plurality of nodes arranged under one or more AND or OR branches, each node comprising an answer graph object.
  • Determining a mark for the submitted answer may comprise determining a mark for each of a group of two or more answer graph objects arranged underneath an OR node and determining the highest marked answer graph object of that group or determining a mark for each of a group of answer graph objects arranged under an AND node and adding together the marks for each answer graph object in the group.
  • Determining a similarity metric for the submitted answer may comprise determining a mark for all of said plurality of answer graph objects arranged under a root node of the AND/OR tree.
  • the method may further comprise adding a new answer graph object to said plurality of answer graph objects based upon identifying a new portion of an acceptable answer within a submitted answer.
  • a standard answer for a question within a computer aided assessment method comprising: an AND/OR tree comprising a plurality of nodes arranged under one or more AND or OR branches, each node comprising an answer graph object; wherein each answer graph objects comprises a portion of one or more acceptable answers to the question.
  • a carrier medium may be provided carrying the standard answer for a question.
  • a computer aided assessment method comprising: comparing a first submitted answer to a question with a second answer to the question to determine a similarity metric indicative of how similar the first submitted answer is to the second answer; wherein said comparing comprises determining a number of occurrences of a plurality of words within the first submitted answer and the second answer and representing the number of occurrences of the plurality of words within the submitted answer and the second answer as a pair of vectors, the similarity metric being indicative of a relationship between the pair of vectors.
  • An advantage of the fifth aspect of the present invention is that by clustering groups of answers by their similarity to one another or to a standard answer, identical or closely similar answers may be grouped and marked at the same time, thus reducing the marking time taken, and reducing the workload of assessment markers by reducing the number of distinct marking judgements that need to be made. The consistency of marking can also be increased as similar answers can readily be marked at the same time.
  • the method may further comprise comparing a plurality of submitted answers to the question with the second answer.
  • the second answer may comprise a standard answer to the question.
  • the second answer may be a particular submitted answer.
  • the method may further comprise determining the number of occurrences within each submitted answer and the second answer of each word contained within any submitted answer or the second answer.
  • the relationship between the or each pair of vectors may comprise a Euclidean distance between the or each pair of vectors.
  • the relationship between the or each pair of vectors may comprise a cosine of an angle between the or each pair of vectors.
  • the method may further comprise a number of techniques for reducing vector size. For example, the words to be represented may be removed or combined.
  • the method may further comprise correcting the spelling of words within the or each submitted answer and the second answer, thus reducing many variant spellings to a single word.
  • the method may also further comprise removing one or more words within the or each submitted answer or the second answer.
  • the method may also further comprise truncating ("stemming") one or more words within the or each submitted answer or the second answer, thus reducing variant forms of a word (such as singular and plural nouns) to a single word.
  • the method may also comprise summarising the first submitted answer.
  • Determining a number of occurrences of a plurality of words may comprise weighting one or more words within the or each submitted answer or the second answer.
  • the method may further comprise clustering submitted answers according to the similarity metric for each submitted answer. Said clustering may comprise assigning each submitted answer to a separate cluster and amalgamating pairs of clusters with the most similar similarity metrics.
  • the clustering may further comprise amalgamating pairs of clusters until the number of clusters has been reduced by a predetermined amount. Alternatively, the clustering may further comprise amalgamating pairs of clusters until the variance between the similarity metrics for submitted answers within any cluster exceeds a maximum value.
  • the method may further comprise processing the submitted answer to determine a degree of conformance of the submitted answer to a regular expression indicative of a correct answer format to the question; and generating an output signal based upon said processing.
  • the method may further comprise displaying a warning message if said output signal indicates that the submitted answer does not conform to the regular expression.
  • the method may further comprise processing the submitted answer to determine a degree of conformance of the submitted answer to a series of two or more regular expressions.
  • the method may further comprise processing the submitted answer to determine a degree of conformance of the submitted answer to two or more alternative regular expressions.
  • the method may further comprise receiving a first input indicative of the question; and receiving a second input indicative of a correct answer format for that question; wherein the second input comprises the regular expression.
  • the method may further comprise a human assessment marker performing a further processing step upon the or each submitted answer based upon an output indicative of said comparison or said clustering.
  • An advantage of aspects of the present invention is that human markers are freed from much of the routine portions of marking, thus allowing greater time to make more important marking judgements.
  • the accuracy and consistency of marking may also be improved.
  • the overall time for marking a set of answers may be significantly reduced. For instance, embodiments of the present invention have shown that the time to mark a set of answers may be reduced by a factor of two or more.
  • a method for comparing first and second data items in a computer system comprising: generating a first difference data item representing a difference between the first data item and a reference data item, generating a second difference data item representing a difference between the second data item and the reference data item, comparing said first and second difference data items to provide data indicative of similarity between said first and second data items.
  • the first and second data items and the reference data item may take the form of alphanumeric strings.
  • the first and second data items may be submitted answers to an assessment question, while the reference data item may be a standard answer to the assessment question.
  • the difference data items will typically be shorter than the first and second data items and therefore liable to more computationally efficient processing.
  • Groups of data items may be clustered, for example using methods such as those described above.
  • the clustering may be based upon the data indicative of similarity.
  • a method for clustering textual data items comprising: processing a plurality of textual data items to generate a plurality of summarised textual data items, each summarised textual data item representing a summary of a respective data item, clustering said data items by processing said summarised data items.
  • clustering data items in this way offers greater computational efficiency given that the summarised data items are shorter than the originally provided textual data items.
  • Each textual data item is an answer to an assessment question.
  • the clustering may comprise determining similarities between the summarised data items.
  • Each textual data item may be an answer to an assessment question.
  • At least one of the data items may be a submitted answer to an assessment question.
  • At lest one of the data items may be a standard answer to an assessment question.
  • a method of processing textual data comprising: receiving textual data, processing said textual data to generate a plurality of summaries of said textual data, determining similarity between said summaries, and generating an output indicating a property of said textual data based upon said similarity.
  • comparing a plurality of summaries of particular textual data can provide useful information about that textual data.
  • this is likely to be an indication of good written style in the textual data.
  • Such a method can be applied in a computer aided assessment method.
  • Processing to generate at least two summaries may comprise applying a plurality of summarising algorithms to the textual data, each summarising algorithm generating a respective one of the plurality of summaries. At least one of the summarising algorithms may perform an analysis of word frequencies in the received textual data.
  • the property indicated by the generated output may be an indication of writing style.
  • the textual data may be an answer to an assessment question.
  • a computer aided assessment method comprising: receiving a student answer to an assessment question, accessing a standard answer to said assessment question, aligning tokens of said student answer to tokens of said standard answer, and outputting data for use in said assessment based upon said alignment.
  • the data based upon the alignment may indicate differences of word order between the student answer and the model answer.
  • Outputting the data based upon the alignment may comprise displaying the student answer on a display device, displaying data based upon the alignment in connection with the student answer, and highlighting tokens of the student answer.
  • Aligning tokens of the student answer to tokens of the standard answer may comprise generating a similarity matrix and generating alignment data based upon the similarity matrix.
  • the similarity matrix may be generated using an algorithm based upon the Needleman-Wunsch algorithm.
  • the method may also comprise generating a score associated with the student answer based upon the alignment. Generating the score may comprise processing data indicating a number of differences between the student answer and the model answer.
  • a computer aided assessment method comprising: receiving a student answer to an assessment question, receiving an annotated version of said student answer to said assessment question, processing the student answer and the annotated version of the student answer to identify annotations to said student answer, said processing comprising determining whether the annotations comprise deletion of text included in said student answer.
  • the processing may comprise applying a sequence alignment algorithm.
  • the sequence alignment algorithm may comprise identifying a marker occurring exactly once in each of the student answer and the annotated version of the student answer and the method may further comprise aligning the marker in the student answer with the marker in the annotated version of the student answer.
  • the method may further comprise identifying a plurality of markers, each marker occurring exactly once in each of the student answer and the annotated version of the student answer; aligning a marker together and tokens of the student answer with a corresponding marker and tokens of the annotated version of the student answer, the tokens of the student answer occurring between the marker and a subsequent marker in the student answer.
  • Embodiments of the present invention may be implemented in software.
  • a carrier medium carrying computer readable code for controlling a computer to carry out the above aspects of the invention may be provided.
  • a computer apparatus comprising a memory storing processor readable instructions and a processor configured to read and execute instructions stored in said memory may be provided.
  • the processor readable instructions stored in said memory may comprise instructions controlling the processor to carry out the above aspects of the invention.
  • the computer apparatus may comprise bespoke hardware arranged such that a suite of assessment setting and marking tools are provided on a bespoke server computer, suitable for access by client software provided on a bespoke client computer.
  • the invention may be implemented in any suitable form including as a system, method or apparatus.
  • Figure 1 schematically illustrates a computer network suitable for implementing embodiments of the present invention
  • Figure 2 schematically illustrates a computer network suitable for implementing a marking process for a computer-aided assessment in accordance with an embodiment of the present invention
  • Figure 3 is a screen shot of a question screen displayed to students undergoing a computer-aided assessment in accordance with embodiments of the present invention
  • Figure 4 is a screen shot of a marking tool for a computer-aided assessment in accordance with an embodiment of the present invention
  • Figure 5 is a screen shot of a user interface for a selecting a regular expression to constrain an answer to a question in accordance with an embodiment of the present invention
  • Figure 6 is a screen shot of a diagram drawing tool used for submitting constructed diagram answers in accordance with an embodiment of the present invention
  • Figure 7 is a screen shot of a palette builder for preparing the diagram drawing tool of Figure 6;
  • Figure 8 illustrates a diagram matching metaformat for storing a standard answer for a constructed diagram question in accordance with an embodiment of the present invention
  • Figures 9A and 9B illustrate alternative correct answers to a constructed diagram question for which the standard answer is defined by Figure 8;
  • Figure 10 schematically illustrates a possible submitted answer to a constructed diagram question for which the standard answer is defined by Figure 8;
  • Figure 11 schematically illustrates the process of determining a similarity metric between two answers using a vector space model in accordance with an embodiment of the present invention
  • Figure 12 is a flow chart representing the process of determining a similarity metric between two answers using the vector space model in accordance with an embodiment of the present invention
  • Figure 13 is a screen shot of a cluster of submitted answers, clustered according to the process of Figure 12;
  • Figure 14 is a flow chart of a process for generating similarity matrix;
  • Figure 15 is an example similarity matrix after initialisation
  • Figure 16 is a completed similarity matrix
  • Figure 17 is a screenshot from a computer aided assessment application in which sequence alignment has been applied to answers;
  • Figure 18 is a flow chart showing a process for identifying annotations to a submitted answer.
  • Figure 19 is a flowchart showing a sequence alignment algorithm.
  • Human computer collaborative assessment can be considered to be a sub-field of computer-aided assessment.
  • answers may be constructed (for example using text or diagrams) rather than selected from possible answers.
  • Human computer collaborative assessment techniques can, however, support both selected answer and constructed answer question formats.
  • Marking is a process of active collaboration between software and a human marker.
  • Human computer collaborative assessment methods offer significant benefits over both traditional paper based assessment and conventional forms of computer-aided assessment. Such methods offer flexibility in the manner in which questions can be set, allowing the form of answers to be constrained by educational rather than technological considerations.
  • the speed of marking can be significantly increased. Consistency and accuracy of marking may also be improved.
  • the reasoning behind the allocation of marks can also be explicitly recorded.
  • a series of computer programs are provided for setting assessment questions, taking a computer aided assessment, administering a computer aided assessment and marking answers submitted in response to questions in an assessment.
  • the programs form four main components.
  • a first component is an assessment system (or exam client) in which students are presented with questions via a graphical user interface, and given the opportunity to enter their answers.
  • a second component is an assessment setting tool allowing an assessment marker to prepare questions to be presented within the assessment system.
  • a third component is a marking tool allowing an assessment marker to mark submitted answers from a group of students.
  • a fourth component comprises an administration tool, which provides the facility to control and invigilate exams.
  • All four components can be implemented on a server computer, which may be accessed by client computers.
  • the client computers may be directly connected to the server in the event that an examination is taking place within a closed assessment room.
  • the server computer and client computers can be connected to a local area network (LAN).
  • LAN local area network
  • remote access to the server may be provided, for example using the Internet.
  • all three components may be provided in a stand alone configuration. That is, it is not necessary to be connected to the server computer when marking or setting an assessment.
  • FIG. 1 a computer network suitable for implementing embodiments of the present invention is schematically illustrated.
  • a server computer
  • the server computer 1 runs the assessment system, the administration tool and the assessment setting and marking tool.
  • a plurality of client computers 2 is shown. Each client computer 2 may be used by a student to access the assessment system when sitting an assessment. Alternatively, an assessment marker when marking a set of submitted answers, or for preparing a new assessment may use a client computer 2. Finally, a client computer 2 may be used by an assessment invigilator to initiate and control a computer aided assessment.
  • FIG 2 this illustrates a configuration suitable for using the marking tool implemented by server computer 1 to mark submitted assessment answers in accordance with embodiments of the present invention.
  • a client program
  • the client program 4 operating on a remote client computer 2 is arranged to access the server 1.
  • the client program 4 may be a web browser or a Java application.
  • the server 1 stores an answer set 5, comprising a standard answer file 6 and a set of submitted answers 7.
  • the answer set 5 is stored in an extensible mark up language (XML) format.
  • XML XML
  • Storing data in XML is advantageous because it is a standardised and portable data storage format. This allows for flexibility in that the assessment system is not restricted to a single proprietary database format. As well as portability, XML is advantageous because it is an efficient, lightweight and robust data format.
  • the standard answer file 6 comprises one or more standard answers to each question, and optionally feedback to be presented to a student in response to their answer. Such feedback is typically used in formative, rather than summative assessments.
  • the standard answer file is used to partially automate part of the marking process in a human computer collaborative assessment method in accordance with embodiments of the present invention. The standard answer file and the marking process will be described in greater detail below.
  • the submitted answers 7 are received from client computers in communication with the server 1, are processed by the server 1 and passed to the client program 4 for marking. Marking may be an iterative process, for instance if a human marker 8 operating the client program 4 changes marking parameters (as discussed below). After marking of a question or a group of questions is completed the marks may be passed to the server 1 and output as a comma separated file 9 for further processing 8 (for instance, using a spreadsheet application). It will be appreciated that a comma separated file is only one of a large number of known standardised output formats that could be used. It will be appreciated that the client and server can be arranged in any convenient way.
  • the client program may take the form of a web browser as described above and in such a case the server 1 may provide appropriate web pages which are displayed to the client. Such web pages can be configured to receive data and transmit that data back to the server 1.
  • the use of webpages in this way will be readily understood by one of ordinary skill in the art.
  • setting and marking of assessments may be performed off line (that is, when a client computer 2 is not connected to the server computer 1).
  • the marking tool In order to mark an assessment all that is required is that the marking tool, the assessment (that is, the questions), the standard answers and the set of submitted answers are downloaded to a client computer from the server computer via the administration tool. Once these are downloaded, the marking can be performed offline on any computer, with the marks uploaded, if required, to the server computer at a later time.
  • the assessment setting tool may also operate in a stand-alone configuration allowing an person setting an assessment to create questions and standard answers off line. Indeed, in preferred embodiments of the present invention, marking is carried out as an offline process.
  • Figure 2 relates to operation of the marking tool it will be appreciated that the assessment system used by students when sitting an assessment can be implemented in a similar way. That is, a student sitting an assessment may access the assessment system by simply using a web browser to download and navigate appropriate webpages.
  • This has the advantage that the user interface, which is to be presented to a user sitting an assessment, is provided from the server. Accordingly, there is no specific software to be installed and maintained at the client computer; rather the interface can easily be updated by simply making appropriate changes at the server.
  • administration tool used by assessment invigilators can be implemented in a similar manner to the marking tool operation illustrated in Figure 2.
  • Assessments are generated by an examiner constructing a series of questions, which may be hierarchically structured into sections, questions, and part questions. Upon sitting an assessment, a student or a group of students must first log onto an assessment system. Assessments may be conducted in a controlled assessment hall, similar to conventional paper based assessments, or students may be able to sit the assessment by logging to the assessment system from a remote computer. After providing the assessment system with appropriate credentials uniquely identifying the student, the student is presented with instructions for sitting the assessment.
  • Partially completed assessments can be continued with both for formative assessments and also for summative assessments conducted under open conditions (that is, a student being allowed to leave the assessment room) over a longer period of time. Furthermore, a key advantage of being able to continue with partially completed assessments is that it allows recovery from a backed up copy of the partially completed assessment in the event of technical problems during an assessment.
  • the state of a partially completed assessment can be saved in one of two ways. Firstly, students being assessed may be given the opportunity to manually initiate a back up periodically. Alternatively, back ups may happen automatically.
  • a back up may be initiated whenever a change in the state of the completed assessment is detected in order to avoid large numbers of identical backups.
  • client computer sends a Java serialised data stream representing the current state of the student's work to the server computer.
  • the server computer is responsible for converting the serialised data to XML and storing the data as a single file per student per back up.
  • the current state of student's answers may be locally backed up on the client computer for additional robustness.
  • a student is presented with a main assessment screen, an example of which is depicted in Figure 3.
  • the main assessment screen is split into a question area 10 and a button area 11.
  • the question area 10 provides a series of questions 12a, 12b, and 12c arranged as tabs across the top of the question area 10.
  • questions may be navigated by selecting questions from a list (not shown in Figure 3).
  • Alternative methods of navigating the questions within an assessment including buttons linking between questions and scrolling, may be provided.
  • the question 12a is arranged as a series of part questions 13 that students are required to answer.
  • Answer areas 14a, 14b for the student to enter their answers are provided. Two types of answer area are shown in Figure 3.
  • a first type comprises a text entry box 14a allowing the student to type a constructed answer to the preceding question.
  • a second type comprises a group of selection boxes 14b allowing the student to select one or more correct answers from a group of possible correct answers.
  • a diagram drawing tool for constructing diagrammatic answers to questions is depicted in Figure 6 and described below.
  • the button area 11 displays an indication of the time left 17 for answering questions.
  • a button 18 labelled “view rubric” allows the student to return to a screen displaying the instructions for sitting the assessment.
  • a button 19 labelled “finish test” allows the student to save their answers upon finishing the assessment. Partially completed answers are also regularly saved automatically throughout the duration of the assessment.
  • a series of additional utilities may allow students to select the language of text input, insert special characters (in particular mathematical symbols) and provide tools to assist students, such as a calculator.
  • the assessment setting tool For setting questions for a new assessment, the assessment setting tool is used, which provides tools for performing all of the essential administrative tasks of setting up and running a computer-aided assessment.
  • an assessment file is generated containing the questions and all other material presented to a student when sitting the assessment.
  • a second optional file may be prepared containing standard or model answers to the questions, and possibly standard feedback to be presented to students either when the marks to the questions are returned or during the assessment (particularly in the case of formative assessments).
  • the standard answer file may be used during the marking process in accordance with embodiments of the present invention to allow the marking tool to automate parts of the marking process, as will be described in greater detail below.
  • the standard answer file contains a representation of at least one possible answer and the marks associated with each part of the answer, hi principle this is similar to a conventional paper based marking schedule.
  • the marking tool also allows repeated answers and part answers to be identified, allowing for more efficient marking as will be described in greater detail below.
  • the standard answer file alternatively referred to as the answer representation (the representation of possible answers and their marks), may be enlarged and refined during the marking process and not fixed in advance. That is, as answers are processed and further correct answers are identified such correct answers may be added to the answer representation.
  • the existing answer representation may be reused providing further efficiency gains. If an answer representation shows that there are a large number of alternative correct answers this can be used to tighten up the question statement and thereby improve the quality of the question.
  • the marking tool allows markers to, amongst other things, group questions across the whole set of students' answers in order to make the marking process more efficient.
  • the assessment file and a set of answers file (together with the standard answers file) are loaded into the marking tool.
  • the marker is then presented with a series of optional tools for automating parts of the marking process.
  • a suitable marking screen is shown in Figure 4.
  • a question area 24 displays the question currently being marked.
  • An answer area 25 displays one or more submitted answers 26 to the displayed question 24, including a standard answer 27.
  • a question tree 28a allows navigation between questions organised hierarchically into sections, questions and part questions. Further methods of navigating between questions, such as buttons, may be provided. It can be seen that answers from all students who attempted a part of a question can be displayed together in the answer area 25. This can improve the consistency of marking by allowing the marker to mark all answers to a single part of a question at the same time. For performance reasons the answers may be displayed in blocks of answers if there are a large number of answers to a question. For instance, submitted answers may be display twenty at a time.
  • each displayed answer 26 is a mark box 29 allowing the human marker to enter a mark for that answer.
  • Computer-aided assessment methods in accordance with embodiments of the present invention provide significant advantages over conventional marking of handwritten answers. Elimination of the need to decipher handwriting, and the overhead of sorting through large numbers of separate answer papers significantly speeds up the marking process. Marking all answers to part questions together serves to improve consistency of marking as well as to improve efficiency.
  • the marker may be presented with the option of allowing automatic marking for certain answers to certain questions.
  • Automatic marking could include awarding a mark of zero if a student does not answer a question.
  • a mark of zero may be awarded for any answers over and above the number of questions a student is required to answer.
  • Dynamic marking may be used for certain types of answers. If a first answer is marked, then all identical answers can be automatically awarded the same mark. This is particularly applicable for answers consisting of a single word or number or a small number of words or numbers.
  • buttons may be displayed allowing the marker to sort the answers by marking status (marked/unmarked), by length or by similarity to keywords in the standard answer or another submitted answer (for instance sorting by the number of keywords contained in the submitted answers).
  • the last option requires the marker to define keywords within the standard answer either during the question setting process or dynamically during marking of an assessment (for instance if an important, but unanticipated, word appears frequently in a number of students answers). Keywords can be used to reflect key pieces of text in an answer that may attract marks. Similarly keywords can be used to identify common misconceptions.
  • Keywords may be highlighted in the answers, or used for a simple similarity sort as discussed above. Keywords may be matched only when there is an exact correlation between the text in an answer and the keyword.
  • an examiner can define a regular expression to take into account possible variations in spelling a word or the order of words in a key phrase.
  • a further option is to define a maximum error tolerance or edit distance for a word to match. For instance an edit distance of two corresponds to two small errors such as two letters being transposed. An edit distance may also be defined as being proportional to the length of the keyword being matched.
  • Word variants, including misspellings, and context dependent synonyms limit the usefulness of keyword sorting.
  • Context dependent synonyms may not always be predictable in advance, although in accordance with embodiments of the present invention once a new context dependent synonym is identified in a submitted answer the answer representation contained in the standard answer file can be updated by adding the synonym as a new keyword.
  • keywords within submitted answers may be matched by using a regular expression to account for variations in spelling of the keyword.
  • the form of a constructed answer could be constrained, rather than allowing arbitrary free text. For instance, the user interface could be constrained to only allow numbers (for instance) in a particular format.
  • the answer could be constrained to be a number typed into a box of limited size.
  • regular expressions can be used to apply constraints to what students can enter within a constructed text answer in a computer-aided assessment.
  • the answer format may be constrained, for instance to specified number formats (for instance, defining the format of an exponent), date formats, or more generally (for instance specifying that the answer should be no more than four words long).
  • Constraining answer formats according to a regular expression offers the opportunity to increase the degree to which answers can be automatically marked. Should a student attempt to write an answer in the wrong format, one possible approach is to allow this but trigger a warning message to the student. Alternatively the student may be prevented from submitting that answer for marking.
  • an examiner can define one or more regular expressions defining the required format of answers.
  • the Java regular expressions package is used.
  • Examiners may either pick from a predetermined list of possible regular expressions, or advanced users can define a regular expression manually.
  • a regular expression for an answer format is defined a description of the required answer format can optionally be automatically inserted at the end of the question. Alternatively, the examiner can manually enter a description of the answer format.
  • FIG. 5 this is a screenshot illustrating an interface forming part of the assessment setting and marking tool allowing a marker to define a regular expression to constrain the format of submitted answers for the question.
  • a first drop down menu 30 is provided allowing the marker to choose a general category for a regular expression, for instance "number" as shown in Figure 5.
  • a second drop down menu 31 allows the marker to choose a predefined regular expression, for instance "unsigned integers".
  • An example of an answer meeting the constraint selected in drop down menu 31 is shown in text area 32.
  • a customised regular expression can be generated by selecting button 33 labelled "custom”. Selecting button 33 opens a further window allowing the user to define a new customised regular expression.
  • Button 34 labelled "add” allows the marker to add the regular expression selected from drop down menu 31 or custom defined to the regular expression generated so far for the answer format.
  • a regular expression comprising a series of concatenated regular expressions can be built up.
  • Three tabs are provided relating to the regular expression generated so far.
  • a first tab 35a displays a text description of the constraint to be applied to the answer.
  • a regular expression has been built up comprising three concatenated regular expressions.
  • a submitted answer is required to have three parts: an alphanumeric string followed by a custom generated regular expression manually created by the marker and finally an unsigned integer.
  • a second tab 35b displays the text to be displayed to a candidate student taking the assessment explaining the answer constraint.
  • a third tab 35c defines the answer constraint in the form of a formal regular expression.
  • Text box 36 allows the marker to test the answer constraint by entering a test answer and then selecting the test button 37.
  • the marker has entered an alphanumeric string "apple” followed by a period ".” (which may conform to the required custom generated regular expression) and then an unsigned integer "43".
  • a marker has entered a sample answer into text box 36 and selected test button 37 a message is displayed confirming whether or not the test answer meets the selected regular expression.
  • Buttons 38a, 38b, 38c allow the user to undo the last change to the generated regular expression shown in tab 35a, redo any undone change or clear the whole generated regular expression respectively.
  • Button 39 labelled “apply marking constraint” allows the marker to apply the selected regular expression to constrain answers submitted to that question.
  • Button 40 labelled “cancel” allows the user to cancel a previous application of the regular expression to the current question (which may restore any previous regular expression that has been overridden.
  • a checkbox 41 is provided labelled “allow additional spaces in the answer”.
  • a further area 42 labelled "add current selection" of the interface illustrated in Figure 5 allows advanced options for creating regular expressions, as an alternative to the method described above whereby selecting button 34 adds a selected regular expression to the regular expression already created.
  • the advanced option allows the creation of sub-expressions, which can be built up recursively. Each sub-expression is wrapped within parentheses, as is the overall expression so far before the new subexpression is added.
  • a sub-expression is generated as described above by either choosing a regular expression category and predefined expression using menus 30 and 31 or custom defining a regular expression via button 33.
  • the generated regular expression appears is text field 43, which is labelled "add”.
  • Drop down menu 44 allows that subexpression to be added either at the end or the overall regular expression so far (in which case it is simply appended to the end) or as an alternative to the overall regular expression so far (in which case the logical OR operator is applied between the existing regular expression and the new regular expression).
  • Button 45 labelled "create group” operates by opening parentheses when generating a new sub-expression in text field 43.
  • Button 46 labelled “exit group” operates by closing parentheses when generating a new sub-expression in text field 43.
  • Button 47 labelled “insert” functions as the add button 33 in that what is currently selected in text field 43 is added to the overall regular expression answer constraint.
  • constructed diagrams are considered to be diagrams that are composed from a set of basic components that may be tailored with labels (inside or outside of the component). Diagrams are an important part of many assessments. Diagrams formed from boxes and connectors can readily be drawn using conventional computer input devices. The resulting structures can be matched against each other to determine similarity. It is known that a simple heuristic process is effective in finding similarities between such diagrams, for instance between a submitted diagram and a standard answer diagram.
  • diagrams such as UML class diagrams, entity-relationship diagrams, chemical molecules and electronic circuits can be represented as boxes joined by connectors. Such a structure can in turn be represented by a graph.
  • a graph is defined as a series of vertices, for which pairs of vertices are connected by edges.
  • each box is a vertex and each connector is an edge.
  • graphs are matched in order to assess submitted answers and determine marks for constructed graphs in an assessment.
  • a human computer collaborative approach is used, as discussed above, to present results more efficiently to a human marker, rather than fully automatic marking. For instance, diagrams that are functionally identical (that is, they can be represented by the same graph) are clustered, thus reducing the workload of a marker.
  • the assessment system further comprises a diagram drawing tool.
  • An exemplary diagram drawing tool is shown in Figure 6.
  • the diagram tool consists of a palette 50 and a drawing area 51.
  • the palette 50 is further divided into a box palette 52 and a connector palette 53.
  • Connectors can then be selected and used to join boxes together. Both boxes and connectors can be labelled using standard keyboard characters by selecting a label editor 54.
  • Special characters can be incorporated within a label by using a special character editor 55.
  • the palette is fully customisable when setting an assessment, and boxes and connectors can be created according to the kind of diagram expected within the answer. Palettes are created using a palette builder tool.
  • a screenshot of an exemplary palette builder tool is shown in Figure 7.
  • a first portion of the palette builder comprises a palette editor 60 where new boxes and connectors are defined.
  • Test drawing area 61 allows a partially completed palette 62 to be experimented with.
  • a series of selectable menus allow aspects of the boxes and connectors to be defined, including defining label locations 63 and defining line styles 64.
  • a generic representation metaformat for storing complex structures as graph objects has been defined in accordance with embodiments of the present invention.
  • the metaformat is particularly applicable to constructed diagram answers, however it will be appreciated that it may be used for matching any kind of constructed answer against a predefined but dynamically extendable and adjustable model.
  • the generic metaformat has particular utility for matching constructed answers comprising mathematical expressions, software programs and short factual text.
  • the metaformat referred to herein as a Gree comprises a dynamically extendable AND/OR tree for which the leaf nodes are overlapping graph object fragments.
  • the Gree metaformat allows for extendibility, reusability and modularity of a predefined standard answer.
  • the Gree metaformat can efficiently represent data structures along with any extra information needed. Submitted answers can be compared to the standard answer stored in the Gree metaformat and partially or fully matched. Alternatively, submitted answers may be compared with another submitted answer.
  • the process of matching constructed answers to a standard answer using the Gree metaformat comprises a modular scoring approach. Parts of the standard answer are separately matched to parts of a submitted answer. This allows a marking scheme to be accurately defined (though later dynamically amended if necessary) in which the process of awarding individual marks for the presence or absence of certain features of an answer is explicitly recorded. Multiple alternative acceptable parts of an answer, each of which would earn the same portion of the marks, may be defined. Furthermore, different parts of the standard answer may be weighted differently according to their relative importance in terms of providing a correct answer to the question.
  • the use of the Gree metaformat in marking constructed answers, in particular for constructed diagram answers, allows marking judgements to be made in a systematic and consistent manner across a large number of submitted answers.
  • the matching process may alternatively be considered to be a simplified yet quantitative unordered regular expression check for complex structures in which the Gree is the pattern and a submitted answer converted to an answer metagraph is the subject.
  • a number of correct answers to the question are possible, two of which are shown in Figures 9 A and 9B.
  • the Gree shown in Figure 8 is intended to show concisely the range of possible answers by separating out individual portions of correct answers.
  • the Gree metaformat shown in Figure 8 comprises a series of nodes 70 labelled A to H.
  • Each node 70 comprises a possible part of a correct answer.
  • Each part answer comprises a portion of an acceptable class diagram depicted diagrammatically in combination with a series of parameters of the answer portion defining aspects of the answer portion that may be matched in order for the node to be matched.
  • the nodes 70 are connected in a tree structure by an AND node 71 and three OR nodes 72. For a submitted answer to be awarded all available marks each part answer node 70 must form part of the submitted answer if that node 70 is placed directly under an AND node 71.
  • a submitted answer must thus contain the following part answer nodes: (A OR B) AND C AND (D OR E) AND F AND (G OR H).
  • Each part answer node 70 includes the marks to be awarded if that part answer is included in the submitted answer.
  • the content of the part answer nodes may comprise partially overlapping elements of a submitted answer.
  • the alternative part answers shown in nodes A and B both contain a chapter class and a book class. Nodes A and B are directed to marking the relationship indicated between the chapter and book classes.
  • nodes C and F which are concerned with awarding marks for the relationship between the chapter and book classes and other classes.
  • the matching process starts by considering the Gree's root node and continues visiting the nodes down the tree in a predetermined manner. Once a part answer node 70 is encountered (i.e. a node that is not an AND node 71 or an OR node 72) a score for that node compared to the submitted answer is calculated.
  • the allocation of marks can be weighted for each node differently, according to the relative importance for that node of the structure of the diagram fragment or the attributes listed in the node for that part answer.
  • the node 70 providing the highest score is considered to be the closest match and is thus considered to be the score for that OR node 72.
  • the scores are added. Once the matching process has completed the score given by the root node (in the example of Figure 8, the single AND node 71) is the mark awarded to the submitted answer.
  • the Gree metaformat for the standard answer can be adapted dynamically during the marking process in the light of previously unconsidered alternative correct parts of submitted answers.
  • Such adaptation involves adding additional nodes 70 to the Gree.
  • an additional alternative diagram fragment may be added to the Gree underneath an OR node 72.
  • the whole Gree may be revised, for instance by splitting the marks allocated to one node of the Gree into two parts and generating two new nodes to replace that node arranged under an AND node such that both nodes must be matched in order to achieve all of the available marks.
  • the process of adding a new node to the Gree or revising an existing node dynamically during marking of an assessment is the same as the process of generating a new Gree.
  • a Gree has been amended answers that have previously been marked may be automatically remarked in order to take account of the changes. For instance, if another student has submitted an answer that also contains a portion corresponding to the node added to the Gree then the mark awarded to that answer is likely to change. This represents a significant improvement over conventional paper based marking methods, for which if the marking scheme changes part way through the process of marking a large number of answers there is no choice but to review all answers marked so far to determine if the marks awarded must be changed.
  • the structure of a Gree is stored in XML allowing it to be imported into the marking tool described above in connection with Figure 2.
  • the marking process is able to continue in the same manner as for the marking of multiple submitted text answers. For instance, portions of submitted answers that match different nodes 70 of the Gree may be highlighted.
  • a human marker is able to review the marks awarded automatically by matching the Gree to a submitted answer, and either amend the marks or amend the Gree as appropriate.
  • the submitted answers could be grouped automatically by matching to the Gree standard answer before presentation to the human marker.
  • answers may be sorted by keyword similarity to a standard answer (or to a particular submitted answer).
  • One method of sorting a large number of answers by similarity to keywords is to measure the occurrence of keywords or phrases in order to try to capture the essential structure of constructed answers.
  • the distance between keywords or phrases is also measured in order to sort constructed answers.
  • Answers may be grouped according to relative similarity to keywords within a standard answer based on generic text clustering techniques drawn from natural language engineering. Clusters can be displayed in an area 28b ( Figure 4). Natural language engineering techniques can be used in computer aided assessment for sorting answers.
  • Clustering is the process of grouping similar objects together. A measurement of similarity or distance is used to group objects within a set into subsets or clusters. Clustering is known in other fields such as bioinformatics for finding the closest neighbours of a document, or for organising search engine results on the Internet.
  • computer-aided assessment clustering of text answers offers a number of benefits. Clustering similar answers can help the human marker as it provides a review mechanism to check that marking is consistent, and potentially offers a basis for rapid formative feedback by allowing the marker to provide feedback per cluster, rather than individually per answer.
  • Embodiments of the present invention relate to the clustering of complete texts of short answers.
  • Clustering answers offers a trade off.
  • a measure of similarity known for use within information retrieval systems is the vector space model.
  • Documents are expressed as vectors within an n-dimensional space where n is the total number of unique terms (words) that appear in any of the documents in a set of documents to be clustered.
  • the similarity between two documents is calculated as the distance between their respective vectors.
  • the vector space model is applied within a human computer collaborative assessment method for calculating the similarity of two answers (one of which may be a standard answer) to a question.
  • a first step of the vector space model is the creation of a term by document matrix, comprising a list of all the terms (words) contained in any submitted answers and a count of the number of times they appear in each submitted answer or the standard answer.
  • Each term will form a dimension in the vector space.
  • three terms within a pair of answers are defined, terms A, B and C.
  • these terms are represented by three orthogonal axes 80, 81 and 82 respectively. It will be appreciated that this process may be readily automated.
  • a vector may be defined for each answer or the standard answer. The vector comprises an ordered list of integers relating to how often each term occurs.
  • Figure 11 represents two answers as vectors 83, 84 within the three dimensional space defined by axes 80, 81, 82.
  • a second step of the vector space model is to calculate the similarity between two vectors.
  • One option is to calculate the Euclidean distance between the vectors giving a result between zero and one.
  • the Euclidean distance is schematically represented in Figure 11 by line 85.
  • a result of zero indicates that two answers share nothing in common and a result of one indicates that (after pre-processing) the answers are the same (at least in so far as they contain the same terms, though not necessarily in the same order).
  • This similarity metric can be used to cluster the answers.
  • An alternative similarity metric comprises calculating the cosine of the angle between two vectors. For the example of Figure 11, this is the cosine of angle 86.
  • Pre-processing includes spelling correction step S 1 (which may be manual or automatic) and removing stop words step S2 (common words of little interest such as "the"). Stop word removal may be performed using a fixed list of stop words. Alternatively, the list of stop words may be defined dynamically during the marking process.
  • a stemming step S3 removes the suffixes of words to leave a common stem, (for example "interpreter” is stemmed to "interpret")-
  • Different weights may be applied to different terms within answers at step S4 in order to increase or decrease their relative importance in determining answer similarity.
  • the weighting can be used to increase the importance of keywords within submitted answers, or may be binary in order to ignore certain terms.
  • agglomerative hierarchical clustering is used to cluster constructed text answers in computer-aided assessment at step S6 of Figure 12.
  • Agglomerative hierarchical clustering begins by assigning each object, in this case each submitted answer, to a separate cluster. In a first step the two most similar clusters are determined. In a second step these two clusters are combined.
  • the process of determining the two most similar clusters is as follows. Each submitted answer (that is, point in a cluster) is stored as a vector. The distance between a pair of vectors may be simply calculated, for instance using a similarity metric such as the cosine of the angle between the vectors. Average linkage may be used as the measure of similarity between two clusters. Average linkage is calculated by calculating for each point within a first cluster the distance from that point to all of the points in a second cluster, and calculating the mean distance. The process is then repeated for all points within the first cluster, and the mean of the calculated mean distances for each point in the first cluster is calculated. The average linkage between two clusters corresponds approximately to the distance between the centres of the clusters.
  • the two steps are repeated until a predetermined stop point is reached.
  • This may, for instance be defined as the point at which the number of clusters is reduced to a predetermined proportion of the initial number of clusters.
  • a measure of the minimum allowable similarity between elements of a cluster may be used to determine the stop point.
  • the metric "average within cluster similarity" is defined as a measure of how similar answers are to each other within any given cluster.
  • the distance between each pair of points within a cluster for instance the cosine of the angle between the pair of vectors defined by the points, is calculated and stored within a two dimensional matrix.
  • the average within cluster similarity is calculated as the mean of the distances between all of the points within the same cluster. A value approaching one indicates that all of the answers within a cluster are closely similar to one another.
  • a minimum average within cluster similarity figure may be defined as the stop point for agglomerative hierarchical clustering.
  • Agglomerative hierarchical clustering may be used in combination with the vector space model, with the vector space model being used to determine the similarity of submitted answers to a standard answer or a particular submitted answer, and agglomerative clustering being used to reduce the number of clusters output from the clustering process at step S7.
  • this is a screen shot illustrating a cluster of answers from a set of submitted answers that have been clustered using agglomerative hierarchical clustering.
  • the screen shot is similar in layout to the screen shot of the marking tool shown in Figure 4.
  • a question portion 90 of the screen displays the questions answered by the students.
  • the question set is "what single measurement would you make to confirm that an individual is anaemic?".
  • the set of submitted answers is clustered as described above in relation to Figure 12.
  • a series are clusters are created and each cluster may be selected to be displayed from a list of clusters 91.
  • a standard answer 92 (labelled as model answer) to the question is displayed. All of the submitted answers 93 within the selected cluster are displayed below the model answer 92.
  • the submitted answers within the selected cluster comprise 13 minor variations of "haemoglobin concentration of the blood”.
  • the selected cluster comprises 11 distinct text strings, which were correctly reduced in number by the preprocessing steps described in relation to Figure 12.
  • clustering in reducing the workload of a human marker in human computer collaborative assessment varies significantly with the type of question. Clustering is most effective at grouping submitted answers that are to be awarded similar marks for very short text answers. However, answers where word order is significant, or where students are required to submit an original example present greater difficulty.
  • the clustering techniques described above offer considerable benefits as has been described. However, the vectors created from submitted answers can become very large. Processing of the type described above and carried out using large vectors is computationally expensive. In order to improve processing performance, various methods can be usefully employed. In some embodiments of the present invention, each submitted answer is compared to a standard answer so as to produce data indicating a difference between the submitted answer and the standard answer. Typically, the difference will be considerably shorter than the submitted answer.
  • the differences can be represented by a vector of lower dimensionality than the submitted answers. For example, if a question requires that a student names three components of a system, many students are likely to identify two components correctly. Only the third, incorrect part of a student's answer will be represented by the difference data and consequently encoded in a vector. When respective difference data items have been created for each submitted answer these difference data items can be represented as vectors and then processed as described above. Given that these vectors have a lower dimensionality, it will be appreciated that the vector processing can be carried out considerably more efficiently.
  • submitted answers are summarised. That is, where answers comprise relatively long and relatively free form text, such answers may be compressed using natural language engineering techniques of text summarisation.
  • Various summarisation methods are known. Such methods operate by reducing the length of text by extracting sentences containing significant terms or keywords, where these keywords are specified in advance or identified automatically.
  • summarising submitted answers in this way it is then possible to process the summarised answers in order to allow clustering of the type described above. In this way, it can again be appreciated that the summarised answers will be of lower dimensionality than the originally submitted answers. Therefore, processing the summarised answers allow processing to be carried out in a more efficient way.
  • a method for processing textual data items by applying to each data item a plurality of methods for automatic summarisation.
  • Many automatic summarisation algorithms perform an analysis of word frequencies in a given text to determine which sentences are most informative and extract those sentences.
  • a number of automatic summarisation algorithms will be known to those skilled in the art. For example, an algorithm based entirely on word frequency, with medium-frequency words being considered to be the best identifiers of the topic of a text is described in Luhn, H.P. 1958: "The automatic creation of literature abstracts", IBM Journal of Research and Development 2(2): 159- 165, April 1958. An alternative algorithm was proposed by Hovy and Lin in 1997.
  • This method is referred to as the location method and is used in the SUMMARIST system.
  • the location method relies on the position of a sentence in the text to indicate its importance and is described in E. Hovy and C-Y Lin: "Automated Text Summarization in SUMMARIST", Proceedings of the Workshop of Intelligent Scalable Text Summarization.
  • Other suitable algorithms are described in Mani, Inderjeet and Maybury, Mark. T. (1999), 'Advances in Automatic Text Summarization', MIT Press, including an algorithm for "chain computing” (pages 111 to 112) which considers how words in a text are related in meaning and cohesion.
  • chain computing pages 111 to 112
  • summaries are generated for a single answer by a plurality of different algorithms.
  • the resulting summaries are compared using techniques such as those described above and the degree of similarity of the summaries to each other is determined.
  • the inventors have surprisingly found by experiment that, for any given text, the degree of similarity of the summaries generated by a plurality of differing summarisation algorithms corresponds closely to a human assessment of the quality of writing style. Experiments have shown that, answers for which summaries generated by different algorithms are very similar are likely to be judged to be well written by human assessors, while answers for which the summaries vary greatly are likely to be judged to be poorly written.
  • sequence alignment techniques are used to match a student's answer to one or more standard answers. In this way, it is possible to get an indication of an answer's correctness automatically.
  • a student's answer is aligned to a standard answer using a modified Needleman-Wunsch algorithm described in Needleman, S. B. and Wunsch, C. D. (1970) 'A general method applicable to the search for similarities in the amino acid sequence of two proteins', Journal of Molecular Biology, 48(3): 443-53, the contents of which is incorporated herein by reference.
  • sequence alignment first creates a similarity matrix, using a process shown in the flowchart of Figure 14.
  • the processing of Figure 14 is described with reference to a simple example, involving aligning a student answer of:
  • step S8 of Figure 14 the standard answer is tokenised into i tokens.
  • the tokens may be word level tokens as in this example, character level tokens or any tokens generated by a tokeniser as is known to those skilled in the art.
  • processing passes from step SlO to step SI l.
  • Steps SI l to S16 populate the remaining cells in the similarity matrix.
  • step SI l two counter variables m and n are initialised to values of 1.
  • the counter variable m counts through the rows of the matrix, while the counter variable n counts through the columns of the matrix.
  • edit(m,n) is a function arranged to calculate an edit distance between the mth token of the student answer and the nth token of the standard answer.
  • the edit distance is defined to be number of operations required to transform the mth token student answer into the nth token in the standard answer.
  • the first and third inputs to the minimum function of equation (1) include an addition of 1.
  • the nature of the algorithm used means that the inclusion of the addition of 1 in the first and third inputs is such that the second input will only provide the minimum if there is no more than one character difference between the mth token of the student answer and the nth token of the standard answer.
  • a gap penalty of 1 is said to be used.
  • the algorithm treats the mth token of the student answer as matching the nth token of the standard answer.
  • the use of a gap penalty of 1 means that a single character difference will still allow particular tokens to be considered as matching. This allows for, for example, simple spelling mistakes.
  • step S13 it is determined whether m references the final row of the current column n of the matrix. If it does not, processing passes to step S 14 where m is set to reference the next row in the current column of the matrix. Processing then passes back to step S 12. If at step S13 it is determined that m does reference the final row of the current column n, processing passes to step Sl 5. At step S15 it is determined whether n references the final column of values in the matrix. If it does not, processing passes to step S16 where n is set to reference to the next column of values in the matrix and m is set to 1 such that it references the first (non initialised) row of values in the current column n and processing passes back to step S 12. If at step S15 it is determined that n does reference the final column of values in the matrix, processing passes to step S17 and the completed matrix is returned.
  • Figure 16 shows the similarity matrix generated for the present example by the processing of Figure 14.
  • the similarity matrix is used to calculate the best global alignment of the standard answer and the student answer. This is accomplished by backtracking through the matrix along the optimum route starting from the bottom right hand corner.
  • the optimum route in the example shown in Figure 16 is highlighted in bold characters.
  • To determine the optimum route through the table of Figure 16 a determination is carried out based upon how the value of a cell of the matrix was determined.
  • the first step is to determine how the value of the current matrix cell [m][n] was calculated. This effectively involves determining which of inputs to the minimum function of equation (1) in the matrix generation stage generated the value for inclusion in the matrix.
  • equation (2) If equation (2) is satisfied, then the next matrix cell to be processed is [m-1] [n-1]. Otherwise, if equation (3) is satisfied, the next matrix cell to be processed is [m-l][n]. Otherwise, if equation (4) is satisfied, the next matrix cell to be processed is [m][n-l]. If more than one of equations (2), (3) or (4) are satisfied, this indicates that there may be more than one optimal alignment. In this case an action corresponding to one of the satisfied equations may be selected arbitrarily.
  • the value in the bottom right hand cell represents the minimum number of modifications required to transform the student answer into the standard answer. The lower the score in the bottom right hand cell, the closer the students answer is to the standard answer.
  • a plurality of standard answers can be provided, each indicating an acceptable answer to a question set.
  • a student's answer can be compared with each of the plurality of standard answers to determine a standard answer which the student's answer matches most closely. This determination can be carried out using the similarity score between the student's answer each chosen standard answer, the similarity score being included in the bottom right cell of the matrix.
  • the similarity score between a student's answer and a standard answer may also be used to automatically assign an initial score to the student's answer. The higher the value of the bottom right cell of the matrix, the lower the assigned mark.
  • embodiments of the present invention may utilise the generated alignment to highlight and classify errors in a students' answer. Referring to Figure 17, this is a screenshot of a marking tool which uses alignment of the type facilitated by the methods described above to support a human marker. The example is based upon a translation question 95 requiring translation of the displayed German language text "Hat sei Orangenction als Aepfel?" into English. A standard answer 96 to the question is also displayed.
  • a set of student answers 97 is shown below the standard answer 96.
  • Each student answer 97a, 97b, 97c, 97d has a corresponding area showing an initial score automatically allocated to the answer by the alignment process described above.
  • the present invention provides facilities for Human Computer Collaborative marking through buttons 99.
  • the buttons 99 allow a human marker to alter the initial marks generated according to the similarity matrix. Errors in the student text may be classified and highlighted according to the error classification key 100 providing differing highlighting for different types of error. In student answer 97a the word "appels" is highlighted to indicate a spelling mistake.
  • 97c errors are highlighted to indicate erroneous words (rather than a spelling mistake).
  • methods are provided to allow annotating of student essays on screen. Assessment of essays on screen often requires markers to annotate the essays, either to give feedback to students, or to identify where marks are gained and lost, or both.
  • the present invention allows a marker to make arbitrary annotations on a particular text submitted by a student. Sequence alignment techniques are used to display the differences between the annotated text and the original. Annotations may be classified into safe annotations and unsafe annotations, hi some circumstances it is important to preserve the original text such as during summative assessment, whereas in other circumstances, such as formative assessment, preserving the original text is often of less importance.
  • Safe annotations are annotations that preserve the original text, such as annotations comprising additions or formatting changes.
  • Unsafe annotations are annotations that do not preserve the original text, such as annotations comprising modifications, or deletions or the reordering of words. Methods are provided to determine if a text has been safely annotated, and if it has not, to show where the unsafe annotations are in the annotated text.
  • step S20 An example algorithm to determine if a text (e.g. an answer) has been safely annotated will now be described with reference to Figure 18.
  • step S21 each text is tokenised into a sequence of words, resulting in two sequences of tokens, a sequence generated from the original text and a sequence generated from the annotated text.
  • step S22 a reference variable m is initialised to reference a first token in the sequence of tokens generated from the original text and a second reference variable n is initialised to reference a first token in the sequence of tokens generated from the annotated text. Processing then passes to step S23.
  • step S23 it is determined whether the mth token in the sequence of tokens generated from the original text matches nth token in the sequence of tokens generated from the annotated text. If the tokens match, processing passes to step S24 where both m and n are both incremented by 1, such that they now reference the next token in each sequence. Processing then passes to step S25 where it is determined whether m references the last token in the sequence of tokens generated from the original text such that all tokens in the sequence of tokens generated from the original text have been processed. If this is the case, all tokens generated from the original text appear, in order, in the sequence of tokens generated from the annotated text and processing finishes at step S26.
  • step S25 If at step S25 it is determined that m does not reference the end of the original sequence, processing passes back to step S23 and the further tokens from the two sequences are compared. If at step S23 it is determined that the mth token in the sequence does not match the nth token in the sequence generated from processing passes to step S27 where n is incremented by 1 such that it references the next token in the annotated text. Processing then passes to step S28 where it is determined if n references the end of the annotated text such that all words in the annotated text have been compared. If this is true then all tokens in the sequence generated from the annotated text have been processed without matching tokens which were generated from the original text.
  • processing then ends at step S28a. If it is determined at step S28 that n does not reference the end of the annotated text, processing passes back to step S23 and the currently referenced words in each sequence are compared.
  • sequence alignment is used to determine where the unsafe annotations occur.
  • Alignment algorithms such as the modified Needleman-Wunsch as described above have a time and space complexity of O(N 2 ). This makes such algorithms too computationally expensive for aligning essays containing thousands of words.
  • step S29 all formatting is removed from the two copies of the text, the original and the annotated version and processing passes to step S30.
  • each text is tokenised into a sequence of tokens, resulting in two sequences of tokens, a sequence M generated from the original text and a sequence N generated from the annotated text.
  • Processing then passes to step S31.
  • unique markers are chosen from each text. The unique markers are each defined as a word or a series of words which appear exactly once in each text. That is, if the two text strings were:
  • the unique markers chosen may be the five tokens ⁇ does, she, like, oranges, apples ⁇ as these each appear exactly once in each text.
  • Unique markers may be individual words as in the current example, bigrams, Ngrams or character sequences independent of the words, as is required by the particular text being processed and the application. For example, when processing a text where there is no token that appears exactly once in each sequence, it will be necessary to use a sequence of tokens, or if the number of tokens appearing exactly once in each sequence is too large, using a sequence of tokens may work to reduce the number of markers.
  • processing passes to step S32.
  • the context of the unique markers is defined.
  • the context of the unique markers is defined as the unique marker itself and all of the tokens up to but not including the next unique marker. Using the above example, the contexts of the marker oranges are: ⁇ oranges, or ⁇ for sequence N; and
  • step S33 Each unique marker and context define individual chunks of the sequences.
  • processing passes to step S33.
  • step S33 an alignment is performed upon the unique markers of the two sequences. Alignment may be performed using standard algorithms, such as the Needleman Wunsch algorithm, providing that the number of unique markers is small enough.
  • step S34 it is determined if the remaining chunks are small enough to be aligned directly using standard algorithms. If this is true, the remaining chunks are aligned using standard algorithms and processing ends at step S35. If this is not the case processing passes to step S35.
  • sequence N is set to be the context defined from the sequence N and sequence M is set to be the context defined from the sequence M. Processing now passes back to step S31 and the contexts of the unique markers in each sequence are now aligned recursively using the steps S31 to S33 until the remaining chunks of sequence are small enough to be aligned directly.
  • the result can be interpreted according to application specific requirements. For example, unsafe annotations can be detected and displayed in the original text.
  • alignment may be used in conjunction with the clustering methods described above. Regular expressions may be utilised alongside alignment to reduce the number of variants and thus make the alignment more effective.

Abstract

The invention relates to various computer aided assessment methods and various methods for processing textual data. One computer aided assessment provided by the invention comprises processing a submitted answer to a question to determine a degree of conformance of the submitted answer to a regular expression indicative of a correct answer format for the question and generating an output signal based upon said processing.

Description

ASSESSMENT METHOD
The present invention relates to assessment methods. In particular, the present invention relates to computer-aided assessment methods. The invention also relates to methods for processing textual data.
It is known to use computers to assist in setting and marking assessments such as examinations and tests in an academic environment. Computer-aided assessment has been shown to be of value in reducing the time taken to mark assessments taken by large numbers of students. In particular, it is known to automate assessments based on selected answer questions. A selected answer question is a question requiring a student to identify one or more correct answers or parts of answers from a plurality of possible answers presented to the student. An example of a selected answer question is a multiple-choice question.
Students sitting a computer-aided assessment based on selected answer questions via a computer (either in a closed assessment room or remotely, for instance over the Internet) are presented with a graphical user interface displaying a question and a number of possible answers. Each possible answer is associated with a selection box. A student can manipulate the selection box associated with each answer they believe to be correct, for instance using a mouse or other input device, in order to indicate their choice(s). It will be readily appreciated that for a selective answer question based assessment, the marking process may be completely automated by simply comparing answers selected by students with a list of correct answers. Computer-aided assessment can completely remove the burden of marking for selected answer questions. Indeed, selected answer questions are predominant in known computer-aided assessment methods.
However, the pedagogical value of selected answer question based assessment is limited. For example, it is generally considered that multiple-choice questions have greater value for formative assessment of students (for instance, informal assessments undertaken throughout a course to consolidate information taught within the course) than for summative assessment of students (for instance, formal assessments at the end of a course, the results of which determine a grade for a student). It is known that it is relatively easy for students to correctly identify a correct answer presented alongside (even closely similar) incorrect answers, compared with the task of recalling a correct answer without prompting. Consequently, in order to support the pedagogical goals of assessment at a higher level, for instance at university level, it is commonly preferred to assess students using constructed answer questions. A constructed answer question is a question that requires the student to construct an answer, for instance in the form of text or diagrams, rather than select an answer from a number of possibilities.
Open questions are defined as questions where there is not a single or a small number of correct answers, for instance where a student is asked to provide an original example or argument. Conversely, a closed question is a question requiring a student to provide a specific piece or set of information. Accurate and fully automated marking of constructed answers is known to be particularly difficult to achieve for open constructed answer questions. Studies conducted using data gathered from real university assessments have shown that a significant source of complexity results from the fact that students may attempt to answer even seemingly simple questions in a large number of diverse ways. Even for short text answer questions, research has shown that a group of students may provide a large number of different correct answers, not all of which may be anticipated when setting the questions. Furthermore, even for correct answers, if students are allowed to enter free text answers there may be a significant amount of redundant information beyond the information required to gain marks.
It is an object of embodiments of the present invention to obviate one or more of the problems of the prior art, whether identified above or elsewhere. Human computer collaborative assessment can be considered to be a sub-field of computer- aided assessment. In human computer collaborative assessment, answers may be constructed (for example using text or diagrams) rather than selected from possible answers. Human computer collaborative assessment techniques can, however, support both selected answer and constructed answer question formats. Specifically, it is an object of embodiments of the present invention to provide improved methods of marking constructed answers utilising a human computer collaborative approach whereby software performs routine processing of a set of answers and a human marker is thus only required to make important marking judgements.
According to a first aspect of the present invention there is a computer aided assessment method comprising: processing a submitted answer to a question to determine a degree of conformance of the submitted answer to a regular expression indicative of a correct answer format for the question; and generating an output signal based upon said processing.
An advantage of the first aspect of the present invention is that by using regular expressions to constrain the format of answers submitted by a group of students undergoing computer-aided assessment, the number of distinct answers (both correct and incorrect) submitted may be reduced. Consequently, identical or closely similar answers may be grouped and marked at the same time, thus reducing the marking time taken, and reducing the workload of assessment markers by reducing the number of distinct marking judgements that need to be made.
The submitted answer may comprise a text answer to the question.
The method may further comprise displaying a warning message if said output signal indicates that the submitted answer does not conform to the regular expression.
The method may further comprise processing the submitted answer to determine a degree of conformance of the submitted answer to a concatenation of two or more regular expressions. Alternatively, or in addition the method may further comprise processing the submitted answer to determine a degree of conformance of the submitted answer to two or more alternative regular expressions.
The method may further comprise receiving a first input indicative of the question; and receiving a second input indicative of a correct answer format for that question; wherein the second input comprises a regular expression.
According to a second aspect of the present invention there is provided a computer implemented method of generating an assessment, the method comprising: receiving a first input indicative of a question to be included in said assessment; and receiving a second input indicative of a correct answer format for that question; wherein the second input comprises the regular expression.
According to a third aspect of the present invention there is provided a computer aided assessment method comprising: comparing a submitted answer to a question with a standard answer to the question to determine a similarity metric for the submitted answer indicative of how similar the submitted answer is to the standard answer; wherein the standard answer comprises a plurality of answer graph objects comprising portions of one or more acceptable answers to the question, said comparing comprising matching each of said answer graph objects to the submitted answer, determining a similarity metric for each answer graph object based upon said matching and determining a similarity metric for the submitted answer based upon the similarity metrics for each answer graph object.
An advantage of the third aspect of the present invention is that portions of submitted constructed answers for questions requiring, for instance, a diagram to be drawn can be automatically matched to portions of a standard or model answer. This improves the efficiency of the marking process, as a human marker is only required to confirm a mark already generated. Furthermore, the standard answer may be refined dynamically to take account of unanticipated correct portions of submitted answers. Thus the consistency and accuracy of marking can be improved.
The submitted answer may comprise a constructed diagram.
The method may further comprise representing the submitted answer as a submitted answer graph object.
At least two answer graph objects may comprise overlapping portions of one or more acceptable answers. Furthermore, the answer graph objects may define a structure of a portion of one or more acceptable answers. The answer graph objects may also, or may alternatively, define one or more parameters of a portion of one or more acceptable answers.
Determining a similarity metric for each answer graph object may comprise determining a mark for each answer graph object. Determining a similarity metric for the submitted answer may comprise determining a mark for the submitted answer.
Said matching may therefore comprise determining the extent to which the structure of each answer graph object corresponds to the structure of a portion of the submitted answer or determining the extent to which a parameter of each answer graph object corresponds to a parameter of a portion of the submitted answer.
Determining a mark for each answer graph object may comprise determining a portion of a mark allocation for that answer graph object according to the extent to which the structure or a parameter of each answer graph object corresponds to a portion of the submitted answer. Determining a portion of a mark allocation may comprise weighting the structure and parameters of the answer graph objects according to relative importance in determining the extent to which an answer graph object matches a portion of a submitted answer. The standard answer may comprise an AND/OR tree comprising a plurality of nodes arranged under one or more AND or OR branches, each node comprising an answer graph object. Determining a mark for the submitted answer may comprise determining a mark for each of a group of two or more answer graph objects arranged underneath an OR node and determining the highest marked answer graph object of that group or determining a mark for each of a group of answer graph objects arranged under an AND node and adding together the marks for each answer graph object in the group. Determining a similarity metric for the submitted answer may comprise determining a mark for all of said plurality of answer graph objects arranged under a root node of the AND/OR tree.
The method may further comprise adding a new answer graph object to said plurality of answer graph objects based upon identifying a new portion of an acceptable answer within a submitted answer.
According to a fourth aspect of the present invention there is provided a standard answer for a question within a computer aided assessment method, the standard answer comprising: an AND/OR tree comprising a plurality of nodes arranged under one or more AND or OR branches, each node comprising an answer graph object; wherein each answer graph objects comprises a portion of one or more acceptable answers to the question. A carrier medium may be provided carrying the standard answer for a question.
According to a fifth aspect of the present invention there is provided a computer aided assessment method comprising: comparing a first submitted answer to a question with a second answer to the question to determine a similarity metric indicative of how similar the first submitted answer is to the second answer; wherein said comparing comprises determining a number of occurrences of a plurality of words within the first submitted answer and the second answer and representing the number of occurrences of the plurality of words within the submitted answer and the second answer as a pair of vectors, the similarity metric being indicative of a relationship between the pair of vectors. An advantage of the fifth aspect of the present invention is that by clustering groups of answers by their similarity to one another or to a standard answer, identical or closely similar answers may be grouped and marked at the same time, thus reducing the marking time taken, and reducing the workload of assessment markers by reducing the number of distinct marking judgements that need to be made. The consistency of marking can also be increased as similar answers can readily be marked at the same time.
The method may further comprise comparing a plurality of submitted answers to the question with the second answer. The second answer may comprise a standard answer to the question. Alternatively, the second answer may be a particular submitted answer.
The method may further comprise determining the number of occurrences within each submitted answer and the second answer of each word contained within any submitted answer or the second answer.
The relationship between the or each pair of vectors may comprise a Euclidean distance between the or each pair of vectors. Alternatively, the relationship between the or each pair of vectors may comprise a cosine of an angle between the or each pair of vectors. As these vectors, representing every word occurring in any answer, rapidly become very large, the method may further comprise a number of techniques for reducing vector size. For example, the words to be represented may be removed or combined.
For example, the method may further comprise correcting the spelling of words within the or each submitted answer and the second answer, thus reducing many variant spellings to a single word. The method may also further comprise removing one or more words within the or each submitted answer or the second answer. The method may also further comprise truncating ("stemming") one or more words within the or each submitted answer or the second answer, thus reducing variant forms of a word (such as singular and plural nouns) to a single word. The method may also comprise summarising the first submitted answer.
Determining a number of occurrences of a plurality of words may comprise weighting one or more words within the or each submitted answer or the second answer. The method may further comprise clustering submitted answers according to the similarity metric for each submitted answer. Said clustering may comprise assigning each submitted answer to a separate cluster and amalgamating pairs of clusters with the most similar similarity metrics. The clustering may further comprise amalgamating pairs of clusters until the number of clusters has been reduced by a predetermined amount. Alternatively, the clustering may further comprise amalgamating pairs of clusters until the variance between the similarity metrics for submitted answers within any cluster exceeds a maximum value. The method may further comprise processing the submitted answer to determine a degree of conformance of the submitted answer to a regular expression indicative of a correct answer format to the question; and generating an output signal based upon said processing.
The method may further comprise displaying a warning message if said output signal indicates that the submitted answer does not conform to the regular expression.
The method may further comprise processing the submitted answer to determine a degree of conformance of the submitted answer to a series of two or more regular expressions.
The method may further comprise processing the submitted answer to determine a degree of conformance of the submitted answer to two or more alternative regular expressions.
The method may further comprise receiving a first input indicative of the question; and receiving a second input indicative of a correct answer format for that question; wherein the second input comprises the regular expression. The method may further comprise a human assessment marker performing a further processing step upon the or each submitted answer based upon an output indicative of said comparison or said clustering.
An advantage of aspects of the present invention is that human markers are freed from much of the routine portions of marking, thus allowing greater time to make more important marking judgements. The accuracy and consistency of marking may also be improved. Furthermore, the overall time for marking a set of answers may be significantly reduced. For instance, embodiments of the present invention have shown that the time to mark a set of answers may be reduced by a factor of two or more. According to a sixth aspect of the present invention, there is provided, a method for comparing first and second data items in a computer system, the method comprising: generating a first difference data item representing a difference between the first data item and a reference data item, generating a second difference data item representing a difference between the second data item and the reference data item, comparing said first and second difference data items to provide data indicative of similarity between said first and second data items.
The inventors have realised that processing difference data items in this way is computationally efficient given that difference data items will typically be shorter than the first and second data items.
The first and second data items and the reference data item may take the form of alphanumeric strings. The first and second data items may be submitted answers to an assessment question, while the reference data item may be a standard answer to the assessment question. In this case, by processing the first and second data items in accordance with the sixth aspect of the invention, it will be appreciated that the difference data items will typically be shorter than the first and second data items and therefore liable to more computationally efficient processing.
Groups of data items may be clustered, for example using methods such as those described above. The clustering may be based upon the data indicative of similarity.
According to a seventh aspect of the present invention, there is provided, a method for clustering textual data items, the method comprising: processing a plurality of textual data items to generate a plurality of summarised textual data items, each summarised textual data item representing a summary of a respective data item, clustering said data items by processing said summarised data items.
The inventors have found that clustering data items in this way offers greater computational efficiency given that the summarised data items are shorter than the originally provided textual data items. Each textual data item is an answer to an assessment question.
The clustering may comprise determining similarities between the summarised data items. Each textual data item may be an answer to an assessment question. At least one of the data items may be a submitted answer to an assessment question. At lest one of the data items may be a standard answer to an assessment question. According to an eighth aspect of the present invention, there is provided, a method of processing textual data, the method comprising: receiving textual data, processing said textual data to generate a plurality of summaries of said textual data, determining similarity between said summaries, and generating an output indicating a property of said textual data based upon said similarity.
The inventors of the present application have surprisingly discovered that comparing a plurality of summaries of particular textual data can provide useful information about that textual data. In particular, where two summaries generated by different summarising algorithms have great similarity this is likely to be an indication of good written style in the textual data. Such a method can be applied in a computer aided assessment method.
Processing to generate at least two summaries may comprise applying a plurality of summarising algorithms to the textual data, each summarising algorithm generating a respective one of the plurality of summaries. At least one of the summarising algorithms may perform an analysis of word frequencies in the received textual data.
The property indicated by the generated output may be an indication of writing style.
The textual data may be an answer to an assessment question.
According to a ninth aspect of the present invention, there is provided, a computer aided assessment method, the method comprising: receiving a student answer to an assessment question, accessing a standard answer to said assessment question, aligning tokens of said student answer to tokens of said standard answer, and outputting data for use in said assessment based upon said alignment.
The data based upon the alignment may indicate differences of word order between the student answer and the model answer.
Outputting the data based upon the alignment may comprise displaying the student answer on a display device, displaying data based upon the alignment in connection with the student answer, and highlighting tokens of the student answer.
Aligning tokens of the student answer to tokens of the standard answer may comprise generating a similarity matrix and generating alignment data based upon the similarity matrix. The similarity matrix may be generated using an algorithm based upon the Needleman-Wunsch algorithm.
The method may also comprise generating a score associated with the student answer based upon the alignment. Generating the score may comprise processing data indicating a number of differences between the student answer and the model answer. According to a tenth aspect of a present invention, there is provided, a computer aided assessment method comprising: receiving a student answer to an assessment question, receiving an annotated version of said student answer to said assessment question, processing the student answer and the annotated version of the student answer to identify annotations to said student answer, said processing comprising determining whether the annotations comprise deletion of text included in said student answer.
The processing may comprise applying a sequence alignment algorithm. The sequence alignment algorithm may comprise identifying a marker occurring exactly once in each of the student answer and the annotated version of the student answer and the method may further comprise aligning the marker in the student answer with the marker in the annotated version of the student answer.
The method may further comprise identifying a plurality of markers, each marker occurring exactly once in each of the student answer and the annotated version of the student answer; aligning a marker together and tokens of the student answer with a corresponding marker and tokens of the annotated version of the student answer, the tokens of the student answer occurring between the marker and a subsequent marker in the student answer.
Embodiments of the present invention may be implemented in software. For example a carrier medium carrying computer readable code for controlling a computer to carry out the above aspects of the invention may be provided. Alternatively, a computer apparatus comprising a memory storing processor readable instructions and a processor configured to read and execute instructions stored in said memory may be provided. The processor readable instructions stored in said memory may comprise instructions controlling the processor to carry out the above aspects of the invention.
The computer apparatus may comprise bespoke hardware arranged such that a suite of assessment setting and marking tools are provided on a bespoke server computer, suitable for access by client software provided on a bespoke client computer.
The invention may be implemented in any suitable form including as a system, method or apparatus.
Embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which: Figure 1 schematically illustrates a computer network suitable for implementing embodiments of the present invention;
Figure 2 schematically illustrates a computer network suitable for implementing a marking process for a computer-aided assessment in accordance with an embodiment of the present invention;
Figure 3 is a screen shot of a question screen displayed to students undergoing a computer-aided assessment in accordance with embodiments of the present invention;
Figure 4 is a screen shot of a marking tool for a computer-aided assessment in accordance with an embodiment of the present invention;
Figure 5 is a screen shot of a user interface for a selecting a regular expression to constrain an answer to a question in accordance with an embodiment of the present invention;
Figure 6 is a screen shot of a diagram drawing tool used for submitting constructed diagram answers in accordance with an embodiment of the present invention;
Figure 7 is a screen shot of a palette builder for preparing the diagram drawing tool of Figure 6;
Figure 8 illustrates a diagram matching metaformat for storing a standard answer for a constructed diagram question in accordance with an embodiment of the present invention;
Figures 9A and 9B illustrate alternative correct answers to a constructed diagram question for which the standard answer is defined by Figure 8;
Figure 10 schematically illustrates a possible submitted answer to a constructed diagram question for which the standard answer is defined by Figure 8;
Figure 11 schematically illustrates the process of determining a similarity metric between two answers using a vector space model in accordance with an embodiment of the present invention;
Figure 12 is a flow chart representing the process of determining a similarity metric between two answers using the vector space model in accordance with an embodiment of the present invention;
Figure 13 is a screen shot of a cluster of submitted answers, clustered according to the process of Figure 12; Figure 14 is a flow chart of a process for generating similarity matrix;
Figure 15 is an example similarity matrix after initialisation;
Figure 16 is a completed similarity matrix;
Figure 17 is a screenshot from a computer aided assessment application in which sequence alignment has been applied to answers;
Figure 18 is a flow chart showing a process for identifying annotations to a submitted answer; and
Figure 19 is a flowchart showing a sequence alignment algorithm.
In accordance with embodiments of the present invention a human computer collaborative assessment system is provided. Human computer collaborative assessment can be considered to be a sub-field of computer-aided assessment. In human computer collaborative assessment, answers may be constructed (for example using text or diagrams) rather than selected from possible answers. Human computer collaborative assessment techniques can, however, support both selected answer and constructed answer question formats. Marking is a process of active collaboration between software and a human marker. Human computer collaborative assessment methods offer significant benefits over both traditional paper based assessment and conventional forms of computer-aided assessment. Such methods offer flexibility in the manner in which questions can be set, allowing the form of answers to be constrained by educational rather than technological considerations. The speed of marking can be significantly increased. Consistency and accuracy of marking may also be improved. Advantageously, the reasoning behind the allocation of marks can also be explicitly recorded.
In accordance with embodiments of the present invention a series of computer programs are provided for setting assessment questions, taking a computer aided assessment, administering a computer aided assessment and marking answers submitted in response to questions in an assessment. The programs form four main components. A first component is an assessment system (or exam client) in which students are presented with questions via a graphical user interface, and given the opportunity to enter their answers. A second component is an assessment setting tool allowing an assessment marker to prepare questions to be presented within the assessment system. A third component is a marking tool allowing an assessment marker to mark submitted answers from a group of students. A fourth component comprises an administration tool, which provides the facility to control and invigilate exams.
All four components can be implemented on a server computer, which may be accessed by client computers. The client computers may be directly connected to the server in the event that an examination is taking place within a closed assessment room. For example, the server computer and client computers can be connected to a local area network (LAN). Alternatively, remote access to the server may be provided, for example using the Internet. However, it will be appreciated that all three components may be provided in a stand alone configuration. That is, it is not necessary to be connected to the server computer when marking or setting an assessment.
Referring now to Figure 1 a computer network suitable for implementing embodiments of the present invention is schematically illustrated. A server computer
1 is provided. The server computer 1 runs the assessment system, the administration tool and the assessment setting and marking tool. A plurality of client computers 2 is shown. Each client computer 2 may be used by a student to access the assessment system when sitting an assessment. Alternatively, an assessment marker when marking a set of submitted answers, or for preparing a new assessment may use a client computer 2. Finally, a client computer 2 may be used by an assessment invigilator to initiate and control a computer aided assessment. The client computers
2 and the server computer 1 are connected to the Internet 3 allowing remote access by the client computers 2 to the server 1. Data and instructions communicated between a client computer 2 and the server 1 are exchanged via the Internet 3.
Referring now to Figure 2, this illustrates a configuration suitable for using the marking tool implemented by server computer 1 to mark submitted assessment answers in accordance with embodiments of the present invention. A client program
4 operating on a remote client computer 2 is arranged to access the server 1. The client program 4 may be a web browser or a Java application. The server 1 stores an answer set 5, comprising a standard answer file 6 and a set of submitted answers 7. The answer set 5 is stored in an extensible mark up language (XML) format.
Additionally, other data such as the standard answer file or the questions may also be stored in XML format. Storing data in XML is advantageous because it is a standardised and portable data storage format. This allows for flexibility in that the assessment system is not restricted to a single proprietary database format. As well as portability, XML is advantageous because it is an efficient, lightweight and robust data format.
The standard answer file 6 comprises one or more standard answers to each question, and optionally feedback to be presented to a student in response to their answer. Such feedback is typically used in formative, rather than summative assessments. The standard answer file is used to partially automate part of the marking process in a human computer collaborative assessment method in accordance with embodiments of the present invention. The standard answer file and the marking process will be described in greater detail below.
The submitted answers 7 are received from client computers in communication with the server 1, are processed by the server 1 and passed to the client program 4 for marking. Marking may be an iterative process, for instance if a human marker 8 operating the client program 4 changes marking parameters (as discussed below). After marking of a question or a group of questions is completed the marks may be passed to the server 1 and output as a comma separated file 9 for further processing 8 (for instance, using a spreadsheet application). It will be appreciated that a comma separated file is only one of a large number of known standardised output formats that could be used. It will be appreciated that the client and server can be arranged in any convenient way. For example, the client program may take the form of a web browser as described above and in such a case the server 1 may provide appropriate web pages which are displayed to the client. Such web pages can be configured to receive data and transmit that data back to the server 1. The use of webpages in this way will be readily understood by one of ordinary skill in the art.
It will be appreciated that the arrangement of Figure 2 is merely exemplary. For instance, setting and marking of assessments may be performed off line (that is, when a client computer 2 is not connected to the server computer 1). In order to mark an assessment all that is required is that the marking tool, the assessment (that is, the questions), the standard answers and the set of submitted answers are downloaded to a client computer from the server computer via the administration tool. Once these are downloaded, the marking can be performed offline on any computer, with the marks uploaded, if required, to the server computer at a later time. Similarly, the assessment setting tool may also operate in a stand-alone configuration allowing an person setting an assessment to create questions and standard answers off line. Indeed, in preferred embodiments of the present invention, marking is carried out as an offline process.
Although the description of Figure 2 relates to operation of the marking tool it will be appreciated that the assessment system used by students when sitting an assessment can be implemented in a similar way. That is, a student sitting an assessment may access the assessment system by simply using a web browser to download and navigate appropriate webpages. This has the advantage that the user interface, which is to be presented to a user sitting an assessment, is provided from the server. Accordingly, there is no specific software to be installed and maintained at the client computer; rather the interface can easily be updated by simply making appropriate changes at the server.
Furthermore, it will also be appreciated that the administration tool used by assessment invigilators can be implemented in a similar manner to the marking tool operation illustrated in Figure 2.
Assessments are generated by an examiner constructing a series of questions, which may be hierarchically structured into sections, questions, and part questions. Upon sitting an assessment, a student or a group of students must first log onto an assessment system. Assessments may be conducted in a controlled assessment hall, similar to conventional paper based assessments, or students may be able to sit the assessment by logging to the assessment system from a remote computer. After providing the assessment system with appropriate credentials uniquely identifying the student, the student is presented with instructions for sitting the assessment.
It may be possible to continue with a previously saved partially completed assessment. When required to start the assessment each student can begin by selecting a "start test" button. Partially completed assessments can be continued with both for formative assessments and also for summative assessments conducted under open conditions (that is, a student being allowed to leave the assessment room) over a longer period of time. Furthermore, a key advantage of being able to continue with partially completed assessments is that it allows recovery from a backed up copy of the partially completed assessment in the event of technical problems during an assessment. The state of a partially completed assessment can be saved in one of two ways. Firstly, students being assessed may be given the opportunity to manually initiate a back up periodically. Alternatively, back ups may happen automatically. For a short duration formative assessment automatic back ups may occur every few minutes such that if a students computer fails in the middle of an assessment they may simply move to a new computer and carry on, having lost relatively little work. For a longer duration assessment, such as an open assessment a back up may be initiated whenever a change in the state of the completed assessment is detected in order to avoid large numbers of identical backups. When a back up is initiated the client computer sends a Java serialised data stream representing the current state of the student's work to the server computer. The server computer is responsible for converting the serialised data to XML and storing the data as a single file per student per back up. Alternatively, or in addition, the current state of student's answers may be locally backed up on the client computer for additional robustness.
During an assessment a student is presented with a main assessment screen, an example of which is depicted in Figure 3. The main assessment screen is split into a question area 10 and a button area 11. The question area 10 provides a series of questions 12a, 12b, and 12c arranged as tabs across the top of the question area 10. Alternatively, questions may be navigated by selecting questions from a list (not shown in Figure 3). Alternative methods of navigating the questions within an assessment, including buttons linking between questions and scrolling, may be provided. It can be seen that the question 12a is arranged as a series of part questions 13 that students are required to answer. Answer areas 14a, 14b for the student to enter their answers are provided. Two types of answer area are shown in Figure 3. A first type comprises a text entry box 14a allowing the student to type a constructed answer to the preceding question. A second type comprises a group of selection boxes 14b allowing the student to select one or more correct answers from a group of possible correct answers. A diagram drawing tool for constructing diagrammatic answers to questions is depicted in Figure 6 and described below.
The button area 11 displays an indication of the time left 17 for answering questions. A button 18 labelled "view rubric" allows the student to return to a screen displaying the instructions for sitting the assessment. A button 19 labelled "finish test" allows the student to save their answers upon finishing the assessment. Partially completed answers are also regularly saved automatically throughout the duration of the assessment.
To allow questions in the question area 11 to be answered, a series of additional utilities (not shown in Figure 3) may allow students to select the language of text input, insert special characters (in particular mathematical symbols) and provide tools to assist students, such as a calculator.
For setting questions for a new assessment, the assessment setting tool is used, which provides tools for performing all of the essential administrative tasks of setting up and running a computer-aided assessment. When preparing a new assessment an assessment file is generated containing the questions and all other material presented to a student when sitting the assessment. A second optional file may be prepared containing standard or model answers to the questions, and possibly standard feedback to be presented to students either when the marks to the questions are returned or during the assessment (particularly in the case of formative assessments).
The standard answer file may be used during the marking process in accordance with embodiments of the present invention to allow the marking tool to automate parts of the marking process, as will be described in greater detail below. The standard answer file contains a representation of at least one possible answer and the marks associated with each part of the answer, hi principle this is similar to a conventional paper based marking schedule. The marking tool also allows repeated answers and part answers to be identified, allowing for more efficient marking as will be described in greater detail below.
For human computer collaborative assessment the standard answer file, alternatively referred to as the answer representation (the representation of possible answers and their marks), may be enlarged and refined during the marking process and not fixed in advance. That is, as answers are processed and further correct answers are identified such correct answers may be added to the answer representation. For assessments where questions are reused (for instance from assessments used to assess a previous year group of students) the existing answer representation may be reused providing further efficiency gains. If an answer representation shows that there are a large number of alternative correct answers this can be used to tighten up the question statement and thereby improve the quality of the question.
During the marking process the marking tool allows markers to, amongst other things, group questions across the whole set of students' answers in order to make the marking process more efficient. The assessment file and a set of answers file (together with the standard answers file) are loaded into the marking tool. The marker is then presented with a series of optional tools for automating parts of the marking process. A suitable marking screen is shown in Figure 4.
As shown in Figure 4 the main marking screen is split into three main sections. A question area 24 displays the question currently being marked. An answer area 25 displays one or more submitted answers 26 to the displayed question 24, including a standard answer 27. A question tree 28a allows navigation between questions organised hierarchically into sections, questions and part questions. Further methods of navigating between questions, such as buttons, may be provided. It can be seen that answers from all students who attempted a part of a question can be displayed together in the answer area 25. This can improve the consistency of marking by allowing the marker to mark all answers to a single part of a question at the same time. For performance reasons the answers may be displayed in blocks of answers if there are a large number of answers to a question. For instance, submitted answers may be display twenty at a time. Alongside each displayed answer 26 is a mark box 29 allowing the human marker to enter a mark for that answer.
Computer-aided assessment methods in accordance with embodiments of the present invention provide significant advantages over conventional marking of handwritten answers. Elimination of the need to decipher handwriting, and the overhead of sorting through large numbers of separate answer papers significantly speeds up the marking process. Marking all answers to part questions together serves to improve consistency of marking as well as to improve efficiency.
After a set of answers have been opened, the marker may be presented with the option of allowing automatic marking for certain answers to certain questions. Automatic marking could include awarding a mark of zero if a student does not answer a question. For assessments in which a group of questions are presented and students are only required to submit answers to some of the questions, a mark of zero may be awarded for any answers over and above the number of questions a student is required to answer. Furthermore, if any multiple-choice questions or other types of selected questions are included then these may readily be automatically marked. Dynamic marking may be used for certain types of answers. If a first answer is marked, then all identical answers can be automatically awarded the same mark. This is particularly applicable for answers consisting of a single word or number or a small number of words or numbers.
In order to speed up the marking process buttons may be displayed allowing the marker to sort the answers by marking status (marked/unmarked), by length or by similarity to keywords in the standard answer or another submitted answer (for instance sorting by the number of keywords contained in the submitted answers). The last option requires the marker to define keywords within the standard answer either during the question setting process or dynamically during marking of an assessment (for instance if an important, but unanticipated, word appears frequently in a number of students answers). Keywords can be used to reflect key pieces of text in an answer that may attract marks. Similarly keywords can be used to identify common misconceptions.
As well as matching individual keywords, multiple word key phrases may be matched. Keywords may be highlighted in the answers, or used for a simple similarity sort as discussed above. Keywords may be matched only when there is an exact correlation between the text in an answer and the keyword. Alternatively, when setting keywords an examiner can define a regular expression to take into account possible variations in spelling a word or the order of words in a key phrase. A further option is to define a maximum error tolerance or edit distance for a word to match. For instance an edit distance of two corresponds to two small errors such as two letters being transposed. An edit distance may also be defined as being proportional to the length of the keyword being matched. Word variants, including misspellings, and context dependent synonyms limit the usefulness of keyword sorting. The use of an edit distance can mitigate these problems although this is achieved at the risk of inaccuracy. Additionally a list of expected synonyms can be applied. Context dependent synonyms may not always be predictable in advance, although in accordance with embodiments of the present invention once a new context dependent synonym is identified in a submitted answer the answer representation contained in the standard answer file can be updated by adding the synonym as a new keyword. As discussed above, when applying keywords to highlight and sort constructed text answers by keywords, keywords within submitted answers may be matched by using a regular expression to account for variations in spelling of the keyword. In order to reduce the variability of constructed answers, the form of a constructed answer could be constrained, rather than allowing arbitrary free text. For instance, the user interface could be constrained to only allow numbers (for instance) in a particular format. Alternatively, for instance, for a question requiring the result of a calculation as the answer, the answer could be constrained to be a number typed into a box of limited size. In accordance with embodiments of the present invention, regular expressions can be used to apply constraints to what students can enter within a constructed text answer in a computer-aided assessment. For instance, and according to the type of question being asked, the answer format may be constrained, for instance to specified number formats (for instance, defining the format of an exponent), date formats, or more generally (for instance specifying that the answer should be no more than four words long). Constraining answer formats according to a regular expression offers the opportunity to increase the degree to which answers can be automatically marked. Should a student attempt to write an answer in the wrong format, one possible approach is to allow this but trigger a warning message to the student. Alternatively the student may be prevented from submitting that answer for marking.
When generating questions an examiner can define one or more regular expressions defining the required format of answers. In an embodiment of the present invention the Java regular expressions package is used. Examiners may either pick from a predetermined list of possible regular expressions, or advanced users can define a regular expression manually. When a regular expression for an answer format is defined a description of the required answer format can optionally be automatically inserted at the end of the question. Alternatively, the examiner can manually enter a description of the answer format.
Referring to Figure 5, this is a screenshot illustrating an interface forming part of the assessment setting and marking tool allowing a marker to define a regular expression to constrain the format of submitted answers for the question. A first drop down menu 30 is provided allowing the marker to choose a general category for a regular expression, for instance "number" as shown in Figure 5. A second drop down menu 31 allows the marker to choose a predefined regular expression, for instance "unsigned integers". An example of an answer meeting the constraint selected in drop down menu 31 is shown in text area 32. Alternatively, a customised regular expression can be generated by selecting button 33 labelled "custom". Selecting button 33 opens a further window allowing the user to define a new customised regular expression.
Button 34 labelled "add" allows the marker to add the regular expression selected from drop down menu 31 or custom defined to the regular expression generated so far for the answer format. By selecting button 34 a regular expression comprising a series of concatenated regular expressions can be built up. Three tabs are provided relating to the regular expression generated so far. A first tab 35a displays a text description of the constraint to be applied to the answer. In the example of Figure 5 a regular expression has been built up comprising three concatenated regular expressions. A submitted answer is required to have three parts: an alphanumeric string followed by a custom generated regular expression manually created by the marker and finally an unsigned integer. A second tab 35b displays the text to be displayed to a candidate student taking the assessment explaining the answer constraint. A third tab 35c defines the answer constraint in the form of a formal regular expression. Text box 36 allows the marker to test the answer constraint by entering a test answer and then selecting the test button 37. In the example of Figure 5 the marker has entered an alphanumeric string "apple" followed by a period "." (which may conform to the required custom generated regular expression) and then an unsigned integer "43". When a marker has entered a sample answer into text box 36 and selected test button 37 a message is displayed confirming whether or not the test answer meets the selected regular expression. Buttons 38a, 38b, 38c allow the user to undo the last change to the generated regular expression shown in tab 35a, redo any undone change or clear the whole generated regular expression respectively.
Button 39 labelled "apply marking constraint" allows the marker to apply the selected regular expression to constrain answers submitted to that question. Button 40 labelled "cancel" allows the user to cancel a previous application of the regular expression to the current question (which may restore any previous regular expression that has been overridden. A checkbox 41 is provided labelled "allow additional spaces in the answer".
If this checkbox is selected then any additional blank spaces in a students answer are ignored when applying the regular expression constraint. For certain types of answers any additional blank spaces are irrelevant (for instance, some forms of mathematical answer).
A further area 42 labelled "add current selection" of the interface illustrated in Figure 5 allows advanced options for creating regular expressions, as an alternative to the method described above whereby selecting button 34 adds a selected regular expression to the regular expression already created. The advanced option allows the creation of sub-expressions, which can be built up recursively. Each sub-expression is wrapped within parentheses, as is the overall expression so far before the new subexpression is added.
A sub-expression is generated as described above by either choosing a regular expression category and predefined expression using menus 30 and 31 or custom defining a regular expression via button 33. The generated regular expression appears is text field 43, which is labelled "add". Drop down menu 44 allows that subexpression to be added either at the end or the overall regular expression so far (in which case it is simply appended to the end) or as an alternative to the overall regular expression so far (in which case the logical OR operator is applied between the existing regular expression and the new regular expression).
Button 45 labelled "create group" operates by opening parentheses when generating a new sub-expression in text field 43. Button 46 labelled "exit group" operates by closing parentheses when generating a new sub-expression in text field 43. Button 47 labelled "insert" functions as the add button 33 in that what is currently selected in text field 43 is added to the overall regular expression answer constraint.
The use of regular expressions to constrain answer formats typically significantly reduces the number of distinct answers submitted by large groups of students, resulting in an increase in speed of marking due to the same marks being allocated to identical answers after a first example has been marked. Embodiments of the present invention allow students to answer questions by submitting constructed diagrams. In this context, constructed diagrams are considered to be diagrams that are composed from a set of basic components that may be tailored with labels (inside or outside of the component). Diagrams are an important part of many assessments. Diagrams formed from boxes and connectors can readily be drawn using conventional computer input devices. The resulting structures can be matched against each other to determine similarity. It is known that a simple heuristic process is effective in finding similarities between such diagrams, for instance between a submitted diagram and a standard answer diagram.
Many types of diagrams such as UML class diagrams, entity-relationship diagrams, chemical molecules and electronic circuits can be represented as boxes joined by connectors. Such a structure can in turn be represented by a graph. A graph is defined as a series of vertices, for which pairs of vertices are connected by edges. In the present context, each box is a vertex and each connector is an edge. In accordance with embodiments of the present invention graphs are matched in order to assess submitted answers and determine marks for constructed graphs in an assessment. A human computer collaborative approach is used, as discussed above, to present results more efficiently to a human marker, rather than fully automatic marking. For instance, diagrams that are functionally identical (that is, they can be represented by the same graph) are clustered, thus reducing the workload of a marker.
In accordance with embodiments of the present invention, if students are expected to submit a diagram as an answer to a question in a computer-aided assessment the assessment system further comprises a diagram drawing tool. An exemplary diagram drawing tool is shown in Figure 6. The diagram tool consists of a palette 50 and a drawing area 51. The palette 50 is further divided into a box palette 52 and a connector palette 53. In order to minimise the potential complexity of submitted answers the number of boxes and connector types is limited. In order to construct diagrams within the drawing area boxes may be selected from the boxes palette 52 and placed anywhere in the drawing area 51, for instance by dragging a box using a mouse. Connectors can then be selected and used to join boxes together. Both boxes and connectors can be labelled using standard keyboard characters by selecting a label editor 54. Special characters can be incorporated within a label by using a special character editor 55. The palette is fully customisable when setting an assessment, and boxes and connectors can be created according to the kind of diagram expected within the answer. Palettes are created using a palette builder tool. A screenshot of an exemplary palette builder tool is shown in Figure 7. A first portion of the palette builder comprises a palette editor 60 where new boxes and connectors are defined. Test drawing area 61 allows a partially completed palette 62 to be experimented with. A series of selectable menus allow aspects of the boxes and connectors to be defined, including defining label locations 63 and defining line styles 64. In order to allow for versatile partially automated marking of constructed answers a generic representation metaformat for storing complex structures as graph objects has been defined in accordance with embodiments of the present invention. The metaformat is particularly applicable to constructed diagram answers, however it will be appreciated that it may be used for matching any kind of constructed answer against a predefined but dynamically extendable and adjustable model. As well as diagrammatic constructed answers, the generic metaformat has particular utility for matching constructed answers comprising mathematical expressions, software programs and short factual text. The metaformat, referred to herein as a Gree comprises a dynamically extendable AND/OR tree for which the leaf nodes are overlapping graph object fragments. The Gree metaformat allows for extendibility, reusability and modularity of a predefined standard answer. The Gree metaformat can efficiently represent data structures along with any extra information needed. Submitted answers can be compared to the standard answer stored in the Gree metaformat and partially or fully matched. Alternatively, submitted answers may be compared with another submitted answer.
The process of matching constructed answers to a standard answer using the Gree metaformat comprises a modular scoring approach. Parts of the standard answer are separately matched to parts of a submitted answer. This allows a marking scheme to be accurately defined (though later dynamically amended if necessary) in which the process of awarding individual marks for the presence or absence of certain features of an answer is explicitly recorded. Multiple alternative acceptable parts of an answer, each of which would earn the same portion of the marks, may be defined. Furthermore, different parts of the standard answer may be weighted differently according to their relative importance in terms of providing a correct answer to the question.
The use of the Gree metaformat in marking constructed answers, in particular for constructed diagram answers, allows marking judgements to be made in a systematic and consistent manner across a large number of submitted answers. The matching process may alternatively be considered to be a simplified yet quantitative unordered regular expression check for complex structures in which the Gree is the pattern and a submitted answer converted to an answer metagraph is the subject.
Referring to Figure 8, this represents a standard answer represented as a Gree for a question in which students are asked to draw a skeleton class diagram defining the relationships between five defined classes relating to the operation of an online book information system. A number of correct answers to the question are possible, two of which are shown in Figures 9 A and 9B. The Gree shown in Figure 8 is intended to show concisely the range of possible answers by separating out individual portions of correct answers.
The Gree metaformat shown in Figure 8 comprises a series of nodes 70 labelled A to H. Each node 70 comprises a possible part of a correct answer. Each part answer comprises a portion of an acceptable class diagram depicted diagrammatically in combination with a series of parameters of the answer portion defining aspects of the answer portion that may be matched in order for the node to be matched. The nodes 70 are connected in a tree structure by an AND node 71 and three OR nodes 72. For a submitted answer to be awarded all available marks each part answer node 70 must form part of the submitted answer if that node 70 is placed directly under an AND node 71. For a group of part answer nodes 70 placed directly under an OR node 72, only one of the group of part answer nodes 70 need form part of the submitted answer if the submitted answer is to be awarded all available marks. For the Gree example of Figure 8, to be awarded full marks a submitted answer must thus contain the following part answer nodes: (A OR B) AND C AND (D OR E) AND F AND (G OR H). Each part answer node 70 includes the marks to be awarded if that part answer is included in the submitted answer. The content of the part answer nodes may comprise partially overlapping elements of a submitted answer. For instance, the alternative part answers shown in nodes A and B both contain a chapter class and a book class. Nodes A and B are directed to marking the relationship indicated between the chapter and book classes. The same two classes also appear in nodes C and F, which are concerned with awarding marks for the relationship between the chapter and book classes and other classes. In order to match a submitted answer, such as is shown in Figure 10, using the Gree shown in Figure 8, the matching process starts by considering the Gree's root node and continues visiting the nodes down the tree in a predetermined manner. Once a part answer node 70 is encountered (i.e. a node that is not an AND node 71 or an OR node 72) a score for that node compared to the submitted answer is calculated. The allocation of marks can be weighted for each node differently, according to the relative importance for that node of the structure of the diagram fragment or the attributes listed in the node for that part answer. For all nodes 70 depending directly from an OR node 72 the node 70 providing the highest score is considered to be the closest match and is thus considered to be the score for that OR node 72. For all nodes depending directly from an AND node 71 the scores are added. Once the matching process has completed the score given by the root node (in the example of Figure 8, the single AND node 71) is the mark awarded to the submitted answer.
As noted above, the Gree metaformat for the standard answer can be adapted dynamically during the marking process in the light of previously unconsidered alternative correct parts of submitted answers. Such adaptation involves adding additional nodes 70 to the Gree. For instance, an additional alternative diagram fragment may be added to the Gree underneath an OR node 72. Alternatively, the whole Gree may be revised, for instance by splitting the marks allocated to one node of the Gree into two parts and generating two new nodes to replace that node arranged under an AND node such that both nodes must be matched in order to achieve all of the available marks.
The process of adding a new node to the Gree or revising an existing node dynamically during marking of an assessment is the same as the process of generating a new Gree. Once a Gree has been amended answers that have previously been marked may be automatically remarked in order to take account of the changes. For instance, if another student has submitted an answer that also contains a portion corresponding to the node added to the Gree then the mark awarded to that answer is likely to change. This represents a significant improvement over conventional paper based marking methods, for which if the marking scheme changes part way through the process of marking a large number of answers there is no choice but to review all answers marked so far to determine if the marks awarded must be changed. The structure of a Gree is stored in XML allowing it to be imported into the marking tool described above in connection with Figure 2. The marking process is able to continue in the same manner as for the marking of multiple submitted text answers. For instance, portions of submitted answers that match different nodes 70 of the Gree may be highlighted. A human marker is able to review the marks awarded automatically by matching the Gree to a submitted answer, and either amend the marks or amend the Gree as appropriate. As discussed above, the submitted answers could be grouped automatically by matching to the Gree standard answer before presentation to the human marker. As discussed above, answers may be sorted by keyword similarity to a standard answer (or to a particular submitted answer). One method of sorting a large number of answers by similarity to keywords is to measure the occurrence of keywords or phrases in order to try to capture the essential structure of constructed answers. In accordance with embodiments of the present invention the distance between keywords or phrases is also measured in order to sort constructed answers.
Answers may be grouped according to relative similarity to keywords within a standard answer based on generic text clustering techniques drawn from natural language engineering. Clusters can be displayed in an area 28b (Figure 4). Natural language engineering techniques can be used in computer aided assessment for sorting answers.
In accordance with embodiments of the present invention lightweight, robust, generic natural language engineering techniques for text clustering are applied within a human computer collaborative assessment method in order to streamline the process of marking constructed text answers. Clustering is the process of grouping similar objects together. A measurement of similarity or distance is used to group objects within a set into subsets or clusters. Clustering is known in other fields such as bioinformatics for finding the closest neighbours of a document, or for organising search engine results on the Internet. In the context of computer-aided assessment clustering of text answers offers a number of benefits. Clustering similar answers can help the human marker as it provides a review mechanism to check that marking is consistent, and potentially offers a basis for rapid formative feedback by allowing the marker to provide feedback per cluster, rather than individually per answer. Embodiments of the present invention relate to the clustering of complete texts of short answers.
Clustering answers offers a trade off. The larger the clusters the fewer discrete marking judgements need to be made. However, for large clusters there is less similarity between the answers within the cluster. Particularly for summative assessment, it is commonly necessary to have a high degree of accuracy, which typically can limit the size of clusters.
A measure of similarity known for use within information retrieval systems is the vector space model. Documents are expressed as vectors within an n-dimensional space where n is the total number of unique terms (words) that appear in any of the documents in a set of documents to be clustered. The similarity between two documents is calculated as the distance between their respective vectors. In accordance with an embodiment of the present invention the vector space model is applied within a human computer collaborative assessment method for calculating the similarity of two answers (one of which may be a standard answer) to a question.
Referring now to Figure 11, a simplified example of how the vector space model can be used to determine how similar a first submitted answer is to a second submitted answer.
A first step of the vector space model is the creation of a term by document matrix, comprising a list of all the terms (words) contained in any submitted answers and a count of the number of times they appear in each submitted answer or the standard answer. Each term will form a dimension in the vector space. In the example of Figure 11 , three terms within a pair of answers are defined, terms A, B and C. In Figure 11, these terms are represented by three orthogonal axes 80, 81 and 82 respectively. It will be appreciated that this process may be readily automated. A vector may be defined for each answer or the standard answer. The vector comprises an ordered list of integers relating to how often each term occurs. Figure 11, represents two answers as vectors 83, 84 within the three dimensional space defined by axes 80, 81, 82. A second step of the vector space model is to calculate the similarity between two vectors. One option is to calculate the Euclidean distance between the vectors giving a result between zero and one. The Euclidean distance is schematically represented in Figure 11 by line 85. A result of zero indicates that two answers share nothing in common and a result of one indicates that (after pre-processing) the answers are the same (at least in so far as they contain the same terms, though not necessarily in the same order). This similarity metric can be used to cluster the answers. An alternative similarity metric comprises calculating the cosine of the angle between two vectors. For the example of Figure 11, this is the cosine of angle 86.
Several pre-processing steps can be performed in order to improve performance. Referring to Figure 12, this schematically represents in the form of a flow chart the process of clustering answers in accordance with an embodiment of the present invention, including pre-processing steps SO. Pre-processing includes spelling correction step S 1 (which may be manual or automatic) and removing stop words step S2 (common words of little interest such as "the"). Stop word removal may be performed using a fixed list of stop words. Alternatively, the list of stop words may be defined dynamically during the marking process. A stemming step S3 removes the suffixes of words to leave a common stem, (for example "interpreter" is stemmed to "interpret")- Different weights may be applied to different terms within answers at step S4 in order to increase or decrease their relative importance in determining answer similarity. The weighting can be used to increase the importance of keywords within submitted answers, or may be binary in order to ignore certain terms. The process of determining a similarity metric as discussed above follows pre-processing steps SO at step S5
In some embodiments agglomerative hierarchical clustering is used to cluster constructed text answers in computer-aided assessment at step S6 of Figure 12. Agglomerative hierarchical clustering begins by assigning each object, in this case each submitted answer, to a separate cluster. In a first step the two most similar clusters are determined. In a second step these two clusters are combined.
The process of determining the two most similar clusters is as follows. Each submitted answer (that is, point in a cluster) is stored as a vector. The distance between a pair of vectors may be simply calculated, for instance using a similarity metric such as the cosine of the angle between the vectors. Average linkage may be used as the measure of similarity between two clusters. Average linkage is calculated by calculating for each point within a first cluster the distance from that point to all of the points in a second cluster, and calculating the mean distance. The process is then repeated for all points within the first cluster, and the mean of the calculated mean distances for each point in the first cluster is calculated. The average linkage between two clusters corresponds approximately to the distance between the centres of the clusters. The two steps are repeated until a predetermined stop point is reached. This may, for instance be defined as the point at which the number of clusters is reduced to a predetermined proportion of the initial number of clusters. Alternatively, a measure of the minimum allowable similarity between elements of a cluster may be used to determine the stop point. The metric "average within cluster similarity" is defined as a measure of how similar answers are to each other within any given cluster. The distance between each pair of points within a cluster, for instance the cosine of the angle between the pair of vectors defined by the points, is calculated and stored within a two dimensional matrix. The average within cluster similarity is calculated as the mean of the distances between all of the points within the same cluster. A value approaching one indicates that all of the answers within a cluster are closely similar to one another. A minimum average within cluster similarity figure may be defined as the stop point for agglomerative hierarchical clustering.
Agglomerative hierarchical clustering may be used in combination with the vector space model, with the vector space model being used to determine the similarity of submitted answers to a standard answer or a particular submitted answer, and agglomerative clustering being used to reduce the number of clusters output from the clustering process at step S7.
Referring to Figure 13, this is a screen shot illustrating a cluster of answers from a set of submitted answers that have been clustered using agglomerative hierarchical clustering. The screen shot is similar in layout to the screen shot of the marking tool shown in Figure 4. A question portion 90 of the screen displays the questions answered by the students. The question set is "what single measurement would you make to confirm that an individual is anaemic?". The set of submitted answers is clustered as described above in relation to Figure 12. A series are clusters are created and each cluster may be selected to be displayed from a list of clusters 91. A standard answer 92 (labelled as model answer) to the question is displayed. All of the submitted answers 93 within the selected cluster are displayed below the model answer 92. The submitted answers within the selected cluster comprise 13 minor variations of "haemoglobin concentration of the blood". The selected cluster comprises 11 distinct text strings, which were correctly reduced in number by the preprocessing steps described in relation to Figure 12.
The value of clustering in reducing the workload of a human marker in human computer collaborative assessment varies significantly with the type of question. Clustering is most effective at grouping submitted answers that are to be awarded similar marks for very short text answers. However, answers where word order is significant, or where students are required to submit an original example present greater difficulty. The clustering techniques described above offer considerable benefits as has been described. However, the vectors created from submitted answers can become very large. Processing of the type described above and carried out using large vectors is computationally expensive. In order to improve processing performance, various methods can be usefully employed. In some embodiments of the present invention, each submitted answer is compared to a standard answer so as to produce data indicating a difference between the submitted answer and the standard answer. Typically, the difference will be considerably shorter than the submitted answer. This is particularly true for tightly constrained answers. It will be appreciated that the differences can be represented by a vector of lower dimensionality than the submitted answers. For example, if a question requires that a student names three components of a system, many students are likely to identify two components correctly. Only the third, incorrect part of a student's answer will be represented by the difference data and consequently encoded in a vector. When respective difference data items have been created for each submitted answer these difference data items can be represented as vectors and then processed as described above. Given that these vectors have a lower dimensionality, it will be appreciated that the vector processing can be carried out considerably more efficiently.
It should be noted that where an answer exactly matches the standard answer the difference will be an empty set.
From the preceding description it can be seen that the generation of difference data as described above provides a convenient way of processing data items to allow clustering. Additionally or alternatively, in some embodiments of the invention submitted answers are summarised. That is, where answers comprise relatively long and relatively free form text, such answers may be compressed using natural language engineering techniques of text summarisation. Various summarisation methods are known. Such methods operate by reducing the length of text by extracting sentences containing significant terms or keywords, where these keywords are specified in advance or identified automatically. By summarising submitted answers in this way it is then possible to process the summarised answers in order to allow clustering of the type described above. In this way, it can again be appreciated that the summarised answers will be of lower dimensionality than the originally submitted answers. Therefore, processing the summarised answers allow processing to be carried out in a more efficient way.
In some embodiments of the present invention a method is provided for processing textual data items by applying to each data item a plurality of methods for automatic summarisation. Many automatic summarisation algorithms perform an analysis of word frequencies in a given text to determine which sentences are most informative and extract those sentences. A number of automatic summarisation algorithms will be known to those skilled in the art. For example, an algorithm based entirely on word frequency, with medium-frequency words being considered to be the best identifiers of the topic of a text is described in Luhn, H.P. 1958: "The automatic creation of literature abstracts", IBM Journal of Research and Development 2(2): 159- 165, April 1958. An alternative algorithm was proposed by Hovy and Lin in 1997. This method is referred to as the location method and is used in the SUMMARIST system. The location method relies on the position of a sentence in the text to indicate its importance and is described in E. Hovy and C-Y Lin: "Automated Text Summarization in SUMMARIST", Proceedings of the Workshop of Intelligent Scalable Text Summarization. Other suitable algorithms are described in Mani, Inderjeet and Maybury, Mark. T. (1999), 'Advances in Automatic Text Summarization', MIT Press, including an algorithm for "chain computing" (pages 111 to 112) which considers how words in a text are related in meaning and cohesion. The contents of each of the references cited in this paragraph are incorporated herein by reference. In some embodiments of the present invention, summaries are generated for a single answer by a plurality of different algorithms. The resulting summaries are compared using techniques such as those described above and the degree of similarity of the summaries to each other is determined. The inventors have surprisingly found by experiment that, for any given text, the degree of similarity of the summaries generated by a plurality of differing summarisation algorithms corresponds closely to a human assessment of the quality of writing style. Experiments have shown that, answers for which summaries generated by different algorithms are very similar are likely to be judged to be well written by human assessors, while answers for which the summaries vary greatly are likely to be judged to be poorly written.
In some embodiments of the present invention, sequence alignment techniques are used to match a student's answer to one or more standard answers. In this way, it is possible to get an indication of an answer's correctness automatically. A student's answer is aligned to a standard answer using a modified Needleman-Wunsch algorithm described in Needleman, S. B. and Wunsch, C. D. (1970) 'A general method applicable to the search for similarities in the amino acid sequence of two proteins', Journal of Molecular Biology, 48(3): 443-53, the contents of which is incorporated herein by reference.
The process of sequence alignment first creates a similarity matrix, using a process shown in the flowchart of Figure 14. The processing of Figure 14 is described with reference to a simple example, involving aligning a student answer of:
"what does she like better oranges or apples" with a standard answer of:
"does she like oranges better than apples" At step S8 of Figure 14 the standard answer is tokenised into i tokens. In this example the tokens are generated at word level giving i = 7. Processing then passes to step S9 where the student answer is tokenised into j tokens, again at the word level, givingy = 8. The tokens may be word level tokens as in this example, character level tokens or any tokens generated by a tokeniser as is known to those skilled in the art. After both the standard answer and the student answer have been tokenised processing passes to step SlO. At step SlO, a first row and a first column of a matrix are initialised. In Matrix row [0], each column is initialised with values in the range from 0 to i. That is matrix cell [O][O] is initialised to 0, matrix cell [I][O] is initialised to 1 and so on until matrix cell [/][0] is initialised to i, in this case i=l
In Matrix column [0], each row is initialised with values in the range from 0 to j. Figure 15 shows a similarity matrix after the initialisation step SlO.
After appropriate initialisation of the first row and first column of the similarity matrix, processing passes from step SlO to step SI l. Steps SI l to S16 populate the remaining cells in the similarity matrix.
At step SI l two counter variables m and n are initialised to values of 1. The counter variable m counts through the rows of the matrix, while the counter variable n counts through the columns of the matrix.
The value for a cell [m][n] of the similarity matrix is calculated at step S 12, according to equation (1): matrix[m - 1 ] [n] + 1 matrix[m- l][n - l] + edit(m,n) (1) matrix[m][n -l] + l
where edit(m,n) is a function arranged to calculate an edit distance between the mth token of the student answer and the nth token of the standard answer. The edit distance is defined to be number of operations required to transform the mth token student answer into the nth token in the standard answer.
By way of explanation, it can be seen that the first and third inputs to the minimum function of equation (1) include an addition of 1. The nature of the algorithm used means that the inclusion of the addition of 1 in the first and third inputs is such that the second input will only provide the minimum if there is no more than one character difference between the mth token of the student answer and the nth token of the standard answer. In this case a gap penalty of 1 is said to be used. When the second input to the minimum function provides the minimum value, the algorithm treats the mth token of the student answer as matching the nth token of the standard answer. The use of a gap penalty of 1 means that a single character difference will still allow particular tokens to be considered as matching. This allows for, for example, simple spelling mistakes. Similarly, if an addition of, say, 2 was included in the first and third inputs to the minimum function, up to two characters could differ as between compared tokens while still allowing tokens to be considered to match. If more than one input to the minimum function of equation (1) provide in the same value, this indicates that there may be more than one optimal alignment. In this case, one of the minimum outputs may be chosen arbitrarily.
Processing now passes to step S13 where it is determined whether m references the final row of the current column n of the matrix. If it does not, processing passes to step S 14 where m is set to reference the next row in the current column of the matrix. Processing then passes back to step S 12. If at step S13 it is determined that m does reference the final row of the current column n, processing passes to step Sl 5. At step S15 it is determined whether n references the final column of values in the matrix. If it does not, processing passes to step S16 where n is set to reference to the next column of values in the matrix and m is set to 1 such that it references the first (non initialised) row of values in the current column n and processing passes back to step S 12. If at step S15 it is determined that n does reference the final column of values in the matrix, processing passes to step S17 and the completed matrix is returned.
Figure 16 shows the similarity matrix generated for the present example by the processing of Figure 14. The similarity matrix is used to calculate the best global alignment of the standard answer and the student answer. This is accomplished by backtracking through the matrix along the optimum route starting from the bottom right hand corner. The optimum route in the example shown in Figure 16 is highlighted in bold characters. To determine the optimum route through the table of Figure 16 a determination is carried out based upon how the value of a cell of the matrix was determined. The first step is to determine how the value of the current matrix cell [m][n] was calculated. This effectively involves determining which of inputs to the minimum function of equation (1) in the matrix generation stage generated the value for inclusion in the matrix.
This can be carried out by determining which of equations (2), (3) and (4) is true: matrix [m] [n] == matrix[m-l] [n-1] + edit(m,n) (2) matrixfmj [n] == matrix [m- 1 ] [n] + 1 (3) matrix fmjfnj == matrixfmj [n-1 J + 1 (4)
where = = is used to indicate equality.
If equation (2) is satisfied, then the next matrix cell to be processed is [m-1] [n-1]. Otherwise, if equation (3) is satisfied, the next matrix cell to be processed is [m-l][n]. Otherwise, if equation (4) is satisfied, the next matrix cell to be processed is [m][n-l]. If more than one of equations (2), (3) or (4) are satisfied, this indicates that there may be more than one optimal alignment. In this case an action corresponding to one of the satisfied equations may be selected arbitrarily.
This process is repeated until matrix cell [O][O] is reached. Generation of a route through the matrix as described above allows the student's answer and the standard answer to be aligned. Table 1 shows the alignment generated for the present example.
does she like - oranges - better than apples what does she like better oranges or - _ apples
Table 1
As the route generated is optimal, the value in the bottom right hand cell represents the minimum number of modifications required to transform the student answer into the standard answer. The lower the score in the bottom right hand cell, the closer the students answer is to the standard answer.
A plurality of standard answers can be provided, each indicating an acceptable answer to a question set. A student's answer can be compared with each of the plurality of standard answers to determine a standard answer which the student's answer matches most closely. This determination can be carried out using the similarity score between the student's answer each chosen standard answer, the similarity score being included in the bottom right cell of the matrix.
The similarity score between a student's answer and a standard answer may also be used to automatically assign an initial score to the student's answer. The higher the value of the bottom right cell of the matrix, the lower the assigned mark. In addition to scoring a student's answer, embodiments of the present invention may utilise the generated alignment to highlight and classify errors in a students' answer. Referring to Figure 17, this is a screenshot of a marking tool which uses alignment of the type facilitated by the methods described above to support a human marker. The example is based upon a translation question 95 requiring translation of the displayed German language text "Hat sei Orangen lieber als Aepfel?" into English. A standard answer 96 to the question is also displayed.
A set of student answers 97 is shown below the standard answer 96. Each student answer 97a, 97b, 97c, 97d has a corresponding area showing an initial score automatically allocated to the answer by the alignment process described above. As student answers may contain variables that have not been anticipated in the set of standard answers, the present invention provides facilities for Human Computer Collaborative marking through buttons 99. The buttons 99 allow a human marker to alter the initial marks generated according to the similarity matrix. Errors in the student text may be classified and highlighted according to the error classification key 100 providing differing highlighting for different types of error. In student answer 97a the word "appels" is highlighted to indicate a spelling mistake. In student answer 97b, 97c errors are highlighted to indicate erroneous words (rather than a spelling mistake). In some embodiments of the present invention methods are provided to allow annotating of student essays on screen. Assessment of essays on screen often requires markers to annotate the essays, either to give feedback to students, or to identify where marks are gained and lost, or both. The present invention allows a marker to make arbitrary annotations on a particular text submitted by a student. Sequence alignment techniques are used to display the differences between the annotated text and the original. Annotations may be classified into safe annotations and unsafe annotations, hi some circumstances it is important to preserve the original text such as during summative assessment, whereas in other circumstances, such as formative assessment, preserving the original text is often of less importance. Safe annotations are annotations that preserve the original text, such as annotations comprising additions or formatting changes. Unsafe annotations are annotations that do not preserve the original text, such as annotations comprising modifications, or deletions or the reordering of words. Methods are provided to determine if a text has been safely annotated, and if it has not, to show where the unsafe annotations are in the annotated text.
An example algorithm to determine if a text (e.g. an answer) has been safely annotated will now be described with reference to Figure 18. At step S20 all formatting is removed from an original text and a corresponding annotated text. Processing then passes to step S21. At step S21, each text is tokenised into a sequence of words, resulting in two sequences of tokens, a sequence generated from the original text and a sequence generated from the annotated text. Processing then passes to step S22. At step S22 a reference variable m is initialised to reference a first token in the sequence of tokens generated from the original text and a second reference variable n is initialised to reference a first token in the sequence of tokens generated from the annotated text. Processing then passes to step S23. At step S23 it is determined whether the mth token in the sequence of tokens generated from the original text matches nth token in the sequence of tokens generated from the annotated text. If the tokens match, processing passes to step S24 where both m and n are both incremented by 1, such that they now reference the next token in each sequence. Processing then passes to step S25 where it is determined whether m references the last token in the sequence of tokens generated from the original text such that all tokens in the sequence of tokens generated from the original text have been processed. If this is the case, all tokens generated from the original text appear, in order, in the sequence of tokens generated from the annotated text and processing finishes at step S26. If at step S25 it is determined that m does not reference the end of the original sequence, processing passes back to step S23 and the further tokens from the two sequences are compared. If at step S23 it is determined that the mth token in the sequence does not match the nth token in the sequence generated from processing passes to step S27 where n is incremented by 1 such that it references the next token in the annotated text. Processing then passes to step S28 where it is determined if n references the end of the annotated text such that all words in the annotated text have been compared. If this is true then all tokens in the sequence generated from the annotated text have been processed without matching tokens which were generated from the original text. This indicates that there were words in the original text that do not appear in the annotated text, i.e. there have been unsafe modifications made to the original text. Processing then ends at step S28a. If it is determined at step S28 that n does not reference the end of the annotated text, processing passes back to step S23 and the currently referenced words in each sequence are compared.
If it is determined that the text has been unsafely annotated, it is desirable to show where the unsafe annotations occur in the annotated text. Preferably, sequence alignment is used to determine where the unsafe annotations occur. Alignment algorithms such as the modified Needleman-Wunsch as described above have a time and space complexity of O(N2). This makes such algorithms too computationally expensive for aligning essays containing thousands of words. An alternative algorithm is now described with reference to Figure 19. At step
S29 all formatting is removed from the two copies of the text, the original and the annotated version and processing passes to step S30. At step S30, each text is tokenised into a sequence of tokens, resulting in two sequences of tokens, a sequence M generated from the original text and a sequence N generated from the annotated text. Processing then passes to step S31. At step S31 unique markers are chosen from each text. The unique markers are each defined as a word or a series of words which appear exactly once in each text. That is, if the two text strings were:
Sequence N = what does she like better oranges or apples; and Sequence M = does she like oranges better than apples
the unique markers chosen may be the five tokens {does, she, like, oranges, apples} as these each appear exactly once in each text. Unique markers may be individual words as in the current example, bigrams, Ngrams or character sequences independent of the words, as is required by the particular text being processed and the application. For example, when processing a text where there is no token that appears exactly once in each sequence, it will be necessary to use a sequence of tokens, or if the number of tokens appearing exactly once in each sequence is too large, using a sequence of tokens may work to reduce the number of markers. When the unique markers have been chosen, processing passes to step S32. At step S32, the context of the unique markers is defined. The context of the unique markers is defined as the unique marker itself and all of the tokens up to but not including the next unique marker. Using the above example, the contexts of the marker oranges are: {oranges, or} for sequence N; and
{oranges, better, than} for sequence M.
Each unique marker and context define individual chunks of the sequences. Once the unique markers and the contexts have been defined, processing passes to step S33. At step S33 an alignment is performed upon the unique markers of the two sequences. Alignment may be performed using standard algorithms, such as the Needleman Wunsch algorithm, providing that the number of unique markers is small enough. Processing now passes to step S34. At step S34 it is determined if the remaining chunks are small enough to be aligned directly using standard algorithms. If this is true, the remaining chunks are aligned using standard algorithms and processing ends at step S35. If this is not the case processing passes to step S35. At step S35, sequence N is set to be the context defined from the sequence N and sequence M is set to be the context defined from the sequence M. Processing now passes back to step S31 and the contexts of the unique markers in each sequence are now aligned recursively using the steps S31 to S33 until the remaining chunks of sequence are small enough to be aligned directly.
Once an alignment has been formed, the result can be interpreted according to application specific requirements. For example, unsafe annotations can be detected and displayed in the original text.
It will be appreciated that the above examples are merely exemplary and that particular implementations may vary. For example, alignment may be used in conjunction with the clustering methods described above. Regular expressions may be utilised alongside alignment to reduce the number of variants and thus make the alignment more effective.
It will also be appreciated that particular algorithms discussed, for example with reference to computing similarity matrices and determining optimal alignments, are provided merely by way of example and that other algorithms may be used as are known to those skilled in the art.
Further modifications and advantages of the present invention will be readily apparent to the appropriately skilled person from the teaching herein, without departing from the scope of the appended claims.

Claims

1. A computer aided assessment method comprising: processing a submitted answer to a question to determine a degree of conformance of the submitted answer to a regular expression indicative of a correct answer format for the question; and generating an output signal based upon said processing.
2. A computer aided assessment method according to claim 1, wherein the submitted answer comprises a text answer to the question.
3. A computer aided assessment method according to claim 1 or claim 2, further comprising displaying a warning message if said output signal indicates that the submitted answer does not conform to the regular expression.
4. A computer aided assessment method according to any one of the preceding claims, further comprising processing the submitted answer to determine a degree of conformance of the submitted answer to a concatenation of two or more regular expressions.
5. A computer aided assessment method according to any one of the preceding claims, further comprising processing the submitted answer to determine a degree of conformance of the submitted answer to two or more alternative regular expressions.
6. A computer aided assessment method according to any one of the preceding claims, further comprising: receiving a first input indicative of the question; and receiving a second input indicative of a correct answer format for that question; wherein the second input comprises the regular expression.
7. A carrier medium carrying computer readable code for controlling a computer to carry out the method of any one of claims 1 to 6.
8. A computer apparatus for a computer aided assessment method, the apparatus comprising: a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory; wherein the processor readable instructions comprise instructions controlling the processor to carry out the method of any one of claims 1 to 6.
9. A computer implemented method of generating an assessment, the method comprising: receiving a first input indicative of a question to be included in said assessment; and receiving a second input indicative of a correct answer format for that question; wherein the second input comprises a regular expression.
10. A carrier medium carrying computer readable code for controlling a computer to carry out the method of claim 9.
11. A computer apparatus for a computer aided assessment method, the apparatus comprising: a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory; wherein the processor readable instructions comprise instructions controlling the processor to carry out the method of claim 9.
12. A computer aided assessment method comprising: comparing a submitted answer to a question with a standard answer to the question to determine a similarity metric for the submitted answer indicative of how similar the submitted answer is to the standard answer; wherein the standard answer comprises a plurality of answer graph objects comprising portions of one or more acceptable answers to the question, said comparing comprising matching each of said answer graph objects to the submitted answer, determining a similarity metric for each answer graph object based upon said matching and determining a similarity metric for the submitted answer based upon the similarity metrics for each answer graph object.
13. A computer aided assessment method according to claim 12, wherein the submitted answer comprises a constructed diagram.
14. A computer aided assessment method according to claim 12 or claim 13, further comprising representing the submitted answer as a submitted answer graph object.
15. A computer aided assessment method according to any one of claims 12 to 14, wherein at least two answer graph objects comprise overlapping portions of one or more acceptable answers.
16. A computer aided assessment method according to any one of claims 12 to 15, wherein the answer graph objects define a structure of a portion of one or more acceptable answers.
17. A computer aided assessment method according to claim 16, wherein the answer graph objects define one or more parameters of a portion of one or more acceptable answers.
18. A computer aided assessment method according to claim 16 or claim 17, wherein determining a similarity metric for each answer graph object comprises determining a mark for each answer graph object, and determining a similarity metric for the submitted answer comprises determining a mark for the submitted answer.
19. A computer aided assessment method according to claim 18, wherein said matching comprises determining the extent to which the structure of each answer graph object corresponds to the structure of a portion of the submitted answer.
20. A computer aided assessment method according to claim 18 or claim 19, wherein said matching comprises determining the extent to which a parameter of each answer graph object corresponds to a parameter of a portion of the submitted answer.
21. A computer aided assessment method according to claim 18 or claim 20, wherein determining a mark for each answer graph object comprises determining a portion of a mark allocation for that answer graph object according to the extent to which the structure or a parameter of each answer graph object corresponds to a portion of the submitted answer.
22. A computer aided assessment method according to claim 21, wherein determining a portion of a mark allocation comprises weighting the structure and parameters of the answer graph objects according to relative importance in determining the extent to which an answer graph object matches a portion of a submitted answer.
23. A computer aided assessment method according to any one of claims 18 to 22, wherein the standard answer comprises an AND/OR tree comprising a plurality of nodes arranged under one or more AND or OR branches, each node comprising an answer graph object.
24. A computer aided assessment method according to claim 23, wherein determining a mark for the submitted answer comprises determining a mark for each of a group of two or more answer graph objects arranged underneath an OR node and determining the highest marked answer graph object of that group
25. A computer aided assessment method according to claim 23 or claim 24, wherein determining a mark for the submitted answer comprises determining a mark for each of a group of answer graph objects arranged under an AND node and adding together the marks for each answer graph object in the group.
26. A computer aided assessment method according to any one of claims 23 to 25, wherein determining a similarity metric for the submitted answer comprises determining a mark for all of said plurality of answer graph objects arranged under a root node of the AND/OR tree.
27. A computer aided assessment method according to any one of claims 12 to 26, further comprising adding a new answer graph object to said plurality of answer graph objects based upon identifying a new portion of an acceptable answer within a submitted answer.
28. A carrier medium carrying computer readable code for controlling a computer to carry out the method of any one of claims 12 to 27.
29. A computer apparatus for a computer aided assessment method, the apparatus comprising: a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory; wherein the processor readable instructions comprise instructions controlling the processor to carry out the method of any one of claims 12 to 27.
30. A standard answer for a question within a computer aided assessment method, the standard answer comprising: an AND/OR tree comprising a plurality of nodes arranged under one or more AND or OR branches, each node comprising an answer graph object; wherein each answer graph objects comprises a portion of one or more acceptable answers to the question.
31. A carrier medium carrying a standard answer for a question within a computer aided assessment method according to claim 30.
32. A computer aided assessment method comprising: comparing a first submitted answer to a question with a second answer to the question to determine a similarity metric indicative of how similar the first submitted answer is to the second answer; wherein said comparing comprises determining a number of occurrences of a plurality of words within the first submitted answer and the second answer and representing the number of occurrences of the plurality of words within the submitted answer and the second answer as a pair of vectors, the similarity metric being indicative of a relationship between the pair of vectors.
33. A computer aided assessment method according to claim 32, further comprising comparing a plurality of submitted answers to the question with the second answer.
34. A computer aided assessment method according to claim 32 or claim 33, wherein the second answer comprises a standard answer to the question.
35. A computer aided assessment method according to claim 33 or claim 34, further comprising determining the number of occurrences within each submitted answer and the second answer of each word contained within any submitted answer or the second answer.
36. A computer aided assessment method according to any one of claims 32 to 35, wherein the relationship between the or each pair of vectors comprises a Euclidean distance between the or each pair of vectors.
37. A computer aided assessment method according to any one of claims 32 to 35, wherein the relationship between the or each pair of vectors comprises a cosine of an angle between the or each pair of vectors.
38. A computer aided assessment method according to any one of claims 32 to 37, further comprising correcting the spelling of words within the or each submitted answer and the second answer.
39. A computer aided assessment method according to any one of claims 32 to 38, further comprising removing one or more words within the or each submitted answer or the second answer.
40. A computer aided assessment method according to any one of claims 32 to 39, further comprising truncating one or more words within the or each submitted answer or the second answer.
41. A computer aided assessment method according to any one of claims 32 to 40, further comprising summarising the first submitted answer.
42. A computer aided assessment method according to any one of claims 32 to 41, further comprising: processing the first submitted answer and a second submitted answer to determine respective differences between the first and second submitted answers and the second answer; and processing said differences to compare said first and second submitted answers.
43. A computer aided assessment method according to any one of claims 32 to 41, wherein determining a number of occurrences of a plurality of words comprises weighting one or more words within the or each submitted answer or the second answer.
44. A computer aided assessment method according to claim 33 or any dependent thereto, further comprising clustering submitted answers according to the similarity metric for each submitted answer.
45. A computer aided assessment method according to claim 44, wherein said clustering comprises assigning each submitted answer to a separate cluster and amalgamating pairs of clusters with the most similar similarity metrics.
46. A computer aided assessment method according to claim 45, further comprising amalgamating pairs of clusters until the number of clusters has been reduced by a predetermined amount.
47. A computer aided assessment method according to claim 45, further comprising amalgamating pairs of clusters until the variance between the similarity metrics for submitted answers within any cluster exceeds a maximum value.
48. A computer aided assessment method according to any one of claims 32 to 47, further comprising: processing the submitted answer to determine a degree of conformance of the submitted answer to a regular expression indicative of a correct answer format to the question; and generating an output signal based upon said processing.
49. A computer aided assessment method according to claim 48, further comprising displaying a warning message if said output signal indicates that the submitted answer does not conform to the regular expression.
50. A computer aided assessment method according to claim 48 or claim 49, further comprising processing the submitted answer to determine a degree of conformance of the submitted answer to a series of two or more regular expressions.
51. A computer aided assessment method according to any one of claims 48 to 50, further comprising processing the submitted answer to determine a degree of conformance of the submitted answer to two or more alternative regular expressions.
52. A computer aided assessment method according to any one claims 48 to 51, further comprising: receiving a first input indicative of the question; and receiving a second input indicative of a correct answer format for that question; wherein the second input, comprises the regular expression.
53. A computer aided assessment method according to any one of claims 32 to 52, further comprising a human assessment marker performing a further processing step upon the or each submitted answer based upon an output indicative of said comparison or said clustering.
54. A carrier medium carrying computer readable code for controlling a computer to carry out the method of any one of claims 32 to 53.
55. A computer apparatus for a computer aided assessment method, the apparatus comprising: a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory; wherein the processor readable instructions comprise instructions controlling the processor to carry out the method of any one of claims 32 to 53.
56. A method for comparing first and second data items in a computer system, the method comprising: generating a first difference data item representing a difference between the first data item and a reference data item; generating a second difference data item representing a difference between the second data item and the reference data item; comparing said first and second difference data items to provide data indicative of similarity between said first and second data items.
57. A method according to claim 56, wherein first and second data items and said reference data item are alphanumeric strings.
58. A method according to claim 56 or 57, wherein the first and second data items are submitted answers to an assessment question, and the reference data item is a standard answer to the assessment question.
59. A method according to any one of claims 56 to 58, further comprising clustering groups of data items, said clustering being based upon said data indicative of similarity.
60. A carrier medium carrying computer readable code for controlling a computer to carry out the method of any one of claims 56 to 59.
61. A computer apparatus for comparing data items, the apparatus comprising: a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory; wherein the processor readable instructions comprise instructions controlling the processor to carry out the method of any one of claims 56 to 59.
62. A method for clustering textual data items, the method comprising: processing a plurality of textual data items to generate a plurality of summarised textual data items, each summarised textual data item representing a summary of a respective data item; and clustering said data items by processing said summarised data items; wherein each textual data item is an answer to an assessment question.
63. A method according to claim 62, wherein said clustering comprises determining similarities between said summarised data items.
64. A method according to claim 62 or 63, wherein at least one of said data items is a submitted answer to an assessment question.
65. A method according to claim 62, 63 or 64, wherein at least one of said data items is a standard answer to an assessment question.
66. A carrier medium carrying computer readable code for controlling a computer to carry out the method of any one of claims 62 to 65.
67. A computer apparatus for clustering data items, the apparatus comprising: a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory; wherein the processor readable instructions comprise instructions controlling the processor to carry out the method of any one of claims 62 to 66.
68. A method of processing textual data, the method comprising: receiving textual data; processing said textual data to generate a plurality of summaries of said textual data; determining similarity between said summaries; and generating an output indicating a property of said textual data based upon said similarity.
69. A method according to claim 68, wherein said processing to generate at least two summaries comprises applying a plurality of summarising algorithms to said textual data, each summarising algorithm generating a respective one of said plurality of summaries.
70. A method according to claims 69, wherein at least one of said summarising algorithms performs an analysis of word frequencies in said received textual data.
71. A method according to any one of claims 68 to 70, wherein said property of said textual data is an indication of writing style.
72. A method according to any one of claims 68 to 71, wherein said textual data is an answer to an assessment question.
73. A carrier medium carrying computer readable code for controlling a computer to carry out the method of any one of claims 68 to 72.
74. A computer apparatus for processing textual data, the apparatus comprising: a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory; wherein the processor readable instructions comprise instructions controlling the processor to carry out the method of any of claims 68 to 72.
75. A computer aided assessment method, the method comprising: receiving a student answer to an assessment question; accessing a standard answer to said assessment question; aligning tokens of said student answer to tokens of said standard answer; and outputting data for use in said assessment based upon said alignment.
76. A computer aided assessment method according to claim 75, wherein said data based upon said alignment indicates differences of word order between said student answer and said model answer.
77. A computer aided assessment method according to claim 75 or 76, wherein outputting said data based upon said alignment comprises: displaying said student answer on a display device; and displaying data based upon said alignment in connection with said student answer.
78. A computer aided assessment method according to claim 77, wherein displaying data based upon said alignment in connection with said student answer comprises highlighting tokens of said student answer.
79. A computer aided assessment method according to any one of claims 75 to 78, wherein aligning tokens of said student answer to tokens of said standard answer comprises: generating a similarity matrix; and generating alignment data based upon said similarity matrix.
80. A computer aided assessment method according to claim 79, wherein generating said similarity matrix comprises generating said similarity matrix using an algorithm based upon the Needleman-Wunsch algorithm.
81. A computer aided assessment method according to any one of claims 75 to 80, further comprising generating a score associated with said student answer based upon said alignment.
82. A computer aided assessment method according to claim 81, wherein generating said score comprises processing data indicating a number of differences between said student answer and said standard answer.
83. A carrier medium carrying computer readable code for controlling a computer to carry out the method of any one of claims 75 to 82.
84. A computer apparatus for providing a computer aided assessment method, the apparatus comprising: a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory; wherein the processor readable instructions comprise instructions controlling the processor to carry out the method of any of claims 75 to 82.
85. A computer aided assessment method comprising: receiving a student answer to an assessment question; receiving an annotated version of said student answer to said assessment question; processing the student answer and the annotated version of the student answer to identify annotations to said student answer, said processing comprising determining whether the annotations comprise deletion or rearrangement of text included in said student answer.
86. A computer aided assessment method according to claim 85, wherein said processing comprises applying a sequence alignment algorithm.
87. A computer aided assessment method according to claim 86, wherein said sequence alignment algorithm comprises: identifying a marker occurring exactly once in each of said student answer and said annotated version of said student answer.
88. A computer aided assessment method according to claim 87, further comprising: aligning said marker in said student answer with said marker in the annotated version of the student answer.
89. A computer aided assessment method according to claim 87 or 88, further comprising: identifying a plurality of markers, each marker occurring exactly once in each of said student answer and said annotated version of said student answer; aligning a marker and tokens of said student answer with a corresponding marker and tokens of said annotated version of said student answer, said tokens of said student answer occurring between said marker and a subsequent marker in said student answer.
90. A carrier medium carrying computer readable code for controlling a computer to carry out the method of any one of claims 85 to 89.
91. A computer apparatus for a computer aided assessment method, the apparatus comprising: a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory; wherein the processor readable instructions comprise instructions controlling the processor to carry out the method of any of claims 85 to 89.
PCT/GB2008/000603 2007-02-23 2008-02-21 Assessment method WO2008102146A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US90322607P 2007-02-23 2007-02-23
US60/903,226 2007-02-23

Publications (1)

Publication Number Publication Date
WO2008102146A1 true WO2008102146A1 (en) 2008-08-28

Family

ID=39430969

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2008/000603 WO2008102146A1 (en) 2007-02-23 2008-02-21 Assessment method

Country Status (1)

Country Link
WO (1) WO2008102146A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699658A (en) * 2020-12-31 2021-04-23 科大讯飞华南人工智能研究院(广州)有限公司 Text comparison method and related device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5517405A (en) * 1993-10-14 1996-05-14 Aetna Life And Casualty Company Expert system for providing interactive assistance in solving problems such as health care management

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5517405A (en) * 1993-10-14 1996-05-14 Aetna Life And Casualty Company Expert system for providing interactive assistance in solving problems such as health care management

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699658A (en) * 2020-12-31 2021-04-23 科大讯飞华南人工智能研究院(广州)有限公司 Text comparison method and related device

Similar Documents

Publication Publication Date Title
US8185378B2 (en) Method and system for determining text coherence
US8577898B2 (en) System and method for rating a written document
Gomaa et al. Short answer grading using string similarity and corpus-based similarity
Huber Data analysis: What can be learned from the past 50 years
Sachan et al. From textbooks to knowledge: A case study in harvesting axiomatic knowledge from textbooks to solve geometry problems
Mohtaj et al. Parsivar: A language processing toolkit for Persian
Neudecker et al. A survey of OCR evaluation tools and metrics
WO2003091830A2 (en) Text processing method and system
Korinek Generative AI for economic research: Use cases and implications for economists
Rahman et al. NLP-based automatic answer script evaluation
Omran et al. Automatic essay grading system for short answers in English language
Osman et al. Generate use case from the requirements written in a natural language using machine learning
Hassan et al. Learning to identify educational materials
Wang et al. Research and implementation of English grammar check and error correction based on Deep Learning
Alrehily et al. Intelligent electronic assessment for subjective exams
De Gasperis et al. Automated grading of short text answers: preliminary results in a course of health informatics
Eckert et al. Semantic role labeling tools for biomedical question answering: a study of selected tools on the BioASQ datasets
KR102467096B1 (en) Method and apparatus for checking dataset to learn extraction model for metadata of thesis
WO2008102146A1 (en) Assessment method
Obeid et al. Mandiac: A web-based annotation system for manual arabic diacritization
CN113901793A (en) Event extraction method and device combining RPA and AI
McKenzie et al. Information extraction from helicopter maintenance records as a springboard for the future of maintenance text analysis
Goonawardena et al. Automated spelling checker and grammatical error detection and correction model for sinhala language
English et al. Striking a balance: Human and computer contributions to learning through semantic analysis
Azmi et al. Mining and visualizing the narration tree of hadiths (prophetic traditions)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08709486

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08709486

Country of ref document: EP

Kind code of ref document: A1