Partial Credit Grading of DFAs: Automation vs Human GradersIn-Person
We examined the efficacy of automatic partial credit approaches for assignments asking students to construct a Deterministic Finite Automaton (DFA) for a given language. We chose two DFA problems, and generated a representative sample of 10 benchmark submissions for each. Next, in order to get an accurate baseline of the results of human graders, we asked professors at our university to submit their grader guides to us. We found that the grader guides, at least within our institution, were very consistent but also quite problem-specific and reliant on human understanding, hence unlikely to lead to an automated process applicable to all DFA problems. We generated a ``consensus grader guide'' and graded each benchmark submission, obtaining a baseline human partial credit score. Then, we assessed the submissions using three techniques proposed by Alur et al.: The Solution Syntactic Difference technique’s score corresponds to the number of changes that must be made to the DFA. The Problem Syntactic Difference score is based on converting each DFA into Monadic Second Order (MSO) Logic and examining the number of necessary changes. For Problem Semantic Difference, the score is the limit of the ratio of incorrect strings to correct strings. The final score is the maximum of these three scores. In general, the results closely matched the consensus grades, but there were some peculiarities generated by Problem Semantic Difference. Additionally, for each problem, one submission included two separate types of mistakes. These submissions had automatic grades much lower than the consensus grades.