Pages Navigation Menu

Massive MOOC Grading Problem – Stanford HCI Group Tackles Peer Assessment

Six weeks into Coursera’s Passion Driven Statistics course from Wesleyan University, students received a notice that they would participate in a new kind of peer-based grading exercise for their final projects. While nothing has been said publicly about the experiment until now, this marks a radical departure from the usual quiz-based examinations provided by MOOCs.Peer review, the problem of assessment in MOOCs

What is different about this approach, and why is it worth watching even in its first commercial implementations? In this column I will explore the process they are testing and examine the potential for peer assessments to change how MOOCs are used. If you are thinking about what course to take next, check to see if you too can be involved in peer assessments.

The problem with grading thousands of student projects

College course grades are often the result of subjective comments and assessments by instructors, particularly for written and project-based work. Up until now, MOOCs have relied on quizzes and tests where there is a clearly defined “right” answer. This limitation severely inhibits the potential for Coursera and other MOOC providers if they wish to offer a wide spectrum of courses. The challenge is that instructors cannot review essays or other open-ended work from thousands of students as they do in smaller class settings. In order to remove this limitation, MOOC providers are looking to peer-based assessments, in which students learn to review the work of their cohorts.

Peer assessment research to the rescue

Recently, peer assessments have been the focus of extended research as an outgrowth of the remarkable help some MOOC students gave their classmates via discussions and ad-hoc learning groups. When a class grows to over 1,000 students, Stanford professors found that students tend to support each other and rely less on the staff for answers to their questions. For example, the first Stanford AI class taught by Sebastian Thrun and Peter Norvik featured one (yes, 1) teaching assistant.

What if students could be even more active? Could they be taught to grade the work of their peers?

from “Assessing Design” by Scott Klemmer

In his Human Computer Interface course, Scott Klemmer and others found that students can be put to work reviewing their cohorts. While less than 3% of students completed the course, those who did transcribed video lectures into 13 languages and created language-specific study groups to discuss them.

In the process, students reported that their own learning improved. Among other benefits, the students-helping-students approach fostered empathy and equality, scaled naturally and required serious community buy-in. Along the way, Klemmer established methods for peer assessments on a mass scale that seem to be involved in the Passion Driven Statistics course now in progress.

from “Assessing Design” by Scott Klemmer

Peer assessments are widely practiced in schools but they have never involved thousands of students using their own systems all over the world.  Messages from Coursera to Passion Driven Statistics students reveal the complexities involved.  Even though all students were invited to post their project on any blog or internet accessible system, a week after the first announcement they were warned not to use anything that required a software download or installation of plug-ins that might contain malware.

Making the grade through rubrics, calibration and student review

The process for managing project-based assignments and grading them has remarkable similarities across a wide range of subject matter. Students must first identify a problem they will address in their project. They then need to form some hypothesis to test or a plan to execute. These can be graduate-level problems or they can be straightforward tasks, such as making a paper airplane that flies.

In order to be graded, students need to present their project and demonstrate or describe their findings. In the trivial example of making a paper airplane, students might submit videos of their design and show the finished work in flight. They might comment on the research they performed and what they learned by doing the project.

Project-based assignments have always been tough on teachers. How can student evaluations consistently match the results of trained staff? Klemmer found he could train students using rubrics and a calibration exercise prior to assessing their peers. Their assessments correlated within 0.80 of those graded by staff.

Consider our paper airplanes project, again. A rubric for grading one might be to award points based on a scale of 0 to 3: 0 = unsatisfactory, 1 = bare minimum, 2 = satisfactory, and 3 = above and beyond. In this example, three aspect categories are to be assessed: aircraft design, in-flight performance and depth of presentation. For each assignment, the instructor describes the characteristics that relate each category to a number grade.

Using such a rubric, students are given a test assignment to grade. They are then shown how the staff assessed the same assignment so they can calibrate their grading to a norm for the class. With this training in place, students are required to assess the work of five of their cohorts in order to complete the course. Their own project must receive a passing score as well. Final grades contain a mix of peer-assessed project scores alongside quiz and test scores.

I recently completed a two-week video production course on the Skillshare platform that used peer reviews. While the course was not for college credit and the only rubric used was to foster positive comments even while pointing out areas for improvement, knowing I was commenting on the work of my cohorts made me dive into the course more deeply. I felt a certain sense of fairness at play: I was going to make my project live up to the standard I set for others. In the end, I put in more hours reviewing comments and working on my projects than I spent watching the instructor’s videos and engaging in other assignments. The comments I received from my peers spurred me to rethink, revise and improve my project.

Reviewing work by other students not only forced me to consider shortcomings lurking in my project but inspired me to try new approaches based on what I saw from my cohorts. I solved problems that I never would have identified on my own and my presentation was better because of it.

I’ve heard some teachers say they split their efforts between delivering course information and motivating students to believe that what they are learning is worth their time. Reviewing work by other students not only forced me to consider shortcomings lurking in my project but inspired me to try new approaches based on what I saw from my cohorts. I solved problems that I never would have identified on my own and my presentation was better because of it.

If my experience is any indication, students seem to be willing to do more of the heavy lifting when they benefit from the examples of their peers. Given that MOOCs are often criticized for their lack of student engagement, peer assessments could be a breakthrough that validates their mass scale. I believe we will be hearing more on this when Passion Driven Statistics has run its course.

I would like to know if you will look for opportunities to assess your cohorts as well. How much confidence would you have in peer assessment if careful training and calibration were involved? Is the accuracy of your peers’ assessment even important if the process still helped you engage with the material better? Please add your comments to this article.

John Duhring (7 Posts)

John Duhring has been s a founding team member at nine startups, including Supermac Software and Bitmenu. During his career he has also applied technology to learning at large companies such as Prentice-Hall, Apple and AOL. Follow him on Twitter @duhring.


  1. I’m just finishing Coursera’s Modernism and Post Modernism class where grading was based on up to 9 papers, each written to a specific rubric. It was a interesting experience, one I analyzed based on some professional experience in the grading of subjective materials (such as papers and other work products):

    • Rubrics solve a significant part of the puzzle but not all. At a mass scale, matching students to papers to be graded is non-trivial. For example, Google’s open source Course Building code has been upgraded with a peer assessment capability. The entire process deserves deeper review.

  2. We used peer reviews for our final project in elearning and digital cultures coursera course and I was impressed with how well it worked -clear rubric and THREE reviewers for each project. I thought the 3 reviewers aspect was brilliant.

  3. Peer assessment certainly has a place in education/training. Once someone is in the work force their evaluations are based on a rubric or peer review. If k-12 educators used more rubric and peer review assessments students would be better prepared for life after school. This would also force educators to think more deeply about how and what to assess. As a middle/high school teacher I now how easy it is to throw together a 10 item multiple choice quiz that has no validity in finding out what students know and what I need to do to adjust my curriculum.

    • “If k-12 educators used more rubric and peer review assessments students would be better prepared for life after school.”

      The use of rubric in the workplace often tends to result in poor workers gaming the system, and good workers being punished for doing what really needed to be done.

      There’s little we can do about this in the world of work, but do we really want to encourage children to “follow the script” instead of challenging them to develop cognitively?