Encouraging reflection on practice while grading an artifact: A thought on badges


When I started teaching I thought back to all of those teachers who made me write meaningless papers into which I put little effort and received stellar grades, and I vowed not to be that teacher. I promised myself and my future students that we – as equals – would discuss the literature as relevant historical artifacts that are still being read because the authors still have something to comment on in today’s society.
But then I stepped into the classroom and faced opposition from my colleagues who thought my methods would not provide students with the opportunities to master the knowledge of the standards. Worst of all, some teachers actually punished students who came from my class because they “knew” the students had not learned how to write or analyze since I did not give traditional tests or grade in a traditional way. 

And I started to doubt myself. I faced that giant stack of essays and thought, “Maybe I should give in.” And I tried it. I tried grading the essays with a strict traditional rubric instead of talking through the papers and the thought processes that went into writing them. I tried giving a final grade instead of encouraging rewrites and reflections on rewrites. And it totally failed. I wasn’t happy, the students were miserable, and most importantly, no one was learning anything but how to write to a rubric. So I went back to following my gut. Students wrote and revised and reflected on their papers. They posted to discussion forums about the relevance of Huck Finn in the 21st century and how Romantic paintings made them feel. This opened up so much conversation and reflection, and allowed me to have students practice an abstract and difficult concept like literary analysis over and over again in meaningful, relevant ways in many different contexts.
When I began my work with Daniel Hickey and his research team designing curricula and refining models of curriculum development at IU, I ran – and am still running into the same roadblocks. It is difficult to ask an English teacher not to grade the essay because, in the end, a graded essay is what teachers, students, and parents are comfortable with. They like seeing the 78% score in red on the paper and in the gradebook because it is concrete. And there is certainly merit in receiving direct feedback on something one has created, even if that may encourage only a shallow engagement with the directions of the assignment rather than deeper thinking about how the creator of the artifact engaged in a practice with a concept. So when I approach a teacher with whom I am designing a curricular unit and tell them that we are only going to grade the reflection, not the artifact at all, I am not really surprised at the resistance I face.
Karen Jeffrey, in her blog post ePortfolios as Badges – A Badge System Design for Learning by Creating (a response to Dan Hickey’s blog post Some Things about Assessment that Badge Developers Might Find Helpful), suggests creating badges both for the artifact and for the reflection. Brilliant. In addition to designing curricula with teachers, our team is gearing up to host a HackJam this summer, and we have been thinking about how to use badges in this context. We want to award a badge for deeper reflection, but realize the importance of acknowledging the accomplishment of completing the hactivity. Awarding two types of badges may be just the ticket. Perhaps the levels of badges could range from an automatic badge for completing the activity, a mentor-awarded badge for reflection, and a highly coveted badge awarded by the community.
By awarding separate badges for the reflection and the artifact, students are encouraged to engage in deep reflective thinking upon their use of a concept within a particular context, thereby learning to reflect on their own thinking and practices. And at the same time, students receive meaningful, direct feedback on their artifact, and students, teachers, parents, and all those involved feel the satisfaction that feedback on the artifact itself brings.

Flipping Classrooms or Transforming Education?

Dan Hickey and John Walsh
Surely you have heard about it by now.  Find (or make) the perfect online video lecture for teaching particular concepts and have students watch it before class.  Then use the class for more interactive discussion.  In advance of presenting at Ben Motz’ Pedagogy Seminar at Indiana University on March 22, we are going to raise some questions about this practice.  We will then describe a comprehensive alternative that leads to a rather different way of using online videos, while still accommodating prevailing expectations for coverage, class structure, and accountability.

Compared to What?
A March 21 webinar by Jonathan Bergman that was hosted by e-School News (and sponsored by Camtasia web-capture software) described flipped classrooms as a place where “educators are actively transferring the responsibility and ownership of learning from the teacher to the students.”  That sounds pretty appealing when Bergman compares it to “teachers as dispensers of facts” and students as “receptacles of information.”




Indeed, compared to many traditional alternatives, the ability to re-watch videos and the ability to include more interactive demonstrations is appealing.  We have seen descriptions of teachings having students watch a video of text at home - like a video of Huck Finn - and arrive to class prepared write a comparison/contrast with the novel.  Some innovators are assigning online course work such as watching Khan Academy videos with parents. This circumvents school-based internet bandwidth limitations and provides a promising context for parental engagement.  We are intrigued by ambitious efforts to have students meet at an arranged time, outside of class, for immersive experience in an online virtual world like the NASA RealWorld/Inworld Challenge.  While we have yet to see much formal encouragement, we suspect that these activities might lead some students toward the more transformational use of online videos like we describe below.

Concerns about Flipping Classrooms
As Audrey Watters pointed out recently, this idea has actually been around for a while.  But it has exploded in popularity along with the Khan Academy.  One of the biggest concerns is that the videos simply won’t be meaningful to students.  RMA co-blogger Rebecca Itow said that when she was teaching English last year in LA that the flipped classrooms she ended up being just “lectures about online lectures.”  As Watters articulated in her widely-cited “Wrath of Khan” post last year, “it is really a matter of form, not content, that is new.”  So yes, simply streaming videos of traditional “expository” lectures is not really new.  And if students lack the prior experience to make sense of the concepts the video present, being able to watch it over and over won’t help.  We agree with such concerns and worry about simplistic expectations about the potential of streaming videos.  But we also worry about the more sophisticated uses because of the way they reify course knowledge as static “stuff” to be learned outside of the context of meaningful use.

Theorists like John Seely Brown,  James Gee, Mimi Ito, and Henry Jenkins suggest that incremental changes like flipped classrooms barely exploit the broader potential of new networked media for supporting deeper learning.  More importantly, they don’t prepare our students for the quickening pace of change for disciplinary knowledge that we are teaching.  As Seely Brown argued in his DML Keynote, it is all about context.  An online video is even more removed from students’ prior experience than a lecture.  Even a streaming video of your own teaching is more decontextualized for your students than your live lectures. This is because a live lecture can take into account reactions, questions, puzzled looks, etc. 

The fundamental concern is that streaming videos reify course knowledge as isolated abstractions.  Don’t get us wrong.  As we detail below, we see a crucial role for streaming videos and other open education resources.  But first consider Seely Brown’s argument that digital knowledge networks are shrinking the “half-life” of what we teach our students.  You might quibble with his precision in claiming that it is down to five years.  But we celebrate his audacity because it helps communicate this crucial point:  21st Century Learning is about providing students with the contextual knowledge they need to continue using the facts and concepts we want them to learn.  The only thing we know about the contexts in which our students will use what we teach them is that it will mostly likely consist of digital networks of user-generated content.  To borrow a construct from our IULS colleague Melissa Gresalfi, what students need in the 21st century is the disposition to consider how their new knowledge takes on different meaning in different contexts.  And this results from guided practice doing just that.

The Insidious Role of Assessment
We suspect that in many cases, educators assume that streaming videos “work” because they help student succeed on classroom assessments and possibly even achievement tests.  This is likely to be what they find in several high-profile randomized experiments currently under way.  This in part because students can watch videos over and over until they have memorized their contents well enough to reproduce that information later. 

The problem is that most classroom assessments fail to ask students to use course knowledge in a different context in a meaningful way.  And while external achievement tests might provide more valid evidence of a very specific kind of knowledge transfer, such tests ask students to guess which of five associations is least wrong.  Such ‘recognition level’ knowledge is gained quite readily—and just as readily forgotten.   In the backlash against multiple choice tests, many teachers have shifted to more open ended essays and performance assessments.  But such assessments are often so closely aligned to the context of the instruction that they encourage students to memorize definitions of concepts so they can repeat them back.  Jim Popham has been railing against this for decades to little apparent effect.  Meanwhile, open-ended assessments are laborious for teachers and students.  They are particularly laborious when high stakes for individuals are attached to them, and most teachers struggle to provide feedback on them that is both useful and used. 

A Situative Alternative for Transforming Education
Situated learning and participatory assessment guide us beyond a search for the perfect networked tool for teaching isolated skills and abstract concepts.  Rather, course knowledge is reframed as procedural and conceptual tools.  Learning can then be framed as interactive practice using those tools appropriately in different networked and conventional contexts. Doing so increases efficiency and fosters dispositions to consider how course knowledge takes on new meaning in different contexts.  This promotes transfer to a wide range of subsequent contexts.  These contexts include (but are not limited to) conventional classroom assessments and achievement tests.  Significantly, treating assessments and tests as just one of the contexts where students can practice using course knowledge enhances the validity of the resulting evidence.  But we further believe that such formal assessments and tests should mostly be used evaluate one’s success as a teacher.  Grades instead should be based more on artifacts that students create in class.  As we will show in our presentation, a more efficient and formative alternative is to grade student reflections. 

Our alternative approach is currently referred to as “Participatory Assessment.”  It embraces the much broader view of assessment that has been outlined recently by theorists like Jim Gee, Jim Greeno, and Andreas Lund.  It aims to transform educational practices, while still respecting prevailing expectations for course structures, coverage of domain content, and accountability for that coverage.  Our presentation will show how this approach can be used transform college learning.  John is a Senior Lecturer in Telecommunications and a doctoral student in Learning Sciences.  In John’s 125-student lecture course on cinematic production theory, students sign up for one of five “craft roles” such as film editor or lighting designer.  They then use networked forums to consider and discuss the relative relevance of key concepts each week from those perspectives.  After two semesters, it seems to be working pretty well. The fact that these students spontaneously elected to search for and link to YouTube videos to make their points suggest that they are “trying out” professional discourse and “trying on” professional identities and learning to become real 21st Century Learners; scores on a rigorous exam help motivate and document broad engagement.  Here is John describing it a presentation last fall.

Dan will show examples from his online graduate course in Learning and Cognition in Education.  In this course, busy working teachings make sense of otherwise-abstract concepts like encoding and retrieval in a challenging text by consider the most relevant and lease relevant implications from the perspective of a specific instructional goal and a particular educational domain (here is an earlier webinar  and working example about it).  These two examples just skim the surface of what seems possible when we flip the relationship between concept and contexts.  If we have time we will describe how Language Culture and Literacy Education doctoral student Tara Kelly is remixing these same core ideas in her Freshman Composition courses.

As for the Khan Academy videos, LS doctoral students Rebecca Itow and Andi Strackeljahn have been piloting and validating curricular modules for secondary English and Algebra.  Each module is aligned to a primary and a secondary Common Core State Standard and has students consider how the concepts outline in the standards is or is not relevant in the context of different Open Educational Resources.  These include multiple Kahn Academy videos, as well as lots of other contexts like online graphing calculators and topical discussion forums.  Thus, students can practice using course knowledge tools conceptually by discussing their appropriateness in different expository resources like streaming videos.  Furthermore they can practicing using those knowledge tools procedurally in interactive resources like graphing calculators, historically in encyclopedic resources like Wikipedia, and personally in social resources like the Algebra Forum.  We hope to be able to discuss what this would look like in intro psych courses as well.

We have a lot of work to do, and hopefully a pending NSF proposal will allow us to do so. But in the meantime, we are quite encouraged.  It takes a while for teachers to see what we are doing.  But the students seem to “get it” pretty quickly.  And we are finding that the students who have been struggling the most (who presumably lack sufficient experience to make sense of course concepts when presented in the abstract ) respond particularly well.  And we have obtained statistically significant gains on tests consisting of randomly sampled items aligned to the standards but independent of the curriculum.  You can check out our current modules and ask our teacher collaborators about their experience at the PLAnet.

Some Things about Assessment that Badge Developers Might Find Helpful

Erin Knight, Director of Learning at the Mozilla Foundation, was kind enough to introduce me to Greg Wilson, the founder of the non-profit Software Carpentry. Mozilla is supporting their efforts to teach basic computer skills to scientists to help them manage their data and be more productive. Greg and I discussed the challenges and opportunities in assessing the impact of their hybrid mix of face-to-face workshops and online courses. More about that later.
Greg is as passionate about education as he is about programming. We discussed Audrey Watters recent tweet regarding “things every techie should know about education.” But the subject of “education” seemed too vast for me right now. Watching the debate unfold around the DML badges competition suggested something more modest and tentative. I have been trying to figure out how existing research literature on assessment, accountability, and validity is (and is not) relevant to the funded and unfunded badge development proposals. In particular I want to explore whether distinctions that are widely held in the assessment community can help show some of the concerns that people have raised about badges (nicely captured at David Theo Goldberg’s “Threading the Needle…” DML post). Greg’s inspiration resulted in six pages, which I managed to trim (!) back to the following with a focus on badges. (An abbreviated version is posted at the HASTAC blog)




 
A. There seem to be three types of primary goals for badge practices.
A review of the funded and unfunded proposals shows quite a range of goals for badges. Based on the information posted at the DML competition website, the primary goal of most of the badging proposals falls into one of three categories:
  1. Use badges to show what somebody has done or might be able to do. This seems like the goal of badges in the Badgework for Vets and 4-H Robotics proposals.
  2. Use badges to motivate more individuals to do or learn more. Badges in 3D Game Lab and BuzzMath proposals seem to accomplish this.
  3. Use badges to transform or even create learning systems. This is what badges have accomplished in Stackoverflow and seems like the goal of badges in the MOUSE Wins! and Pathways for Lifelong Learning proposals.
These are not mutually exclusive categories. In many cases the second goal encompasses the first goal and the third goal encompasses the first and second goals.
B. These three types of goals appear to correspond with the three primary assessment functions.
Most of these goals require some form of assessment. Whether we like it or not, assessment is complex. Arguably these three goals correspond with three assessment functions (or what others have labeled as purposes):
  1. Summative functions, which are often called assessment OF learning.
  2. Formative functions for individuals, which are often called assessment FOR learning.
  3. Transformative functions for systems, which a few are calling assessment AS learning.
C. Different assessment functions generally follow from different theories of knowing and learning, but these assumptions are often taken for granted. And the relationship between assumptions about learning and assessment practices are often in tension.
  1. Summative functions generally follow from conventional associationist views of learning as building organized hierarchies of very specific associations. These concerns are generally consistent with the “folk psychology” views of learning as “more stuff.”
  2. Formative functions follow from modern constructivist theories of learning as constructing conceptual schema by making sense of the world. These include socio-constructivist theories that emphasize the role of sociotechnical contexts in the way that individuals construct meaning.
  3. Transformative functions follow from newer sociocultural theories of learning as participating in social and technological practices.
The key point here is that these three assessment functions often conflict with each other in complex ways. In particular, summative functions often undermine formative and transformative functions. This is because ratcheting up the stakes associated with summative functions (i.e., the value of the badge) often requires assessments that are “indirect” and “objective” like an achievement test. As John Frederiksen and Allan Collins pointed out back in 1989, such assessments have limited formative and transformative potential, compared to more direct and subjective performance and portfolio assessments. Appreciation of this point requires a foray into the complex realm of validity:
D. Each assessment function raises distinct validity concerns.
Getting serious about assessment means thinking about evidence, and that quickly raises the issue of the trustworthiness of the evidence. Validity has to do with the trustworthiness of the evidence for whatever claims one wishes to make about assessment; validity is not the same as reliability, which is a property of the assessment itself. Each of the three assessment functions raises different concerns about the validity of the corresponding evidence:
  1. Summative functions raise concerns about evidential validity: How convincing is the evidence that this person has done or will be able to do what this badge represents? Many assessment theorists like Jim Popham break this down further into content-related, criterion-related, and construct-related evidence. Measurement theorists like Sam Messick break it down even more, but these distinctions are probably too nuanced for now.
  2. Formative functions raise concerns about consequential validity: How convincing is the evidence that these badges led individuals to do or learn more? Consequential validity is often broken down into intended consequences (always desirable) and unintended consequences (usually undesirable).
  3. Transformative functions raise concerns about systemic validity: How trustworthy is the evidence that this educational system might not exist if we had not used badges? Frederiksen and Collins pointed out that the systemic validity of an assessment practice is linked to its directness and subjectivity.
This is where having multiple goals for a single set of badges, or having different goals for different badges, in a single system can get exceedingly complicated. My point here is that badge developers should consider the various goals for their badges, and the assumptions behind those goals. Failing to do so can create “wicked” tensions that are impossible to resolve. This can be toxic to educational systems because stakeholders ascribe those tensions to other things (politics, laziness, culture, faddism, etc.).
In response to my first draft of this post, Greg summarized my point more succinctly and more generally:
People have different philosophical models of education (whether they realize it or not) and that is why they talk across each other so often.
Greg also inspired me to suggest the following contribution to Audry Watters' top ten list of questions you can ask to find out if somebody really knows education, and if you want to know if they know about educational assessment:
Do you understand the difference between summative, formative, and transformative functions of assessment and how they interact?

Initial Consequences of the DML 2012 Badges for Lifelong Learning Competition

Daniel T. Hickey

The announcement of the final awards in MacArthur’s Badges for Lifelong Learning competition on March 2 was quite exciting. It concluded one of the most innovative (and complicated) research competitions ever seen in education-related research. Of course there was some grumbling about the complexity and the reviewing process. And of course the finalists who did not come away with awards were disappointed. But has there ever been a competition without grumbling about the process or the outcome?

A Complicated Competition
The competition was complicated. There were over 300 initial submissions a few months back; a Teacher Mastery category was added at the last minute. Dozens of winners of Stage 1 (Content and Program) and Stage 2 (Design and Tech) went to San Francisco before the DML conference to pitch their ideas to a panel of esteemed judges.

 

After what must have been a terrifically complicated matching process, Stage 1 and Stage 2 winners were paired up. This yielded the final Stage 3 winners who would actually be awarded grants to carry out the work.

The list of funded proposals reveals some terrific ideas. As an assessment researcher, I am particularly excited about the intended broader consequences of the proposed badging practices beyond simply awarding badges. See David Theo Goldberg’s recent post about the broader goals of the initiative. And of course, the unintended consequences (both positive and negative) will be even more interesting and are likely to be even more far reaching. I agree with Cathy Davidson when she suggested that we may well look back on this competition as a "tipping point" for digital media and learning. I will begin summarizing the intended consequences and speculating about the unintended consequences in posts to follow. Before that I want to consider the consequences of this initiative before the awards were even announced.

The Indirect Consequences of the Competition
The parade of directors and department heads from DOE, NASA, Veterans Affairs, and elsewhere at the September 2011 launch event suggested that this initiative was going to have some impact. Given that other education funding agencies routinely spend far more on a single project, this level of attention for a $2M competition must have raised some eyebrows in DC (more at http://bit.ly/w3Jxc0).

By compelling over 300 individuals and teams to draft proposals in the first place, DML generated tremendous interest in the very idea of open badges. Then the Stage 1 and Stage 2 winners clearly invested a lot of additional energy into these ideas. I spoke with a number of Stage 1 winners at the conference who indicated that they hoped to pursue their proposed effort regardless of whether they were funded. And this gets at the transformational/disruptive potential of badges.

Badges as a Useful Disruption
On one hand, it is simple to add open badges to an existing educational ecosystem. With the Open Badges Interface (OBI) being developed by Stage 3 awardee Philipp Schmidt and Peer 2 Peer University, virtually anybody should be able to easily offer digital badges for accomplishments. By structuring and simplifying the peer reviewing process, communities will be able to negotiate criteria and establish validity and value.

But there is more to it. Barry Joseph of Global Kids put it perfectly at the end of the meeting:
"Introducing badges into an educational ecosystem is like developing a new website within a company or an organization.” Barry explained how the seemly simple process of creating a website often reveals unexamined sources of power and information, and forces communities to explicate reams of previously tacit information. Introducing badges forces learning organizations to do the same. Simply drafting a Stage 1 proposal surely led those proposers to consider and reconsider how learning was being acknowledged and rewarded. (BTW, stay tuned for an important announcement from Global Kids in this regard).

Return on Investment
In any socio-historical context, the energy that DML catalyzed around open badges would be a noteworthy development. This development is even more notable when juxtaposed with test-based accountability, with its focus of exceedingly narrow kinds of learning and corrosive use of rewards and punishment. It seems that the structure and community that emerged with just the competition phase was an unprecedented outcome. Given that other agencies and foundations routinely commit far more funds to community building efforts that often fizzle, this seems like a pretty wise investment.


Open Badges and the Future of Assessment

Of course I followed the roll out of MacArthur’s Badges for Lifelong Learning competition quite closely. I have studied participatory approaches to assessment and motivation for many years.  

EXCITEMENT OVER BADGES
While the Digital Media and Learning program committed a relatively modest sum (initially $2M), it generated massive attention and energy.  I was not the only one who was surprised by the scope of the Badges initiative.  In September 2011, one week before the launch of the competition, I was meeting with an education program officer at the National Science Foundation.  I asked her if she had heard about the upcoming press conference/webinar.  Turns out she had been reading the press release just before our meeting.  She indicated that the NSF had learned about the competition and many of the program officers were asking about it.  Like me, many of them were impressed that Education Secretary Duncan and the heads of several other federal agencies were scheduled to speak at the launch event at the Hirshhorn museum,

THE DEBATE OVER BADGES AND REWARDS
As the competition unfolded, I followed the inevitable debate over the consequences of “extrinsic rewards” like badges on student motivation.  Thanks in part to Daniel Pink’s widely read book Drive, many worried that badges would trivialize deep learning and leave learners with decreased intrinsic motivation to learn. The debate was played out nicely (and objectively) at the HASTAC blog via posts from Mitch Resnick and Cathy Davidson .   I have been arguing in obscure academic journals for years that sociocultural views of learning call for an agnostic stance towards incentives.  In particular I believe that the negative impact of rewards and competition says more about the lack of feedback and opportunity to improve in traditional classrooms.  There is a brief summary of these issues in a chapter on sociocultural and situative theories of motivation that Education.com commissioned me to write a few years ago.  One of the things I tried to do in that article and the other articles it references is show why rewards like badges are fundamentally problematic for  constructionists like Mitch, and how newer situative theories of motivation promise to resolve that tension.  One of the things that has been overlooked in the debate is that situative theories reveal the value of rewards without resorting to simplistic behaviorist theories of reinforcing and punishing desired behaviors.



BACKGROUND [FOR MOTIVATION GEEKS AND GRAD STUDENTS]
Historically speaking, the initial thesis of behaviorism assumed that rewards were a fine way to build sufficient proficiency so that success would allow the environment to reward continued engagement; constructionism and other rationalist emerged as the antithesis to behaviorism.  The emphasis on internal sense making explains why constructionists like Mitch react so strongly to rewards and badges.  My argument builds on Greeno's situative dialectical synthesis that suggests that the motivational consequences of rewards are highly contextual and depend on the forms of participation that are either encouraged or discouraged in that reward context]

My own thinking about motivation was strongly influenced by a paper in one of the those same obscure journals (the Elementary School Journal) by DML director Connie Yowell called “Self-Regulation in Democratic Communities.”  I assume that Connie's  ideas about self-regulation being “stretched across” communities of students and teachers and communities of learners and mentors were indeed influential in her thinking about the competition.  I considered weighing in on the debate but I was pretty swamped at the time and I figured I might be better off if I sat on the sidelines and watched and watched the debate unfold.  In particular I was curious if anyone would point to the irony in arguing that badges that promised to empower individual and give them recognition for their accomplishments could simultaneously disempower them.  This might have come up but I did not see any references to it.
                                                                                                                              
A HOTBED FOR INNOVATION (DML NAILS IT)
When the 70 or so Stage 1 winners were posted they covered the gamut and many were quite ambitious.  Meanwhile, the debate over intrinsic motivation proved its irrelevance for actual educational practice and disappeared.  On a blogpost signifying that the number of badge developer submissions has passed 100 (the got something like 300!), Cathy Davidson suggested that someday we will look back at the competition as a “turning” point for digital media and learning.  I think she might be right.  In particular, I think that the many proposals for teacher proficiencies really have the potential to accomplish the crucial goal of helping teachers think about learning in completely new ways.   There were so many teacher proficiency proposals that MacArthur elected to fund and run a separate completion.  While there were fewer submissions for the later badge development and research competitions, they still also ran the gamut.  In particular the student proposals and the research proposals looked particularly promising.

I have talked to a bunch of people today who in San Francisco this week pitching their proposals to the panel.  There are 70 proposals, and only 20 will be funded.  Many of the proposers have said that they are going to pursue open badges even if they don’t get an award.  I think that is awesome, and look forward to finding ways to help that work better.

I am on my way to the meeting where the final stage are to be announced.  I will summarize the winners in my next post.