A test that felt like science
When I think about my “best assessment”, I don't remember the grade I received. That alone tells you something important about why it was so effective.
It was my senior year capstone course in evolutionary biology. The assessment was a collaborative case study analysis about anoles (a type of lizard) that stretched over two classes, examining a real research scenario where evolutionary principles intersected with conservation biology. We worked in teams of four, armed with primary literature, field data, and open access to any resources we could find including our notes, textbook, and all of the work from the semester thus far. No memorization required. No time pressure. Just us wrestling with messy, complex problems that didn't have tidy answers waiting in the back of some textbook.
Anoles
Phylogenetic tree of Anolis lizards showing evolutionary relationships among species (adapted from Wogan et al., 2023)
What made this assessment extraordinary becomes clearer when I consider Schneider and Hutt's (2013) analysis of how grading evolved in American education. They argue that grades began as "an intimate communication tool among teachers, parents, and students used largely to inform and instruct" but transformed into "useful tools in an organizational rather than pedagogical enterprise, tools that would facilitate movement, communication and coordination." (Schneider and Hutt, , ) My capstone assessment felt like a return to that original purpose that actually informed learning rather than just ranking us.
The learning objectives were complex and interconnected: synthesize evolutionary theory with real-world applications, evaluate evidence (which is not as neat as the scenarios we are normally presented with), collaborate effectively, and communicate scientific reasoning. These could not be reduced to multiple choice questions or memorized facts. The processes required real teamwork; we built understanding together, challenged each other's interpretations, and refined our thinking through discussion.
The open-book format was revolutionary for me. Rather than testing whether I could regurgitate information under artificial time constraints, it assessed whether I could actually use evolutionary concepts to solve novel problems. This aligns with my own personal beliefs around knowledge. Knowledge is not just stored facts but the ability to apply concepts in new contexts.
This design reflected what Broadfoot and Black (2004) identify as a crucial but often-overlooked principle: assessments should match their purpose. They note that assessment practices 'can be employed for a great variety of purposes, some of which are potentially of great educational value but are not currently well understood' (p. 10). My professors understood something fundamental: they weren't just measuring what I'd memorized; they were assessing whether I could think like an evolutionary biologist.
The validity of this assessment was obvious. In real scientific work, you have access to resources. You collaborate with colleagues. You wrestle with incomplete data and competing hypotheses. The assessment mirrored authentic practice in ways that traditional testing never had in all of my previous years of schooling. This was the first time that a test really felt like it might be relevant to my career.
Looking back now as someone who studies education, what strikes me is how different this felt from the kinds of assessments I usually see. The lower-stakes environment reduced anxiety and let us focus on genuine learning rather than competition over grades. It also pushed back against the standardization that Schneider and Hutt (2013) describe in the history of grading. Instead of flattening nuance into a single letter or number, this assessment embraced complexity and context in the way real learning often does.
Broadfoot and Black (2004) warn about the tension between assessment that empowers learners and assessment used punitively to raise standards. They argue that 'any possible short-term gains that the more or less extreme instrumentalism of the latter engenders...is bought at the price of turning many students off formal learning forever' (p. 11). My capstone assessment sat firmly in the empowerment camp: it made me want to keep learning, not just to perform for a grade.
What stays with me is not the grade or even whether our group’s analysis was “right.” It is the memory of puzzling through incomplete data, disagreeing with teammates, and realizing that doing science often means sitting with uncertainty. I cannot say this experience gave me a blueprint for how every assessment should look, but it reshaped how I think about what they could be. Even now, I wonder how many of my other classes might have felt different if they had asked me to use knowledge instead of just recall it.
References
Broadfoot, P., & Black, P. (2004). Redefining assessment? The first ten years of Assessment in Education. Assessment in Education: Principles, Policy & Practice, 11(1), 7–26. https://doi.org/10.1080/0969594042000208976
Schneider, J., & Hutt, E. (2013). Making the grade: A history of the A–F marking scheme. Journal of Curriculum Studies. https://doi.org/10.1080/00220272.2013.790480
Wogan, G. O. U., Yuan, M. L., Mahler, D. L., & Wang, I. J. (2023). Hybridization and transgressive evolution generate diversity in an adaptive radiation of Anolis lizards. Systematic Biology, 72(4), 874–884.https://doi.org/10.1093/sysbio/syad024