To Test Or Not To Test? That Is The Question – Using Retrieval Practice To Make Stuff Stick


If there’s one way to instantly divide opinion in the teaching profession you only have to mention the word ‘testing’. It is seen by some as the evil part of education: created to destroy both pupil and teacher self-esteem. Often perceived in primary education as a one-hour written task, completed in silence, consequent data collection is then used to scrutinise quality of teaching and learning. At its worst, it is the high-stakes, end of key stage tests for Year 6 in England, where so much accountability is placed. Pupils sit a 1 hour reading test, currently not linked to the curriculum; a 45 min grammar test; a 20 min spelling test; and three maths tests (one 30 min arithmetic and two 40 min reasoning papers); totalling approximately less than 4 hrs. Make no mistake, schools are heavily judged and compared by this data, not just by Ofsted, but the local authority, parents and worse, each other. So is it really that surprising that tests have such a bad reputation?

Testing: assessment of learning or assessment for learning?

One of the issues with testing is that schools provide tests that are done to the pupils and not with them. For example, primary schools often collect data from tests at 3 points in the academic year, one per term. These are usually in the form of published written tests in reading, grammar and maths. But does this help to identify what the pupils do, and don’t know?

“The distinction between assessment of learning and assessment for learning is basically about the intention behind the assessment.  So, if you’re assessing in order to help you teach better, that’s assessment for learning, and if you’re assessing in order to grade students, to rank them or to give them a score on a test, then that’s assessment of learning.  But in classrooms I see plenty of what I would call formative intention but very little formative action. Teachers often say to me that they collect information in order to take action to help students, but if you follow it through, you find that the data never get acted on and the teaching never changes direction…If you’re not using the evidence to do something that you couldn’t have done without the evidence, you’re not doing formative assessment.” Dylan Wiliam, Institute of Education, University of London, Keynote (2006)

Tim Oates, Cambridge Assessment talks of the principle that assessment should be about children producing more ‘stuff’ and that ‘stuff’ can be looked at by teachers. Teachers should be assessment kleptomaniacs to support learning and to assess if the children has understood the idea or key concepts knowledge and skills.

Working in a primary school we also believe that testing needs to be done more, not less. In fact, we would argue that in most schools, testing is going on all the time – teachers just don’t realise what they are doing is testing! And some of the most beneficial testing techniques, for students of all ages, to enhance learning in both recall and comprehension, is retrieval practice. (Dunlosky et al, 2013)

Types of testing (retrieval practice):

  • Verbal Questioning

A staple of any teacher’s arsenal is questioning. And why do teachers ask questions? To test if pupils can remember prior learning. There are several aspects to consider however, when questioning (though I will not address them all here):

The most commonly used questioning methods are the least effective. There are many things wrong with asking a question to a class, where some hands go up, a teacher chooses a pupil who answers correctly and then moves on. One problem is that a teacher only learns what one student thinks, not how all the rest would have answered. Petty goes on to state that assertive questioning is more beneficial as for one, it means that all students are thinking and that they don’t have the choice of ‘opting out’. (Geoff Petty, Evidenced Based Teaching 2017).

Doug Lemov, Teach Like A Champion (2015), also adheres to the importance of wait times when asking questions to ensure every child has adequate time to respond. Teachers sometimes are too concerned with keeping pace and flow of lessons, that they sometimes move on too quickly.

Questions are good. Just be sure every child has an opportunity to think hard and not opt out.


  • Multiple Choice Quizzing (MCQ)

Have you, as an adult, ever taken part in a quiz, be it a ‘pub quiz’, family board game or play-along with a television show, such as Who Wants To Be A Millionaire? Have you noticed other people’s reactions when they get an answer correct? Perhaps a “YESSSSSS!” with a triumphant fist-pump for good measure! Try giving a MCQ to your class during and after teaching them about a given topic. Watch their reactions as they get more and more questions correct. The real beauty of MCQ is that if feedback is provided to incorrect answers, albeit to individuals or whole class, they will do better next time. MCQ is a powerful tool for learning that is in essence retrieval practice. Further reading can be accessed here

One word of warning, when creating MCQ, Daisy Christodoulou does comment on the need to ensure that incorrect answers are plausible. It is no use, for example, when asking, ‘Who led the Nazi party in 1939?’ provided choices are too obscure or ‘silly’:

  1. Adolf Hitler
  2. Ed Sheeran
  3. Germany
  4. Sir Francis Drake

This would consequently test deduction more than knowledge.


Online software such as Socrative can be used to test curriculum knowledge through MCQ. Children complete assigned quizzes individually or through teacher-led sessions. Furthermore, they can be aligned with Knowledge Organisers to test key knowledge throughout the year.


A Year 6 History Example of Multiple Choice Quizzing


Brain Dumps

A brain dump is a complete transfer of knowledge about a subject from a pupil’s brain to some other storage medium. This could be in the form of recorded speech, drawings or writing on paper.

The example below is from a Year 3 unit of work on the Ancient Egyptians. After several weeks, the children were asked to do a brain dump on plain paper. The examples are from a pupil with low prior attainment (left) and high prior attainment (right).

brain dump y3


The teacher was pleased with some of the retention, particularly the usefulness of the River Nile. As we can see, on the left the pupil was able to talk about papyrus and where it came from. The pupil on the right remembered that flax came from a plant and in turn was manufactured into linen clothes. Incidentally, this knowledge was obtained through reading and instruction.

My colleague, Carl Badger, Assistant Headteacher, took brain dumping further by asking pupils to complete one for a foundation subject scrutiny we did together. Rather than the historical book trawl, looking at quality and quantity of tasks as well as reviewing presentation and even marking, pupils were prompted to show what they had learnt on paper through drawings and annotations. This made learning very real indeed.

We then took this one step further and presented the dumps back to staff for reflection. Below is an example of a reflection sheet that was completed by staff:

brain dump reflection

Think about current practice after a learning scrutiny/book trawl. Would systems facilitate such reflection in teachers?


If you asked EYFS practitioners how often they tested children, what would you imagine their response to be? Take phonics, for instance. The use of flashcards is a key feature of helping children to remember phonemes and their corresponding graphemes. Yet not every teacher of phonics would realise that what they are actually doing on a daily basis is both retrieval practice (flash card is shown and children response with appropriate phoneme) and interleaving (although a new phoneme may be introduced daily, the mixing up of practice such as blending, retrieval of ‘tricky’ words and written practice).  For more guidance on interleaving see and


Knowledge Organisers (KOs)

Knowledge Organisers are being used increasingly in schools to refine fundamental aspects of learning within a unit. Although not a test itself, it can be used to form part of a retrieval practice whereby pupils are given KOs that may be partially or fully blanked-out to see how much information the pupil can remember. Once completed, the teacher may share class or individual feedback and use these ‘gaps’ to revisit prior lessons.



Although all of these ‘tests’ can be used as a form of assessment, retrieval practice is first and foremost a learning tool.

The importance of spacing

Prior research has shown that the difficulty of initial retrieval is correlated with later retention (Karpicke & Roediger 2007; Benjamin, Bjork, & Schwartz, 1998), as well as direct evidence that delaying an initial retrieval attempt enhances performance on a later criterial test (Jacoby, 1978; Whitten & Bjork, 1977). Therefore, it’s recommended that a gap in time is used between retrieval. This would vary depending on the age of the child. Whereas a 7-day gap may be ideal for older pupils, for younger ones who are 4-6 a day gap may prove sufficient. So use retrieval practice to help make stuff stick and ensure you space your learning and testing so that they stick for longer.

Happy testing!


C.Westby, Old Hill Primary School. Twitter: @eggegg80

C.Badger, Old Hill primary School. Twitter: @badgeml1968




Agarwal, P. K., Bain, P. M., & Chamberlain, R. W. (2012). The value of applied research: Retrieval practice improves classroom learning and recommendations from a teacher, a principal, and a scientist. Educational Psychology Review, 24, 437–448

Benjamin, A. S., Bjork, R. A., & Schwartz, B. L. (1998). The mismeasure of memory: When retrieval fluency is misleading as a metamnemonic index. Journal of Experimental Psychology: General, 127, 55– 68.

Dempster, F. N. (1997). Distributing and managing the conditions of encoding and practice. In E. L. Bjork & R. A. Bjork (Eds). Human Memory(pp. 197-236). San Diego, CA: Academic Press.

Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Improving students’ learning with effective learning techniques: Promising directions from cognitive and educational psychology. Psychological Science in the Public Interest, 14, 4–58.

Karpicke, J. D., & Roediger, H. L. (2007). Expanding retrieval practice promotes short-term retention, but equally spaced retrieval enhances long-term retention. Journal of Experimental McDaniel et al. (2011) Journal of Educational Psychology 103:399–414

Roediger, H. L. III., & Karpicke, J. D. (2006). The power of testing memory: Basic research and implications for educational practice. Psychological Science, 1, 181-210.



Writing Assessment: There And Back Again (Part 2)

there back again 2NEW: Added test prompt examples, whole school objectives & example of Insight data  

DISCLAIMER: If you missed Part 1, you can find it here. Anything I write here is of course just my personal opinion and experiences. I am not trying to convert anyone to one form of assessment over another. Writing this down does help me to organise my thoughts. I have deliberately kept this blog citation-free but most of what I have said can be supported by evidence. I hope that any feedback we receive, whether positive or negative, will further help to develop our understanding of writing assessment. I appreciate this blog is longgggggg, but I feel it’s vital to cover some of our journey and reasons behind our decisions. I also believe that by continuing to share ideas within the profession, together we will reflect and improve on our own practice and experiences. I would hate to live in an echo chamber.

Before I share are writing assessment system, I would like to discuss the main assessment systems we already have.

When we examine current writing assessment models, they really fall into three main categories:

  1. Secure fit
  2. Best fit
  3. Comparative Judgement

Let’s take a brief look at each…

Secure Fit

Where a pupil’s writing should meet all the statements within the standard at which they are judged. However, The 2017-18 guidance does state:

“teachers can use their discretion to ensure that, on occasion, a particular weakness does not prevent an accurate judgement being made of a pupil’s attainment overall.”

Which many feel is an improvement on all last year’s all or nothing approach but is far from perfect.

Best Fit

Prior to secure fit, rubrics were produced by the STA under three main strands:

  • Sentence structure & Punctuation
  • Text structure & Organisation
  • Composition & Effect

Comparative Judgement

Comparing pieces of writing to decide which is better. This can be done as a manual sort or using an online tool.

So which is best?

I think to answer this question we have to examine the purpose of assessment. What is for? Who is it for? No really, be honest. Why are you giving a child a writing assessment? Is it because you want to find out what they can or cannot show in a particular task or is it because it’s the end of autumn term, and well, historically, that’s when we do writing assessments?! Or is it so you can hand some lovely looking data in for pupil progress, governors etc?

Over the years, I have read a lot about, ‘by using model x has told us this’ or ‘by using y we have a reliability score of 0.8.’ But whenever I ask colleagues this key question (and I’ve asked it multiple times both on Twitter and in ‘real life’):

Did your assessments tell you anything that you didn’t already know?

I inevitably get a muted response.

When teachers are brutally honest, half of the things that they do aren’t because it helps them. Why? I think it is because the profession likes to create solutions for problems that don’t exist. And the perception is that if you can make your data look like you are achieving or identifying gaps, then, ‘Hurrah!’ What a great teacher/leader/school you are! I’ll say it again, most teachers will know this without the need to jump through these assessment hoops – ours included!

And most teachers will know exactly what children are and aren’t capable of through their daily writing. The only aspect this doesn’t include is the ‘small’ matter of independency. I know that Ros Wilson’s Big Write plans in for independent opportunities on a weekly basis thus promoting a true reflection of what a pupil is capable of. Not what they are capable of with modelling, scaffolding or word banks and the like. It is no wonder end of key stage assessment is so unreliable.

Which brings me on to comparative judgement. One of the online programs lauds as an assessment system that can achieve high reliability. That it can. I’ve seen (and personally experienced) many examples of data outcomes. The problem is, is that the scripts that have been judged may not have had equal input. So whilst the process of judging scripts is reliable, I would argue that the testing conditions are not.  For summative purposes, if test conditions are correct, then I think comparative judgement could be used for internal assessments and aid the moderation process but I don’t believe it is appropriate for end of key stage assessments in the current climate. Because, if you are a Year 2 or Year 6 teacher, using CJ is not an option (well it is, but you will be doing double the workload by doing CJ and then re-assessing against the national criteria).

You see, we have so many conflicts in our system it isn’t surprising we are struggling to get writing assessment right. The national curriculum is a set of subjects and standards used by primary and secondary schools so children learn the same things. It covers what subjects are taught and the standards children should reach in each subject (for mainstream schools). Schools are different, and therefore use and interpret the NC in different ways. Consequently, in writing, schools may not teach children all the same things in the same year groups. One of the hardest tasks to do since the abolishment of levels is to moderate writing cross-school. Not because of logistics: online tools such as Nomoremarking helps to share writing simply. No. It’s because there is no set standard for year groups other than Y2 and Y6 in primary writing. Don’t forget, the English curriculum is divided into lower and upper KS2 rather than year-group specific objectives. So what might be ‘on-track’ for one school in say, Y3, may not be for a Y3 cohort in a different school.

Why did we stop using CJ to assess writing? Simple. Because until we’re told otherwise, we have to teach children to meet the expected standard outlined by the DfE at the end of Y2 and Y6. And actually, I think there should be some non-negotiables to writing. If our school continued to solely use CJ to assess writing, we would only be fooling ourselves to believe that children who are meeting our standard are also meeting national standard. Although advocates of CJ herald the non-rubric approach, staff that I have spoken to who have used CJ all seem to have similar internal principles that they use to judge writing:

  • Handwriting (bias)!
  • Sentence structure & grammar
  • Vocabulary choices
  • Composition

Low and behold! Isn’t similar to the STA grids?

So to answer the question, ‘which is best?’ Who knows? But after having used all of the three models above, Carl and I have in a sense combined both secure & best fit with CJ. This is no silver bullet. It’s far from perfect, but by sharing this within the community perhaps we can make it better, together.


Last point that needs addressing. I’m sorry, I know writing tests are far from perfect but using teacher assessments for high stakes accountability just doesn’t work! We are advocates of testing what we teach. You only get an honest account by giving children a (writing) test in one form or another. Of course, there are limitations to this, too. It’s one piece, on one particular day etc etc… but it’s honest! Perhaps what the the child wrote was rubbish compared to ‘normal’.  But it’s honest! Again, I’ll refer back to Ros Wilson’s approach to highlight the importance of writing independence.

So now you know some of our thought-processes, without further ado I’ll show you our new writing assessment system that…



These are the exact notes I wrote in preparation for staff meeting.

1st Staff Meeting

  • Ask question: did using CJ help you to know what to teach next in your class?
  • Show template above – explain thought process & limitations CJ (gives overview but doesn’t help teacher to know what their class is capable of)
  • Each class to be given streamlined objectives – discuss and agree in staff meeting
  • Staff to write their own test based on what children have covered in the curriculum (1 fiction/1non-fiction). To ensure independency, do not set task that has been recently covered. E.g. if you have just asked children to write a diary of a Roman soldier, don’t set this as the task. CW & CB to provide example.

2nd Staff Meeting

  • After the students have written both tests, 2nd staff meeting to judge and moderate:
    1. Confirm 2 students who meet the ‘on-track’ criteria (through whole staff agreement). There may be some leeway with certain objective components e.g. pupil not shown can use ;
    2. Use manual CJ to sort into 3 piles: working towards, on-track & greater depth
    3. Staff will then complete the assessment sheet above and complete online Insight tracker (yes/no field) which will provide percentage of pupil groups etc.
    4. Teach what points are missing to ‘working towards pupils’ to get more on track
    5. Use reading and writing opportunities to extend those that are on-track/greater depth
    6. Profit!


Whole School Objectives (we aren’t 100% happy with these and they will likely evolve over the year)

writing school objectives

Writing Prompt Examples

Year 2 – Non-fiction (based on a real life experience)

writing assessment prompt Y2

Year 5 – Non-fiction (Space)

You have been exploring deep Space for several years. You discover an unknown planet and are assigned to make a detailed report about its suitability for living. You have been asked to write about three main areas:

  • What the planet is like (surface, temperature etc.) and if it has suitable conditions for living
  • What life is currently there
  • Any interesting discoveries you have made on the planet

space pics

Year 6 – Fiction (linked to science topic ‘electricity’ & based on the animation ‘Replay’ which they watched last year with me) 

y6 1y6 2

Comparative Judgement Element

This grid is used during the 3-pile book sort. Really, we’re focusing on those children who aren’t on track. There may be a few children who meet 4/5 points for on-track (such as Sarah, Jack & Layla in the table below) and won’t be ‘penalised’ for not hitting all five, dependent on which objective they aren’t secure in. Providing opportunities for those children working at ‘greater depth’ to continue to experiment.

Assessment writing grid

How Data could look using Insighttracking

insight 1insight 2

The above takes no time to generate and is very helpful for teachers, subject leaders, HTs, governors etc. to see an overview of the class (again, this is a snapshot and should not be used to bash staff with).



I don’t agree with tests. It doesn’t provide a fair reflection of what children are capable of.

Fine, wallow in your false assessments. How will you really know what your pupils can do by themselves? And consequently, what to teach next. We are well aware of the limitations of testing writing as well as the benefits.


Here’s a great discussion we had via Twitter. I would like to thank @egtexts for stimulating debate.

twitter 3

In answer to the last question, we aren’t testing features of a diary/report etc. We are testing their basic writing skills. What separates good writers from really good ones is where the comparative judgement element of our assessment comes in.

There was also some interesting interpretation as to what writing tests look like. Some colleagues still have the impression that a writing test must look like the Y6 SATs of yonder years: one longer and one shorter, timed piece. We are not suggesting this. In fact, teachers could be flexible with sessions and how long these are.

What made you choose the objectives ?

The principles behind our decision boiled down to one main factor: what are the essential skills that writers should be using autonomously so that working memory can be freed for bettering content and experimentation?

What about the other objectives in the National Curriculum? Where are the relative clauses etc?

See above. We believe over-practising basic skills is more beneficial than spending time teaching writers more complex skills that they aren’t ready for, which consequently cause overloading.

So you don’t teach all of the writing National Curriculum? 

In Year 5, for example, we believe it is more important to practise certain skills over say, active/passive voice. There’s no point in teaching active and passive voice if a pupil can’t use full stops or inverted commas correctly. Of course, teachers should teach to the needs of their class.

What about your high prior attainers (HPA)? How will teachers know what to teach them to improve? 

Because of our heavily-text based curriculum, we find these children are transferring reading knowledge well into their own writing. Outcomes from the manual CJ aspect can also be used to gauge what HPA need to do next with their writing.

What about progress?

You can’t measure it! So…we are stealing Clare Sealy’s School system of: here’s the first piece they did, here’s the next, here’s the last. Have they made progress? Yes or no?


So there you have it! If you made it this far, I salute you. Now bring on the critique!

Writing Assessment by C.Westby @eggegg80 & C.Badger @badgeml1968 
Old Hill Primary School

Writing Assessment: There And Back Again (Part 1)

therebackagainFor almost two years, we have been experimenting with writing assessments and trying to make them more meaningful. We have gone from criterion scales to best fit, to NC objectives, to nothing, to comparative judgement. And now, here we are at yet another attempt to find the balance between workload and the usefulness of primary writing assessments. We haven’t even started our new idea yet, but I would welcome suggestions and potential pitfalls along the way. Hence, I’ll be sharing our idea in Part 2.

I know that many colleagues are in the midst of trialling CJ. I have written a few blogs about our experience here. I have also been in contact with several schools over setting it up as well as some of the things you can do with it (I think I’m right in saying we were one of the first to try out anchoring for showing progress). More recently, I have been asked by a few other schools if we would like to do cross-school moderation/assessment judgements with them. We aren’t using online CJ this year, but here’s a few reasons why you should consider it:


+ CPD – all staff get to see a whole lot of writing in the school

+ Can provide a ‘flavour’ of what writing is like throughout the school

+ Reliability score – no disputing the numbers or agreements (typically around 0.8)

+ Speed when moderating (this can’t be emphasized enough)


As you can see, we found many positives using online CJ. However, after we had discussed the outcomes. Shared the scripts with both staff and pupils. Explored the rankings. Examined the graphs. Looked at the extremities. Used the feedback from staff to identify next steps in writing for the school. Shared data with governors. You know what we found out? Nothing new. Our children aren’t great at grammar (which we knew). Lots carn’t spel (we knew this). Some children are more creative or have a stronger writing voice (which we knew) and some are a little more r-o-b-o-t-i-c (which we knew). A few can’t help but can’t help but repeat themselves (knew). Some have no ideas for story writing (we knew – they are usually the reluctant readers). Hardly any could make their colons the correct size! 😉


The issue with the way we did it, in hindsight (damn you, hindsight!) was that teachers didn’t know which scripts they were marking. ‘Duh, that’s the point!’ I hear you say. But actually, is it? Isn’t the point of assessment, for teachers, to find out about their class: what they can and can’t do? By setting up CJ the way we did, we removed this crucial aspect. So the staff went away with generic whole-school issues, still being blissfully unaware of what their class were capable of without reading through all of the scripts again (which defeats the purpose in the first place, right?). Let me emphasize, that this is our findings. I’m sure there are schools out there that have used CJ far more effectively than we did and I am very much watching this space.


It’s interesting to read that NNM (who have been great, by the way) are aware of some of my current feelings towards (writing) assessment:

Of course, you could argue that the traditional teacher assessment also provides pupils with regular useful feedback, whereas comparative judgement is just providing an intermittent grade. We’ll deal with this point more in future posts, but for now, briefly, we’d argue that the feedback pupils get from traditional assessment, often in the form of a written comment taken from the frameworks, is not actually that helpful.


In the cold light of day, it all boils down to (idioms galore): why are we doing this? Really, though. Think about your staff. Do they really need to go through *insert whatever form of writing assessment you currently do* to have a good understanding of what to teach next? Does the impact from said assessment process warrant the time it takes?  What about measuring progress? I’ll leave that one for James Pembroke to hammer home, here and here


Why am I writing this blog? Well, I have an itch in the form of writing assessment at the moment. It’s the annoying itch in the middle of your back where, try as you might, you can’t quite scratch the right spot. I still love CJ and we are incorporating it into our new method of assessing writing, which I will be sharing in Part 2. But perhaps our school has missed a trick with it. And I don’t want to miss it. So if you have read this and are sat at your screen thinking, ‘Duh! What about x, y or z.’ I’d love to hear it. After all, we’re all in the same boat and it would be great to hear others’ views and ideas to develop assessment together, assidere style!

CJ, Anchors & Our Spring Assessment

So a few weeks ago, I blogged, somewhat controversially, on the Sharing Standards Experience. It did ruffle a few feathers but in a way, I’m glad. NoMoreMarking has made great steps in bringing to the assessment table something that could really help teachers. Refining the test conditions, presentation of scripts and the user interface is still, I believe, work in progress.

I’d like to share our spring data with you. Just to remind you that we were very specific in our test conditions:

  • Children from 1-6 wrote a narrative from a choice of 3 images in autumn term
  • Children from 1-5 wrote a diary entry from a choice of 3 stimuli in spring (Y6 were involved in Sharing Standards so )
  • No prior teaching, modelling or sharing similar pieces of writing
  • No pooling of ideas or any guidance was given.
  • No redrafting or feedback was given
  • Each child completed the writing during the morning session



In the autumn term we had identified narrative as an area of focus for our school. Consequently this was the task for the autumn CJ. Our assistant head, Carl Badger, then led training on improving narrative writing. We could have used CJ again to compare the before and after to measure the impact of the intervention. This would be one good use of CJ to measure progress over a specific area. But quite frankly, it wasn’t necessary. We didn’t need an assessment to see which children had or hadn’t improved. This was evidenced in books. Most had, some hadn’t.

So in spring we decided to give the children a diary entry as we wanted to have a look at non-fiction. Diaries can incorporate elements of narrative so although it isn’t ideal to compare the two, the text types aren’t a world apart.



So here it is, in all its glory!

progress cj

Thank you Chris @nomoremarking for producing this for me.

Overall we can see ‘progress’. I use this tentatively as we are only comparing two pieces of writing. We realise this isn’t enough to make firm conclusions about the learning across the school. Nevertheless it’s interesting! In case you don’t know, the dots are the extremes. On the face of it, Year 4 have made the most progress with Year 2 showing some rather peculiar (and extreme) outcomes for that task. They’ve obviously been untaught and the Year 2 teacher needs to go!


Because the top two scripts have a lot of personal information in, I cannot share them but here’s the highest script from Y4 (ranked 3rd overall) – a true reflection of how this child writes, completely independently:

spring best scriptspring best script 2

Even with this highly ranked script, it’s simple to pick out next steps for this child (language, paragraphing, punctuation range e.g. ‘four-sided’, handwriting, to name a few).


We used anchors from the autumn assessment to compare the spring scripts on a scaled score. This was the highest ranked script overall from the autumn (Y6 pupil) that became one of the anchors.

ladder of curiosity

We used a total of 8 anchors.


What now?

We have already begun trialling direct instruction to improve writing standards and will use CJ to compare prior to, and then at the conclusion of the intervention. I will be sharing the results of this.

At the end of this year we will choose another piece of writing to compare as part of our ‘testing the waters’ with a view to using CJ to compare portfolios in the next academic year. I’m still not convinced about trying to compare 6 different pieces of writing on screen all at once using CJ. We will either have set forms/genres that appear in the same order on-screen or will judge pieces 1 vs 1 and produce some sort of average.


Food for thought

School leaders and governors need to be able to talk about pupil groups, slow learners etc. as well as how well staff are performing. With this in mind, here’s a few questions our school is currently considering:

  • As with any assessment, for what purpose are you using CJ?
  • Who is this assessment for?
  • Does it reveal anything we don’t know already?
  • Will doing this assessment have a positive impact on learning (considering workload/time etc.)?


I have always maintained that I like CJ, and like all assessment systems, careful consideration should be given to why it is being used, how much it actually informs us, and the impact it will have on standards.

Comparative Judgement: The Sharing Standards Experience

Those who read my 2016 blog on CJ will know how interested I am in using CJ to improve our assessment in writing. Like other schools, we took part in the KS2 ‘Sharing Standards’ project. We were really pleased to be in a project where schools across the UK shared their Year 6 writing and have high hopes that it will be a vehicle for driving assessment strategies forward.

The difference to our first two internal writing assessments we have made using CJ was that we only assessed 30 children. However, the 30 children assessed had a portfolio (3 different pieces of writing in genre and form) to compare against. Whilst I understand the principle behind the thinking (a range of writing needs to be compared, not just one), for us it didn’t work as well. Here’s why…


Apples and Oranges

The first issue was that we weren’t comparing like for like. Some children had written a story, diary and report whilst others had submitted instructions, story and biographies. Even in the Mars and football example Chris Weadon often refers to, a report was compared to a story. Now whilst some may argue this shouldn’t matter, I would question how much prior information the ‘Mars’ child was given in order to write the report. Furthermore, when the scripts appeared on screen, they weren’t in any particular order of form or genre. Some had completely different genres. Both of which made it very difficult to compare.

Rather than:

cj colour shades 1

We were doing:

cj colours shades 2

Staff had to scroll back and forth to try and make comparisons from the sets of work. I could see the look of frustration on their faces. It wasn’t as quick as it should be. There wasn’t the same buzz in the room we had had previously. There was less discussion. It was hard. Overall, it was a completely different experience for staff compared to when we had used CJ before, and not in a good way.



In future, I would suggest that the scripts from each school were organised on the screen so that they appeared in a set order:

  • script 1, diary entry
  • script 2, newspaper report
  • script 3, story

This would make judging them through comparison much easier.

Another thought is to hold separate sessions so that genres/forms are ranked individually. So session 1: diaries; session 2: newspapers and session 3: stories. The ranks could then be averaged out to show the mean rank. I’m not amazing with data and there is probably a huge flaw in doing this but just putting it out there!


Final thought 

I still think CJ is good and it’s still in its (primary writing assessment) infancy. With more consideration it could be great. But, please, let’s not try and compare apples and oranges.



Comparative Judgement Day – exploring a different way of assessing writing

For many years now, I felt that the way writing was assessed in primary schools was wrong. But, ironically, as with judging writing itself, I couldn’t quite put my finger on why. With the more recent changes in assessments, came the secure-fit model whereby pupils have to show evidence of all of the standards, without exception. I won’t go into how much I despise this form of assessment other than that I have witnessed numerous situations where pieces of work are down-graded (or sometimes up) because of handwriting, or no evidence of the conjunction ‘but’ despite showing creative flair or voice. Is this really the best our country can come up with? Surely there is a better way…

Then I remembered watching the excellent (as always) speakers at the Beyond Levels conference in Sheffield. Ally Daubney & Prof. Martin Fautley spoke about music assessment and this got me thinking about writing. Specifically, how we are judging the art of writing as though it is mathematics. I found myself asking questions: Why do some people like certain pieces of music over others? Similarly, the same could be said about books. The 50 Shades of *insert your funniest replacement here* trilogy certainly captured the interest of reluctant adult readers like no other before them. A few years ago we even had a parent write a review of the book and say just how much the book has changed her life – I didn’t delve further!

So, what I am waffling on about is that although we know we have lots of great writers at our school, the way in which we were assessing them, internally, didn’t show this. Yes, our children could use fronted adverbials. Yes, they could use colons before a list and to provide further explanation to the previous independent clause. And yes, grammar and punctuation is very important in writing. But where many were falling down was when they had to produce their own ideas for writing, independently. And then of course, there were those adorable writers who had the messiest scrawl that were able to take you away into their own world without a passive voice sentence in sight. Don’t get me wrong, our school does an extremely good job with both attainment and progress at the end of KS1 and KS2. But it just felt wrong to use a form of assessment that emphasises grammar and spelling so heavily. We didn’t give enough credence to creativity.

Then I stumbled upon this blog by Daisy Christodoulou about comparative judgement. Then another excellent article from David Didau here (there are two further parts that are a must-read). I began blabbering uncontrollably about CJ to Carl (our assistant headteacher). This is it! It makes sense!  And so after a million-and-one emails to Chris, we finally judged our first session. This is how it went…


How did you prepare for your first judging session?

We first had a practise at judging in SMT. Carl and I uploaded some pieces of writing from the standards files to judge so that a few of us were familiar with the process. Next, we introduced CJ via a staff meeting. I found it helpful to play snippets of Daisy’s presentation from ResearchED2016 alongside our own thought-processes. Carl and I spent quite some time deliberating how best to assess the children. Eventually we decided:

  • The first assessment task would be based on narrative (we had originally chose a descriptive piece but decided against it)
  • Every child from years 1 to 6 would take part.
  • The whole school would write on the same day from the same stimulus (they had a choice of three images)
  • We clarified that no working walls were to be used; nor pooling of vocabulary or sharing ideas – basically, the children were given the stimulus and a piece of paper to plan from. We contemplated giving the children a planning format but decided against it – after all, they should be able to plan a story on their own, shouldn’t they?
  • We would only give the children two pages (or sides) to write on
  • Children were allowed as long as they wanted (within reason)
  • The QR code ‘answer sheets’ (lined paper for the pupils to write on) needed to be made larger for Year 1.
  • A handful of children were omitted from the assessments from Year 1 due to accessibility

Before we carried out our judging session we made sure all of the staff (including LSPs and apprentices) had emails and knew how to log on! This may sound daft, but I wanted everything to be as efficient as possible. Laptops were all set up and ready to go. Our staff meeting is an hour. I wanted staff to sit down, click a link and start judging right away. Which is pretty much what happened.

Note: I would recommend using iPads or tablets so as to use the zoom pinch feature.

How many scripts were judged by how many teachers, and how long did it take?

19 members of staff judged 66 scripts each. 176 pieces with 1195 scripts in total, in approximately 55 minutes with a reliability of 0.87.

What feedback did you get from the session? Did you feel that the judging itself was a useful experience?

Staff have been to many moderation sessions both internally and externally and we can honestly say this was the most pleasurable experience we have had. There was a real buzz in the room. Staff from opposite ends of the school were engaged in conversations about the children’s writing sharing snippets of quality, humour and down-right ridiculousness that comes from brilliant young writers. I will echo Daisy: using CJ to moderate achieves more in shorter timescale, with everyone agreeing, than traditional moderation ever did.

What have you learned from the data?

After the judging, Chris Weadon came to our school to help me answer some remaining questions I had about using CJ. Within five seconds of Chris opening up his laptop and showing me a graph of our results, I was ready to kiss him!

Below is a graph of result from our autumn assessments.


The dots are each individual pupil (note the Year 3 blank script that found its way into the scanner!). The white rectangle the 50% median and the black line the median.


So what does our data tell us?

Lots! Including:

  • There is a high performing Year 1 writer in the school who is actually writing at a similar standard to Year 5. This came a big surprise to us. We knew the pupil was a proficient writer, but not by such an amount.
  • Year 3 and 4 cohorts are generally writing at the same standard, as are Year 5 and 6 – consolidation anyone?!
  • There is a child in Year 6 who we hadn’t identified as being so ‘low’

We will not be using any of the data to bash staff over the head with!


How will you use the judging to help you improve teaching & learning?

Carl produced a feedback sheet for each staff member to complete after judging: what were good elements of writing? What were the main areas for development? This gave staff a focus and us an insight to exactly what different members of staff deemed to be signpost to good writing. Another question asked was, ‘When you came across two scripts that were of a similar standard, how did you finally make a decision?’ Further discussion is evidently needed.

Once we have collated the data, we will plan a whole school initiative around improving one or two key areas of development. For example, already it seems there is evidence to suggest that children throughout the school could use more lessons on narrative planning and plot structure as well as punctuation (when isn’t this the case?!).

Example scripts can be used in each year group classroom to share with pupils; make comparisons between structurally sound pieces of writing; and explore other points of interest such as looking at use of vocabulary or effective punctuation.


What will you do next?

We are planning to give children more opportunities to write freely. No success criteria, help with language etc. Of course, we will continue to explore texts to help improve writing and immerse children in high quality writing – much of which will be other children’s in the school, as opposed to within their own class.

A repeat of this assessment will be undertaken in the spring term. All children will write with special pens for their next assessment to redress the issue of not being able to read a small amount scripts clearly in Year 1. We had planned to use a non-fiction stimulus rather than a narrative but decided that we couldn’t make fair comparisons between the two. Chris suggested giving different children different forms and genres of writing. ‘If the pupils find different genres difficult then you may find that ‘progress’ is going down not due to the pupils’ performance but due to the difficulties of the tasks. If you mix them up you get a better overall picture but a less reliable picture at the individual level.’ Ideally, we will build up small writing portfolios for each child, across a range of genres, to make further judgements. Chris explained that we can anchor some autumn scripts to use as benchmarks for further assessments to begin looking at progress.

So is this really judgement day for writing assessment? It’s too early to say for sure but the outlook looks promising. Perhaps with this system we will finally reward the worthy.


Craig Westby is the Deputy Headteacher at Old Hill Primary School in Sandwell.