Education Forum
Advanced Search
  • About us
  • Education Resources
    • Student assistance
    • Private tertiary education
    • Tuition fees
    • Curriculum, assessment and qualifications
    • School choice and private schools
    • School
    • Teachers and teacher education
    • Tertiary
    • International
    • Training and skill development
    • Other
    • Early childhood
  • Publications
    • Book and Reports
    • Briefing Papers
    • Media Releases
    • OpEds
    • Presentations
    • Speeches
    • Submissions
  • Links
  • Contact Us

Both Sides Archive

New assessment system does not pass test

Emeritus Professor of Education Warwick B Elley, 06 February 2003

Has NCEA fulfilled its promise? Warwick Elley looks at the results and thinks not

The Minister of Education appears to be pleased with the first NCEA results. At last, he says, students can see how they performed in each part of their subjects. If only it were true! A cursory glance at the summary statistics is enough to show how arbitrary and basically incredible the whole system is.

First, it is clear that some standards are much more difficult to attain than others within the same subject. Of the 12,000 students who took the seven Geography standards, 90 per cent were able to achieve the required standard in "describing a geographic issue, and evaluating outcomes" (whatever that means), but only 34 per cent were able to "examine population patterns, processes and issues" (whatever that means). In Biology, 81 per cent of students achieved the standard in carrying out a practical investigation, but only 39 per cent were successful in "understanding how humans use and are affected by micro-organisms". Incredibly, it was the easy standard that was given four credits, while the hard one received only two. In History, 92 per cent of the 8000 candidates were able to "interpret historical sources" (with some unknown level of quality), but only 43 per cent reached the required standard in "describing the impact of a development in an historical setting..." Yet all the History standards were given the same weight in cumulating students' total credits. Clearly the hurdles are higher for some students than for others.

At the other end of the scale, there were tremendous variations in the percentage of students given Excellent grades for each standard. Many standards had no excellent grades allocated at all, while others were given to 50 per cent or more of the candidates. Is this because some of the standards do not belong in the Year 11 curriculum, or because the examiners' standards were wide of the mark? As one who has spent years studying examination results, I know which explanation I would accept. Examiners in most subjects, cannot tell in advance the difficulty of the questions they set.

To illustrate, I was intrigued by the finding that 89 per cent of our 40,000 English candidates can read and understand unfamiliar text, but only 68 per cent can do so for familiar text. Do we conclude that unfamiliar text is easier to read than familiar text - or that the examiners set harder questions for the familiar text? In Latin, the familiar texts were the easier ones.

I was most impressed by the fact that 94 per cent of the present generation of French students can "converse in French in a familiar context" at an acceptable level. The teaching of oral French must have improved a great deal since our day! Furthermore, it is extraordinary to learn that our foreign language students are better at giving a "prepared speech" in the language they are studying than native speakers can, in English. The pass rates in German, Chinese, Samoan, Indonesian and Cook Islands Maori were all over 95 per cent for this task. For English they were only 76 per cent. Do we have consistent standards? Of course not.

Some of the Mathematics results are also hard to explain. For instance, the worst result is found in the standard for "estimating and determining probabilities", which 49 per cent of students failed. Yet in the IEA surveys of 1994 and 1998, this topic was our best. It was tested by means of a broad range of searching questions, under examination conditions, and compared with students in 40 countries, and in both years, it was our best mathematics topic for Year 9 students. How could we have slipped so much? Because there is no rational framework in NCEA to provide a basis for interpreting the standards. It's all terribly arbitrary. There is much more. Nearly 44 per cent of all the standards attempted by the candidates in Biology, and 45 per cent in Technology were failed. The corresponding figures for Visual Arts, and Chemistry, for instance, were 18 per cent, and 16 per cent. In language subjects, the failure rates were low in German, Samoan and Cook Islands Maori (five per cent, eight per cent, and zero per cent), but much higher in Japanese, NZ Maori and Korean. (21 per cent, 24 per cent and 22 per cent). Can we have confidence in these figures?

It is not that standards-based assessment is bad in principle. It is just inappropriate in "high stakes" assessment where thousands of students are to be measured against a common, (vaguely expressed) standard. Good teachers have always tried to spell out the expected learning outcomes to their pupils, and tried to assess just those outcomes, both during, and after the unit has been taught. This is good pedagogy. Students know just where they stand, and teachers and learners can modify their behaviour as they go. But it doesn't work in "high stakes" examinations, across a nation. Some believe that the problems with the new qualification will go away, once it is better resourced, and teachers get used to it. However, in my view the problems are more fundamental. For those who believe we should continue down the NCEA track, I would like to put forward these persistent, basic concerns.

  1. The standards developed in each subject are nowhere near clear enough for teachers to assess their students against, or for employers to interpret in appointing employees. The real standard, if there is one, is hidden in the small print - the assessment specifications, or in the minds of the back-room moderators. To be useful, properly defined standards require a clearly specified activity and an expected level of attainment: can jump over 1.8 metres; can type 50 wpm. This problem is inherent in all knowledge-based subjects. It was there are at the outset, in 1991, and is not resolved 12 years later.

     
  2. The tasks set by teachers in each subject will rarely be of similar difficulty from school to school. . The Assessment Specifications do not, and cannot, give enough guidance to teachers to ensure that the questions set are of similar levels of challenge. Thus, for English 1.7, students have to "read and show understanding of an unfamiliar prose passage" and various other tasks. Much research has shown us that students' level of understanding varies according to the particular passage chosen, the particular questions asked, the kind of question format, the wording of the question, and the time allowed, to name a few. Change the wording of the question and you change the pass rate. These are not problems in traditional examinations, where the questions and conditions are the same for all, and examiners do not pretend to understand exactly what the students know.

     
  3. The cut-off points for judging student achievement levels are not determined by any rational or empirically-based procedures. For instance, some guidelines typically state that six out of 10 pieces of appropriate data earn a Credit, while eight will earn a Merit grade. Some standards require 100 per cent mastery before students are passed. These cutoff points are entirely arbitrary. Overseas research suggests that pre-testing of each task is required to produce a defensible system. But overworked teachers will never have such a luxury.

     
  4. An "Excellent" grade in one subject will not be in any way equivalent to an "Excellent" grade in another subject, or another year. In Bursary Examinations, an elaborate system ensures that the challenge involved in attaining an A grade is similar from subject to subject, and from year to year. Under NCEA, there is no way that assessments derived from pre-ordained cutoff points can control the number of Excellent grades. The 2002 results illustrate this problem clearly.

     
  5. The moderation procedures cannot ensure similar standards from school to school. Internally assessed work is set by teachers, according to a generic requirement, and thus varies a great deal in difficulty. It is judged by these same teachers, according to generic criteria, illustrated in (different) examples. Other teachers check the standards, on a sample of students' work, without full knowledge of the conditions under which it was produced - how much teacher input, time allowed, home help, peer discussion, and repeated efforts, and often with limited knowledge of the specific content. Surveys of teacher opinion show that they too do not believe the system works.

     
  6. Students who fail to achieve the standards will not be treated in similar fashion from school to school. The NCEA UPDATE for May 2002 gives a range of options for schools, from talking to the student after the test in order to get further evidence, to obtaining evidence from earlier work by the student, to repeating the same task again, to setting a new activity.. This level of variability makes moderation seem pointless.

     
  7. The use of four levels of achievement will not be sufficient for the needs of tertiary institutions. Teachers can make much finer distinction between students' levels of achievement - at least 20 in a typical class, in most subjects. Thus, much information is lost, information that could be used to make fair selection decisions for competitive entry courses. And converting the totals to a percentage mark at the end of the process does not restore the lost information. The students are the ones who suffer.

     
  8. The results for each standard will not be as reliable as the marks for a traditional examination. It takes about three hours in a traditional examination to produce a reliability index of 0.90. This typically means an error factor of plus or minus five per cent on a percentage scale. To produce six or eight outcome measures at this level of reliability adds up to enormous workloads for teachers. So we will have to be content with large error rates.

     
  9. The aggregation of weighted credits will often be unfair to students. Firstly, the weighting of credits depends on the spread (or standard deviation) of the students' grades, not on the weights assigned by NZQA. So, if most students in a class receive the same grade for a particular standard, as frequently happens, then no amount of weighting will influence the results. Secondly, the differences in performance levels within grades will be huge. A good student who just misses out on a Merit grade will be given the same result as a plodder, who just avoided a Not Achieved grade, after a couple of re-sits. So a 3 credit unit, weighted 2, will generate 3 x 2= 6 points for each. Meanwhile, another plodder, who just missed out, gets 0 x 2 = 0. In other words, small differences in performance will amount to big differences in totals. And big differences in performance will frequently be given the same mark.

     
  10. Employers will rarely be able to make better selection decisions. A few may, but only if the particular profile of skills they are seeking happens to match the profile that NZQA generated, and the job applicants were all assessed at about the same time, and they have not lapsed or gained on the assessed skills, and the results are credible, and interpretable. However, many employers, faced with a mass of reported standards, assessed in different years, in different ways, with different rules about re-sitting, will probably ask for a reliable, overall assessment of the applicants' ability to learn new skills in the area of mathematics, or technology, or English language, or whatever their job will entail. That is precisely what the traditional system gave them.

In summary, a standards-based system is unsuitable in a "highstakes" academic qualification. Nobody overseas has solved the problems listed above for knowledge-based subjects, when assessing for important qualifications. The visionaries who proposed this novel approach to national qualifications were no doubt well intentioned. Would that they had studied the pitfalls before they persuaded others that New Zealand was to be the pioneer in a new world of assessment! Our students deserve better than this.

  • Subtext
  • Hot Topic
  • Both Sides Archive
  • About Us
  • Who We Are
  • Education Resources
  • Publications
  • Links
  • Subtext
  • Hot Topics
  • Both Sides
  • Quick Facts
  • Events
  • Private Education
  • Ero Reports
  • News
  • Subscribe
  • Contact Us
  • Site Map
Twitter