This is a summary of the second half of an online video seminar entitled “Assessment and Highly Effective Intervention in Light of Advances in Understanding Word-Level Reading” (the first half I recently summarised here), which I hope encourages you to watch the whole thing.
It’s by Dr David Kilpatrick, was recorded at the 2017 Reading in the Rockies conference, and is on the Colorado Dept of Education website. Thanks so much to all those involved in putting this great information in the public domain.
The talk’s summary and conclusions are a good place to start:
- We have not been working from a scientifically-established understanding about how words are learned.
- Our intervention approaches have been around for decades, but are not informed by word-learning research.
- The “culprit” in poor word-level reading is phonology.
- Skilled readers have letter-sound proficiency and phonemic proficiency, weak readers do not.
- Interventions that address these skill deficits have the best results, by far.
The better we understand the word-learning research, the better-equipped we’ll be to ensure all kids successfully learn to read. So here’s the video, and I’ll summarise what I see as key points below. Bracketed numbers are times on the video clock each point is discussed, to help you go back later and listen to the bits that interest you most, or play a section to someone else.
Does your school have a formal phonemic awareness program for all young children?
Do you work in a school with a formal phonological awareness program for five and six-year-olds?
Few schools in the US have these, but a Tier 1 phonological awareness programs is a key part of the Response To Intervention (RTI) approach recommended by the National Reading Panel.
If a school doesn’t have such a program, it’s not really doing RTI.
(6.20) Young children can be taught to say individual phonemes, because it’s a concrete task. However, many adults teach consonant phonemes incorrectly, by adding a little “schwa” vowel to them, and this interferes with blending.
Stop sounds (p, b, t, d, k, g) are the hardest because they don’t really have much of a sound themselves, they gain their reality from the position your mouth is in when these sounds interrupt vowels.
Letter names or sounds first?
(11:50) The research on this question is evenly split, so it’s not possible to say which should come first. The sounds map onto the pronunciation of words. The names are useful in instruction.
As well as teaching sounds that relate to each individual letter, we should teach digraphs (two letters, one sound) as a continuation of the alphabet, particularly the ones that represent sounds different from those usually represented by one of their component letters (e.g. ch, sh, th, ph), because they function like separate letters.
Digraphs should not be confused with blends, where you do hear the individual sounds.
Multisensory learning is good for letters, but not words
(15:10) Working in a multi-sensory way is great for paired associate learning, where you’re teaching arbitrary things like sound-letter relationships. This kind of learning is difficult, and easily forgotten if it’s not constantly reinforced, so drawing letters in sand or shaving cream or whatever while saying the sounds/names can help to cement these links in.
(17:41) Distributed practice (spread through the day) helps kids remember sound-letter relationships better than just practising once a day. Flashcards and random pointing to letter charts or friezes can be great for this too, if used well.
We do not need multisensory learning for words, because we learn words via orthographic memory. Multisensory learning of words is not efficient or evidence-based.
Embedded picture mnemonics
(18:40) About a dozen studies over three decades consistently show that putting a picture with a letter (e.g. an apple with the letter “a”) doesn’t really help kids learn sound-letter relationships.
However, embedding relevant pictures inside letters (e.g. having a snake in the shape of an “s” behind the letter “s”) does help kids learn the sounds.
Levels of phonological awareness
(19:55) There are three very broad levels of phonological awareness: syllable, onset-rime and phoneme. However, not all tasks at each level are of equal difficulty, and not all are equally efficient or effective.
Manipulation tasks seem to work best, in that they correlate most highly with reading.
Children in preschool or the first year of schooling usually can’t do phoneme manipulation tasks, so should stick to manipulation of syllables or onsets/rimes.
Our goal is to develop phonemic proficiency in kids, so that they can instantly, unconsciously segment any word that’s thrown at them into its component sounds.
Many ineffective programs are sold as “research-based”
(24:37) Most programs marketed as “research-based” have never been subjected to research.
More remarkably, at least four well-known, widely-used programs have been researched and found not to work, yet are still marketed as “research-based”. Their marketing material even includes links and/or references to the research showing that their programs are not effective.
This marketing relies on most people not actually having the time or skills to read and understand the research. They’re persuaded programs are good by synopses, anecdotes and links to the research, because nobody believes they’d include these links if the research showed that the program doesn’t work.
Do raw scores, statistical significance and Effect Sizes help us find the best programs?
(26:48) Children whose raw scores (the number of items they get correct) on tests go up can still be falling further and further behind, because their classmates are making bigger improvements than they are. Improved test raw scores thus tell us that children are learning stuff, but don’t tell us whether they’re catching up.
(27:40) The only thing statistical significance tells us in research is that the difference between the control and experimental group is not likely due to chance. Many studies show statistically significant differences between groups, but the differences are actually very small e.g. three Standard Score points (where 100 is average and 85-115 is the average range). Nobody will notice much reading improvement in a child whose reading score was 80 and now it’s 83.
(29:26) Effect Sizes (where +0.4 is considered worth noticing and +1.0 is considered highly effective) are also unhelpful in working out which programs are most effective, because they only tell you the magnitude of the difference between the control and experimental groups.
There can be a large effect size between a control group and an experimental group doing a mediocre or ineffective intervention. One study included in the National Reading Panel had an effect size of 0.49, but the experimental group kids had zero improvement on their test Standard Scores. The reason was that the control group’s Standard Scores went down, so the effect size made it look like an effective intervention, but in fact the kids in the intervention group made no progress compared to kids across the country.
Control groups in reading research typically aren’t actually under experimenter control (we’re supposed to call them comparison groups nowadays), so if the school has a good program running in the comparison group, the effect size between it and an experimental group doing a highly effective intervention can be fairly small. This happened in a Torgeson et al study reported in the 2010 Annals of Dyslexia, where the intervention group made 22-point standard score gains, but the control group made 14-point gains, so the effect size was only 0.53, similar to the program with no improvements at all in Standard Scores.
(38) Another 2017 study in the Journal of Learning Disabilities by researchers from MIT and Harvard had an effect size of 0.96, so seemed to be amazingly effective, but the kids only made about half a Standard Score point gain. But it was a summer tutoring program for dyslexic kids, so the comparison group’s Standard Scores went down, because they had no intervention over summer.
Standard Scores CAN help us find the best programs
(39:34) The National Reading Panel, What Works Clearinghouse etc all conclude lots of programs are effective based on their Effect Sizes, but when you look at their Standard Scores, many really aren’t very effective. Studies of Reading Recovery which do report Standard Score points show that kids only gain about three Standard Score points.
(41:28) However, a number of key researchers have suggested doing a review of research based on Standard Scores. It would be worth having a fresh look at the 75% of studies on the What Works Clearinghouse which do report Standard Scores, to see which ones actually do show effective interventions.
(42:39) People promoting ineffective programs always say “you can’t use Standard Scores because they don’t show smaller improvements”. Right. We want BIG improvements, and we’ve got plenty of studies to show big improvements using Standard score points. And the exciting thing is that these correspond very directly with the word reading research of Ehri (Orthographic Mapping) and Share (the Self-Teaching hypothesis), even though this research is completely independent.
While effect sizes are based on a moving target, the five or so major untimed, standardised word identification tests do offer us a stable reference point. The results of these tests all correlate very strongly, showing that they are giving us a good picture of how well kids can read words (unlike reading comprehension tests, which are highly inconsistent).
Research showing fast, amazing Tier 2 progress
(43:50) Vellutino et al (1996) conducted reading intervention research in a middle-class school district of 1400 kids. They took the bottom 15% of kids and gave them 15 weeks of intensive, one-to-one intervention focussed on phonemic awareness and phonics in first grade. A few of the weakest kids also did an extra 8-10 weeks at the start of second grade.
They didn’t realise then that they didn’t need to do one-to-one. There is plenty of research to show that working one-to-three is just as effective, as long as the kids have similar needs.
At the end of the Vellutino et al intervention, only 1.5% of children were below average (under the 16th percentile) on word reading.
Only 3% were below the 30th percentile.
These results were maintained three years later.
Research showing fast, amazing Tier 3 progress
(45) Torgeson et al worked with the bottom two percent of third to fifth graders, the most severely reading-disabled kids. They made one Standard Deviation of gain (15 Standard Score points), and two years later it went up to 18 points. The study was over and their skills continued to grow.
The intervention was 8 weeks of 50 minutes of one-to-one intervention, twice a day, so the kids had a total of 60 hours of intervention. There are 17 other studies that have produced a Standard Deviation of gain with older kids, and only about three of them went for more than six months. Big improvements are possible in a short time with the right intensive intervention.
It’s not possible to say this conclusively, but if you look at these studies through the lens of the word learning research, you’d conclude that they made these kids good orthographic mappers.
Bureaucracy has derailed RTI – it should be all about instructional content
(51) The US government saw the results of studies like these and said “we need to ramp this up across the country”, and that’s how Response To Intervention (RTI) got started.
However, the content, i.e. the instruction which produced the gains, somehow got lost in the bureaucratic process of implementing RTI.
Three levels of intervention effectiveness
(56) There have been about two dozen reviews of the intervention research since 1999, and if you go back through them looking through the lens of Standard Score gains, you find that what we can control – the instruction – is what makes the most difference to outcomes.
Studies can be divided up into three groups:
- Minimal results: 0-5 Standard Score points (mostly 2-4 points)
- Moderate results: 6-9 Standard Score points (mostly 6-7 points)
- Highly successful: 12.5-25 Standard Score points (mostly 14-17 points)
(57:18) The programs used in all three groups included reading practice. Most also did phonics, and importantly not a single intervention approach that didn’t use phonics made it out of the Minimal Results category. So any time you hear someone promoting a program that doesn’t use phonics, just reply that it will only get minimal results, and chances are they won’t be maintained. Phonics is necessary but not sufficient in intervention.
The Moderate Results group trained phonological segmentation and blending, so they built basic phonemic awareness, and took kids up to the second level of phonemic awareness (see first graphic in previous blog post about these seminars).
The highly successful group aggressively developed advanced phonemic awareness skills, as well as doing systematic phonics and reading connected text.
(58:50) This finding closely aligns with Ehri’s and Share’s theories. They’re saying we need phonemic proficiency and letter-sound proficiency, which we use when reading to anchor new words in long-term memory and build our sight vocabulary.
If kids are struggling with phonemic skills and we don’t work on phonemic awareness, we don’t get good results.
If we do enough phonemic awareness for them to get good at sounding out words, but they don’t remember the words, they become good at sounding out words, but they don’t remember the words.
If we go after and develop the phonemic skills, and we develop the letter-sound skills, both to automaticity, we get great results.
What about kids with low IQs?
(59:34) There was one study in the “highly successful” group with only a Standard Score gain of 10.
It was included in that group because it was a special group. The average IQ of the children studied was 85 (where 100 is average and 85-115 is the average range).
Even the kids with lower IQs doing highly successful programs outperformed all the other approaches.
Programs with Minimal Results
(1:00:43) The following interventions have been studied in the empirical reading literature and have been shown to yield 2 to 4 Standard Score improvements:
|Repeated Readings||Read 180||Reading Recovery|
|Fast ForWord||Read Naturally||Failure Free Reading|
|Seeing Stars||Great Leaps|
Most of these studies have “statistically significant” results, so they can all call themselves “research-based”, but students almost never catch up with these approaches.
Likewise, “gold standard” phonics approaches like Wilson, Orton-Gillingham and DISTAR/Reading Mastery can yield huge improvements in word attack (15-25 Standard Score points) but only modest improvements in general word identification (3-5 Standard Score points). They do not develop phonological proficiency, which is needed for orthographic mapping/sight word development.
Kids with phonological-core deficits only develop as much phonemic awareness/proficiency as we teach them.
A lot of people are working on reading comprehension with kids whose language comprehension is fine, but they have word reading problems, which mean they fail reading comprehension tests. These programs are (understandably) minimally effective.
Examples of successful programs
(1:03:00) Highly successful programs all include three key elements:
- Aggressive training of phonological awareness to the advanced level (beyond blending and segmenting, using tasks like phoneme substitution, deletion and manipulation).
- Teaching and/or reinforcement of sound-letter knowledge and skills (phonics)
- Extensive opportunities to read connected text.
Here’s Kilpatrick’s slide giving some example highly successful programs:
(Alison’s aside: These are North American programs, but there are a number of programs from elsewhere which include the three key elements, the one I am most familiar with is Sounds~Write).
Is fluency its own separate Thing?
(1:07:45) The elusive key to fluency is the size of your sight vocabulary (how many words you can read without conscious effort). Intonation matters a bit, so there’s a bit more to it than that, but list reading and paragraph reading correlate very highly, so the lion’s share of your fluency is determined by your sight vocabulary.
If fluency was a separate reading-related skill, it would not be affected by how easy or hard a text is. But kids who read texts that are too hard for them become dysfluent, and kids reading texts that are too easy for them become fluent. So fluency is not its own separate Thing, even though the National Reading Panel got everyone talking about the Big Five.
You want to know how to work on fluency? Turn kids into good orthographic mappers.
Fluency tasks are, however, great screening tools, for example the Test of Silent Word Reading Fluency can be administered in a group and takes 3 minutes. It involves words all run together, and kids have to draw lines between the words. It’s great for identifying kids with all kinds of problems for further assessment. Or you can just make up your own test like this, run it on the whole school, and you’ve got your own local norms.
The culprit in poor reading is phonology. That should be obvious. Our language is based on writing phonemes.
It’s exciting that the word learning research corresponds so closely with the intervention research. The most effective interventions develop phonemic proficiency (not just awareness) and sound-letter proficiency, and include plenty of reading connected text.
Question: Does group size matter?
Torgeson says that one-to-four is the magic number, but kids need to be well-matched for skills, because if you have two kids at two different levels you’re doing two different programs. Some programs regroup kids every two weeks to ensure kids are closely matched. You of course also don’t get the same results with four kids on the All-Ritalin team in a group.
Keeping the intensity of intervention up in groups can be a challenge, kids need to be responding all the time to get the same kinds of results as in the research. You need a lot of short tasks so you can keep moving from one to the next. Another thing that works is choral-solo-choral-solo reading, making sure that you don’t just go round the group, you choose the next student randomly, so nobody knows who’s next and they all have to pay attention.
Question: What about English Language Learners?
They fit into the Simple View of Reading like everyone else. If a kid can read in their first language, that predicts their ability to read in their second. Phonemic awareness carries over from one language to another.
Question: Minimum time per day?
Some effective programs do 35 hours, some do 100. What matters is what you do instructionally, using the time well.
For early years prevention you need a good Tier 1 and then smaller Tier 2 groups to turn out a better product, groups of three or four children.
Question: Should children be encouraged to guess words from context? (I think)
Think about what is involved in orthographic mapping. Mapping is making a connection between the pronunciation and that spelling pattern. Children who guess words from context and move on miss that opportunity.
Question: Once kids are good at orthographic mapping, will spelling skills follow?
We don’t really know, normally the spelling doesn’t just take off. The problem is that every word gets added to the sight vocabulary one word at a time.
Learning words takes time, and spelling requires more precise orthographic level of processing than reading. So you won’t see a magical improvement in spelling, it will take time.
Question: Is there a good assessment of word reading efficiency for older kids?
No, the Test of Word Reading Efficiency is OK for middle school, but isn’t particularly good for high school students, older kids don’t get to the more challenging words. The Kaufman Test of Educational Achievement also has some timed items but seems to have much the same problem, only the most severe kids show up.