I came across an interesting question recently: "If you understand 80% of the words in a piece of text, what percentage of the meaning of the text do you have?" On the face of it the answer is in the question, ie. 80%. But I suggest that it is not that simple. Let me give (actually pass on, this is not original!) an example.
The original of the following text (which I give in full at the end of this blog) is 61 words, so 20% is about 12 words. So here is the text with 12 words missing:
The ___ between Governor Gachagua and Nyeri Town MP Murugi over the ______ of a clinic into a _______ centre ______ yesterday, after ____ of the ______ stormed the _____ to kick out addicts. While the county government wants the health centre ______ into a ______ centre, Ms. Murugi has ______ this change and wants it to remain a ______ _____.
Would you agree with me that it is rather difficult to work out what is going on here? Which is to say, the loss of 20% of the words means a rather greater than 20% loss in meaning.
But what happens if I change the 20% of the words which are missing?
___ row between Governor Gachagua and Nyeri _____ MP Murugi over the conversion __ ____ clinic into a rehabilitation centre escalated yesterday after supporters of the lawmaker stormed the facility ____ kick out addicts. While the _____ government wants the health centre converted _____ a rehabilitation ______, Ms. Murugi ____ opposed _____ change and wants ____ to remain a maternity _____.
Still 20% of the words missing, but would you agree with me that the loss in meaning of the overall text is negligible? So, coming back to the original question, if you don't know 20% of the words in a piece of text, your understanding of that text may be anything from 0 to 100%, depending on your prior knowledge and what words are missing.
I was quite struck by this example because it connects up with a number of other bits and pieces from over the years. Back in 2007 (I think) I went to an international mathematics education conference in Cyprus, it was understood that the language of the conference was English but many of the participants were not first language English speakers. It rapidly became apparent that, in that particular environment, the best way to be understood was to use as many technical words as possible. This is partly because these are words which participants are likely to know, also the rule seems to be that the longer and more technical a word is, the more likely it is to be similar in other languages, certainly other European lanugages.
Similarly, back in 1999 I had the opportunity, as a funded PhD student, to go to Peru to give a lecture on "Underachievement in mathematics in the UK, with comparative data from the USA and Peru" having been on a church trip a few months before. When it came to preparing for the Peruvian part of this, I discovered that the major Latin American education collection in the UK is housed in the Bodleian Library, Oxford. Much of what I read there was in Spanish yet, knowing just a very small amount of the language and armed with a Spanish - English dictionary, I was able to make sense of what I was reading relatively easily. Again, the longer technical words were fairly clear - I didn't need the dictionary to work out what pedagogica means. When I did use the dictionary to look up sin embargo I was slightly non-plussed to discover this means 'but'.
(Parenthetical thought: that visit to the Bodleian Library was all quite surreal. I phoned before going to be told that, if I produced my Institute of Education, University of London library card, that would be enough to get in, without eg. a letter from my supervisor. When I arrived and tried to get access, I was met by some of the nicest people on the planet who were implementing a bureaucratic system of a level of complexity - and arbitrariness - that Stalin himself would have been proud of. Notwithstanding the phone call there was a lot of tutting about my lack of paperwork beyond my library card. After this had gone on for an ice age or two, I was then required to read out a declaration promising to abide by all the rules of the library, with a particular pledge 'not to kindle a fire therein'. I mean, there's lots of things one may not do in libraries, including holding wild parties, which are covered in the catch-all 'obey all rules of the library', so why single out kindling a fire? I can only imagine some 16th century student either maliciously committed arson or had an accident with his (would have been 'he' in those days) candle. But really.... Somehow I managed to get through the statement with a straight face, but I'm not quite sure how.
Then into the library to discover what should, in retrospect, have been obvious, that the books I was wanting were not on the shelves but in deep storage, so it would take some time for them to be retrieved. By the time this happened it was nearly time to finish for the day. Although I would say that, when I returned two days later, I was able to get straight in with my temporary reader's card, the books were waiting for me, so in a matter of minutes from entering the building was settling down to my task. Credit where credit's due! But ... 'kindle a fire therein'. Really!
While I'm still in parenthetical mode I would say that the lecture in Peru is a story all of its own, I'll not tell it here but suffice to say it is the only occasion in my life so far where I've found myself thinking, "This is a dream, I'll wake up in a moment" only to realise that, no, I wasn't dreaming, the events around me really were happening. If you're interested, let me know and I'll either tell you privately or include it in a future blog post. End of parenthetical thought).
So, if you know the underlying themes and recognise the key words, you can extract a fair bit of the meaning without knowing all the small words. Of course, for children learning to read, not least in a language other than their mother tongue, it is very likely to be the longer, more technical words that they don't know, making the reading much more of a grind than maybe is expected. And it is the nature of children's acquisition of language that they can often speak with a high level of fluency and a good accent but with quite a small vocabulary base. So adults around them - particularly teachers - can assume that, because their conversational language is good, their academic language is also good, which may not at all be the case. As a PhD student I came across the idea that youngsters coming to the UK at the age of, let's say, 10, with no English on arrival, they typically achieve conversational fluency in 2 years and academic fluency in 7, leaving a substantial period where they may be understanding considerably less than may be assumed.
Not quite sure where I'm heading with this, as regular readers will know, language is of considerable interest to me, currently in a context where many people are working and studying in their second, third or fourth language, but originating from a context where everything is done in just the one. It does reinforce my rule number two of living out here: assume nothing, if you've something important to communicate, check, double check and triple check that the message has got through. Rule number one? Don't drink the tap water.
Picking up on the point that percentages do not always work out how you might reasonably expect them to, can I give my favourite example of this? Suppose a medical condition effects 1 in 10 000 people, and you have a test for this condition which is 99% accurate. If a person tests positive for the condition, what is the probability that this person actually has it?
As with the opening question, it would appear at first sight that the answer is in the question, ie. 99%. But this is not correct or anywhere close, let me try to explain why.
Start with a population of 1 000 000 people. 1 in 10 000 have the condition, which means 100 people do and 999 900 don't. Of the 100 people who do, 99 will test positive and 1 negative. Of the 999 900 who don't have the condition, 99% will test negative but 1%, so 9 999 people, will falsely test positive.
So, a total of 99 + 9 999 = 10 098 test positive, of whom 99 actually have the condition. This means that the probability of somebody who tests positive actually having the condition is 99/10 098, which is just less than 1%.
Which on the face of it this is totally counter-intuitive, the point here is that the genuine positives are swamped by a much larger number of false positives. I know virtually nothing about medical statistics but am told by people who do that there is need a problem here in some situations. Mathematically simliar scenarios can be constructed which calls into question the value of eye witness testimony, if interested I can dig out an example on request.
Aware I'm in danger of starting to ramble - if I haven't started already - so I'll stop here for this week. Thank you for reading, more coming soon!
Text used at the beginning of this post in full:
The row between Governor Gachagua and Nyeri Town MP Murugi over the conversion of a clinic into a rehabilitation centre escalated yesterday after supporters of the lawmaker stormed the facility to kick out addicts. While the county government wants the health centre converted into a rehabilitation centre, Ms. Murugi has opposed this change and wants it to remain a maternity ward.