The g Factor   General Intelligence and its Implications   Christopher BRAND
PreviousContentsPrefaceIntroductionChap. IChap. IIChap. IIIChap. IVEpilogueNext

 

 

I - UNITY IN DIVERSITY

 

I : Unity in Diversity

  • The quest for scientific psychology circa 1900; Watson, behaviourism and anti-mentalism.
  • From Binet to Wechsler and Cattell: the early history of mental testing.
  • Why do mental tests and test-items inter-correlate?
  • Six ways of avoiding talk of general intelligence (g).
  • Jensen's six defences of the validity and fairness of tests of g.
  • There must be more than g?
  • Five special, non-g -related, differential ability dimensions (viz. the 'Big Five').

OUTLINE

Is it reasonable to impressed by someone who knows the dates of the battles of Flodden, Leipzig and Stalingrad? This person may just have specialized in history, or seen a TV programme about 'great battles that changed the world'. If not, the best guess might still be that the person merely has an unusually good memory for figures. To be sure of sampling a person's general knowledge requires care, a lot of items and only those errors of measurement that happily cancel each other out.

This chapter outlines the discovery of easily-administered questions, correct answers to which by children and adults are indeed widely predictive. The intellectual background and the early achievements of the 'founding fathers' of IQ-testing are indicated; the case for talking of a general factor among mental abilities is critically discussed; and five more specialized (yet still very broad) differences between people in abilities and strategies are suggested as supplements to the general factor.

Do relations between widely varying test items require and license talk of a unitary dimension of 'general intelligence' (g )? (g is the usual symbol for those psychological differences that are apparently reflected in tests of Mental Age and IQ.) Why do people who do well at one mental test tend also to do well at many others? The existence of real individual differences along a g dimension provides one hypothesis; and, if the mental tests that have the highest and widest links are any guide, the g factor would seem especially involved in the detection of meaning and of the regular patterns made by symbols (i.e. words, numbers and shapes). However, other ways of explaining relations between mental measures may seem just as likely. Influences of educational, motivational, socio-economic, racial and 'labelling' differences are particularly examined. Perhaps it is merely these external, local and artificial differences that bring about the positive relations that are found in the general population between many mental tests? Perhaps the unfair advantages enjoyed by some testees can do all the explanatory work without there being any further reality to the hypothetical g factor?

Whatever the linkages that give rise to talk of IQ and g , there are other ability variations that occur independently - even if they account for much less of the overall variation between people in mental abilities. Using modern personality theory to supplement today's data, five dimensions of contrast are suggested in personal strategies of information-handling and problem-solving.


    The beginning of the twentieth century was a momentous period for psychology. In Britain, Germany and the USA, there was talk of a 'new', 'scientific' psychology. The laboratory of Wilhelm Wundt (1832-1920) in Leipzig had been operational for twenty years. A version of Wundt's stress on experimentation had spread to the United States: there, Edward Titchener (1867-1927) and students practised introspective reporting of consciously experienced sensations under precise conditions. Yet at the same time there were many strains. In Vienna, Franz Brentano (1838-1917) and his School stressed the unique character of thought and action as always being about something: unlike tables and chairs, which are not 'about' anything at all, thoughts and actions are inherently 'intensional' and do not just exist in mere quantity or quality. (Though sensations and reflexes might be adequately described in such terms, not to know what is the object of a thought or the purpose of an action is to miss something that is essential.) Also in Vienna, Sigmund Freud (1856-1939) was coming to emphasize the importance of unconscious mental life. If Brentano and Freud were right, even the most rigorous introspection in the laboratory would miss the real business of the human mind - and of the human heart and soul. So whose would be the 'new' psychology?

    In the University of Chicago, John Broadus Watson (1878-1958) was about to become the youngest-ever PhD graduate of his University's combined department of psychology and philosophy. Watson would reject the uncertainties of studying experiences, thoughts or mental entities of any kind, and set psychology to study and account for behaviour. The spirited son of an alcoholic South Carolina father (though named John Broadus by his mother after a Baptist preacher of that name), Watson had grown up in the outback and was used to animals and their training. Having a juvenile arrest-record and a secret first marriage (to frustrate the opposition of his girlfriend's brother), the experimental psychology of his day seemed dull. It required Chicago students to sit quietly reporting the subjective duration and intensity of tone signals and light-patches. Watson was to write: "I hated to serve as a subject. I didn't like the stuffy, artificial instructions given to subjects. I was always uncomfortable and acted unnaturally." Unsurprisingly, Watson's subsequent research for his PhD involved the more objective material of physiological psychology. (His thesis examined the growth of myelin sheaths around the nerves of young animals: as it takes place, myelination makes for developmental advances in perception and motor skills - and perhaps much else.) Yet Watson was not pointing psychology towards physiology just for a glimpse of its own underpinnings: rather, he would go on to claim that psychology need be literally nothing but physiology. Watson supposed that thoughts, emotions and the rest of experience and behaviour would soon be revealed as nothing more than reflex arcs. Once appointed to Johns Hopkins University, Watson resolved to take psychology by the horns. In 1914, his first book, Behavior: an Introduction to Comparative Psychology outlined his radical plan for psychology which would lead to experiments on behaviour - though mainly on the behaviour of animals.

    Despite much academic criticism - and even astonishment - Watson's behaviourist proposals would long guide the study and state-funded practice of psychology. Watson's idea was that human thinking, feeling and motivation, and all abstract mental entities such as intelligence and personality traits might be conveniently 'reduced to' (or understood in terms of) chains and constellations of observable reflexes and 'habits'. Tirelessly anticipating such discoveries and thus the prospect of behavioural control, Watson's followers would maintain in principle that anyone could be trained ('conditioned') to do anything whatsoever. In particular, behaviourists would decline to believe people could be 'retarded', by any vague impediments in 'intelligence' or other faculties. Such undiscriminating and unhelpful accounts of human handicaps would be best left to philosophers and lay people, and abjured by the true science of psychology. As Watson (1924) put it (1):

      "There are inheritable differences in structure, but we no longer believe in inherited capacities, talent, temperament, mental constitution and characteristics. Give me a dozen healthy infants, and my own world to bring them up in, and I'll guarantee to train any one of them to become any type of specialist I might select - doctor, lawyer, artist, merchant chief, and even beggar-man or thief."

    Once he was driven out of academic life (by his brother-in-law reporting him for an affair with his star student, Rosalie Rayner) Watson (once married to Rosalie) would write a popular baby-book that counselled fashionably against the 'spoiling' of children before turning his skills to the marketing of cigarettes and toothpaste. Though himself no steady devotee of the study of the rat, his enthusiasm for conditioning proved infectious - especially in departments of psychology in American universities by 1940, and later in Britain. In experimental psychology, behaviourism would long provide the main academic challenge to the existence of human faculties - and to the reality of intelligence.


    While Watson hatched behaviourism in Chicago, in Paris the career of the fifty-year-old Alfred Binet (1857-1911), Director of Physiological Psychology at the Sorbonne, and writer of popular melodramas, was reaching its culmination. Whereas Watson had grown restive at the slow progress of psychologists and philosophers with the complexities of human consciousness and mentality, Binet's discontent was with the medical profession and its tendency to dogmatism. Binet's father, himself a doctor, had once frightened his five-year-old son by showing him a corpse; and the sensitive Binet eventually dropped out of medical school to study a human science that could in those days be studied in libraries - psychology. Later, Binet's career in psychology - involving work on hysteria and hypnosis - had brought him into contact with startling psychiatric symptomatology and with the imposing medical and neurological personages of his day. Taking the physicians seriously as they demonstrated 'hysterical dissociation' by switching patients' paralyses from one arm to another, Binet underestimated for a while the sheer conformity and obedience of the patients in the great doctors' hospitals. Sharp and public academic criticism of his own gullibility was the painful result (Fancher, 1985)(2). Later, once a young doctor, Théodore Simon (1873-1961) had arranged his first access to the mentally retarded, Binet had discovered that the medical 'diagnoses' of idiocy, imbecility and cretinism in children were also less authoritative than they sounded: two different doctors could easily fail to agree a categorization, and children's diagnoses might be changed alarmingly from one certificate of mental deficiency to another, "as if they had been drawn by chance out of a sack" (Binet & Simon, 1905). Thus Binet, though no slouch at physiological psychology, arrived at an ambition distinct from Watson's, yet one that would have a similar type of appeal for twentieth-century psychologists. Just as Watson hoped to liberate psychology from philosophy, introspection and all mentalism, so Binet hoped to liberate it from the presumptions of medicine.

    In fact, the doctors of Binet's day were doing their best to assist with the problems that had arisen for all European schools from nation states making school attendance compulsory. Before North American success at growing wheat led to the collapse of European agricultural prices and to a surge in urbanization (Stone, 1988?), children could simply be left in illiteracy with their peasant parents - for they would still prove employable and marriageable in the countryside. However, this relaxed attitude could not continue after 1870 as agriculture declined and urban squalor increased; so legal compulsion forced handicapped, retarded, hyperactive and unwilling children alike into schools that did not see themselves as providing mere playgrounds or child-minding services. Soon teachers sought legal ways of dealing with those children they judged ineducable by normal methods. Especially, they appealed to the fast-rising authority of medicine(3).

    Yet there was something strange about what teachers were doing. Why, asked Binet, should a child's need for special or slower-paced schooling be judged only 'pedagogically' (by the child's school results) or medically (by physical signs, symptoms, stigmata, and the skull readings of the phrenologists)? After all, school failure could have causes of various, quite different kinds. Some children might do poorly at school because of lack of home encouragement or because of a poor relationship with a teacher - common enough in days when the use of physical punishment was widespread; and plenty of children with physical and neurological problems are actually educable in a largely normal way given a little patience, suitable remedial opportunities and firm suppression of bullying. At most some, not all school failure need be attributed to lack of ability; and Binet (1900) had shown that "cephalic measures" were unreliable and made rather little distinction even between schoolchildren judged to be at the extremes of ability. Thus, in 1904, Binet found himself commissioned by the French minister of public education to answer the question that he himself had posed: how to measure a child's capacity for learning - to common sense, its intelligence - by a direct psychological method, rather than indirectly via attainments or neurological problems.

    How should Binet begin? In London, the Victorian gentleman-scientist, Sir Francis Galton (1822-1911) (the half-cousin of Charles Darwin, co-founder of the journal Nature, discoverer of the anticyclone, inventor of the modern weather map, explorer of what is now Namibia and eventual founder of the Eugenics Laboratory(4) in London University) had tried. He had hoped that quickness of reaction time and acuity of sensory discrimination might relate to teachers' estimates of children's intelligence. As a result of his passion for numbers (a top British psychiatrist once called it an obsession), Galton (1886) had come up with a calculus for quantifying degree of 'correlation' (i.e. strength of association) between variables. Soon developed by his student and colleague, Karl Pearson (1857-1936), the statistic r involved multiplying individuals' deviations on any two variables from their group's mean scores on them. An individual with high deviations from average on both variables would thus contribute highly to r ; by contrast, an individual with average scores on both variables would contibute nothing. The r statistic thus expressed the strength of association between two variables on a scale from +1.00 (perfect positive correlation), through zero (no correlation), to -1.00 (perfect negative correlation)(5).

    The r correlation coefficient provides a measure of strength of the relation between two variables: this allows psychology to go beyond merely talking about significant relations which often arise just because a large number of subjects has been studied or because (e.g. with animals) a big experimental manipulation has been contrived. Concern with significance rather than with strength of relation was to prove all too congenial to the many twentieth century psychologists who themselves found no strong effects to study - except by gross and technical laboratory manipulations of no relevance to questions about human personality. Such psychologists preferred to draw a veil over the issue and to complain that natural correlations between variables 'did not reveal causation'. However, quantitative assessment of strength of relationship is integral to any science that has moved beyond the nursery; and it was not the followers of Galton who would unfailingly interpret correlations between X-at-Time-1 (say, parental handling of child) and Y-at-Time-2 (say, adolescent delinquency) as showing causation by X - rather than looking for wider social or biological, 'Z' factors that might have been causing variations in both X and Y (see Meehl, 1990 and Cohen, 1994.)

    Just as futuristically, Galton had set up a laboratory (in South Kensington) to record anthropometric data, reaction times and high-frequency auditory sensitivity in members of the general public - who paid for the privilege. However, though professional people had somewhat faster reaction times than unskilled workers (see Johnson et al. , 1985), Galton had used no problem-solving measures and found no promise in his three measures of sensory and motor ability - indeed, he seems not to have troubled to analyze his copious data. There were discouraging problems of unreliability and of only moderate similarities between siblings - despite siblings sharing family environment and fifty per cent of those genetic variations that occur at all commonly between people. More seriously, Galton's middle-class adult volunteers - enthusiasts for science as they were - did not vary especially widely: they would have shown only rather low correlations amongst their scores even if Galton's procedures had actually been effective in tapping intelligence differences. In contrast with Galton, Hermann Ebbinghaus (1850-1909), working in Würzburg and Breslau, had actually begun to use promising tests of 'closure' (whether the testee could fill a gap made by the tester in a simple word or sentence); but, since Ebbinghaus was not studying the big developmental differences of childhood and lacked Galton's technique of calculating the strength of correlation between variables, he did not realize that he had an early measure of intelligence in his hands.

    Fortunately for his own assignment, Binet did not stick to searching for any single or 'theoretically basic' measure of intelligence. Neither was he pre-occupied with any one definition of intelligence. From his study of his two growing daughters - one a budding young scientist, the other more artistically inclined - Binet was impressed that good intelligence could easily take different forms. Instead of locking intelligence prematurely to 'understanding', 'judgement', 'reasoning', 'learning', 'memory', 'speed', 'perception', 'concentration' or 'imagination', Binet realized that there was only one general point about human intelligence on which there was wide agreement. Whatever it might be said to be and however it might be measured, intelligence is usually thought to increase through childhood (at least until mid-adolescence) without that increase requiring any unusual sensory acuity or any special education or training. Binet's otherwise comprehensive search was guided by these two simple constraints. (For detail of Binet's ten-year programme, see Matarazzo, 1992.)

    Binet began to question children to see whether they could name simple colours, unwrap and eat a sweet, pick out the longer of two lines (3cm., 4cm.), remember shopping lists, arrange weights in order (3, 6, 9, 12 and 15 grams), make rough copies of a line-drawn square, diamond and cylinder (see Figure 1), or construct sentences containing given words (e.g. 'Paris', 'fortune' and 'river').

     

     

        Figure I.1 : In the Figure Copying Test, a child is shown one simple figure on each trial and then asked to draw it from memory. (The child can be shown the figure again - though copying will then start afresh. Detailed accuracy and neatness of the child's copy are unimportant.). The square can be copied by the average white 5-year-old child, the diamond by age 8, and the cylinder by age 10. Jensen (1980, pp.662-665) finds that this ability correlates very highly with many others in childhood. The problem is not perceptual or manual - since an 8-year-old who fails at copying the diamond will have been quite able to copy the square; as Jensen says, "it is the child's analytic concept of the figure that governs performance." The ability itself cannot be trained: 5-year-olds who are trained (with some difficulty) to copy the cylinder will show no gains when asked to copy the diamond or other intermediate items (e.g.triangle) on which they have not been specifically trained.

     

    Binet quite often asked children to re-discover meaning: he would ask them to re-assemble sentences, paragraphs and pictures that had been violated or scrambled into their component parts (thus extending the principle of 'closure' tests). Yet comprehensiveness was Binet's keynote: a wide range of 'tests' was tried - including requests that children shake hands or copy the strange adult tester in putting their fingers into comical positions on their noses and ears. Avoidance of tasks resembling schoolwork was strict: "It is the intelligence alone that we seek to measure, by disregarding in so far as possible the degree of instruction which the child possesses.... We give him nothing to read, nothing to write, and submit him to no test in which he might succeed by means of rote learning" (Binet & Simon, 1905). Some items were harder than others and were typically passed only by older children; and children who passed one such hard item were more likely than other children to pass other items of a similar degree of difficulty. At age five, for example, the average child could say which of two objects was heavier, copy a square, and count four coins. Thus, even without the r statistic, Binet came to:

      1. talk of the level of a child's performance across the entire range of items;
      2. summarize that level by saying that a particular child performed like the average child of a certain chronological age (CA); and
      3. attribute that CA as a mental age (MA) to the child.

    Expressing MA in relation to the child's own CA as suggested by the eminent German psychologist and personality theorist, William Stern (1871-1938), the Intelligence Quotient (IQ) was born:

    IQ = (MA / CA) x 100.

    Binet himself always wanted to provide more than such a single number by which to express a child's intelligence, and so did Stern; but they were to have no more luck than did Galton with his own daydreams. In the absence of further findings of equivalent interest about mental abilities, and in view of the clear rationale for MA in the regularities of children's performance, the IQ number caught on. Mental abilities correlate as strongly as human height and weight are correlated: just as people vary generally in overall physical size and development, so they vary in general intelligence. There are many intriguing complications - just as good athletes are usually better at some events than at others. Yet when the whole population is considered, generality of ability is the more striking phenomenon - just as athletes who excel at one or two events will be well above the population average of ability at virtually any event. Eventually, in the 1930's, the American psychometrician-psychologist, David Wechsler (1896-1981) would try out similar individually administered tests for adults. By this time, the correlation coefficient was a commonplace; so Wechsler supplied correlations between his tests on a standardized sample of adults of all ages, all socio-economic groups and both sexes. The tests included general factual knowledge, everyday comprehension (e.g. of how to post a letter), memory for short strings of numbers, picture completion, jig-saw puzzles, and re-arranging scrambled, wordless cartoon frames into the right order. Wechsler found that his ten-minute tests showed striking correlations, of around +.65, from one occasion of testing to another; and they correlated strongly with each other, at around +.50 - a much stronger relation than is found between variables investigated in most psychological and social-scientific researches. As had happened for Binet, all reliable mental tests, however different their content, apparently had something in common. To allow for the fact that MA does not increase after age 15 in line with CA, Wechsler's Adult Intelligence Scale (WAIS) calculated an adult's IQ by comparison with the levels achieved by age peers. A mean of 100 and a standard deviation of 15 were retained so that 98% of IQ's fall in the range 55 -145, as do 98% of the IQ's of children when Stern's original (MA/CA)100 equation is used.(6) [By contrast, the tests of Cattell (see below) which came to be used by MENSA, the society for high-IQ people, use a fixed standard deviation of 24 points. Cattell observed that, if adult CA was set at 15 (so that IQ's do not decline with advancing chronological age), the true standard deviation of (MA/CA)100 scores in the full population, including representatives of all age groups, was actually 24 IQ points. This greater range of IQ's occurs because of the particularly low MA's obtained by young children and by some older adults (see Chapter II). An IQ of 130 using the Wechsler-type standard deviation for calculating IQ is thus equivalent to a Cattell-type IQ of 148 - both scores indicating that the testee falls just within the top 2·5% of age-peers.]

    Especially when put in multiple-choice form for administration to whole groups of testees, IQ tests found a ready market in the heterogeneous USA. Here, true mental differences between people would often have been overshadowed by the big cultural and linguistic differences between immigrants; so IQ tests could reasonably be expected to prove fairer than were school records to the basic abilities of the country's diverse minorities. At Stanford University, Lewis Terman (1877-1956) adapted and expanded Binet's package of tests(7) and campaigned tirelessly for using IQ test results to assist the implementation of public policy. The Harvard psychologist, Robert M.Yerkes (1876-1956), headed a team (including Terman and Henry H. Goddard (1866-1957)) which pioneered 'group' testing to assist military selection procedures and thereby collected the first large-scale data on mental abilities beyond childhood. The 'Army Alpha' test was provided for literate recruits; other testees, together with recruits who did poorly on Alpha, were to take the 'Army Beta.' Beta was primarily a pictorial test including mazes, jigsaws, cube counting and picture completion; it required only comprehension of spoken instructions in English, accompanied by blackboard demonstrations; and, though numbers had to be recognized, no arithmetical ability was required.) Any remaining uncertainties were supposed to be resolved by Binet-type individual testing: this would be necessary when a testee had failed even the simplest items, perhaps because of some difficulty in following the instructions.

    Over 1917-18, 1,75 million U.S. Army draftees and others participated in the programme - though sometimes in overcrowded conditions supervised by army staff lacking training in and understanding of the exercise. Notable findings were the high IQ's of conscientious objectors (something of a surprise to Colonel Yerkes) and the low IQ's of prostitutes ("from 30 to 60 per cent of prostitutes are deficient and for the most part high- g rade morons" [i.e. below MA 12, or IQ 80] (Yerkes, 1921)). Alpha scores correlated highly, at .75, with education; but cause and effect remained to be decided. Though this correlation strongly suggested that education conferred advantages under the testing conditions mentioned, it was, in principle, still compatible with the non-environmental theory "that native intelligence is one of the most important conditioning factors in continuance in school...." (Yerkes et al. , 1921). Interestingly, immigrants who had come from different ethnic backgrounds had MA's that differed more than could be explained by their differences in familiarity with English:

    • Immigrants from Canada and the British Isles had MA 13.8;
    • Immigrants from Germany, Holland and Scandinavia had MA 13.0;
    • Immigrants from Mediterranean countries had MA 11.4.

    Additionally, the more recent immigrants had the lower MA's - which a young colleague of Yerkes blamed on the rising proportions of immigrants from southern and eastern Europe (Brigham, 1923). Thus Terman's ambition was fulfilled as the earliest large-scale evidence about IQ was advanced in the course of the great debates that culminated, in 1924, in the USA's restricting further immigration to the proportions from different countries that had originally obtained in the U.S. population in 1890. IQ had arrived in politics - for all that most countries would long continue to admit migrants chiefly by the more elementary principles of how much money migrants were prepared to spend and whether voters liked the look of them.(8)

    The likely relevance of IQ test evidence to more immediate, practical problems was equally appreciated. Foreshadowing what would be a long-running use of the tests to assist the judiciary, the court testimony of Terman in 1918 as to the low IQ of a seventeen-year-old Hispanic, Alberto Flores, procured the latter's exemption from the death penalty for sexual assault and murder. By the 1920's the tests were occasionally used by Goddard on New York's Ellis Island to furnish 'clinical' evidence as to which of Europe's refugees from famine, nationalism, communism and civil war could be expected to cope in the free-enterprise, English-speaking and qualification-conscious USA(9). The enthusiasm for measuring intelligence affected even Britain - a country renowned for its complacency about theories and experts of any kind. By 1923 the tests had been used in the selection of 30,000 British civil servants (mainly for clerical posts) (Spearman, 1923, p.2); and, by 1940, tests involving 'matrices' (see Figure I,2) were in use by the British Army in the selection of officers, spies and high- g rade technical personnel. As Wilson (1993) has outlined, in societies locked into racial, ethnic, class, sexual, and 'old-school-tie' discrimination, IQ testing was a liberating and improving force. In situations where choice clearly had to be made, it offered a simple, inexpensive and relatively reliable way of identifying: (a) the ablest potential students, employees and fellow citizens; and (b) the least able who might need special education, guardianship and "the surveillance and protection of society" that Terman urged.

        Figure I.2: In 'matrices' items the testee completes the overall pattern made by the figures on the left by selecting for the question box one of the figures on the right (see e.g. Raven, 1989).

     

    Not everyone agreed that what was being tapped by the tests was intelligence. In 1922, America's star newspaper columnist, Walter Lippman, responding to the report of a conference on IQ (Journal of Educational Psychology, 1921), declared it laughable to claim from scores on the tests that the average American Army recruit (while admittedly having greater knowledge, experience and acquired skills) had little more general intelligence than a normal 13-year-old (see Block & Dworkin, 1976); and, indeed, Harvard's Edwin Boring (1886-1968) (one of the most distinguished psychologists of his day, who himself possessed a prodigious intellect and had gone to university at age 12) had obliged would-be critics by joking to the conference that 'all we know about intelligence is that it is what the tests test'. Such would long remain the principal objections to the tests from critics who doubted their value and feared their misuse; and there is no doubt that the early use of tests had raced ahead without the checks that would be required today. The low 13.1-year MA of 'the average recruit' in the US Army data was indeed something of an artefact. There had been far too many zero scores on some subtests - suggesting that some testees had simply not understood instructions. The recruits had been compared to a non-Army sample that over-represented high school pupils and educated adults; and the Army had allowed illiterate recruits to attempt Alpha and then often not found the time to test them with Beta. However, there was correct and understandable excitement at the now massive empirical evidence that intelligence did not generally increase beyond age 15 and at the tests' relevance to officer selection.

    Eventually, following in the footsteps of Wechsler and a somewhat chastened Brigham (1930), the Staffordshire-born Raymond Cattell (b. 1905, taking up psychology and objective measurement along with socialism in response to the horrors of World War I, and working as the Leicester Area School Psychologist before emigrating to the U.S.A.) would conclude that it was best to recognize two partly distinguishable types of test for general intelligence (g ) - resembling the abilities required for success at Beta and Alpha. Some mental tests require mental work on the spot with largely unfamiliar materials and problems (e.g. to find what is missing from a drawing - perhaps one of a dog's ears(10) - or to solve simple jig-saw puzzles from their pieces alone). Others require stored knowledge (e.g. of the meanings of words and proverbs, or of simple mental arithmetical operations) which, for the time being, the testee either possesses or lacks. Cattell called these types of intelligence 'fluid' (gf) and 'crystallized' (gc) respectively. Cattell particularly confirmed suggestions arising from Wechsler's work that, although half-hour tests of gf and gc correlated very strongly, at around .70, in normal ranges of children and young adults, the gf and gc scores of one person in eight diverge significantly. For example, children who were much below-average in exposure to normal schooling (e.g. Britain's canal boat children and American children of poor-white, rural families) showed the pattern gf > gc; on the other hand, in late-middle-aged and elderly people, gc would often 'hold' well while gf declined.

    Still, for Binet, Wechsler and Cattell to have identified a plausible age curve for gf,(11) to have shown how gc sometimes diverges from gf, and to have provided a wide range of types of test indexing both expressions of general intelligence was soon to seem but a slight achievement. In the generation after 1945, it would become unfashionable to regard intelligence and IQ levels as enduring and consequential characteristics of individuals.(12) As Binet himself might have wondered: could two substantially correlated IQ's - gf and gc (or Performance and Verbal, as Wechsler called them) - be much of an advance on just one? Might not the correlations between mental tests be explained without referring to any hypothetical general intelligence at all? Might not mental tests be found that would simply not correlate so highly? Could there perhaps have been some way in which, whatever the improving ambitions and objectivity of the early testers, truths about human intelligence that were at once more complex and more intrinsically 'social' had slipped through their fingers? Perhaps Lippman and Boring had articulated the very reservations that had actually made Binet cautious about his psychometric breakthrough and led his colleague, Thomas Simon, to denounce the use of global, general IQ scores as "treachery"?


    According to a leading modern critic of the g factor, the distinguished Harvard biologist, Stephen J. Gould (1981/1982, p.315), "The fact of pervasive intercorrelation between mental tests must be among the most unsurprising major discoveries in the history of science." Thus Gould professes no more surprise at test intercorrelation than would an opposing theorist who was prepared to talk of 'real' individual differences in general intelligence. Yet a normal expectation is that time spent in one activity is time that is lost for another: an evening spent doing crossword puzzles or metaphysics is an evening lost to practising jigsaws or swatting up metallurgy. Thus, in so far as 'practice makes perfect' and time is finite, the pervasive intercorrelation between mental abilities should actually tend to be negative; and a prediction of negative correlation should particularly made by anyone who, like Gould, is inclined to treat measured IQ-type abilities as collections of attainments. Why, then, do mental tests inter-correlate positively? And how is it that Gould is unsurprised? Could IQ-type tests (and their diverse subtests) reflect influences quite distinct from the g levels whose reality Gould doubts? Perhaps there are there better accounts of the 'positive manifold' of correlations between all tests requiring work with symbols? (13) And perhaps such accounts are more connected with 'levels' that are social than with levels of anything attributable to the individual? (The question of whether social factors affect g itself is considered in Chapter 3. Here the question is with whether talk of the g factor might be avoided altogether.)

    Examination of possible answers to this question in modern times has been the concern of a figure rather like Binet - especially in his interests in the arts, his painful learning of the need to challenge conventional professional wisdom, and his sustained scholarly interest in work that might help children who have learning difficulties. As Arthur Jensen (b. 1923) grew up in San Diego, his hero was the anti-imperialist Indian leader, Mahatma Gandhi, and his great love was for classical music. However, although he was a reasonable clarinetist who had played for the San Diego symphony orchestra, he was advised that he had little chance of the career as a conductor that he wanted. He turned to social work and clinical psychology and thus achieved a teaching post at Berkeley, his alma mater, in 1958. Over the next ten years, Jensen's interests moved gradually towards intelligence:(14) from his early interests in 'projective' personality assessments (using the famous ink-blots) and in memory (the serial position effect whereby the ends of lists recalled better than their middles), he shifted towards trying to develop culture-fair assessment of intelligence and towards assessing the results of the early remedial Head Start programmes (to be considered especially in Chapter 4). In his major book presenting his research and scholarship so far, Jensen (1980) focussed on allegations that IQ tests were 'biased' against minorities (especially against black people): he asked repeatedly whether the covariation between mental tests came about for reasons having less to do with intelligence than with opportunities - or their absence. There are in fact six main explanatory options which avoid postulating that IQ tests principally reflect intelligence. All of them have enjoyed support as ways of denying real intelligence differences between individuals and - important in multi-ethnic societies - between human groups; but all have attendant problems, as Jensen was to tease out.

  1. The existence and nature of any 'test' may chiefly reflect its own inventor's ideas about what to measure and about how to measure it. Such are the suspicions of testers that even an eminent scientist could write to the leading journal, Science (Hubbard, 1972): "The IQ tests ignore much in us that is artistic, contemplative and nonverbal. They were constructed to predict success in the kinds of schools that have prevailed in Europe and the United States." Even today, best-selling American psychologists such as Robert Sternberg (1984, 1985; Allman, 1994) and Howard Gardner (1983, 1993a,b,d) go out of their way to note that IQ tests were devised for predicting success in school, measure chiefly academic intelligence and should not be thought to tap into many equally valuable abilities such as common sense, organisational skill or creativity. In particular, Gardner criticizes the tests for using 'pencil and paper techniques'.

    Perhaps the keener schoolchild will necessarily feel happier with such procedures than will the child who suffers school phobia or whose parents spend little time reading or writing? Anyhow, merely being tested by some kind of authority figure may be thought to require a child who is well-drilled and acquiescent rather than 'generally intelligent'. However, to say that IQ tests are 'narrowly academic' reveals an untested assumption more than actual familiarity with the tests themselves.
     
    • (a) Different psychologists from four countries were involved in devising even the earliest mental tests; and today's IQ-type tests are validated, standardized and scrutinized worldwide for the reliability and consistency of the correlations among their items and subtests.
       
    • (b) All the main constructors of IQ tests have thought it folly to try to measure intelligence with only one technique; and most have been persistently curious to discover new and perhaps quite independent types of intelligence (e.g. 'social intelligence', 'empathic understanding', 'perspective taking', 'creativity' and 'moral reasoning'). Thus there has always been a continuous stream of would-be tests becoming available - allowing correlations between supposedly different types of test to be assessed empirically. Sometimes, indeed, testers are criticized for troubling to use everyday social knowledge, as when child testees are asked 'Why is it generally better to give money to an organised charity than to a street beggar?' (According to Evans and Waites (1981, p.131), this item makes the "questionable assumption that organised charities ensure that money goes to those most in need.") However: (i) such an item taps understanding of how to be genuinely charitable - generally speaking, expressly allowing for exceptions; (ii) the correct answer will be especially obvious to any child whose circumstances provide serious familiarity with street beggars; (iii) any spirited child who appears restive with the question's assumption will simply be asked by the tester 'Why is it said that...?'; (iv) in any case, such an item, like others, is only used because it does simply turn out in fact to correlate well with many other items.
       
    • (c) For better or worse, IQ-test constructors have never been remotely entranced by the merits of predicting school attainment. Most of them (like Binet himself) seem to have wanted to see brighter children from all social backgrounds being offered a chance to be 'stretched' by a suitably exacting school experience (even if some children's own scholastic attainments were previously poor and their parents entertained few academic aspirations of them); and to see duller children given special help that might compensate for, if not actually eradicate their educational handicaps.
       
    • (d) Very few of the traditional tests used in measuring IQ are in fact of the pencil-and-paper type. For example, only one of Wechsler's eleven subtests of the Wechsler Adult Intelligence Scale (WAIS) involves the testee using pencil and paper; and even this subtest involves copying simple and novel symbols for which no drawing skill is required (see Figure I,3). Tests for use with groups admittedly use pencil and paper: but their results correlate very highly (at around .80) with results from individual testing.

       

          Figure I.3: In Digit Symbol, the testee is first shown the code (at the top): this shows which symbol is to go with which number. Then, with the code always available for inspection, the testee completes as many of the empty boxes as possible within 1,5 minutes. Although of limited reliability (test-retest r = .50) because of its brevity, this test still correlates at around .35 with mental tests that require no use of pencil or paper.

    • (e) Across three-quarters of a century, none of IQ's critics has been able to provide any competing test of that 'non-IQ-type' intelligence to which allusion is so readily made in semi-erudite conversation. Although 'mere academic intelligence' is casually scorned, no-one has any validated test of 'non-academic' intelligence that can even be examined by potential enthusiasts. Hopes of such tests have often been entertained - of the 'British Ability Scales', of the Illinois Psycholinguistic Ability Test, of the Kaufman Ability Battery, of some of Jean Piaget's methods (see Chapter II) and of many others; but the putatively 'new' measures of non-IQ-type intelligence invariably turn out to correlate highly with the Binet and Wechsler scales and thus to measure little that was not included in IQ-type tests and in g . For example, in the 1970's some psychologists hoped that the measures of intelligence favoured by Piaget and his followers - e.g. measures of 'conservation' (realizing that water does not increase in volume when it is poured into a taller and thinner container - i.e. that its volume is 'conserved' across a superficial transformation) - might yield 'a new IQ'. However, it soon turned out that Piagetian measures correlated as highly as their own reliabilities allowed with conventional measures of g , and especially with gf. The only problem-solving tests that have achieved a fleeting 'independence' of IQ are those that are simply unreliable - usually because they have only recently been thought up and have not been checked out for suitability with a wide range of testees in everyday circumstances. (Many of the tests produced as part of J.P.Guilford's great quest to identify and measure no less than 150 hypothetically unrelated mental abilities were sadly of this type.) In the past decade, Sternberg (e.g. 1988) has advocated a 'triarchic' view of 'cognitive behavior' that breaks it into three aspects - performance, monitoring and skill-acquisition; but no more than any other theorist has he shown any degree of actual empirical independence for the individual differences belonging to his three categories.
    • (f) Despite years of search for other predictors, g still provides the only way of predicting success in most occupations. Even such critics of IQ as Evans & Waites (1981, p.140) allow that lawyers, engineers and chemists virtually never have IQ's below 100. The capacity of g to predict success extends even into areas like the military where modern educators may like to jest that 'academic aptitude' is not highly regarded. Much has been hoped by occupational psychologists of 'differential abilities' (see below) and questionnaire self-assessments of personality, but little has been delivered by comparison with what g manages to predict. By definition, it cannot be 'narrow academic skills' that boost efficiency ratings and remuneration across a wide range of job types: grasping capitalist employers and crime-busting police chiefs will surely not be taken in for long by mere scholasticism. Rather, something of wider relevance - like the hypothesized g factor - would seem to make for higher levels of real-world attainment as much as for success at mental tests. Today, for American adolescents, IQ correlates almost as highly with nonacademic knowledge (r = .78) as with academic knowledge (r = .82) (Humphreys, 1994): the idea that g is only 'academic intelligence' is make-believe.
       
  2. Even if g is not some narrow, scholastic ability, will not mood and motivation influence testees' scores? Perhaps high scores can be achieved by those who merely 'try harder' for whatever reason (including even personal vanity or shameless conformism)? Likewise, might not low scores on a range of tests result from feeling poorly, or depressed by domestic circumstances; or from anxiety about mankind's worldwide problems; or from having been upset by an insensitive teacher or personnel manager on the day of the test? If so, would not motivational differences be sufficient to explain the positive manifold?
    The answer to such worries is certainly 'yes'. IQ scores do indeed vary a little from one occasion of testing to another: testees can increase their IQ-test scores by some seven points if they practise for a few hours and receive feedback on the particular type of IQ test at which they wish to 'succeed'; those children who are especially inhibited and unresponsive will 'warm up' and put on perhaps 8-10 IQ points if play sessions are provided (Jensen, 1969, p.100); and the reliability of even a 'full scale' 1,5-hour IQ test over a six months' gap is not an ideal +1.00. However, in normal circumstances the reliability of Binet and Wechsler IQ's is still around .93: such reliability is far higher than is found for any other important individual, non-biographical measurement across the entire range of twentieth-century psychology and indeed social science. Recently, a large study in New Zealand has shown children's Wechsler IQ's correlate at .80 from age 9 to age 13 - over which range modern educators typically assume there is much change and inter-individual variability due to different children having different experiences and differently timed 'growth spurts' (Moffitt et al. , 1993). The researchers particularly observed that "the reliable change that does take place appears to be very idiosyncratic: it is not systematically associated with environmental changes." In Canada, across the adult age-range 20 to 60, a long-term follow-up of normal Second World War conscripts first tested in 1944 found test-retest correlation of .78 (Schwartzman et al. , 1987). Such reliability should not be too surprising: attempts to influence IQ-test performance by offering cash prizes have been unsuccessful; IQ-test performance is not systematically affected by testees being made more or less anxious about the significance of the test(15) - for some testees work a little better under pressure; and individual levels of self-reported worry and depression show little relation with IQ. IQ correlates weakly and negatively, at around -.15, with self-reported anxiety. Since anxiety actually correlates at a similar level but positively, around +.15, with academic attainments in higher education, there is no reason to think that anxiety lowers intelligence. It is equally likely that lower intelligence itself creates minor life stresses for people or that it directly makes people feel anxious and less able to cope. As for IQ tests being fakeable, so are Snellen eye tests and thermometer readings: it is only within sensible parameters that tests will measure what they are supposed to measure. IQ results have been observed to be "remarkably robust" across minor illness, fatigue, ambient temperature and ambient noise (Humphreys, 1994).
     
  3. Perhaps motivational effects are not so much short-term as long-term? Perhaps some testees generally have more long-term motivation to succeed at the tests of psychologists and educators? Perhaps the 'high motivation' of some testees may even last a lifetime - enabling them to score more successes at tests and comparable examinations even at age 60? Such ideas are plausible in so far as many measures of 'achievement motivation' constructed by psychologists (e.g. asking whether testees 'enjoy work' and 'prefer competitive to co-operative activities') do indeed show a modest (.30) correlation with IQ. (In fact, many supposed tests of achievement striving do not seem to measure anything very much unless there are intelligence differences between the testees being compared: the correlation between achievement motivation and IQ is quite often as high as the reliability of the achievement tests themselves, thus leaving nothing else for these tests to measure (Fineman, 1977).) Yet higher-IQ people do not in fact perform well at all those tricks and puzzles that psychologists can dream up. Measures of 'simple reaction time' (e.g. speed of pressing a button, as instructed, when a single light comes on) and of 'rote memory' (e.g. for nonsense syllables like VAW, TOQ and DEH) show only a slight advantage for subjects of higher IQ (see Jensen,1987 and Lynn & Wilson,1990). Why should the hypothesized 'higher motivation' of higher-IQ testees fail them on tasks where no use of symbols is involved or when overnight retention of meaningless material is examined? Although success in exams and in life is often attributed to 'hard work', no psychologist has ever produced confirmation of this popular causal story - let alone of hard work actually raising intelligence or IQ-test scores.(16) Even if there were any demonstrable correlation between hard work and IQ, higher intelligence may just have led its possessors to work hard because their efforts are more successful and earn them larger rewards.)
     
  4. Perhaps scholasticism, motivations and work habits provide little purchase on why some testees perform better than others on most mental tests. Perhaps it would be easier to admit that real, intellectual differences are involved on the tests while claiming that these g differences themselves reflect long-term advantages (and disadvantages) that different children experience in view of the 'social classes' into which they are born.(17) This claim accepts the validity of the tests as reflectors of g ; and it expresses a concern with the causation of individual differences in perfectly real levels of g (a topic to be pursued in Chapter IV). However, there may be a distinct appeal to 'social class' that continues to dispute the very existence of g . People's differences may be claimed to arise because of the tests' class-related invalidity. According to this hypothesis, failures of items and subtests to tap intelligence make the tests unsuitable for use with testees from particular social groups - at least if the intention is to measure g . This 'invalidity' crticism is not that the tests reflect g and environmental influences on g , but that, at least for some people, they do not measure g at all. However, even if this interesting question is examined quite independently of the main nature-nurture question about g (for which see Chapter III), four difficulties arise.
    • (a) Whatever may have happened in history, differences in parental social class of origin in the modern West have, quite simply, very modest associations with the educational attainments of children by their early twenties. White (1982) reviewed a hundred studies in the USA and estimated the correlation at around .22; and similar correlations have been reported from Ireland (Greaney & Kellaghan, 1984; Lynn, 1984). Evidently parental SES today scarcely correlates with, so simply cannot be influencing such a crucial variable as educational attainment in young adults. Thus it is quite unclear how it could turn hypothetically unrelated mental test scores into substantial correlates of each other across the social class range.
    • (b) Allowing for restriction of ability range, the same correlations between IQ subtests occur within families as occur for children drawn from different families. The sister who does better on some mental tests will do better than her siblings on others, despite all the children being in a home of the same SES and thus not differing in whether the tests are 'valid for them'.
    • (c) The actual correlation between parental SES and full-scale IQ in adolescence is substantial; but it is still only .40 for what are, after all, the two major variables of Western social science. Even if genetic and environmental influences on g itself are altogether discounted, parental class could account for only .40² = 16% of children's differences in measured ability: at least 84% of IQ variation does not result from class effects of any kind. {See also Box I,1, below.}
    • (d) Although lower-SES children perform worse on gc than on gf tests, the correlations amongst the different subtests, and between IQ and external criteria are the same for them as for other children. Even when low-SES children have special handicaps, these do not weaken general correlations or remotely suggest test invalidity. There are certainly some real effects of home environment on child IQ (see Chapter IV); yet that is not because the tests are failing to measure intelligence properly in lower-SES children, but because they are succeeding.
       
  5. There is an important variant on the idea that IQ reflects inappropriately and invalidly the degree of privilege of one's background. It is that some people, notably amongst ethnic minorities and the physically or sensorily handicapped, may find themselves unfairly and incorrectly judged as dull by IQ-type tests. To the extent that this happens, apparent g differences across an entire population will arise if some sub- g roups are especially disadvantaged on a wide range of tests - not now by SES alone, or by genuine intellectual limitations, but by detrimental factors specific to minority status. Once more, the combination of groups for whom the tests are invalid with groups for whom they are valid might tend to yield a spurious g dimension in the population as a whole. (Of course, the minority groups would have to do badly on the tests despite their hypothetically normal intelligence. To the extent that a critic allows it quite possible to do well on a test that is invalid as a measure of one's intelligence, this argument for a false g factor cannot be used.) As with the case of SES (above, (4)), two quite different claims need to be disentangled.
     
    • (a) One is the claim that racial prejudice, sensory handicap and physical incapacity to explore or learn from the environment all actually cause genuinely lower intelligence in victims. This claim does not dispute the validity of IQ tests: indeed, the tests may serve as very useful measures of the degree of harm actually done to the intelligence of victims by their environmental or constitutional problems. For example, deaf children have entirely normal levels of performance on gf tests despite having missed much of the supposedly enriching and stimulating world of language and verbal communication; but, especially in childhood, they do have lower scores on gc tests requiring knowledge of language (Braden, 1994). In both cases, the tests are valid; but one type, gc, requiring normal verbal skills, registers - quite properly, and indeed quite fairly - a real handicap.(18)  
       
    • (b) On the other hand, it is sometimes believed that IQ tests are 'invalid' for particular groups, as if the low IQ estimates yielded in some cases were not in fact seriously meaningful - perhaps because black children should be thought to have their very own dialect of English which is especially rich in monosyllables (e.g. Labov, 1973). However, for criticism to succeed would require that mental tests and subtests did not show the same correlations within minority group testees as they do within the rest of the population; and this is far from what actually happens. IQ tests are just as 'good' at measuring, amongst black or Asian minority testees,(19) whatever they normally measure elsewhere. A 'Black Intelligence Test for Children' was constructed in the early 1970's (asking about knowledge of what were then distinctive Afro-Caribbean colloquialisms, like 'The Bump' and 'going down on'); and white children duly had the lower scores (Williams, 1972?, Matarazzo & Wiens, 1977; Jensen, 1980, pp. 679-681). Yet this 'intelligence test' turned out not to predict any kind of educational, occupational or sociometric success even amongst black youngsters themselves: it was no more a test of intelligence than is expertise in Cockney or Glaswegian for most British children. Conventional IQ tests are just as reliable, internally valid (i.e. self-consistent as between their different parts or items) and externally predictive for blacks as they are for whites; and black children do not improve their IQ's when tests are translated into black ghetto dialect by linguistics specialists (e.g. Quay, 1974).

    Reviewing her study of three thousand children (Grades 3-8) in Philadelphia state schools, two thirds of them black, one third white, Scarr-Salapatek (1971, 1972) found measures of aptitude to predict school achievement equally well in both racial groups. She wrote: "Many would like to claim that the low average IQ scores of disadvantaged children result from measurement invalidity, but I find no support whatsoever in my data for this assertion." For adults in the USA., IQ tests correlate just as well with job performance in all racial groups. (If anything, the tests slightly over-predict scholastic and workplace performance by blacks and are to that extent unfair to whites and Asians in competition for the same positions - see Hartigan & Wigdor, 1989.) Nor is there any general problem of test-taking motivation for minority children: black children do perfectly well at laboratory tests that are not correlated with IQ - such as drawing a straight line, threading beads, or recalling past events (Montie & Fagan, 1988); and deaf children, despite their gross cultural deprivation, have no special problems with non-verbal tests that are well known as good measures of IQ.(20) The question of what causes Afro-Caribbeans to have low IQ-type scores (even as early as age 3 and even when selected as having mothers who are married and have enjoyed a college education) is of great interest (and will be considered in Chapter IV); but it seems the answer will have to involve test validity, not invalidity. Even when particular IQ items are identified by sociologists and educationists as appearing 'culturally unfair' to minorities, investigation shows that black children actually do a little better on these (often requiring memory and learning) than on items selected as 'unbiassed' (and requiring gf) (McGurk, 1975 - in a review of 105 published articles). At every age and at every level of family income, black children are no worse on Wechsler Vocabulary than they are at Block Design (Roberts, 1971). Jensen's demonstrations were an important beginning of the answer to the 1979 ruling of a Californian court banning the use of IQ tests by state authorities:(21) Judge Peckham's ruling had expressly remarked the lack of evidence from testing personnel about the validity of the tests in use with black children and adults themselves, but such evidence was soon available in quantity. Overall, the slight degree to which IQ test variance is attributable to either social class or racial differences in the US population is shown in the Box I,1:
     

      Do IQ tests discriminate mostly along the lines of race and social class? Do IQ differences chiefly reflect children's differences in social advantage vs disadvantage - i.e. in the class (socio-economic status (SES)) or race (black, white) of their parents?

      Arthur Jensen (1980, pp. 42-3, 57-9) examined the Wechsler IQ's of 622 black and 622 white children from 98 Californian schools. Both groups involved children of 5-12 years old who were representative of their racial groups in SES levels. [These data had been collected by Dr Jane Mercer; the Wechsler Intelligence Scale for Children (Revised) (Wechsler, 1974) was used; and SES was calculated as by O.D.Duncan (e.g. 1968) from parental occupations.] The correlations between the three variables, IQ, race and SES, were all around .45; so the partial correlations amongst any two of these variables, controlling for the remaining variable, were around .30. For comparative variance between siblings, between average families and between random individuals, Jensen used his own Lorge-Thorndike IQ data and WISC-R standardization data.
      The following Table shows: (1) the percentage of variance in IQ attributable to each source; and (2) the average IQ difference attributable to each source.

      Table 1.

      Source of variance
      (1)
      % IQ variance
      (2)
      Average IQ difference
      Between races, independently of SES
      14
      12 pts (between black and white)
      Between SES groups, independently of race
      8
      6 pts (between High and Low)
      Between families, within race & SES groups
      29
      9 pts (between random families)
      Within families (thus within same race & SES)
      44
      12 pts (between siblings)
      Measurement error (using one-month reliability)
      5
      4 pts (between self 1 and self2)
      TOTAL
      100
      17 pts (between two random individuals)

       

      Jensen explains: "The reason that race and SES account for relatively little (i.e. 22 per cent) of the total variance of IQ....is that there is so much variability within racial and SES groups relative to the difference between the means....Average differences among groups may seem overwhelming until they are viewed in [conjunction with] the total variation in the population."

      The following Figure illustrates the race and class differences. It may be seen how these group differences account for little of the population range in individual IQ scores.

  6. A last doubt about IQ-test validity is that 'measured' differences may be little but the products of other people's expectations, 'labels' and self-fulfilling prophecies. Once more, there are two versions of such a claim.
     
    • (a) One is that differences in expectations (e.g. by children's teachers) may have real effects on intelligence. This is a claim for which no evidence has ever been offered other than from IQ-type testing; and, if IQ-test evidence is considered relevant, the claimant is accepting IQ-test validity.
       
    • (b) The other version is that expectancies may particularly affect only IQ scores. Such invalid scores may eventually become reality via subsequent differential provision of educational opportunities. The idea is that differential treatment, in response to initial IQ scores, may yield real, 'self-fulfilling prophecy' effects on intelligence itself. Fortunately, though it is now well recognized that one-off perceptual judgments and children's achievements in swimming, athletics and laboratory learning can sometimes reflect initially erroneous expectancies (of teachers, parents or pupils), hundreds of studies in the past twenty-five years(22) have found little general effect of such 'labelling' effects on IQ. In the most systematic study in a normal school setting (Kellaghan et al, 1982), expectancies of teachers supplied with IQ information about pupils did not generally change children's IQ's or attainments over a school year. (There was a slight boost to the end-of-the-year achievements of those (genuinely) higher-IQ children who came from relatively low-SES families: the teachers may have been trying to discount background SES and to 'bring on' such children towards the attainment levels normally expected from children of such IQ's.) Far from labelling or self-labelling themselves giving rise to IQ-type differences and so to spurious correlations and a g dimension among mental tests, it is noticeable that many genuinely bright people have a misleadingly modest impression of their own abilities - often claiming on TV shows to be 'poor spellers', for example; while vanity amongst people of mediocre intelligence is probably easier to find (see Brand et al. , 1994).

    Thus all the six proposed escape routes from admitting that IQ tests measure intelligence turn out to be blind alleys. As much as household thermometers or kitchen scales, IQ tests are generally reliable and make distinctions that are not interpretable as discriminatory. None of the above six arguments has concerned what is the nature of intelligence, how intelligence differences arise, or whether intelligence is of any importance. (These three matters are the respective concerns of the next three chapters.) However, it transpires that, in trying to explain why mental tests correlate and why the same testees who do well on one type of test also do well on others, the proposition that there are real and general mental differences between people is very hard to avoid. These differences have no ready interpretation other than that they must consist in something approximating 'general intelligence.' Testers' whims, testees' moods and motivational idiosyncrasies, the social classes of testees' parents, minority group statuses and 'labelling' effects are quite unable explain the correlations between tests or to account for more than tiny proportions of population variation in IQ except in so far as it is admitted precisely that estimates of IQ do indeed reflect people's relatively enduring levels of general intelligence. A report published by the National Institute of Education in the USA once complained of "the myth of measurability" and insisted that "a person's abilities, activities and attitudes cannot be measured" (Tyler & White, 1979, p.376). For general mental ability, at least, there is no such "myth". Complaining about the validity and fairness of IQ-type tests has been a popular way of avoiding serious consideration of the other questions about IQ differences - about their unity, essence, origins and function; but the complaints do not withstand scrutiny. In empirical testimony, two massive research programmes on the use of IQ tests in occupational selection in the USA have shown the tests to be equally useful (i.e. valid and predictive) with all racial groups. Reynolds & Brown (1984) brought together the main strands of the voluminous evidence on whether and when IQ tests were unfair to minorities. Blinkhorn (1985) provides a review and observes that "....the problem is not that tests under-predict the performance of blacks [in industry] but that they over-predict it."

    IQ has undoubtedly been more fully checked for possible bias than has any other variable in psychology (Barrett & Depinet, 1991; Humphreys, 1992). More recently, Schmidt et al. (1992) find no evidence of test unfairness in the largest-ever psychometric investigation of all time - into the use of testing by the U.S. Army (Project Alpha). Again, if bias is simply assumed to have resulted from racist prejudices of the early testers, then why do the tests not reflect Victorian sexism by yielding a sizeable difference betwen the sexes? Finally, if IQ tests are still somehow to be called 'unfair', the critic should say what is fairer. Is it fairer to decline to recognize people's differences - and thus to treat unlike cases alike, i.e. with injustice? Is it better to disbelieve in differences - as if one could somehow rig one's own beliefs in such matters when so many tests correlate? Or can the critic actually indicate a measure of intelligence that is 'fairer'? Quite conspicuously, despite 75 years of opportunity, no alternative to IQ testing has appeared - though those independent, non-g mental ability differences that do emerge from such searches must now be considered.(23)

     


As Binet asserted, people do not differ only in their level of general intelligence. They also differ in more particular ways, for example in musical, numerical, map-reading and empathic abilities. How extensive is such specificity on tasks requiring mental work? How much do people vary in ways that are reliably measurable yet not attributable to their g differences alone? What proportion of people's tested differences reflects g differences, and what proportions reflect independent specifics? After sixty years of hunting for non-g mental ability specifics, there exists no standard package of tests for them nor any conspicuous agreement on their number. The lack of g-free ability tests occurs partly because the everyday demand of employers is for the specific ability together with the advantage that higher g will also convey:(24) in practice, an employer seeking high levels of, say, the spatio-mechanical aptitude that makes people good with gadgets will happily use a test which, though it uses problems involving levers and pulleys, is also correlated about .50 with g . Perhaps more surprisingly, there is only modest agreement between theoreticians as to what are the main specifics that can be isolated even once the g factor is set aside:(25) theoreticians have been more interested in what has proved the thankless task of breaking up g itself so as to seize substantial chunks of its variance to increase the size of their own preferred 'specifics'. Once g is set aside (or 'partialled out') in representative samples, specifics like verbal, spatial and associational (memory) abilities can usually be found if relevant tests have been used; but they each account for only about a sixth of the ability variation between people that is attributable to g (e.g. Blaha & Wallbrown, 1982; Jensen, 1985; Carroll, 1993). Probably the search for non-g dimensions needs to be extended to include strategy differences: such bipolar differences occur in so far as choices need to be made between partly incompatible goals such as speed vs accuracy or concentration vs breadth-of-attention. An obvious starting point in any consideration of such contrasting strategies is with the 'Big Five-or-Six' dimensions of human personality difference in self-report data that have lately attracted psychometrician-psychologists (e.g. Brand, 1984a; Deary & Matthews, 1993; Brand, 1994a; Ormerod et al. , 1995). Taking a very broad overview of historical and current research into human differences in both abilities and self-reported preferences,(26) some five specific ability contrasts might be suggested to be largely unrelated to g (as in Brand, 1994b) - though, astonishingly,(27) researchers have yet to adopt Cattell's practice of collecting both questionnaire and ability data from the same subjects. Like g itself, these five dimensions can probably be seen in both 'fluid' and 'crystallized' forms - if measures of temperament and attitudes are expressly used (see Brand, 1994b).

 

    1. The longest-running distinction between mental abilities dates back to Wechsler's work in the 1930's and the similar discovery by Britain's National Institute for Industrial Psychology that verbal tests were less helpful than performance tests in selection for skilled apprenticeships (Evans & Waites, 1981, p.78). However, the personality distinction between tough- and tender-mindedness can be found in Shakespeare (in King Lear) and William James (1842-1910) (e.g. 1976); and tender-mindedness of personality was first identified systematically in questionnaire data by Cattell (as a mixture of premsia (sensitivity), affectothymia (interest in people) and good manners (Cattell's N) (Cattell, 1973; Cattell & Kline, 1977). Today a broad contrast might be made along all of the following lines - although the verbal / spatial distinction is the best known and most commonly tested by measurement of the different abilities.

    VERBAL
    vs
    SPATIAL, 'PERFORMANCE', CONCRETE
    MUSICAL, AUDITORY
    vs
    MECHANICAL, VISUAL
    THEORETICAL, ABSTRACT
    vs
    PRACTICAL, CONCRETE
    INTUITIVE, IMAGINATIVE
    vs
    PERCEPTUAL, SENSORY
    INTEREST IN PEOPLE
    vs
    INTEREST IN MATERIAL THINGS

    Classically the V-P discrepancy was often held to be related to personality and to type of psychopathology - with delinquents, criminals and personality-disordered patients scoring relatively 'low-Verbal' (accounting for the CIA's long-standing interest in V-P differences when testing potential spies and informers). Typically, women are more 'verbal/intuitive' than men; likewise women score higher on the moderately correlated personality measures of tender-mindedness, Openness/Imagination, affection, empathy, trust, idealism and aesthetic and religious values (Minton & Schneider, 1980; Sidanius & Ekehammer, 1982; Gibson, 1979; Vernon, 1982). Apparently this broad dimension of contrast between specifics is one of sensitivity to the higher, less prosaic elements of culture and social experience; and it involves response to symbolic significance and a relatively wide receptivity to experience as opposed to closer reality contact. At the verbal/intuitive end, a broad intake of in-context material is probably achieved by operating abstractly and at a distance from the coarser aspects of reality that sometimes require the relatively direct, quick, perceptually driven and practical responding of the higher-Performance person. The higher-verbal, higher-idealism person is perhaps taking in more by standing back further from the scene, but at the cost of 'miniaturizing' what is viewed and sometimes sacrificing important practical details. In line with the sex difference, the distinction would usually correlate with arts versus science interests. (In everyday life this 'opposition' is usually obscured since higher- g people having more interest in both art and science - just as they have more interests of both masculine and feminine 'types' (e.g. Hamilton, 1995).) Such questionnaire dimensions as Openness and tender-mindedness have strong empirical links to the Jungian contrast between perception by intuition and perception by the senses (e.g. McCrae & Costa, 1989). The broad distinction suggested here may seem easy to confuse with the crystallized-fluid distinction (between gc and gf and - see above): but the latter is linked to age and knowledge rather than to feminine sensibilities and intuition vs practical abilities. The present distinction involves greater preoccupation with the mental and socio-emotional than with the material and tangible aspects of the world.
     

    2. The second dimension also seems related to how information is taken in from the world. Aspects of it would be as follows.

    FIELD INDEPENDENCE
    vs
    FIELD DEPENDENCE
    ANALYSIS
    vs
    SYNTHESIS
    RATIONALITY
    vs
    EMPIRICISM
    DEDUCTION
    vs
    INDUCTION


    First identified by the USAF psychologist, Herman Witkin (1916-1979), in the 1950's, and soon shown by Cattell (e.g. 1973) to be connected with personal qualities of 'independence / assertiveness / self-sufficiency vs subduedness / agreeability / group dependence', this dimension also yields something of a sex difference. Males perform better on tasks requiring narrow attention that sets aside demands of context that are irrelevant to the current task. The classic measure of this dimension is the Embedded Figures Test (EFT): testees are asked to detect simple figures enmeshed within complex visual designs.(28) But, like many tests once intended to tap specific, non-g abilities, the EFT correlates at around .45 with g ; so a more strictly perceptual test, the Rod-and-Frame Test (RFT) is sometimes used instead. In the RFT, the testee tries to rotate a rod to the true vertical position while the square frame around the rod is itself rotated so as to provide what for many testees are quite powerfully misleading visual cues as to the true vertical. Field independence involves attending narrowly and ignoring currently unwanted influences of context. By contrast, field-dependent people often seem better at taking in and using a wide range of cues of less immediate relevance - as is often helpful in fast-changing, unplanned social situations. Similar dimensional contrasts called 'independence' and 'self-awareness' emerge from other procedures (Kline & Barratt, 1983; Bekker, 1993). Strictly analytic abilities usually seem related to field-independence; in contrast, using social cues pointing to correct answers is more of a speciality of field-dependent people. The ability to ignore distraction is sometimes thought to enable shifts in approaches to tasks and to be subserved by the brain's frontal lobes.(29) McCrae & Costa (1989) find their Antagonism vs Agreeableness dimension to link especially to the Jungian contrast between making decisions by reason and making decisions by feeling.
     

    3. Around 1970, a third specific, non-g ability distinction had come to light in the work of the 'London School' psychologist and personality theorist, Hans Eysenck, and was developed further in the work of his son, Michael Eysenck.

    SHORT-TERM MEMORY
    vs
    LONG-TERM MEMORY
    AROUSAL CONTROL
    vs
    AROUSAL SUSCEPTIBILITY
    BEHAVIOURAL SPEED
    vs
    POWER (of processing)
    BREADTH
    vs
    DEPTH (of processing)


    Hans Eysenck had long presumed quiet and serious 'introverts' to be likely to do relatively well at laboratory tasks requiring vigilance, attention, persistence and memory. In fact, it emerged that, by and large, it was fun-loving 'extraverts' who were better at coming to terms with the novel (and often trivial) tasks of the experimental psychologist's laboratory: they tended to score better in the short term. Introverts did better chiefly if testing was extended over several days and required long term memory storage and recall (see Matthews, 1993; Brand, 1994b). Apparently, extraverts can free attentional resources for rapid performance in the task at hand by the expedient of not engaging in so much long-term storage of what is going on. They can be said to process what is going on less deeply than introverts. The latter analyse input more fully (for meaning, not just for sound or spelling) and link it more widely to what is already stored in memory. It is as if the introvert provides a more 'powerful', memory-establishing treatment of incoming happenings and stimuli; but this extra processing means that the immediately required reaction to the experimenter's problem-stimulus takes longer to arrange. (30) As with other mental ability distinctions having little relation to g , it must be stressed that both 'extraverted' and 'introverted' strategies (or styles) have their own special advantages; and that higher levels of g will improve people's performances at both short-term and long-term memory for meaningful material.
     

    4. By 1980 a fourth dimension required recognition despite early disputes as to its reality. This was a dimension that contrasted loose, fluent, original, bizarre and sometimes 'creative' thinking with a more prosaic, down-to-earth and accuracy-seeking style.

    CREATIVITY
    vs
    CONVENTIONALITY
    ORIGINALITY
    vs
    ACCURACY
    LOOSENESS
    vs
    TIGHTNESS (of associations)
    FLUENCY
    vs
    SUPPRESSION (of associations)


    In particular, Hans Eysenck came to agree with an important strand in the suggestions of J.P.Guilford (1897-1987), Liam Hudson and the Glaswegian enfant terrible of British psychiatry, R.D.Laing (1927-1989). This idea was that psychotic (especially, schizophrenic) people might have looser patterns of association to stimuli - perhaps through lacking the normal 'lateral inhibiting mechanisms' that keep most thinking within conventional pathways and allow the elimination of irrelevant responses (cf De La Casa et al. , 1993). Called psychoticism by Eysenck (e.g. 1995), the dimension contrasts spontaneity, imagination, impulsivity and a certain indelicacy of expression with a more thorough, scrupulous, or even pedantic and obsessional approach that is highly suited to the once-prized achievement of clerical accuracy. Higher levels of conventionality, conscientiousness and control have sometimes been found in association with what a Freudian would call 'anal' personality features, and also with more traditional, conservative social attitudes (Kline & Barratt, 1983). Once again, in real life, those versions of 'creativity' and 'clerical accuracy' that are actually in any serious demand will usually involve above-average levels of g ; so the value-free, bipolar contrast in conscientiousness is really between meticulously careful and cautious versus lax and laid-back approaches. Higher conscientiousness may particularly involve a stronger influence of multifactorial models, patterns and regularities - at the expense of situational flexibility. McCrae & Costa (1989) find this dimension especially linked to another Jungian contrast between problem-solving styles: some people try to arrive at principled judgements (whether, in particular, in accordance with reasons or feelings) and other try to collect more evidence (whether from intuition or the senses).  

    5. Lastly, beyond g , there is a major dimension of learning differences that must be mentioned even though it is not itself much concerned with performance on distinctively mental or symbolic tasks.(31)

    CONDITIONABILITY
    vs
    EXTINCTIONABILITY
    PUNISHMENT LEARNING
    vs
    VOLUNTARY UNLEARNING


    To many forms of learning (or 'conditioning'), some kind of motivation (by reward, loss of reward, punishment or relief from punishment) seems essential. Such emotional and experiential learning is especially important for anticipating crises, and it typically seems to involve neural routes in the midbrain that draw little on cortical processing (cf Gray, 1991; Le Doux, 1994; Epstein, 1994). Some people seem to learn especially well under such conditions - perhaps because they bring extra internal 'multipliers' of motivation, drive or emotional arousal to the task. People of a more emotional disposition are more readily moved into extreme mood states (especially into the four main negative mood states of fear, depression, fatigue and hostility) and have been thought to show more susceptibility to motivated learning - as seen in marked long-term preservation of 'neurotic' habits of reaction that they themselves would rather be without. Motivated learning involves identification and recall of events and sequences - allowing whole chunks of behaviour to be copied or shifted around a person's repertoire. In some recent researches, people of higher neuroticism have shown better recall for the details of past events such as their first day at secondary school and their first kiss - regardless of whether they experienced these events as happy or stressful at the time (Brand, 1996/7). It is almost as if life means more to the more neurotic, more emotional person. Whether a high degree of storage for past events is helpful on a day-to-day basis will presumably depend on the degree to which the events were genuinely of importance and can have their features intelligently extracted on later demand. Presumably problems can arise from the overloading of consciousness with useless memories - which is what some people of higher emotionality seem to report.

    The above five non-g -related ability distinctions can provide a serious hypothesis today to the variance that exists beyond the main human mental ability difference in g itself. For the past decade they have been the subject of what has been called a "converging consensus" in the field of personality research (using questionnaires that probably tap more crystallized aspects of them(32)). Still, the plain truth is that, seventy-five years after Binet's tests were translated and organized into a usable form by English-speaking psychologists in Stanford and London, and despite many psychologists having sympathized with Binet's own preference for 'going beyond general intelligence', there is astonishingly little agreement in psychology about such 'other dimensions' of ability at present.(33) Evans & Waites (1981, p183) expressed a common and long-standing aspiration of many psychologists to dispute the claim of the universal involvement of any g factor in mental abilities: apparently "comparatively recent findings" showed that "cognitive tests can in fact be devised which do not correlate with conventional psychometric tests." Yet g still held sway - as in the classic review of the field by Gustafsson (1984). Reviewing the recent literature of personnel selection research, Schmidt et al. (1992) note the continuing lack of support for non-g mental abilities: "Research evidence against differential aptitude theory mounts, leading to a renewed emphasis on the importance of general mental ability." Major North American psychometric programmes (e.g. Snow et al. , 1984) come up with little more than the faint empirical distinction between numerical, verbal and spatial abilities (normally correlating at .70 when reliably measured) that dates from Thurstone (1938) and which long ago inspired the Alice Heim Tests of intelligence that have proved so popular for testing high-level intelligence in Britain.(34) Even Howard Gardner, the foremost champion of multiple independent abilities over the past fifteen years, has been unable to deliver any package of demonstrably uncorrelated mental tests (e.g. Gardner, 1993a,c; Krechevsky & Gardner, 1994).(35) Nor has the enormous US Air Force programme of testing in Texas - using all the concepts and distinctions of modern cognitive psychology (working memory, declarative learning, procedural knowledge etc.) - realized empirically the multidimensional ambitions of J.P.Guilford (see Kyllonen, 1994).(36) Lastly, the long-standing wish of North America's leading academic entrepreneur for cognitive psychology (and for his own 'triarchic theory') to go beyond g has likewise yielded nothing that does yet do that job (see Sternberg, 1994).(37) Hence it seems preferable to invoke modern personality distinctions and strategic contrasts to provide the needed complement to the influence of g alone.

    Interestingly, non-g differences such as the five contrasts suggested above are more easily seen in people of relatively high IQ (Brand, 1988; Detterman & Daniel, 1989; Lynn, 1992a; Detterman, 1993; Deary et al. , 1995/6).(38) At higher levels of g it can be said with better psychometric authority that some people really are more 'verbal', others more 'spatial', some more 'clerical' and others more 'creative', and so on. Self-reported personality features also seem to differ more sharply amongst higher-IQ testees - sometimes yielding more extreme scores and sometimes yielding new personality factors altogether (Brand, Egan & Deary, 1994; Brand, 1995). This may be the reason why so many psychologists - themselves presumably fairly high in g - find it easy to believe that there are many quite distinct types of intelligence and aptitude, and that they and their friends and colleagues all have numerous non-g -related intellectual strengths and weaknesses; and it accords with Binet's own view that intelligence was more unitary and thus more readily 'measured' in the below-average range.(39) Just why intelligence and intellectual styles should appear more 'differentiated'(40) at higher levels of g , Mental Age and IQ will be considered at the end of Chapter 2.

    The striking central phenomenon, however, is twentieth-century psychology's overwhelming and continuing vindication of Binet's main finding - while not of Binet's or others' disunitarian ideas. To the surprise of many psychologists themselves, reliable tests of mental abilities intercorrelate positively and defy interpretation in terms of bias: Binet-type tests have repeatedly shown the meaningfulness of overall MA, and thus of g . Virtually all mental tests could serve to help indicate intelligence: as Binet and Simon (1911) themselves had put it, "It matters very little what the the tests are so long as they are numerous." Lack of grip on the larger aspects of mind had driven Watson and his followers to the study of the rat; but Binet's common sense and empiricism arguably netted the key reality of human mental differences. Like heat, intelligence has proved satisfactorily quantifiable; and measurement should yield the same sorts of advance as occurred in science and medicine after the development of the thermometer. Psychological understanding of this reality may have left much to be desired, as has often been complained; but theorizing and research are barely even attempted by those who are determined to doubt the reality of g . Rather than use the available thermometers, critics of IQ behave like alchemists who would smash their measuring tools rather than learn the truths they tell. Watsonian opposition to broad and central dimensions of human psychological difference has always been a luxury that psychology could ill afford; but to couple ideological behaviourism with accusations that 'no one really knows what intelligence is' adds hypocrisy to frivolity. It is simply wrong to talk of "the relatively small correlations that have been reported" between cognitive tasks (Russell, 1990). Binet provided the basis on which others could build - if and when they were so minded.


CONCLUSIONS

  1. Many turn-of-the-century psychologists doubted that quantitative assessment of human faculties could advance psychology; but Binet realized Galton's dream of finding that most mental abilities are systematically related. To think that people thus differ in their levels of general intelligence (g ) appeared a reasonable and economic way of summarizing the picture that Binet had first disclosed.
     

  2. Other interpretations of 'the positive manifold' of mental ability correlations repeatedly make incorrect predictions. Notions that some testees lack motivation, concentration or high enough expectations of themselves predict that such testees (whether from low-SES groups or ethnic minorities) will perform poorly on virtually any test whatsoever. In fact, simple reaction times, motor skills and memory for nonsense syllables have little connection with g (i.e. with mental ability tasks involving symbol use), so low-IQ testees perform perfectly well on them. The same positive relations are found between mental tests that require the use of symbols even when testees are all drawn from the same SES levels, from the same minority groups or from the same families.
     

  3. Despite the omnipresence of g differences, most psychologists have envisaged that there are some additional tendencies to covariation among mental tests that allow talk of other, more specific mental abilities. 'Fluid' and 'crystallized' forms of g were first identified around 1930 - although only one person in eight in the general population will have scores that differ significantly on these two substantially correlated types of ability. Other more specific abilities certainly exist (e.g. for map reading, constructing objects from diagrams, and being verbally and ideationally fluent); but these still involve g or, when they do not, are irrelevant to capturing more than a small fraction of the practically important differences between random members of the population. Currently, despite decades of search for and belief in 'differential aptitudes', there is in fact no agreed nomenclature or scheme for abilities other than g even though terms like 'verbal', 'spatial' and 'clerical' are often heard. The 'Big Five' dimensions of personality have approached something of a consensus among modern psychometrician-psychologists. They probably come nearer than any other method to indicating the main human ability differences beyond those for which g differences can adequately account. The special links of the Big Five are to the ability-contrasts: verbal vs spatial; field independence vs field dependence; short-term memory vs long-term memory; originality vs accuracy; and conditionability vs extinctionability.
     

  4. Even these bipolar distinctions may themselves be hard to isolate in testees of low general intelligence. Intelligence and personality seem more 'differentiated' in people who are above-average in g: though the g factor is unitary, higher g levels yield more diversity. Binet was right to suspect that differences in general intelligence were both more important and more measurable among the lower-IQ.


ENDNOTES to Chapter I

  1. A child's 'social class of origin' would normally be thought to be determined by the status, wealth, income and influence of its parents. Typically parental socio-economic status is assessed by the 'level' of the father's occupation; or by some formula that essentially multiplies the father's income by his educational level.

  2. Braden (op.cit.) especially considers the idea that minority children are handicapped in access to the ways of the 'dominant culture' - e.g. because their parents do not know it, do not like it, or anyhow cannot communicate it to their children; and thus that minority children will be deficient in the knowledge which is sometimes thought to be especially tapped by IQ-type tests. By such criteria, deaf children clearly have a massive handicap in accessing the 'dominant culture'; yet they have entirely normal levels of gf.

  3. Afro-Caribbeans in America and Britain have been of particular interest concerning the validity of tests. Whereas some ethnic minorities have their own language, religion, trade specializations and musical preferences, blacks in America and Britain are very similar in their general cultural exposure and aspirations to local white populations. Yet blacks show g deficits - especially on those gf tests that are the least conspicuously dependent on 'culture' (and which give higher IQ estimates for white children from poor families).

  4. Schonemann (1985) claimed there might be artefacts of test construction that would lead to minority groups doing poorly on IQ-type tests. But Braden (1989) found that deaf children - who are notably isolated from mainstream American culture and stigmatized by their peers - only have problems with verbal IQ tests. It is precisely on verbal tests that black children typically do rather well - compared to their overall IQ results.

  5. Larry P et al. v. Wilson Riles et al. , US District Court (Northern California) Judge's Opinion, filed 16 x 1979, p.3.

  6. Rosenthal & Jacobson (1968) provided the sensational initial report of labelling-induced IQ-'blooming' in six-year-olds; but their effect was achieved only with very young children on a most unusual test - which classified most of the children as mentally subnormal even though they were in a normal school; anyhow, the 'Pygmalion effect' proved hard to replicate for IQ. Rosenthal (1994) himself reviews the extensive literature and estimates that expectancy effects achieved for combined tests of 'ability and learning' average (in correlational terms) only .26. Since no 'learning tests' are as reliable as IQ - and thus unlikely to reflect such short-term influences as expectancy effects - the Pygmalion effect for IQ has to be still lower.

  7. Replacement of unfair tests and items could occur by finding items on which the lower-IQ racial minorities perform relatively well. Such items can be found: for example, black people do relatively well on tests of simple reaction time and rote memory. However, the problem is that not even IQ's sternest critics think these tests measure intelligence.

  8. There is the Differential Aptitude Test, of which disunitarian theorists entertained so many hopes over the years - but its sub-scales typically correlate at around .35.

  9. It is often thought that there are many different 'cognitive abilities' that must have been identified in the last twenty years work since experimental psychologists forsook the rat and once again studied people. However, cognitive psychologists normally study psychology students or other educated young people who do not differ much in g: thus many of the variables of the cognitivist's laboratory have simply not been investigated as to how they correlate with g . Exceptionally, where cognitivist investigators have made the proper investigations, their measures of attention and memory correlate substantially with g - see Chapter II. Johnson-Laird (see Johnson-Laird & Byrne, 1993) has distinguished some five different types of thought which bear some resemblance to the five non-g dimensions of difference that Chapter I outlines. Again, the first two of the five dimensions set out here (Verbal vs Spatial, and Analyticity vs Synthesis) closely resemble the two distinguished in a substantial review of how other mental tests correlate independently of the g factor of the classic Raven's Matrices (Carpenter et al. , 1990).

  10. Many studies admittedly involve all too slight a range of IQ's. For example, in the largest single project on personality differences in 'normal adults', in Baltimore (e.g. McCrae & Costa, 1989), no less than a quarter of the adult testees have doctorates. Such artificial restriction of g range allows 'special' factors to appear as relatively important compared to g .

  11. Researchers of personality do not invariably look for underlying abilities - partly by theoretical choice, and partly because testing abilities is more demanding of subjects. On the other hand, researches of abilities tend to feel there is little point in administering questionnaires when these produce results that are much less reliable and predictive than are ability measures (especially when ability measures tap into g variance, whether by accident or design). Thus, bizarrely, 'intelligence' and 'personality' are conventionally treated as separate domains by most researchers. Over the years, Cattell has provided the one conspicuous exception: his personality questionnaires always include an intelligence scale.

  12. The task is similar to that seen in children's comics, where the child has to find, say, how many 'monkeys' can be detected in a drawing of people on crowded beach. Finding relevant detail embedded in irrelevant material often figured as one of the primary factors found by follower of Thurstone - see Baker, 1974, p.455.

  13. Dempster(1991) summarizes evidence linking field independence to Wisconsin Card Sorting (when testees are required to change sorting principles throughout the test) and to the Stroop task (where testees have to avoid distraction from the colour in which a colour word is printed - e.g. 'blue' printed in red takes longer to read than 'blue' printed in blue).

  14. For the idea that there is a trade-off relation between storage and current processing of information, see Just & Carpenter,1992. Very stiking extravert-introvert differences occur in response to the McCullough Effect (whether subjects easily see phantom colours after viewing black-and-white grids (Logue & Byth, 1993)). These differences apparently reflect differences in the functioning of the cholinergic fibres that are known to be involved in enabling consolidation of memory traces.

  15. For an account of the range of effects in which something like classical conditioning may be involved, see Turkkan, 1989 and Krank, 1989.

  16. In questionnaire research, the five dimensions are currently known by such titles as:
    • (i) Openness, affection (a), tender-mindedness vs realism, cynicism, projected hostility
    • (ii) Independence, will (w), disagreeableness vs subduedness, deference.
    • (iii) Extraversion, energy (e), surgency vs introversion, gravity, sobriety.
    • (iv) Control, conscientiousness (c) vs laxity, impulsivity, casualness.

  17. (v) Emotionality, neuroticism (n) vs stability, sluggishness, composure.

  18. The g dimension quite often fuses with Openness/Tender-mindedness to yield a factor that is usually called Intellectance. See e.g. Deary & Matthews 1993; Brand 1984,1994a,1994b.

  19. That is to say that, despite the best intentions of both critics of g and of defenders who would deem it wise to admit some non-g variance, there is simply not a single psychologist or publishing house in the 1990's issuing mental tests that are at once (1) reliable, (2) of proven predictive power for a range of important human achievements, and (3) uncorrelated with g when given to representative samples of the population. Such is the extent of the calamity for 'disunitarian' theorists wishing there were a wide variety of abilities so that all could, by happy chance, be good at something. Full modern evidence for the overwhelming paramountcy of the g factor is set out by Carroll (1993) (and summarised by Brand, 1993). Carroll admittedly talks of there being seven second-order ability factors that are distinguishable once a third-order g factor is removed. However, (i) along with gvisual, gauditory, gspeed, gidea-production, and gmemory, Carroll's seven factors include gf and gc which plainly are not generally independent and thus cast great doubt on the independence of the other five; (ii) Carroll's seven show no more correspondence with the schemes of other psychologists than would be expected from the five ability contrasts selected in this Chapter. Carroll's scheme has a family resemblance with those of R.B.Ekstrom and J. Horn (see Kline, 1992); but the schemes of Ekstrom and Horn also suffer classically from a tendency to claim as independent and distinguishable factors that are well known for the ease with which they are found to correlate in studies involving the full range of the general population. (For example, Horn & Noll (1994, p.189) claim factors of gf and gc to correlate at only .19 "based on 154 7-year-olds"; and they suppose that gf and gc "stem from different genetic determinants, the effects of which can be seen early in development." However, the "7-year-olds" had all been in intensive care as neonates (though Horn & Noll think it "unlikely" that this would have made them atypical of normal development); and Horn and Noll cite virtually no evidence since 1980 to support their disunitarian claims.)

  20. As Baker (1974) once put it: "Among those who allow the existence of semi-specific primary factors or group factors, general agreement has not been reached as to the number of them that should be recognized...."

  21. Gardner's failure has even led him to deny that his theory of 'multiple intelligences' constitutes any definite "set of hypotheses and predictions." Apparently he thinks this exempts him from testing his theory before urging on educational practitioners - even though he would seem to be plainly committed to the eminently testable proposition that there are several major variations in mental abilities that are independent of g . He tries to explain (Gardner, 1994): "multiple intelligences....is an organized framework for configuring an ensemble of data about human cognition in different cultures. I bristle at the notion that educational work in the vein should grind to a halt while some kind of decisive scientific test is carried out." Again, Krechevsky & Gardner (1994,) frankly opt out of the too daunting task of trying to break up g into the promised 'multiple intelligences'. They write (p.302): "Overall, we intend our theory to be an expanding and unifying conception, rather than one which directly confronts or refutes psychological trait and factor analytic approaches." It may prove easier for Gardner to 'disprove' the unity of intelligence by the course which he has sometimes favoured of defining it as encompassing artistic and even athletic ability (Gardner, 1983) - and perhaps throwing in capacities for alcohol consumption and sexual vigour for good measure..

  22. In the USAF programme, 'working memory' turns out to correlate with most other tests (e.g. Kyllonen, 1994, p.314). (Working memory, when the trouble is taken to measure it reliably, is simply a good measure of g - see Chapter II.) Like Guilford, Kyllonen has a taxonomy of intelligence: called the Cognitive Abilities Measurement (CAM) framework, it distinguishes at least 144 types of intelligence (Kyllonen, 1994, p.328). However, "the CAM framework is definitely work-in-progress, rather than a fully articulated "theory" of individual differences in cognition" (p.352). More importantly, empirical evidence for any great independence of the proposed abilities remains to be delivered. (The US Air Force Human Resources Laboratory is today called the Armstrong Laboratory.)

  23. Like Gardner, Sternberg often opts out of any immediate confrontration with London School claims. Apparently, "the goal of triarchic theory is not to replace previous theories of intelligence, but rather, to incorporate them, and particularly, their best aspects" (Sternberg, 1994, p. 378). To some this may seem reasonable enough. But the London School claims that, with representative samples, g accounts for more mental ability variance than all other mental abilities put together; so it is not clear how its central claim could be "incorporated" into Sternberg's theory that "conventional intelligence tests can predict only 5%-10% of the variation in various measures of life adjustment and success." If both Sternberg and the (incorporated) London School are right, some 80%-90% of life (etc.) variance will, on Sternberg's own account, be unexplained. Can it really be worth formulating a grand, incorporative 'theory' to explain a measly and quite arbitrary 15% of life (etc.) variance? If mental abilities (g plus all others) explain so little, should not Sternberg announce some non-mental abilities or other factors with which he would propose to plug the gap? (In fact, Sternberg has greatly underestimated g 's importance in generating life (etc.) variance - especially across the lower half of the IQ range, and using reliable and valid indices of life success. See Chapter IV.)

  24. The general idea that differentiation occurs at higher levels of ability, maturation and social enrichment has a history going back to 1919: for a review see Anastasi, 1970. That 'parallel' IQ-type tests do not give such closely similar results in testees of above-average intelligence was especially remarked by Terman & Merrill (1937, pp. 44-47). It could be that IQ tests are less reliable outwith the ranges for which they were primarily designed (as is suggested by Spitz, 1986, pp.45-53). However, the phenomenon of differentiation occurs in ratings as well as in tests: when three raters estimated the IQ's of eminent and creative men, their IQ estimates were closer for eminent men of mediocre intelligence than for those for whom the average of the three ratings was higher (Cox, 1926, p.54 pull-out supplement). It is just as likely that differentiation of intelligence makes for lower correlations between tests as that some intrinsic unreliabilty accounts for the many observations of differentiation.

  25. People's friends will be similar to themselves in IQ and educational level: thus higher-IQ people will have more experience of people in whom intelligence has differentiated into some specialized forms of intelligence and not into others. Binet & Simon (1908, trnsl. R.E.Fancher) remark: "We are of the opinion that the most valuable applications of our scales will not be for the normal, but instead for the inferior degrees of intelligence." In the USA, Wissler (1901, pp. 54-55) had drawn a similar conclusion that tests of weight discrimination, two-point threshold and colour naming had greater inter-correlation (and thus most to offer as indicators of intelligence) "when applied to children in the lower schools."

  26. i.e. 'differentiated' into different types of mental ability. The idea is that dimensions such as the five non-g -dimensions outlined earlier will emerge more clearly - as distinct from each other and from g - among testees of above-average intelligence.

  27. Researchers of personality do not invariably look for underlying abilities - partly by theoretical choice, and partly because testing abilities is more demanding of subjects. On the other hand, researches of abilities tend to feel there is little point in administering questionnaires when these produce results that are much less reliable and predictive than are ability measures (especially when ability measures tap into g variance, whether by accident or design). Thus, bizarrely, 'intelligence' and 'personality' are conventionally treated as separate domains by most researchers. Over the years, Cattell has provided the one conspicuous exception: his personality questionnaires always include an intelligence scale.

  28. The task is similar to that seen in children's comics, where the child has to find, say, how many 'monkeys' can be detected in a drawing of people on crowded beach. Finding relevant detail embedded in irrelevant material often figured as one of the primary factors found by follower of Thurstone - see Baker, 1974, p.455.

  29. Dempster(1991) summarizes evidence linking field independence to Wisconsin Card Sorting (when testees are required to change sorting principles throughout the test) and to the Stroop task (where testees have to avoid distraction from the colour in which a colour word is printed - e.g. 'blue' printed in red takes longer to read than 'blue' printed in blue).

  30. For the idea that there is a trade-off relation between storage and current processing of information, see Just & Carpenter,1992. Very stiking extravert-introvert differences occur in response to the McCullough Effect (whether subjects easily see phantom colours after viewing black-and-white grids (Logue & Byth, 1993)). These differences apparently reflect differences in the functioning of the cholinergic fibres that are known to be involved in enabling consolidation of memory traces.

  31. For an account of the range of effects in which something like classical conditioning may be involved, see Turkkan, 1989 and Krank, 1989.

  32. In questionnaire research, the five dimensions are currently known by such titles as:
    Openness, affection (a), tender-mindedness
    vs
    realism, cynicism, projected hostility
    Independence, will(w), disagreeableness
    vs
    subduedness, deference.
    Extraversion, energy (e), surgency
    vs
    introversion, gravity, sobriety
    Control, conscientiousness (c)
    vs
    laxity, impulsivity, casualness.
    Emotionality, neuroticism (n)
    vs
    stability, sluggishness, composure

    The g dimension quite often fuses with Openness/Tender-mindedness to yield a factor that is usually called Intellectance. See e.g. Deary & Matthews 1993; Brand 1984,1994a,1994b.
     

  33. That is to say that, despite the best intentions of both critics of g and of defenders who would deem it wise to admit some non-g variance, there is simply not a single psychologist or publishing house in the 1990's issuing mental tests that are at once (1) reliable, (2) of proven predictive power for a range of important human achievements, and (3) uncorrelated with g when given to representative samples of the population. Such is the extent of the calamity for 'disunitarian' theorists wishing there were a wide variety of abilities so that all could, by happy chance, be good at something. Full modern evidence for the overwhelming paramountcy of the g factor is set out by Carroll (1993) (and summarised by Brand, 1993). Carroll admittedly talks of there being seven second-order ability factors that are distinguishable once a third-order g factor is removed. However, (i) along with gvisual, gauditory, gspeed, gidea-production, and gmemory, Carroll's seven factors include gf and gc which plainly are not generally independent and thus cast great doubt on the independence of the other five; (ii) Carroll's seven show no more correspondence with the schemes of other psychologists than would be expected from the five ability contrasts selected in this Chapter. Carroll's scheme has a family resemblance with those of R.B.Ekstrom and J. Horn (see Kline, 1992); but the schemes of Ekstrom and Horn also suffer classically from a tendency to claim as independent and distinguishable factors that are well known for the ease with which they are found to correlate in studies involving the full range of the general population. (For example, Horn & Noll (1994, p.189) claim factors of gf and gc to correlate at only .19 "based on 154 7-year-olds"; and they suppose that gf and gc "stem from different genetic determinants, the effects of which can be seen early in development." However, the "7-year-olds" had all been in intensive care as neonates (though Horn & Noll think it "unlikely" that this would have made them atypical of normal development); and Horn and Noll cite virtually no evidence since 1980 to support their disunitarian claims.)

  34. As Baker (1974) once put it: "Among those who allow the existence of semi-specific primary factors or group factors, general agreement has not been reached as to the number of them that should be recognized...."

  35. Gardner's failure has even led him to deny that his theory of 'multiple intelligences' constitutes any definite "set of hypotheses and predictions." Apparently he thinks this exempts him from testing his theory before urging on educational practitioners - even though he would seem to be plainly committed to the eminently testable proposition that there are several major variations in mental abilities that are independent of g. He tries to explain (Gardner, 1994): "multiple intelligences....is an organized framework for configuring an ensemble of data about human cognition in different cultures. I bristle at the notion that educational work in the vein should grind to a halt while some kind of decisive scientific test is carried out." Again, Krechevsky & Gardner (1994,) frankly opt out of the too daunting task of trying to break up g into the promised 'multiple intelligences'. They write (p.302): "Overall, we intend our theory to be an expanding and unifying conception, rather than one which directly confronts or refutes psychological trait and factor analytic approaches." It may prove easier for Gardner to 'disprove' the unity of intelligence by the course which he has sometimes favoured of defining it as encompassing artistic and even athletic ability (Gardner, 1983) - and perhaps throwing in capacities for alcohol consumption and sexual vigour for good measure..

  36. In the USAF programme, 'working memory' turns out to correlate with most other tests (e.g. Kyllonen, 1994, p.314). (Working memory, when the trouble is taken to measure it reliably, is simply a good measure of g - see Chapter II.) Like Guilford, Kyllonen has a taxonomy of intelligence: called the Cognitive Abilities Measurement (CAM) framework, it distinguishes at least 144 types of intelligence (Kyllonen, 1994, p.328). However, "the CAM framework is definitely work-in-progress, rather than a fully articulated "theory" of individual differences in cognition" (p.352). More importantly, empirical evidence for any great independence of the proposed abilities remains to be delivered. (The US Air Force Human Resources Laboratory is today called the Armstrong Laboratory.)

  37. Like Gardner, Sternberg often opts out of any immediate confrontration with London School claims. Apparently, "the goal of triarchic theory is not to replace previous theories of intelligence, but rather, to incorporate them, and particularly, their best aspects" (Sternberg, 1994, p. 378). To some this may seem reasonable enough. But the London School claims that, with representative samples, g accounts for more mental ability variance than all other mental abilities put together; so it is not clear how its central claim could be "incorporated" into Sternberg's theory that "conventional intelligence tests can predict only 5%-10% of the variation in various measures of life adjustment and success." If both Sternberg and the (incorporated) London School are right, some 80%-90% of life (etc.) variance will, on Sternberg's own account, be unexplained. Can it really be worth formulating a grand, incorporative 'theory' to explain a measly and quite arbitrary 15% of life (etc.) variance? If mental abilities (g plus all others) explain so little, should not Sternberg announce some non-mental abilities or other factors with which he would propose to plug the gap? (In fact, Sternberg has greatly underestimated g 's importance in generating life (etc.) variance - especially across the lower half of the IQ range, and using reliable and valid indices of life success. See Chapter IV.)

  38. The general idea that differentiation occurs at higher levels of ability, maturation and social enrichment has a history going back to 1919: for a review see Anastasi, 1970. That 'parallel' IQ-type tests do not give such closely similar results in testees of above-average intelligence was especially remarked by Terman & Merrill (1937, pp. 44-47). It could be that IQ tests are less reliable outwith the ranges for which they were primarily designed (as is suggested by Spitz, 1986, pp.45-53). However, the phenomenon of differentiation occurs in ratings as well as in tests: when three raters estimated the IQ's of eminent and creative men, their IQ estimates were closer for eminent men of mediocre intelligence than for those for whom the average of the three ratings was higher (Cox, 1926, p.54 pull-out supplement). It is just as likely that differentiation of intelligence makes for lower correlations between tests as that some intrinsic unreliabilty accounts for the many observations of differentiation.

  39. People's friends will be similar to themselves in IQ and educational level: thus higher-IQ people will have more experience of people in whom intelligence has differentiated into some specialized forms of intelligence and not into others. Binet & Simon (1908, trnsl. R.E.Fancher) remark: "We are of the opinion that the most valuable applications of our scales will not be for the normal, but instead for the inferior degrees of intelligence." In the USA, Wissler (1901, pp. 54-55) had drawn a similar conclusion that tests of weight discrimination, two-point threshold and colour naming had greater inter-correlation (and thus most to offer as indicators of intelligence) "when applied to children in the lower schools."

  40. i.e. 'differentiated' into different types of mental ability. The idea is that dimensions such as the five non-g -dimensions outlined earlier will emerge more clearly - as distinct from each other and from g - among testees of above-average intelligence.

 

PreviousContentsPrefaceIntroductionChap. IChap. IIChap. IIIChap. IVEpilogueNext
The g Factor   Christopher BRAND - www.douance.org/qi/brandtgf.htm - REDISTRIBUTION FORBIDDEN
free web hosting | website hosting | Business WebSite Hosting | Free Website Submission | shopping cart | php hosting