Historical Roots of the Global Testing Culture in Education

Contemporary education is characterised by a global testing culture, reflecting the fact that students’ learning outcomes and standards are the focus of policymakers worldwide. It therefore plays a significant role in educational policies in different national contexts. We offer a brief outline of the precursors and preconditions that have facilitated the rise of today’s global testing culture. The article notes two chronological stages: the first encompasses a confluence of comparative education, the rise of applied psychology, and the formation of transnational organisational structures prior to World War II. The second stage features the emergence of international organisations immediately after World War II. We argue that these developments subsequently conflated into a trajectory fostered by Cold War policies and became dominant from the 1990s onwards.

synonymous with accountability which becomes synonymous with education quality'. In other words, the conception of a global testing culture reflects the observation that practices in which rankings, performance indicators, and accountability based on various test results are in evidence across the globe (Rizvi & Lingard, 2010;Lindblad et al., 2015). In parallel, the global testing culture is closely affiliated with what Pasi Sahlberg has called the Global Education Reform movement (GERM). The GERM is an education reform approach that broadly follows the tenets of New Public Management and Neoliberalism. It is structured around a common set of policy ideas including standards-based management, performance evaluation, and accountability (Fuller & Stevenson, 2019).
Education data and their presentation frame and shape the political and public discourse on education: '[International large scale assessments] can be seen as a practice showing what is educationally possible' (Lindblad, Pettersson, & Popkewitz, 2015, p. 39). Furthermore, such assessments also influence the very ideas and ideals, purposes, values, and aims of schooling and teaching (Biesta, 2015). Our concern in this respect is not that national education systems are becoming uniform but that the data not only depict certain empirical findings but also express a normative worldview, which then is embodied in the very system of indicators (Desrosières, 1998;Rose, 1999). The testing culture thus ultimately affects educational access and social mobility, along with the performance of and benefits given to different groups. It also plays a significant role in educational policies and conditions in different national contexts (Allan & Artiles, 2017). This testing culture, however, did not emerge ex nihilo. Its history features a longbut not necessarily coherent -development that can be traced to comparative education's foundation as a research field (Brickman, 1966(Brickman, , 2010, the establishment of international networks and organisations engaged with the field of education (Fuchs, 2007;Lawn, 2008), and the ascent of applied psychology in general and psychometrics in particular (Danziger, 1998). We argue that these precedents -building on an amalgam established mainly during the interwar years -conflated into a unified trajectory fostered by Cold War policies and grew dominant from the 1990s onwards.
To sustain this argument, we briefly outline the precursors, antecedents, and preconditions that facilitated the rise of today's global testing culture in education. The article considers two chronological stages: the first encompasses a confluence of comparative education, the rise of applied psychology, and the transnational organisational structures that began materialising prior to World War II (WWII). The second stage features the emergence of foundations and organisations immediately after WWII, a period concerned with educational measurement and comparisons.

State of the Art: Sharpening our Focus
Significant policy research has been conducted on the functioning of the global testing culture (e.g. Grek, 2009;Meyer & Benavot, 2013;Rubenson, 2008;Smith, 2016). A key insight is that there has been no inevitable policy convergence due to international large-scale assessments. Instead, specific contextual factors seem to influence how test results and policy recommendations are interpreted and adapted for specific national schooling systems (Bieber & Martens, 2011;Carvalho & Costa, 2014). Conversely, comprehensive research (e.g. Grek, 2010;Lawn, 2011;Ozga et al., 2011) argues that experts and international organisations create data that transcends national policy debates, because the data enable cultural exchanges across borders and places, creating a new type of virtual, borderless policy space. This is a core feature of the global testing culture.
While policy studies examine the global testing culture's comparative impact, historical studies investigate its various components. American historiographers have explored how the foundations of contemporary educational testing rest on 19th-century developments (Reese, 2013). A main point of Reese (2013, p. 4) is that educational reformers prior to the American Civil War in 1861 'were the first to rank urban teachers, students, and schools based on quantitative scores, to shame the worst and honor the best'.
The majority of historical studies of educational testing are, as Reese's, tied to a national reference frame, mostly concerned with the North American context. The international -and even transnational -nature of the global testing culture has been addressed in only limited publications. Cardoso and Steiner-Khamsi's (2017) groundbreaking article examines education indicator research during three time periods. Their article is organised around three influential persons and institutions in the history of education indicator research: Jullien de Paris (1775-1848), Teachers College at Columbia University, and the United Nations Educational, Scientific and Cultural Organization (UNESCO) Institute for Statistics. Using these three focus points, the article finds discursive shifts in the policy usage of educational statistics affiliated with the three historical processes of modernisation/nation building, colonisation/development, and standardisation/globalisation. Their article thus describes a core development in comparative education. Although our article lies in the wake of Cardoso and Steiner-Khamsi, it offers a slightly different perspective on the necessary conditions -or core building blocksof the contemporary global testing culture and adds other factors, such as applied psychology and the organisational landscape in the two chronological stages treated here.
In this regard, Lawn's 2008 volume concerning the International Examinations Inquiry (IEI) and 2014 followup article are pivotal. The IEI originated from a 1930s initiative of Columbia University and the Carnegie Corporation. Its purpose was to improve ways of identifying students suitable for secondary school (Hegarty, 2014).
Apart from its focus on examinations, the inquiry also focused on intelligence, one of the most important psychological issues at the time. Lawn (2014, p. 24) demonstrates that the IEI published what could be considered the first data-driven research inquiry into comparative education in nine countries and argues that the IEI formed 'a space in which pupil tests and statistical foundations prefigured the post-war expansion of comparative data in education and its use in governing education'. Lawn (p. 21) also notes that data collection on education began accelerating from the 1930s onwards: 'The growth of cross-border expert engagement in the mid-twentieth century created the basis for the later internationalization of education data and comparison'.
Lawn's work reflects the spatial turn in the history of education (Fuchs, 2014;Popkewitz, 2013), emphasising the importance of a transnational flow of expertise in the workings of global education. Its key components are the move beyond methodological nationalism and an understanding of the dynamics between place and space (Christensen & Ydesen, 2015;Lawn, 2014). As Nordin and Sundberg (2014, p. 15) state, education is 'transnational and national at the same time', meaning that place is understood as the setting or location, while space is where interaction, confluence, and exchanges happen.
While this research certainly transcends national reference frames and offers interesting findings relating to the workings and impacts of these organisations and their roles in education, it is largely limited to specific decades. This observation also holds for Lawn's studies. However, we determine from these -and Barnett and Finnemore (2004) -that international organisations have great autonomy and significant power in shaping education globally.
Given these historiographies, this article offers a longterm perspective on 20th-century education history to enhance our understanding of the rise of the global testing culture. Although the article paints with a broad brush, the analysis contributes knowledge about recurring themes, perspectives, continuities, and ruptures in the history of the global testing culture in education.

Antecedents of the Global Testing Culture: Before WWII
This section focuses on the confluence of comparative education, the rise of applied psychology, and the organisational structures that began forming before WWII.

Comparative Education
The rise of comparative education as an academic field has a long history and constitutes a necessary condition for the contemporary global testing culture, even though comparativists could consider it an inadvertent development.
There are three main reasons for this connection. First, from its outset, comparative education instituted a comparative mindset -a logic based on the measurement, qualitative or quantitative, of one education system against another -with the aim of learning from comparisons to improve a given system (Cardoso & Steiner-Khamsi, 2017). From a historical perspective, the roots of such comparative studies can be traced far back, such as this early example from Friedrich August Hecht's 1795 book De re scholastica anglia cum germanica comparata, which compares schools in England with those in German states (Petterson et al., 2015). Hecht succinctly expresses the later famous quotation of Sir Michael Ernest Sadler (1861Sadler ( -1943: 'What can we learn from the study of foreign systems?' (Bereday, 1964). Comparative education has also manifested itself in other practices, such as exhibitions and fairs, which became recurring events in the second half of the 19th century (Lundahl, 2016;Lundahl & Lawn, 2015). These exhibitions promoted the application of a comparative logic among national education systems and, as Sobe and Boven (2014) argue, 'international expositions allowed for educational systems and practices to be "audited" by lay and expert audiences'. Remember that, in the 19th century, the words examination and exhibition were often used synonymously (Reese, 2013, p. 2).
Second, comparative education was historically permeated by a distinct colonial discourse rooted in civilisation theory. It is therefore a Eurocentric approach to education, with a global outlook aimed at elevating the Third World. This strand in comparative education is still generally evident in the Programme for International Student Assessment (PISA) and the offshoot PISA for Development, both designed according to standards defined by the Global North (Cardoso & Steiner-Khamsi, 2017). Teachers College, Colombia University was a central hub for the expansion of American colonialism in education (Takayama, Sriprakash, & Connell, 2017). The point is that comparative education has often operated with hierarchisations in education systems and with varied notions about the best working practices in education.
Third, comparative education has been concerned with developing and refining an arsenal of methodologies and vocabularies for scientific and valid comparisons among education systems (Beech, 2006;Schriewer, 2012;Steiner-Khamsi, 2002) -note the concepts of juxtaposition, tertium comparationis, decontextualisation, borrowing, silent borrowing, and transferring, as well as the entire array of quantitative and statistical tools assembled to measure and sanctify the results (Bereday, 1967). As Cardoso and Steiner-Khamsi (2017, p. 401) state, 'the use of indicators makes educational systems comparable regardless of how different they are'.

Applied Psychology
Science and cooperation among its practitioners in different national contexts represent another backdrop for the birth of comparative practice within education. This pertains especially to psychology as both a science and a scientific field, which, since its earliest days, has been characterised by transnational cooperation and inspiration and the exchange of research results and theories (e.g. Hearnshaw, 1979).
Interestingly, educational testing appeared on the educational scene in most Western countries around the same time. Intelligence testing, for example, originated in Paris and travelled to California, Hamburg, New York, London, Edinburgh, and the rest of the world (Ydesen, 2011). Scientific standardisation was essential to this movement, since it enabled people to work across borders (Grek et al., 2009).
Due to the endeavours of psychologists to have psychology recognised and established as a 'real science' and academic field, some practitioners in this area adapted themselves to and were strongly influenced by the positivist paradigm dominating the late 1800s and early 1900s. Several education scholars committed themselves to research following positivist ideas -for instance, conducting controlled experiments or different tests to compare the results -and characterised by attempts to identify what could be considered general human traits, such as intelligence (Danziger, 1998).
For such purposes, standardised testing was developed and soon became common as both a tool and a technology.
These ideas and trends went on to influence the realm of education. They entered this field through applied psychology and psychologists' questions related to education, for instance, as in experimental pedagogy -which had been founded circa 1900 (e.g. Claparède, 1911) -along with research on intelligence during the same period. These new theories and ideas were soon disseminated via publications and activities in associations and organisations, inspiring the practices of pedagogues, educational psychologists, and other professionals and academics throughout most of the Western world.
The rise of applied psychology in the interwar years was closely affiliated with the progressive educational movement. Many leading testing protagonists were members of and worked actively in such progressive education organisations as the New Education Fellowship (NEF; Ydesen, 2011). The progressive education movement at large was the standardbearer of a humanistic line of thought aimed at emancipating the child from its surrounding society, allowing it to develop freely. Conversely, testing protagonists were stimulated by an experimental scientific line aimed at disclosing the nature of the child and accommodating the educational system according to these findings to maximise society's perceived benefits. The common denominator between the wider audience of progressive educators and the testing protagonists was a critical attitude towards teachers' traditional examinations, which they considered a subjective evaluation tool, and an optimistic view of testing as a just and efficient differentiation tool compatible with meritocratic ideals (ibid.). Nonetheless, the testing protagonists tended to view pedagogy as merely applied psychology. Today, in the global testing culture, we are witnessing a similar reductionistic mechanism, in that education is transformed into learning and learning goals, given that learning is transformed into measurable performance according to such goals, while measurable performance is transformed into testing.

Organisational Structures
In terms of organisational structures, the NEF formed a space in which new progressive ideas could flourish, including notions about the benefits of mental tests. In August 1929, the NEF held its largest conference in Denmark, with around 2,000 participants from 43 nations (Fuchs, 2004). The conference was very important in the international educational field and its report states, 'It is no exaggeration to say that this book contains the truest account available anywhere of the various currents of progressive educational thought in the world at this critical time' (Sadler, 1930, p. xi).
A remarkable feature of the NEF conference was the firsttime inclusion of a conference group titled 'Mental Tests' (Ydesen, 2011, p. 83).
The IBE also constitutes an interesting organisation. Drawing on the work of Rasmussen (2001), Hofstetter and Schneuwly (2013) argue that the IBE represents the transnational turn in the early 20th century. The IBE assigned itself the task of creating a platform to rally the numerous organisations at work worldwide that promoted intellectual cooperation, international solidarity, and educational renewal. Comparative education was upheld as the model discipline and its purpose was to 'bring together diversity and not to reduce it to unity' (ibid., p. 225).
These transnational organisations significantly promoted and inspired work with educational experimentation and crossborder initiatives. Numerous experiments were conducted across educational systems during the interwar and postwar periods, promoting a comparison mindset, even for those working in classroom settings.

Internationalisation of Education after WWII
The internationalisation of education prior to WWII was supported by different kinds The World Bank has also played a role in shaping a global education space (Heyneman, 2003;Jones, 1992). For our purposes, however, we find that its economic approach to education is broadly covered by our discussion of other organisations. In the 1950s, the systematic collection of educational statistics was thus seen as an activity UNESCO could manage and that generally and severally supported the collection of information about education systems, schools, and outcomes, including student performance. Additionally, the use of standardised testing played a central role in supporting data collection of a presumed comparative nature (Smyth, 2005). UNESCO (1949, p. 14) had a robust interest in the development of compulsory education systems and one of the first tasks assigned to the clearinghouse in 1950, in cooperation with the IBE, was to launch a study concerning 'problems involved in making free compulsory primary education more nearly universal and of longer duration throughout the world'.
In 1952, the UNESCO Institute for Education, originally focusing on comparative education, was founded (Elfert, 2015;Landsheere, 1997). Several conferences were held under its auspices during the 1950s. The institute hosted meetings for educational researchers where participants discussed such matters as measurement in education in general, evaluation, and problems related to examinations in educational systems. The meetings were attended by prominent researchers then dominating the field, such as

Swedish psychologist Torsten Husén and American educational psychologist Benjamin
Bloom. The attendees shared an interest in crossnational -and thus comparativeresearch within education and attempted to use comparative research to address various educational problems. For instance, individual countries were considered too small and homogeneous to explain differences in school performance (Landahl, 2017).
These meetings nurtured ideas on how to conduct large comparative international surveys, the first attempt initiated in the late 1950s with a pilot study called the 'Twelve Country Study' (Keeves, 2011;Landsheere, 1997). The project was successful and formation of the IEA was initiated soon after, with, among others, Husén and Danish psychometrician Georg Rasch as important contributors (Keeves, 2011).

Seeking an Evidence-Based and Efficient Pedagogy
The interest in improving pedagogy and supporting efficiency in education soon created a new and dominating practice within some areas of educational research.
Researchers from different scientific areas -such as educational psychology, comparative education, intelligence testing, and the statistics of education -found a common interest in attempts to improve basic teaching and students' performance.
The new technologies to assess and conduct surveys facilitated the collection, analysis, and comparison of large datasets across national education systems. These early international largescale assessments were also important tools paving the way for new attempts to improve educational systems and identify so-called 'best practices' and efficient pedagogy understood and identified based on test results. as well pupils in some of the other Nordic countries, can be seen as forming the early background for the implementation of such a testing practice, by changing the predominant understanding of Danish pupils as being skilled readers (Andreasen, Kelly, Kousholt, McNess, & Ydesen, 2015;Gustafsson, 2012).
Another addition to these comparative endeavours was the school effectiveness movement appearing in the late 1970s that focused on 'effective schools' and worked to identify best practices in pedagogy and school leadership. The movement can be viewed as paralleling the IEA, since it was based on similar ideas (Goldstein & Woodhouse, 2000;Townsend, 2007). The movement manifested itself as a formal organisation in 1988, with the International Congress for School Effectiveness and Improvement, which published a journal and convened an annual congress. Its focus has been on identifying 'effective teaching and leadership' using a variety of international surveys. The movement has gained a strong footing in some countries via such reports as 'Exceptional Effectiveness: Taking a Comparative Perspective on Educational Performance' (Harris & Hargreaves, 2015). The IEA and the school effectiveness movement can be categorised as the promoters of an influential what works-best practice-evidence based policy paradigm popular in contemporary education policy (Connell, 2013). Thus, a picture emerges of certain international organisations serving as arbiters of a positivist statistical agenda in education policy.
UNESCO's reasons for launching new initiatives were largely informed by its aims to expand and strengthen compulsory education for purposes aligned with offering development, extending modern citizens' skills, and promoting international understanding (Boel, 2016). Yet another player would enter the educational arena in the 1960s in support of the what works paradigm noted above: the OECD, a highly influential organisation that also heavily promoted international comparisons across national school systems.
For decades, the OECD has promoted a vision of education as one of providing human capital to improve the economies of nationstates (Papadopoulos, 2011;Tröhler, 2010). While the OECD is essentially an economic organisation, education appeared on the OEEC agenda in 1958 due to the Soviet Sputnik satellite launch the previous year (Kogan, 1979;Tröhler, 2010). Education gradually came to play a defining role in understanding the economic capabilities and potential of nationstates (Petterson, 2014;Ydesen, 2013). Since then, the OECD has developed into one of the most powerful agencies in terms of shaping a global education space, because of its country reviews, test programmes, and reports (Bürgi, 2012;Grek, 2009;Martens, 2007;Moutsios, 2009 to the OECD, for example, on teacher-student ratios, factors affecting student choice in education programmes, and progress reports on educational investment planning, the counsellor was tasked with advising central and local authorities about educational investment planning. 2 The EIP considers that education must employ more effective planning processes using the latest quantitative methods to optimise its results regarding economic growth and thus win the technology race against the Eastern bloc. In 1968, to strengthen its focus and initiatives concerning educational improvements, the OECD founded the Centre for Educational Research and Innovation (CERI).

Conclusion
The global testing culture dominating current educational policies and practices worldwide has a lengthy and fascinating pedigree, as we described. The historical developments presented represent a necessary but not sufficient conditions for the rise of the global testing culture, that is, they should be considered stepping stones for the contemporary workings of global education. The processes leading to the global testing culture's formation include developments and practices from numerous scientific and political areas. Some seem to have merged over time, even given different origins, along with differing and even conflicting purposes at points.
The years before WWII witnessed the first steps in the formation of a new comparative practice in educational research. Inspired by ideas from experimental pedagogy and developments within psychology -including the rise of mental testing -and driven by efforts to improve educational systems as well as a common and more general interest in educational research, such initiatives gained a new platform and were 2 Danish National Archives, Ministry of Education, International Office, 1959-1970Cases Concerning International Organisations, OE 2 1963-4 1963, General Memorandum, 9 November 1964 The process has been dominated by organisations such as UNESCO, the IEA, and the OECD, even though they have supported such activities for differing reasons and purposes, with UNESCO and the IEA focusing on improving pedagogy and identifying best practices, in contrast to the OECD, which pursues a clearly defined economic policy agenda.
Before the 1990s, international comparative assessments in education were primarily initiated and administered by such nongovernmental organisations as the IEA; however, since the 1990s, the OECD also adapted and launched such assessments. The OECD's wellestablished authority conveys high status in member as well as nonmember countries, which strengthens the impact of both the processes and results.
The comparative turn in global education policy advocated and promoted by the OECD must be understood in light of crossnational comparison being considered the best engine to promote educational quality (Martens, 2007). Note, however, that this observation entails a shift from research to policy (Wagemaker, 2013), as well as a shift in focus from pedagogic practice to academic performance. In other words, the OECD has pursued a path of identifying best practices designed to improve education systems around the world by using comparisons and through the development of various monitoring tools. This activity has often been accomplished in close conjunction with the European Commission engaged in the mutual identification of educational problems (Grek, 2010).
The global testing culture has been strongly criticised for its influence on school systems and pedagogy. Its core features are stronger emphasis on national and international comparisons, student performance, and the control of education -for instance, learning goals and corresponding assessments and standardised testing at the national level. These methods have been criticised for sacrificing a focus on pedagogy and Bildung, whose success is more difficult to assess (Biesta, 2015). In addition, the global testing culture tends to strongly influence what is considered normal and leaves less room for deviations therefrom. Consequently, cultural and/or language minorities are at risk of discrimination in these processes (Andreasen & Kousholt, 2018).
Recently, critical voices have spoken out against not only these processes but also the organisations orchestrating them -the OECD, PISA, the IEA -and their political influence in member and even non-member states.
One point of criticism addresses the data and information generated and distributed: the underlying conditions of statistics are difficult to determine. Even though skilled educational statisticians have strongly criticised conclusions drawn from the data, they seem to have little influence (e.g. Kreiner & Christensen, 2014). Another point of contention highlights the conflict between democratic ideals and governance guided by comparative statistics. Organisations such as the OECD are political by nature but their influence on education in both member and non-member states has become increasingly direct (Lewis, 2017). Such direct influence compromises and threatens democracy and democratic processes but explains the recent uniform developments of educational systems. For instance, representatives to PISA's governing board are appointed by each member country (OECD, 2017), such that individuals serving in such a capacity are not democratically accountable. Such problematics could not have been predicted at the outset of these processes but, given their gravity, they must be paid careful attention in the future.