Objective evaluations are essential to improving physical education (PE) policy and practice, and the System for Observing Fitness Instruction Time (SOFIT) is a valid and reliable tool designed to reach this end. This review assesses peer-reviewed studies that used SOFIT to describe preK-12 PE in international schools. Methods were informed by Preferred Reporting Items for Systematic Reviews (PRISMA) and articles were located by searching nine library databases and Google Scholar. A total of 739 records were located, 567 were screened, and 29 full-text articles were scrutinized. Data extraction was conducted to evaluate the characteristics of the 29 studies and to synthesize commonly reported SOFIT variables. The studies, conducted on 5 continents, included direct observations of 2703 lessons in 348 schools taught by more than 600 teachers in 10 different countries. There was substantial variability in study characteristics, how results were reported, and in study outcomes. All studies assessed physical activity (PA) and 90% (n = 26) assessed both PA and lesson context. More than two-thirds of the studies (69%; n = 20) assessed PA, lesson context, and teacher behavior. A common goal of the reviewed studies was to describe PE using SOFIT, however, researcher modifications to the established protocol and variability in how results were reported limited data syntheses and generalizations. As SOFIT is widely endorsed for assessing PE policies and practices, researchers could improve the generalizability of their study findings by adhering to the standard SOFIT protocol and by reporting results in a consistent manner.
The World Health Organization (WHO) recommends that children and adolescents engage in at least 60 minutes of moderate to vigorous physical activity (MVPA) daily that includes muscle and bone strengthening activities at least three times per week (WHO, 2011) . Unfortunately, more than 80% of adolescents do not meet the guidelines (WHO, 2011) and increasing physical activity (PA) among school-age children is a global priority (WHO, 2018a) .
The consequences of physical inactivity are severe as sedentary living is associated with numerous health conditions. Physical inactivity is associated with increased risk for overweight and obesity and the consequences become apparent at a young age. The World Health Organization (WHO), for example, has indicated the prevalence of obesity worldwide has tripled since the onset of the obesity crisis in the 1970’s and that millions of children worldwide are already overweight or obese by age five (WHO, 2018b) .
There is global consensus that physical education (PE) is an essential program within preK to grade 12 (preK-12) schools, largely because of its potential to increase PA and play an important role in obesity prevention (UNESCO, 2015) . Schools reach nearly all children and most countries have established recommendations for PE that recognize the importance of engaging students in health-enhancing MVPA during PE in order to develop student physical fitness and motor skills and to promote the engagement of lifetime PA (Hardman, 2014) .
Although key stakeholders recognize that quality PE programs are a worthwhile public health investment, numerous barriers impact both the quantity and quality of PE, including limited schedules, inadequately trained teachers, lack of curricular resources, and insufficient equipment and facilities (McKenzie & Lounsbery, 2009) . Assessing how PE is conducted is an important step in overcoming these barriers.
Global efforts to evaluate children’s PA and the quality of PE and other school-based PA opportunities are currently underway (Hardman, 2014; Tremblay et al., 2016) . The Active Healthy Kids Global Alliance, for example, recently published Report Cards on PA for international schools from 38 countries located on 6 continents (Tremblay et al., 2016) . As well, in 2013 the United Nations Educational, Scientific and Cultural Organization (UNESCO) published the results of a worldwide survey of PE administered in 232 countries (Hardman, 2014) . These efforts demonstrate a commitment to monitoring PE and improving its quality worldwide; experts acknowledge, however, that current data are limited, partly because objective assessment tools have not been widely adopted (Hardman, 2014; Tremblay et al., 2016) .
The System for Observing Fitness Instruction Time (SOFIT) is a valid and reliable instrument for objectively assessing PE programs (McKenzie, 2012; McKenzie, Sallis, & Nader, 1991a; McKenzie & Smith, 2017) . SOFIT provides objective and contextually rich-data on the conduct of PE lessons and has been widely used. Observers are trained to use SOFIT via a standardized observation protocol that includes video segments for both instruction and assessment. Momentary time sampling methods (i.e., 10 seconds observe; 10 seconds record) are employed to simultaneously code student PA levels (i.e., lying down, sitting, standing, walking/moderate, vigorous), lesson context (i.e., how lesson time is being spent―management, knowledge, fitness, skill development, game play, free time), and teacher behavior (i.e., time spent promoting fitness, demonstrating fitness, instructing generally, managing, observing, or doing other tasks) or teacher interactions (i.e., instances of promoting “in-class” or “out-of-class” PA). Observers also record lesson start and end times, lesson location, target student gender, teacher gender, grade level, and the number of boys and girls engaged in the lesson.
SOFIT student activity codes have been validated using a variety of methods, including heart rate monitoring, accelerometry, and pedometry (McKenzie et al., 1991a; Ridgers, Stratton, & McKenzie, 2010; McNamee & van der Mars, 2005) . The validity of the contextual and behavioral categories is also well-established, with studies consistently reporting significant relationships between student PA levels, how lesson time is allocated, and how teachers spend their time and interact with students (McKenzie, et al., 1991a; McKenzie, Sallis, & Nader, 1991b; McKenzie et al., 1995; McKenzie, Marshall, Sallis, & Conway, 2000; Smith, Monnat, & Lounsbery, 2015) . A recent review of SOFIT studies conducted in the US found consistently high inter-observer agreement (i.e., reliabilities > 85%) (McKenzie & Smith, 2017) .
The current investigation reviews studies that used SOFIT to assess PE in preK-12 schools located outside of the US Specifically, our objectives are to describe the characteristics of international SOFIT studies and to quantitatively synthesize results for the SOFIT main variables (i.e., student PA levels, lesson context, teacher behavior) and two other commonly reported variables--class size and lesson length.
SOFIT has been widely used to assess PE internationally, and this investigation complements a review of SOFIT studies published in the US between 1991-2016 (McKenzie & Smith, 2017) . This review increases awareness about research findings from studies that have utilized SOFIT to describe PA, lesson contexts, and teacher promotion of PA in international settings. The findings have important implications for public health stakeholders, teacher preparation programs, and researchers. Foremost, the findings increase awareness about the potential of PE to increase PA internationally. This is important because of the need to obtain objective evidence about opportunities for children and adolescents to accrue health-related PA. The SOFIT data specifically shed light on how teachers allocate lesson time and interact with students during PE. These factors have important implications for designing professional development for current and future teachers. Finally, this review identifies the strengths and limitations of existing international SOFIT studies and should lead to improving the data collection methods and the reporting of results in future studies. As well, because SOFIT has been recommended for surveillance (McKenzie & Smith, 2017; IOM, 2013) , our data summaries for student activity, lesson context, teacher behavior, class size, and lesson length contribute to efforts to monitor PE globally (WHO, 2018a; UNESCO, 2015; Hardman, 2014) .
Based on the recommendations of Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; see
To be included in the review, studies had to: 1) use the standard SOFIT protocol; 2) describe PE lessons taught in typical preK-12 schools located outside of the US; and 3) be published in English in a peer-reviewed journal between 1991-2017.
We searched nine databases for full-text, peer-reviewed research articles using the terms “physical education” OR “PE” AND “System for Observing Fitness Instruction Time” OR “SOFIT” AND “lesson context.” The databases were: 1) Academic Search Ultimate; 2) CINAHL Plus with Full Text (EBSCO); 3) Education Research Complete (EBSCO); 4) PsycINFO; 5) SPORTDiscus with full text (EBSCO); 6) Physical Education Index (ProQuest); 7) PubMed; 8) Science Direct (Elsevier); and 9) Web of Science. As well, we searched the reference lists of selected papers and used Google Scholar to locate additional relevant papers.
All authors played a role in the process. The first author was responsible for initial data extraction with help from the third author and two student assistants. The first and third authors reviewed full-texts independently, and in the rare case of a disagreement, the second author arbitrated final decisions. The study characteristics extracted from the 29 papers that met initial inclusion criteria included: 1) author; 2) publication year [1991-2017]; 3) country; 4) study design [intervention/descriptive]; 5) study aims; 6) sample size [i.e., schools, lessons, teachers, classes]; 7) reliability [i.e., certification of observers prior to data collection and the maintenance of reliability throughout the study]; 8) main SOFIT categories [i.e., student PA levels, lesson context, teacher behavior, and teacher interaction]; and 9) analyses of other selected variables [e.g., student gender, teacher preparation, lesson location, PE dosage, energy expenditure, interaction between lesson context and MVPA, class size, and lesson length] (
First Author (Yr.) State/Region Study Design (D, I)1 | Description/Design/Sample | Observer Reliabilities Identified2 | Main SOFIT Categories3 | Analysis4 | Other5 |
---|---|---|---|---|---|
Schools (S); Lessons (L); Teachers (T); Classes (C) | CR; FR | PA; LC; TB1; TB2 | SG; TP; LL | PED; T; EE; I; CS | |
Preschools, (n = 2) | |||||
Chow et al. (2015) Hong Kong (D) | Children’s PA and associated variables Cross-sectional S = 4; L = 90; T = 25; C = 23 | CR6; FR | PA; LC; TB1 | SG; LL | PED; T; EE; I; CS |
Van Cauwenberghe (2012) Belgium (D) | PA levels and association with LC, and TB Cross-sectional S = 35; L = 35; T = 35; C = 35 | CR6; FR7 | PA; LC; TB1 | SG; TP | PED9; T; I; CS |
Elementary Schools, (n = 16) | |||||
Barnett et al. (2002) Australia (D) | How active are rural children in Australian PE? Non-experimental S = 18; L = 231 | CR6; FR | PA; LC | SG | T; I |
Cardon et al. (2004) Belgium (D) | Physical activity levels in ES PE: Swimming vs non-swimming classes S = 16; L = 78; C = 39 | CR6; FR7 | PA | TP8; LL | PED9; T |
Chow et al. (2008) Hong Kong (D) | Children’s PA and environmental influences during PE S = 42; L = 368; T = 105; C = 126 | CR6; FR | PA; LC; TB1 | TP8; LL | PED9; T; I; CS |
da Cunha et al. (2016) Brazil (I) | Effect of educational program on children’s energy expenditure during PE Randomized control trial S = 8; L = 79; T = 53;C = 48 | FR7 | PA; LC; TB1 | TP | T; EE |
Da Costa et al. (2016) Brazil (D) | PA and LC in PE Cross-sectional sub-analyses S = 5; L = 12; T = 12; C = 12 | PA; LC | LL | PED9; T; I; CS | |
Gharib et al. (2015) Mexico (D) | Influence of LC and TB on PA Cross-sectional S = 20; L = 58; T = 58; C = 58 | PA; LC; TB1 | SG | PED; T; I | |
Hall-Lopez et al. (2017) Mexico (D) | MVPA during outdoor PE and recess Cross-sectional comparative analysis S = 23; L = 63; T = 63; C = 63 | CR6; FR | PA; LC | T | |
Jennings-Aburto et al. (2009) Mexico (D) | PA during the school day in public primary schools in Mexico City Cross-sectional S = 12; L = 26; T = 12; C = 12 | CR6; FR6 | PA; LC | SG | PED; T; I |
Miller et al. (2016) Australia (I) | Efficacy of a game-centered approach on MVPA in PE Randomized control trial S = 1; L = 30;T = 4; C = 4 | CR6 | PA | SG; TP | |
Powell et al. (2016) United Kingdom (I) | Effectiveness of the SHARP Principles Model on MVPA Quasi-experimental, non-equivalent groups S = 2; L = 28; T = 15;C = 4 | CR6; FR6 | PA; LC; TB2 | TP | CS |
Safdie et al. (2013) Mexico (I) | Impact of school-based intervention on obesity risk factors in Mexican children Randomized control trial S = 27; L = 60; T = 38; C = 38 | PA | PED9 | ||
---|---|---|---|---|---|
Sheehan (2015) Canada (D) | Assessment of MVPA during PE taught by PE specialist Cross-sectional S = 1; L = 54; T = 2; C = 14 | PA; LC; TB1 | SG; TP8 | I | |
Telford et al. (2016) Australia (I) | Four-year specialist-taught PE and PA: LOOK Cluster randomized control trial S = 29; L = 193; C = 68 | CR6 | PA; LC; TB1 | SG; TP | PED; T |
Usher et al. (2016) Australia (D) | PA levels during PE Cross-sectional S = 10; L = 30; T = 10; C = 10 | CR6 | PA; LC; TB2 | TP8 | PED |
van Beurden et al. (2003) Australia (I) | Evaluation of “Move it Groove it” Quasi-experimental S = 18; L = 465 | CR6; FR | PA; LC | SG; TP | T; I |
Verstraete et al. (2007) Belgium (I) | Effectiveness of a two-year health-related PE intervention Quasi-experimental pretest-posttest design S = 16; L = 78; T = 16; C = 39 | CR; FR7 | PA; LC | SG; TP8 | PED9; I |
Secondary Schools, (n = 10) | |||||
Chow et al. (2009) Hong Kong (D) | PA and environmental influences during secondary PE Cross-sectional S = 30; L = 238; T = 65; C = 123 | CR6; FR | PA; LC; TB1 | SG; TP8; LL | PED; T; I; CS |
Curtner-Smith et al. (1995) SW England (D) | PA during diverse activities in a health-related fitness PE program S = 5; L = 40; T = 20;C = 40 | CR; FR7 | PA; LC; TB1 | TP8 | T; CS |
Curtner-Smith et al. (1996) SW England (D) | PE during summer term in one English town Cross sectional S = 5; L = 40; T = 20; C = 40 | CR; FR7 | PA; LC; TB1 | TP8 | T; CS |
Dudley (2012) Australia (D) | Changes in PA, LC, and TB during PE Longitudinal cross-sectional descriptive S = 6; L = 132; C = 48 | CR6; FR | PA; LC; TB2 | T; CS | |
Dudley (2012) Australia (D) | PA levels and movement skill instruction in PE Baseline cross-sectional S = 6; L = 81; C = 27 | CR6; FR | PA; LC; TB2 | SG | T; I; CS |
Fairclough & Stratton (2006) England (I) | Effects of a PE intervention to improve student PA Quasi-experimental S = 1; L = 12; T = 2; C = 2 | CR6; FR7 | PA; LC; TB1 | SG; TP8 | T; EE; CS |
Marques et al. (2017) Portugal (D) | Description of PE in one secondary school Cross-sectional S = 1; L = 30; T = 10 | FR7 | PA; LC; TB1 | PED9; T | |
Mersh & Fairclough (2010) England (D) | PA, LC, and TB in one middle school Case study S = 1; L = 30; T = 2; C = 2 | CR | PA; LC; TB1 | SG; TP8 | PED; T; I |
Smith et al. (2015) East of England (I) | PA of boys and girls during invasion games: Direct instruction vs. tactical games Quasi-experimental pre-test―post-test design S = 2; L = 48; T = 4; C = 4 | CR6; FR | PA; LC; TB1 | SG; TP | T |
---|---|---|---|---|---|
Sutherland et al. (2016) Australia (D) | PA, LC, and TB in PE Cross-sectional descriptive S = 10; L = 100 | CR6; FR | PA; LC; TB2 | SG; TP; LL | T; CS |
Combined Schools, (n = 1) | |||||
Santa Maria et al. (2010) Argentina (D) | Energy expenditure in PE in private elementary schools L = 55; C = 55 | CR6 | PA; LC; TB1 | SG | T; EE; CS |
Notes: 1Descriptive (D); Intervention (I); 2Certification Reliability (CR), Field Reliability (FR); 3Physical Activity (PA), Lesson Context (LC), Teacher Behavior (TB1), Teacher PA Promotion (TB2); 4Student Gender (SG), Teacher Preparation (TP), Lesson Location (LL); 5PE Dosage (PED); Lesson Length (T); Estimated Energy Expenditure (EE); Interaction between lesson context and physical activity (I), Class Size (CS), 6data not shown; 7video or audio recorded; 8All lessons taught by PE Specialists;9Not measured objectively.
We limited quantitative data syntheses to the main SOFIT variables (i.e., student PA levels, lesson context, teacher behavior) and two other commonly reported variables (class size and lesson length). Mean scores, standard deviation values, and sample size were extracted using an Excel tool. The range of mean scores was determined by sorting data from low to high values for each variable. Lower and upper values for the 95th confidence interval were estimated for MVPA%, a measure of PA intensity during lessons, using the formula:
μ = M ± t (sM)
(Stangroom, 2018) . Excel for Mac version 15.30 was used to compute the median, first and third quartiles, and interquartile range.
Variable | St1 | L2 | Range of Means | Median | Q1 | Q3 | IQR |
---|---|---|---|---|---|---|---|
PHYSICAL ACTIVITY | 12 | 1170 | |||||
Lying Down | 12 | 1170 | 0.0 - 3.0 | 0.3 | 0.37 | .74 | .37 |
Sitting | 12 | 1170 | 10.9 - 31.7 | 18.2 | 12.7 | 27.2 | 14.5 |
Standing | 12 | 1170 | 16.3 - 65.8 | 37.4 | 29.4 | 42.2 | 12.8 |
Walking/moderate | 12 | 1170 | 4.4 - 39.1 | 28.5 | 18.6 | 33.4 | 14.8 |
Vigorous | 12 | 1170 | 9.0 - 23.8 | 18.8 | 13.4 | 21.5 | 8.1 |
MVPA | 12 | 1170 | 20.9 - 58.2 | 41.9 | 37.8 | 50.5 | 12.7 |
LESSON CONTEXT | 9 | 1050 | |||||
Management | 9 | 1050 | 14.0 - 30.8 | 19.5 | 17.4 | 22.4 | 5.0 |
Knowledge | 9 | 1050 | 7.1 - 26.3 | 15.2 | 12.4 | 17.7 | 5.3 |
Fitness Activity | 9 | 1050 | 7.1 - 32.5 | 14.5 | 11.3 | 19.8 | 8.5 |
Skill Practice | 9 | 1050 | 5.2 - 43.8 | 16.6 | 12.2 | 34.3 | 22.1 |
Game Play | 9 | 1050 | 5.1 - 31.2 | 12.2 | 11.6 | 31.1 | 19.5 |
Other | 9 | 1050 | 0.0 - 10 | 2.1 | 1.1 | 3.1 | 2.0 |
TEACHER BEHAVIOR | 7 | 841 | |||||
Promotes fitness | 7 | 841 | 0.0 - 21.2 | 9.3 | .22 | 12.0 | 11.8 |
Demonstrates | 6 | 751 | 0.0 - 13.0 | 5.8 | 1.1 | 9.8 | 8.7 |
General Instruction | 7 | 841 | 6.7 - 69.2 | 54.7 | 50.5 | 64.0 | 13.5 |
Class Management | 7 | 841 | 18.1 - 46.5 | 23.2 | 20.9 | 24.0 | 3.1 |
Observe | 7 | 841 | 3.1 - 25.4 | 10.6 | 4.9 | 10.9 | 6.0 |
Other Tasks | 6 | 801 | 0.1 - 0.7 | 0.3 | 0.2 | .55 | 0.4 |
TEACHER INTERACTIONS | 3 | 232 | |||||
IN Class | 3 | 232 | 10.1 - 30.8 | 28.6 | 19.4 | 29.7 | 10.4 |
OUT of Class3 | 1 | 100 | - | - | - | - | - |
None | 3 | 232 | 68.9 - 89.7 | 71.2 | 70.1 | 80.5 | 10.4 |
CLASS SIZE | 4 | 816 | 18.5 - 32.8 | 22.6 | 20.6 | 26.1 | 5.5 |
LESSON LENGTH | 6 | 344 | 19.8 - 50.0 | 39.9 | 36.9 | 43.3 | 6.4 |
Notes: 1Number of studies; 2Number of lessons observed; 3Promotion of out-of-class activity was reported in only one study ( Sutherland et al., 2016 ; Mean intervals = 0.3%; SD = 0.8%).
meeting inclusion criteria. They included the direct observations of 2703 PE lessons that were taught by at least 603 teachers in more than 348 schools.
The studies were conducted in preschool (n = 2), elementary (n = 16), and secondary (n = 10) school settings and one included both elementary and secondary grade levels. Studies took place on five continents [Australia (n = 8), Europe (n = 10), South America (n = 7), Asia (n = 3), and North America (n = 1)]. They included 10 different countries/territories, with most studies taking place in Australia (n = 8), England (n = 5), and Mexico (n = 4).
Twenty studies (69%) were descriptive (D) and nine (31%) were part of an intervention (I). All 29 used the SOFIT PA codes and 26 (90%) also assessed lesson context. More than two-thirds (n = 20; 69%) described all three major categories--PA, lesson context, and teacher behavior. More studies used the original 6-category teacher behavior codes (n = 15; 52%) than the newer 3-category teacher interaction codes (n = 5; 17%).
Twenty-three studies (80%) described how data collectors were certified prior to starting data collection and 20 (69%) described the periodic assessment of observers (i.e., reliability) in the field during the data collection period. Studies consistently reported reliability scores met or exceeded the criteria standard (≥85% agreement; McKenzie, 2012 ) with inter observer agreements ranging between 80% - 90% for each main SOFIT variable (Mode = 85%) with between 84% - 100% for PA, 86% - 100% for lesson context, and 80% - 96% for teacher behavior.
Seventeen studies (59%) examined student gender, including 13 that compared boys and girls within the same lessons and four that investigated differences by class gender composition (i.e., boys-only, girls-only, and co-educational classes). Eight studies (28%) examined differences based on the preparation of teachers, mainly PE specialists vs classroom teachers. Ten studies (34%) described lessons taught only by PE specialists. Six studies (21%) investigated the location of lessons, with most comparing lessons taught indoors vs outdoors. Cardon et al. (2004) , however, compared swimming and non-swimming lessons and Sutherland et al. (2016) compared lessons taught in rural and urban schools.
Twenty-three studies (79%) reported actual (i.e., observed) lesson length and 13 (45%) provided scheduled lesson length. PE dosage (i.e., lesson frequency x lesson length) was reported anecdotally, but not objectively assessed. Thirteen studies (45%) reported the number of boys and girls present in class and 13 described student activity levels during the different lesson contexts. Only four studies (14%) reported estimated student energy expenditure rates (i.e., an overall measure of PA intensity).
Twelve of the 29 studies (41%) met the inclusion criteria for quantitative syntheses of PA. They included two preschool, three elementary, and six secondary school studies and a total of 1,170 lessons (n = 125 preschool; n = 465 elementary; n = 580 secondary) from 170 schools taught by more than 323 teachers (
MVPA. Mean MVPA% was above the median in the two preschool studies (45.8%, 49.9%), but below it in four secondary and one elementary school study.
Analyses for student gender were reported in 6 of the 12 studies that met the criterion for PA% syntheses (data not shown). Boys were typically observed being more physically active than girls, both when compared during coeducational lessons and when class gender composition (i.e., boys-only, girls-only lessons) was considered. For example, students in boys-only classes in Hong Kong secondary schools engaged in MVPA during 38.2% of lesson time compared to 31.8% for students in girls-only classes (Chow, McKenzie, & Louie, 2009) . There was one exception, with Verstraete et al. (2007) reporting no gender differences MVPA% in their elementary school study in Belgium.
Other significant findings related to MVPA% were reported. For example, Van Cauwenberghe et al. (2011) reported students accumulated greater MVPA during lessons taught by early childhood specialists than non-specialists in Belgium preschools. Additionally, Sutherland et al. (2016) found higher MVPA% during Australian secondary school lessons taught by more experienced teachers and in those conducted in urban versus rural schools. Cardon et al. (2004) also reported that MVPA% increased during swimming lessons than in non-swimming lessons in elementary school PE in Belgium (Mean MVPA% = 52% vs. 40%).
Nine studies (n = 31%) met the inclusion criteria for quantitative data syntheses for lesson context. These included 2 preschool, 2 elementary, and 5 secondary school studies for a total of 1,050 lessons (n = 125 preschool; n = 426 elementary; n = 500 secondary) in 150 schools taught by more than 304 teachers (
Noteworthy findings related to skill practice and game play were found for school level and country of origin. Skill practice was the most prevalent lesson context in the two preschool studies (Chow, McKenzie, & Louie, 2015; Van Cauwenberghe, Labarque, Gubbels, DeBourdeaudhuij, & Cardon, 2011) where it averaged 41.7% and 43.8% of lesson time. In comparison, game play was the most prevalent context in four of the five secondary school studies. On average in these four studies, game play ranged between 12.1% and 46.6% of lesson time and skill practice occurred between 5.2% and 16.5% of lessons (data not shown). The exception was the Hong Kong secondary school study (Chow, et al., 2009) which reported students spent 36.5% of lesson time in skill practice and 12.1% of it in game play. Relative to country of origin, skill practice was the most prevalent context in all three Hong Kong studies (Chow, McKenzie, & Louie, 2008; Chow et al., 2009; Chow et al., 2015) , regardless of school level (preschool, elementary, secondary) and game play was the most prevalent context in all three Australian secondary school studies (Dudley, Okely, Cotton, Pearson, & Caputi, 2012a; Dudley, Okely, Pearson, Cotton, & Caputi, 2012b; Sutherland, Campbell, Lubans et al., 2016) .
Only six studies assessed MVPA% during different lesson contexts. Generally, lesson time allocated for fitness activities, skill practice, and game play was positively associated with MVPA%, and time for management and knowledge was negatively associated with it (Chow, et al., 2008; Chow, et al., 2009; Chow, et al., 2015; van Beurden, et al., 2003; Van Cauwenberghe, et al., 2011; Verstraete, 2007) . The Verstraete et al. (2007) study found that involving teachers in a professional development intervention led to them being more efficient in allocating lesson time and this subsequently increased student MVPA%.
Seven studies (n = 24%) met the inclusion criteria for a quantitative syntheses for teacher behavior. These included two preschool, one elementary, and four secondary school studies for a total of 841 lessons (n = 125 preschool, n = 368 elementary, and n = 348 secondary) from 122 schools taught by 280 teachers (
Five studies (17%) described teacher interactions, but only three secondary studies met the inclusion criteria for a quantitative synthesis. These included a total of 232 lessons from 22 schools taught by more than 48 teachers (
Thirteen studies (45%) reported observed class size, but only four (14%), met the criteria for inclusion in a quantitative synthesis. These included one preschool study (n = 125 lessons) and three secondary school studies (n = 318 lessons) for a total of 408 observed lessons in 44 schools taught by 130 teachers (Median = 22.6 students; IQR = 20.6 - 26.1;
Lesson length was described in 23 studies (79%), but only six (21%) met the inclusion criteria for a quantitative synthesis. These included two preschool, one elementary, and three secondary school studies for a total of 344 total lessons (n = 125 preschool; n = 39 elementary; n = 180 secondary) in 75 schools and taught by more than 100 teachers (
Thirteen studies (45%) reported the number of PE minutes scheduled weekly, but only Chow et al. (2015) indicated students (preschool) had PE daily (between 25 - 30 minutes a day). The other 12 studies reported that students were typically scheduled to have PE lessons 1 - 2 days per week (Mode = 2 days per week) that they were between 20 - 120 minutes long (data not shown).
Actual observed lesson length was typically shorter than the scheduled lesson length because of student transitions to the instructional areas. Studies in Hong Kong elementary and secondary schools reported actual observed lessons were from 22% to 27% shorter than their scheduled lengths (Chow et al., 2008; Chow et al., 2009) . In the Cardon et al. (2004) study, mean scheduled time was much longer for swimming lessons than regular lessons (83.0 min; SD = 22.0 min vs. 50.8 mi; SD = 7.1 min; data not shown), however, lesson scheduled length was not significantly associated with the proportion of time that students were engaged in MVPA.
Our purpose was to review SOFIT PE studies conducted in preK-12 schools outside the US. We located 739 records and systematically assessed 29 studies that were conducted in 10 different countries on 5 continents. Data for these studies were obtained via trained observers that used the same SOFIT instrument reliably to directly assess 2703 lessons that were taught by more than 603 teachers in 348 schools. Most of the 29 studies were conducted in elementary and secondary schools, but two involved preschools.
All 29 studies used SOFIT to describe PA, 90% described PA and lesson context, and 69% assessed PA, lesson context, and teacher behavior. Relative to teacher behavior, more studies assessed how teachers spent lesson time generally (i.e., teacher behavior categories, n = 15; 52%) rather than assessing teacher interactions related to promoting PA (teacher interaction, n = 5; 17%). Assessments of teachers promoting PA “in” and “out” of PE lessons are thus limited; as teacher promotion of PA is important, future studies should focus on it.
Although 90% of studies examined both PA and lesson contexts, only 13 (45%) assessed PA levels during the different contexts. Such an analysis requires entering data line-by-line data rather than entering lesson summary scores only. Entering data line-by-line is especially recommended for intervention studies because it will enable a more fine-tuned analysis of how changes in MVPA came about.
Synthesizing the results of studies was challenging because papers often did not always report specific information, such as for sample sizes (i.e., number of schools, teachers, and/or classes), field reliability tests, and standard deviations. Precision in sample size (i.e., number of schools, teachers, and classes) was lacking in numerous studies. Specifically, within the 29 studies where data were synthesized, one paper did not identify the number of schools, six did not identify the number of teachers, and five did not indicate the number of different classes observed. Additionally, it was not always clear if “lessons” and “classes” were distinct or if the terms were synonymous. Accurate and complete reporting of sample sizes is essential for understanding the scope of studies and should be reported consistently (e.g., how many schools were included, how many teachers, and how many distinct classes).
In some cases, the trustworthiness of the data was limited because observer reliabilities were not reported. Reliabilities were reported for 25 of the 29 studies, and the results consistently exceeded the established SOFIT protocol standard (i.e., >85% agreement). Not all studies reported detailed scores for certification and field tests, and subsequently 12 (41%) were excluded from quantitative syntheses because they did not provide sufficient evidence of data reliability throughout the study. Of these, four did not report any reliabilities, seven described reliabilities only during observer training, and one reported a low kappa value (i.e., kappa = 0.091). A strength of SOFIT is that following a standardized protocol makes comparisons among studies possible, but this is only appropriate when the data are trustworthy (i.e., reliable).
Syntheses of lesson length and class size were limited, mainly because standard deviation scores and or reliabilities were not reported. Nearly 80% of studies (n = 23) described actual lesson length and 45% (n = 13) reported class size; however, only six and four studies, respectively, were synthesized, primarily because standard deviation scores were not reported and/or it was not clear if observer reliabilities were maintained. Lesson length and class size have important implications for PE dosage and program quality, and it is important that this information be included in studies. Future reports should also include standard deviation scores and results of reliability assessments.
Only 12 out of 29 studies met the criteria for synthesis of PA% and fewer studies qualified for syntheses of other variables (i.e., lesson context, teacher behavior, teacher interactions, class size, and lesson length). Nonetheless, important findings emerged relative to the variability of study means scores and ranges of means among studies (
Only five of 29 studies met the public health of 50% MVPA. Further, there were differences by student gender, with boys accruing more MVPA than girls. This was found during both coeducational lessons and during boys-only and girls-only lessons. Teachers should strive to achieve the public health goal of 50% MVPA and provide more equitable PA opportunities for boys and girls.
An important finding was the large variability among studies in how time was allocated to the different lesson contexts (
There was also variability in teacher behavior among the studies. Teachers spent more time in general instruction, rather than demonstrating and promoting fitness. In the four studies that assessed teacher interactions, teacher promotion of MVPA beyond the immediate lesson was observed rarely (during less than 1% of observation intervals).
The variability in PA, time spent in lesson contexts, and teacher behavior within and among studies illustrates that the conduct PE is substantially different and may depend on where a child lives and goes to school. Time allocations for different lesson contexts and teacher behaviors reflect both programmatic goals and teacher expertise. PE stakeholders can benefit from ongoing dialogue related to PE curricula and instructional methods with the aim of greater consistency within and among programs worldwide.
Class size and lesson length varied widely and these variables have important implications for program outcomes. Chow et al. (2008) counted an average of 33.6 students in Hong Kong lessons (range = 15 to 45), nearly twice as many as many as in the two studies by Curtner-Smith and colleagues in England in 1995 and 1996and a third larger than the two secondary school studies in Australia reported by Dudley and colleagues in 2012. As well, PE was typically offered only two days per week with daily PE was identified only for the children in Hong Kong preschools. As well, the total minutes per lesson (e.g., 20 - 120 minutes) and per week varied widely. In many cases investigators reported that there were regional recommendations for PE time, but they also identified that school administrators were responsible for making site-based scheduling decisions for PE. Greater consistency in class size and lesson length at the school site level could ensure students have more equitable opportunities to become physically educated regardless of where they live.
The findings of this investigation are similar to those reported in our review of SOFIT studies conducted in the US (McKenzie & Smith, 2017) . For example, the challenges with synthesizing data were similar due important information being left out or reported inconsistently (i.e., observer reliabilities, sample sizes, and standard deviations). Nonetheless, there was similar variability in lesson characteristics in both the US studies and the current ones (e.g., how time was spent in lesson contexts).
A major difference between the US and international studies is the sample size, especially the number of lessons and schools observed. The 29 US studies included observations of 12,256 lessons, nearly five times the number of the 29 international studies. This difference is likely because SOFIT was used in randomized control-trials (e.g., SPARK, MSPAN, CATCH, TAAG) that were conducted in the US and sponsored by the National Institutes of Health (NIH).
The current description is limited to the assessment of the peer-reviewed reports of 29 different investigations that included direct observations of 2703 lessons using SOFIT in schools in 10 countries. Out syntheses of the main SOFIT variables were restricted to only the 12 studies that included at least 30 typical PE lessons that were not influenced by experiment or intervention, identified mean scores and standard deviations for main SOFIT variables, and provided evidence of observer reliability throughout the study. As the original study locations (e.g., county, city, school district, and schools) and the lessons themselves were not selected at random, our results may not accurately reflect the conduct of PE globally.
Nonetheless, the review has important implications for increasing awareness about the characteristics of preK-12 PE in international schools and for the conduct of future PE studies. Assessing PE is essential for improving its quality, and SOFIT has potential as a ground truthing tool that helps inform programmatic and instructional improvement efforts. In order to realize this potential, however, there is need for additional observations of PE in preK-12 international schools and for greater consistency in study design and how results are reported.
To inform policy and best practices that could improve PE globally, it is important for future investigations using direct observation to establish observer reliability prior to the start of data collection and continue to assess it throughout the study. As well, the utility and generalizability of the results of these studies can be improved by reporting sample sizes, means, and standard deviations scores in a consistent manner. Improved generalizability could result from investigators adhering to the standard SOFIT protocol and using the observer training videos that available for no cost on YouTube. For larger studies, investigators should consider using the iSOFIT iOS application. This app is free and it has the potential to streamline data entry and reporting processes (e.g., it generates data graphs immediately and can export data files via email).
SOFIT provides objective data on student physical activity levels and how teachers allocate lesson time and behave during lessons. The resulting information can be used to assess how well these factors align with programmatic and instructional goals. PE goals may differ by country, state/province, school district, school, grade level, and even teacher. SOFIT was developed with the belief that PE should be conducted in a pleasant environment that provides students with ample amounts of MVPA in order for them to accrue health benefits while simultaneously becoming physically fit and motorically skilled. The instrument examines the potential of lessons relative to these goals; it does not assess opportunities for students to reach other relevant PE goals such as cognitive, social, and emotional outcomes.
We thank California State University Fresno students Jenna Aoki and Calixte Aholu for their assistance with data extraction.
The authors declare no conflicts of interest regarding the publication of this paper.
The study was conceptualized by NS and TM. NS was responsible for all aspects of the process including the literature search, study selection, data extraction, data synthesis, and manuscript preparation. TM guided study conceptualization and methodology and made substantial contributions during the writing process. AH assisted with study selection, data extraction, and assessment of reliabilities. All authors read and reviewed the final version of the manuscript and agree with the order of presentation of the authors.
This research received no funding from agencies in the public, commercial, or not-for-profit sectors.
Supplementary data are available upon request.
Smith, N. J., McKenzie, T. L., & Hammons, A. J. (2019). International Studies of Physical Education Using SOFIT: A Review. Advances in Physical Education, 9, 53-74. https://doi.org/10.4236/ape.2019.91005
CS Class size
I Interaction between LC and MVPA
LC Lesson context
LL Lesson location
PA Physical activity
PE Physical education
PreK-12 Preschool-Kindergarten-12th grade
MVPA Moderate to vigorous physical activity
SOFIT System for Observing Fitness Instruction Time
T Lesson length
TB1 Teacher behavior
TB2 Teacher interaction