Open Journal of Statistics
Vol.06 No.05(2016), Article ID:71439,9 pages
10.4236/ojs.2016.65075
Study of University Dropout Reason Based on Survival Model
Juan C. Juajibioy
Fundación Universidad Autónoma de Colombia, Bogotá, Colombia
Copyright © 2016 by author and Scientific Research Publishing Inc.
This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).
http://creativecommons.org/licenses/by/4.0/
Received: July 28, 2016; Accepted: October 21, 2016; Published: October 24, 2016
ABSTRACT
In this paper, we introduce the survival modelling methodology in order to identify some factors which may be influencing the university dropout. By using the data base provided by the Fundación Universidad Autónoma de Colombia and the semi parametric proportional hazard Cox model, we have been able to identify these risk factors.
Keywords:
Dropout, Survival Models
1. Introduction
According to SPADIES1 in Colombian Institutions Higher Education, around 20% of students beginning an undergraduate program drop out at first year. That is a global phenomenon: usually the group of graduates is smaller respect to the number of beginners. That is due to variables of academic, social or economic type and several studies have been realized about it. From this global phenomenon arose two big questions:
・ What are the factors influencing the student drop out?
・ How long take a student to drop out university?
The most literature about the first question is divided in two branches: Tinto’s student integration model and Bean and Metzner’s student attrition model (1985). The first one refers to the student’s integration process and the second one refers to the student’s individual variables, see [1] [2] and references therein for a detailed description.
Respect to the second question, the survival models have been amply developed, and typically focused on time to event data.
2. Discrete Duration Analysis
Following [3] [4] we introduce the necessary background. Let T be the discrete variable representing the duration of studies (by semester from 1 until 12). The survival function is defined as
(1)
Since we have
(2)
The Hazard function is defined as
(3)
Notice that, since, by using (3) we have
(4)
so, the survival function can be written as
(5)
2.1. The Nonparametric Kaplan-Meyer Estimator
Let the failure time, the number of events that occur at time and the number of individuals at risk of experiencing the event immediately prior to, then the product limit estimator of survival function is
(6)
An interesting representation is given in [3] by using the following table
where is the initial population.
2.2. The Nonparametric Cox’s Proportional Hazard Model
The Cox’s proportional hazard model really gives a semi parametric method to the estimate the hazard function at time t given a baseline hazard that’s modified by a set of covariates:
(7)
where is the non-parametric baseline hazard function is a set of explanatory variables
3. Data and Descriptive Analysis
In this section we defined the principal explanatory variables and consider some descriptive aspects of these variables. We take a set that belong a cohort of students that began the studies in the first semester of 2010 in the University Fundación Universidad Autónoma de Colombia. In order to differentiate the group of students, we consider the following groups
・ Group 1, Graduated Students: Student which finished successful their studies before 12 semesters.
・ Group 2, Active students: In the dataset in second semester of 2015.
・ Group 3, Inactive Students: Students who did not register for more than three consecutive semesters in the dataset.
In our analysis the following covariates were collected, grouped by individuals and academics. We consider the following individual variables
A breakdown by program and group is given in Figure 1. And in Figure 2, we show the percent of students by program.
In Figure 2 we present the percent of students that began their studies at first semester of 2010.
The student population considered in this study, initially counted with 1018 students and due to the lack of information concerning to the explanatory variables we only considered a total population of 991 students. The total of students who dropped out in the period corresponding to first semester of 2010 until second semester of 2015 was of 37.54%, in Figure 3 we show the distribution by groups. The Fundación Universidad Autónoma de Colombia is divided in four big faculties namely, Faculty of Law, Engineer Faculty, Faculty of Management and Accounting sciences and Human Science Faculty. In Figure 1 (left square) can see that the bigger percent of students that dropped out university was in Law Faculty (8.6% in group 3).
4. Duration Analysis
In this section we looking for the relationship between the student’s decision to complete or abandon, opposite to the decision of prolong their permanence at university.
Figure 1. Breakdown by program and group.
Figure 2. Distribution of students by program.
Figure 3. Distribution of students by group.
Figure 4. Kaplan Meier estimate for Survival function.
Initially we used the nonparametric Kaplan-Meier estimator 2.6, the results are given in Table 1 (See Appendix)
In Figure 4 it can see that the bigger drooping out rate occurs during the four initial semesters. In Figure 5 it is possible see the dynamics of survival in all programs that university offers
In order to study the effect of covariates we use the proportional hazard Cox model. In order to choice the significant variables we use the likelihood test ratio, the final
Figure 5. KM estimate by program.
Figure 6. Baseline cumulative hazard and survival rate.
results can see in Table 2 (See Appendix)
The baseline cumulative hazard it can see in Figure 6, notice in the left side the rapidly increasing rate, meaning that the hazard increase during the four first semesters.
5. Conclusion
In this work, we use the nonparametric survival model in order to estimate the risk factors for the university drop out, factors such that grade point average at first semester, gender and location are most significant in our study, remember that a positive estimate in the coefficient indicates an increased hazard meaning shorter expected survival time. By gender, the male population has more hazards to survival than female population. Finally after accounting for age, sex, grade point average and location there are no statistically significant associations between Icfes score and Social status and all- cause drop out.
Acknowledgements
This research was supported by SUI: Sistema Universitario de Investigación, Fundación Universidad Autónoma de Colombia.
Conflict of Interest
The authors declare that there is no conflict of interests regarding the publication of this paper.
Cite this paper
Juajibioy, J.C. (2016) Study of University Dropout Reason Based on Survival Model. Open Journal of Statistics, 6, 908-916. http://dx.doi.org/10.4236/ojs.2016.65075
References
- 1. Montoya Diaz, M. (1999) Extended Stay at University: An Application of Multinomial Logit and Duration Models. Applied Economics, 31, 1411-1422.
http://dx.doi.org/10.1080/000368499323292 - 2. Giovagnoli, P. (2005) Determinants in University Desertion and Graduation: An Application Using Duration Models. Ecónomica LI, No. 1, 60-90.
- 3. Kleinbaum, D. and Klein, M. (2005) Survival Analysis: A Self-Learning Text. Springer.
- 4. Pintilie, M. (2006) Competing Risks: A Practical Perspective. Wiley.
http://dx.doi.org/10.1002/9780470870709
Appendix
Table 1. KM Estima for survival function.
Table 2. Hazard ratios.
NOTES
1Sistema para Prevención de la Deserción de la Educación Superior