Predicting
general academic performance and
identifying the differential contribution of participating
variables using
artificial neural networks
Mariel F. Mussoab, Eva
Kyndtac, Eduardo C. Cascallarad, Filip
Dochya
aKatholieke
Universiteit Leuven, Belgium
bUniversidad Argentina de La
Empresa, Argentina
cUniversity of Antwerp,
Belgium
dAssessment Group
International, USA / Belgium
Article received 8
March 2013 / revised 2 July 2013 /
accepted 16 July 2013 / available online 27 August 2013
Abstract
Many studies have explored the contribution of different factors from diverse theoretical perspectives to the explanation of academic performance. These factors have been identified as having important implications not only for the study of learning processes, but also as tools for improving curriculum designs, tutorial systems, and students’ outcomes. Some authors have suggested that traditional statistical methods do not always yield accurate predictions and/or classifications (Everson, 1995; Garson, 1998). This paper explores a relatively new methodological approach for the field of learning and education, but which is widely used in other areas, such as computational sciences, engineering and economics. This study uses cognitive and non-cognitive measures of students, together with background information, in order to design predictive models of student performance using artificial neural networks (ANN). These predictions of performance constitute a true predictive classification of academic performance over time, a year in advance of the actual observed measure of academic performance. A total sample of 864 university students of both genders, ages ranging between 18 and 25 was used. Three neural network models were developed. Two of the models (identifying the top 33% and the lowest 33% groups, respectively) were able to reach 100% correct identification of all students in each of the two groups. The third model (identifying low, mid and high performance levels) reached precisions from 87% to 100% for the three groups. Analyses also explored the predicted outcomes at an individual level, and their correlations with the observed results, as a continuous variable for the whole group of students. Results demonstrate the greater accuracy of the ANN compared to traditional methods such as discriminant analyses. In addition, the ANN provided information on those predictors that best explained the different levels of expected performance. Thus, results have allowed the identification of the specific influence of each pattern of variables on different levels of academic performance, providing a better understanding of the variables with the greatest impact on individual learning processes, and of those factors that best explain these processes for different academic levels.
Keywords: Predictive systems; Academic performance; Artificial neural networks
http://dx.doi.org/10.14786/flr.v1i1.13
1.
Introduction
Many studies have
explored the
contribution to the explanation of academic performance with
the use of various
different variables and from diverse theoretical perspectives
(e. g. Bekele
& McPherson, 2011; Fenollar, Roman, & Cuestas, 2007;
Kuncel, Hezlett,
& Ones, 2004; Miñano, Gilar, & Castejón, 2008). Many
factors have been
identified as having important implications not only for the
study of learning
processes, but also as tools for improving of curriculum
designs, tutorial
systems, and students’ academic results (Miñano et. al., 2008;
Musso &
Cascallar, 2009a; Zeegers, 2004). From this previous body of
research, it has
become apparent that the accurate prediction of student
performance could have
many useful applications for positive outcomes of the learning
process and lead
to advances in learning theory. For example, it could be
helpful to identify
students at risk of low academic achievement (Musso &
Cascallar, 2009a;
Ramaswami & Bhaskaran, 2010). This prediction could serve
as an early warning
of future low academic performance and guide interventions
that could prove
beneficial for such students. Similarly, being able to
understand the role of
different intervening variables that influence performance for
all and for each
category of performance level, would be a significant
contribution to improve
the approach to teaching and better understand learning
processes. Many
previous studies have focused on the prediction of academic
performance (e.g., Hailikari,
Nevgi, & Komulainen, 2008; Krumm, Ziegler, & Buehner,
2008; Turner,
Chandler, & Heffer, 2009).
Many
of the studies about academic
performance have considered Grade Point Average (GPA) as the
best summary of
student learning, not only because of its strong prediction of
performance for
other levels of education (e. g. Kuncel et al., 2004, 2005),
but also for other
life outcomes as salary (Roth & Clarke, 1998), and job
performance (Roth,
Be Vier, Switzer, & Schippman, 1996).
The
prediction of academic performance has
been carried out with different methodological approaches. The
first and most
common approach found in the educational literature, has to do
with the use of
traditional statistical methods, such as discriminant analysis
and multiple
linear regressions (Braten & Stromso, 2006; Vandamme,
Meskens &
Superby, 2007). A second approach can be found in various
studies which have
used Structural Equation Modelling (SEM) to compare
theoretical models to data
sets and/or to test different models of academic performance
(Fenollar et al.,
2007; Miñano et al., 2008; Ruban & McCoach, 2005). These
traditional
approaches – that are tools widely used to predict GPA, to
orient selection,
placement, and/or classification of the academic process
–failed to
consistently show the capacity to reach accurate predictions
or classifications
in comparison with artificial intelligence computing methods
(Everson, Chance,
& Lykins, 1994; Kyndt, Musso, Cascallar, & Dochy,
2012, submitted; Lykins
& Chance, 1992; Maucieri, 2003; Weiss & Kulikowski,
1991). Therefore, a
third approach to the “prediction of academic performance”
that we can find in
recent literature involves machine learning techniques, such
as methods using
Artificial Neural Networks (ANN). This method has been used
and proven useful
in several other fields, such as business, engineering,
meteorology, and
economics. It is considered an important method to classify
potential outcomes
and is well regarded as an excellent pattern-recognizer
(Detienne, Detienne,
& Joshi, 2003; Neal & Wurst, 2001; White & Racine,
2001).
Recent work in the
field of computer sciences has started to apply this
methodology to large data
banks of nation-wide educational outcomes (Abu Naser, 2012;
Croy, Barnes, &
Stamper, 2008; Fong,
Si, & Biuk-Aghai, 2009; Kanakana,
& Olanrewaju, 2011; Maucieri, 2003; Mukta & Usha,
2009; Pinninghoff
Junemann, Salcedo Lagos, & Contreras Arriagada, 2007;
Ramaswami &
Bhaskaran, 2010; Zambrano Matamala, Rojas Díaz,
Carvajal Cuello,
& Acuña Leiva, 2011; Walczak, 1994). This methodology has
also recently
been used with various applications in educational
measurement, in conjunction
with other theoretical models of different constructs such as
self-regulation
of learning (Cascallar, Boekaerts
&
Costigan, 2006; Everson et al., 1994; Gorr, 1994; Hardgrave,
Wilson, &
Walstrom, 1994), reading readiness (Musso & Cascallar,
2009a); and
performance in mathematics (Musso & Cascallar, 2009b;
Musso, Kyndt,
Cascallar, & Dochy, 2012). The application of predictive
systems, with the
emergence of new methodologies and technologies, have made it
possible to
assess a wide range of data and student performances in order
to evaluate their
current and future performance without the need for
traditional testing (Boekaerts
& Cascallar, 2006; Cascallar et al., 2006). This
methodological approach
using ANN can lead to the possible implementation of
continuous assessment in
the context of intelligent classrooms (Birenbaum et al.,
2006).
Existing databases
together with
the constant monitoring of student performance could provide a
continuous
evaluation in real time of the students’ progress.
The
interrelationship
between many of the variables participating in the complex and
multi-faceted
problem of academic performance are not clearly understood,
and they are often
related in nonlinear ways. ANN have demonstrated to be a very
effective
approach to address situations with these characteristics and
to be able to
classify and predict outcomes under those conditions with a
high level of
accuracy, especially when large data sets are available. This
approach also
allows the researcher to consider a large number of variables
simultaneously
and make use of their interrelationships without the usual
parametric
constraints. These advantages would allow researchers in the
learning sciences
to better understand the complex patterns of interactions
between the variables
at different levels of academic performance, not just for the
prediction of
performance but also to understand the participating factors
that could be
related to these outcomes. Several previous studies using ANN
have addressed
the classification of outcomes into different levels of
performance, for
different academic purposes: a) diagnostic purposes in order
to identify those
students most in need of support at the beginning of their
primary school,
regarding their readiness for learning to read (Musso &
Cascallar, 2009a),
and b)
identifying students with low
expected writing performance at the vocational secondary
school level in order
to provide support prior to their first year, and thus
avoiding possible
failure (Boekaerts & Cascallar, 2011).
In these and other possible applications, the early
detection of future
low performance, and more targeted interventions, would
decrease the negative
experience of failure, and it would provide an important
diagnostic tool for
effective interventions. This approach would improve the
chances of achieving
successful outcomes, particularly for students identified as
being “at-risk”.
Detecting and understanding the most significant variables
that are the best
indicators of the future low performers would be an important
tool for
management of school resources and planning remediation
programs at all levels
of an educational system.
Similarly, knowing
the
best indicators of the future high performers, would allow
first of all the
understanding of many of the factors leading to these positive
outcomes. It
would also allow an accurate selection of
those students who could be assigned to advance programs,
fellowships and/or be
the object of talent searches. The accurate placement of
students in different
courses or programs according to how they are expected to
perform would prevent
possible failure, as well as providing the opportunity to
offer challenging
tasks for students expected to be among the high performers. In addition, a
better understanding of the
interrelationships between the variables leading to different
levels of
performance, would allow the fine-tuning of instructional
approaches to the
individual and/or group needs using the information provided
by an ANN
approach.
Some authors have
shown that
traditional statistical methods do not always yield accurate
predictions and/or
classifications (Bansal, Kauffman & Weitz, 1993; Everson,
1995; Duliba,
1991). Preliminary research using ANN for prediction,
selection, and
classification purposes suggests that this method may improve
the validity and
accuracy of the classifications, as well as increase the
predictive validity of
educational outcomes (Everson et al., 1994; Hardgrave et al.,
1994; Perkins,
Gupta, Tammana, 1995; Weiss & Kulikowski, 1991).
This
paper explores
this new methodological approach using a large amount of data
collected from
the students (including both cognitive and non-cognitive
measures) in order to
design predictive models using artificial neural networks
(ANN). The ANN models
in this research study can identify those predictors that
could best explain
different levels of academic performance in three different
performance groups
which cover all the range of performances, as well as making
accurate
classifications of the expected level of performance for each
subject. Data
about individual differences in basic cognitive variables were
collected, since
they are strongly related to the student’s achievement (Colom,
Escorail, Chin
Shih, & Privado, 2007; Grimley & Banner, 2008).
Although it has been
argued that considering students’ cognitive ability can lead
to a relatively
strong prediction of academic performance (Colom et al.,
2007), this prediction
could be strengthened by including background and
non-cognitive predictors. As
Chamorro-Premuzic & Arteche (2008) discuss, combining both
cognitive
ability and non-cognitive measures can provide a broader
understanding of an
individual’s likelihood to succeed in academic settings, with
models that
predict such performance
at least one academic year in advance of the actual measure
being obtained
(grade-point average, GPA). In addition, discriminant analyses
(DA) was used to
analyse the same data in order to compare the predictive
classificatory power
of both methodologies. To better understand the rationale for
this research, it
is useful to review some of the main constructs included as
predictors in this
study, and to explain the quite novel methodology introduced
from the family of
predictive systems, that is, the machine learning modelling
technique of
Artificial Neural Networks (ANN).
2.
Theoretical
considerations
2.1
Working memory and academic performance
Intelligence
and the g-factor are the most frequently studied factors in
relation to
academic achievement and the prediction of performance (Miñano
et al., 2012).
There is a large body of research that shows a strong positive
correlation
between g and educational success (e.g., Kuncel, Hezlett,
& Ones, 2001; Linn
& Hastings, 1984). The g-factor is defined, in part, as an
ability to
acquire new knowledge (e.g., Cattell, 1971; Schmidt, 2002;
Snyderman &
Rothman, 1987). Although the g-factor is not the same
construct as Working
Memory (WM), several studies have demonstrated a high
correlation between these
measures (Heitz et al., 2006; Unsworth, Heitz, Schrock, &
Engle, 2005).
Following the early study of Daneman and Carpenter (1980) on
individual
differences in working memory capacity (WMC) and reading
comprehension, further
research has shown the importance of WMC as a domain-general
construct (Conway, Cowan,
Bunting, Therriault, &
Minkoff, 2002; Conway
& Engle, 1996; Engle &
Kane, 2004; Feldman Barrett, Tugade,
& Engle, 2004; Kane et al., 2004), including
the prediction of average scores
over several academic areas (Colom et al., 2007).
Similarly,
a large body of literature shows WMC as a very important
construct in several
areas and several studies have shown its importance in a wide
range of complex
cognitive behaviours such as comprehension (e.g., Daneman
& Carpenter,
1980), reasoning (e.g., Kyllonen & Christal, 1990),
problem solving (Welsh,
Satterlee- Cartmell, & Stine, 1999) and complex learning
(Kyllonen &
Stephens, 1990; Kyndt,
Cascallar, & Dochy, 2012; St
Clair-Thompson & Gathercole, 2006).
WMC is an important predictive variable of intellectual
ability and academic
performance, consistent over time (e.g. Engle, 2002; Musso &
Cascallar,
2009a; Passolunghi
& Pazzaglia, 2004;
Pickering, 2006). Working memory is a paradigmatic form of
cognitive control that explains
how this cognitive control occurs, and which involves the
active maintenance
and executive processing of information available to the
cognitive system,
combining the ability to both maintain and effectively process
information with
minimal loss (Jarrold & Towse, 2006). It is crucial for
the processing of
information within the cognitive system, it has a limited
capacity and it
differs between individuals (Conway et al., 2005). The
literature seems to
indicate two fundamental approaches according to the
interpretation of working
memory and executive control. Traditional perspectives
represent working memory
and executive control as separate modules (e.g., Baddeley,
1986). The
perspective taken in this research coincides with another view
that understands
working memory and executive control as constituting two sides
of the same
phenomenon, an emergent property from the neuro-cognitive
architecture
(Anderson, 1983, 1993, 2002, 2007; Anderson et al., 2004;
Hazy; Frank &
O’Reilly, 2006).
2.2 Attention
and
academic performance
Attention
as
a cognitive construct has been studied from different
theoretical and
methodological approaches (e.g., Posner &
Rothbart, 1998; Redick &
Engle, 2006; Rueda, Posner, & Rothbart, 2004). It is evident
that our cognitive system is
constantly receiving a variety of inputs form the environment.
All these inputs
are competing for the limited resources of the cognitive
system, and requiring
our “attention”.
However, because human
cognitive capacities are limited in their ability to process
information
simultaneously (Gazzaniga, Ivry, & Mangun, 2002), it is
the shifting of the
processing capacity and selection of stimuli to attend to,
which constitute the
basic aspects of our attentional system (Redick & Engle,
2006). This
shifting and selection of incoming information is the function
of the attentional
system, which allows us to redirect our attention to the
relevant aspects of
the environmental information for the task or goals at hand.
This study adopts
the framework of Posner and Petersen (1990) who described
three different and
semi-independent attentional networks: orientation, alertness
and executive
attention. The orienting network allows the selection of
information from
sensory input, the alerting network refers to a system that
achieves and
maintains an alert state, and executive attention or executive
control is
responsible for resolving conflict among responses (Fan,
McCandliss, Summer,
Raz, & Posner, 2002). The efficiency of these three
attentional networks
can be quantified by reaction time measures (Fan et al.,
2002). Redick and Engle
(2006) and Unsworth et al. (2005) have found that individual
differences in
working memory capacity are related to those in attentional
control, thus
establishing that the executive control mechanism is closely
related to working
memory capacity.
Several studies have shown
the importance of attention as a predictor of general academic
performance
(Gsanger, Homack, Siekierski, & Riccio, 2002; Kyndt et
al., 2012, submitted;
Riccio, Lee, Romine, Cash and Davis, 2002), reading (Landerl,
2010; Lovett,
1979), mathematical performance (Fernandez-Castillo &
Gutiérrez-Rojas,
2009; Fletcher, 2005; Musso et al., 2012), and written
expression (Reid, 2006).
The research on learning disorders has found that attentional
problems are
negatively associated to academic achievement (Jimmerson,
Dubrow, Adam, Gunnar,
& Bozoky 2006).
2.3
Learning strategies and academic
performance
The estimated level of contribution of
basic cognitive processes to
the determination of academic achievement has shown
considerable variation, which
ranges from a moderate to a medium-high effect (Castejón &
Navas, 1992;
Navas, Sampascual, & Santed, 2003). Consequently, the
studies focusing on
the prediction of academic performance have increasingly
included the so-called
non-cognitive variables such as motivation, attributions,
self-concept, effort,
goal orientation, etc. (e.g., Fenollar et al., 2007; Pintrich,
2000). Learning
strategies (LS) have been defined as student’s actual
behaviours, in a specific
context, to engage in a task (Biggs, 1987). Other researchers
describe LS as
any thoughts or behaviours that help the students to acquire
new information
and integrate this new information with their existing
knowledge (Weinstein
& Mayer, 1986; Weinstein, Palmer, & Schulte, 1987; Weinstein, Schulte &
Cascallar, 1982). LS also help
students retrieve stored
information. Examples of LS include summarizing, paraphrasing,
imaging,
creating analogies, note-taking, and outlining (Weinstein et
al., 1987).
Previous research has provided support
for the mediating role of
learning strategies (Dupeyrat & Marine, 2005; Fenollar et
al., 2007; Simons, Dewitte,
& Lens, 2004). Fenollar et al. (2007) have
compared a theoretical model,
where achievement goals and self-efficacy were hypothesised to
have direct
effects on academic performance, to a mediating model where
such effects were
mediated through study strategies. Results from the study
showed that
achievement goals and self-efficacy have no direct effects on
performance, and
they suggest that the mediating model provides a better fit to
the data
(Fenollar et al., 2007).
2.4 Artificial
neural
networks and performance
Conceptually,
a neural network is a
computational structure consisting of several highly
interconnected
computational elements, known as neurons, perceptrons, or
nodes. Each “neuron”
or unit carries out a very simple operation on its inputs and
transfers the
output to a subsequent node or nodes in the network topology
(Specht, 1991).
Neural networks exhibit polymorphism in structure and
parallelism in
computation (Mavrovouniotis & Chang, 1992), and it can be
represented as a
highly interconnected structure of processing elements with
parallel
computation capabilities (Grossberg, 1980, 1982; Rumelhart,
Hinton, & Williams,
1986; Rumelhart, McClelland, & the PDP research group,
1986). In general,
an ANN consists of an input layer (which can be considered the
independent
variables), one or more hidden layers, and an output layer
that is comparable
to a categorical dependent variable (Cascallar et al., 2006;
Garson, 1998). All
ANN process data through multiple processing entities which
learn and adapt
according to patterns of inputs presented to them, by
constructing a unique
mathematical relationship for a given pattern of input data
sets on the basis
of the match of the explanatory variables to the outcomes for
each case
(Marshall & English, 2000).
Thus,
neural networks construct a mathematical relationship by
“learning” the
patterns of all inputs from each of the individual cases used
in training the
network, while more traditional approaches assume a particular
form of
relationship between explanatory and outcome variables and
then use a variety
of fitting procedures to adjust the values of the parameters
in the model.
During the training phase,
ANNs generate a predicted outcome for each case, and when this
prediction is
incorrect the network makes adjustments to the weights of the
mathematical
relationships among the predictors and with the expected
outcome, weights that
are represented in the hidden layers of the network. The
predicted output is a
continuous variable with a specific value for each case (or
subject) which
includes information on the probability of belonging to each
of the categorical
classifications requested by the developer of the ANN.
According to this
architecture, the ANN finally recognizes patterns and
classifies the cases
presented into the requested outcome categories, depending on
the target
question, and given the individual probability values for each
case. This information
is generated by the network through many iterations, gradually
changing and
adjusting the weights for all the interrelationships between
the units after
each incorrect prediction. During this training process, the
network becomes
increasingly accurate in replicating the known outcomes from
the test cases.
The neural network continues to improve its predictions until
one or more of
the pre-determined stopping criteria have been met. These
stopping criteria can
be, for example, a minimum level of accuracy, learning rate,
persistency,
number of iterations, amount of time, etc.
Once trained, the
network is tested with the
remaining cases in the dataset, which is considered a form of
validation of the
network (testing phase), by observing how the weights in the
model, now fixed
to those obtained in the training phase, predict classes of
outcomes in a new
set of data of which outcomes are known to the experimenter
but not to the ANN
system. Afterwards it can also be applied to predict future
cases where the
outcome is still unknown (Cascallar et al., 2006). In
addition, with
complementary techniques in predictive stream analysis, the
neural network
approach allows us to determine the predictive power of each
of the variables
involved in the study, providing information about the
importance of each input
variable (Cascallar et al., 2006; Garson, 1998).
Predictive stream
analyses (Cascallar &
Musso, 2008), based in this case on neural network (ANN)
models, have several
strengths: (a) because these are machine learning algorithms,
the assumptions
required for traditional statistical predictive models (e.g.,
ordinary least
squares regression) are not necessary. As such, this technique
is able to model
nonlinear and complex relationships among variables. ANN aim
to maximize
classification accuracy and work through the data in an
interactive process
until maximum accuracy is achieved, automatically modelling
all interactions
among variables; (b) ANNs are robust, general function
estimators. They usually
perform prediction tasks at least as well as other techniques
and most often
perform significantly better (Marquez, Hill, Worthley, &
Remus, 1991); (c)
ANN can handle data of all levels of measurement, continuous
or categorical, as
inputs and outputs. Because of the speed of microprocessors in
even basic
computers, ANNs are more accessible today than when they were
originally
developed. Current research has shown that neural network
analysis
substantially improves the validity of the classifications and
increases the accuracy
and predictive validity of the models, in education and other
fields (Kyndt et
al., 2012, submitted; Musso & Cascallar, 2009b; Perkins et
al., 1995).
The
ANN learns by examining individual training
cases (subjects/students), then generating a prediction for
each student, and
making adjustments to the weights whenever it makes an
incorrect prediction.
Information is passed back through the network in iterations,
gradually
changing the weights. As training progresses, the network
becomes increasingly
accurate in replicating the known outcomes. This process is
repeated many
times, and the network continues to improve its predictions
until one or more
of the stopping criteria have been met. A minimum level of
accuracy can be set
as the stopping criterion, although additional
stopping criteria may be used as well (e.g., number of
iterations, amount of
processing time). Once trained, the network can be applied,
with its structure
and parameters, to future cases (validation or holdout sample)
for further
validation studies and programme implementation (Lippman,
1987). As long as the
basic assumptions of the population of persons or events that
the ANN used for
training is constant or varies slightly and/or gradually, it
can adapt and
improve its pattern recognition algorithms the more data it is
exposed to in
the implementations.
The class of ANN
models used in this research
can be compared with the more traditional discriminant
analysis approach. Both
of these methods derive classification rules from samples of
classified objects
based on known predictors. This general approach is called
‘supervised
learning’ since the outcomes are known and relationships are
modelled or
‘supervised’ according to these outcomes (Kohavi &
Provost, 1998). But,
there are significant differences in the algorithms and
procedures for both
analyses, such as the fact that while discriminant analysis
assumes linear
relationships, neural network analysis does not. In terms of
comparisons with
another common statistical method used in educational
research, linear
regression, it is important to note that although neural
networks can address
some of the same research issues as regression it is
inherently a different
mathematical approach (Detienne et al., 2003). There is
another family of
predictive systems which are “unsupervised” (e.g., Kohonen
networks), in
which the patterns presented to the
network are not associated with specific outcomes; it is the
neural network
itself that derives the commonalities between the predictors,
grouping cases
into classes on the basis of these similarities. Thus, these
analyses can be
used to explore the data from a different perspective and
learn the grouping of
cases based on these predictor commonalities instead of being
focused on
predictions or individual outcomes (Cascallar et al., 2006;
Kyndt et al., 2012,
submitted).
Neural networks
excel in the classification and
prediction of outcomes; especially when large data sets are
available that are
related in nonlinear ways, and where the intercorrelation
between variables is
not clearly understood. These properties of ANNs clearly make
them particularly
suitable for social science data where they can simultaneously
consider all
variables in a study (Garson, 1998). Moreover, the assumptions
of normality,
linearity and completeness that are made by methods such as
multiple linear
regression (Kent, 2009), and that are often very difficult to
establish for
social science data, are not made in neural network analysis.
Neural networks
can work with noisy, incomplete, overlapping, highly nonlinear
and
non-continuous data because the processing is spread over a
large number of
processing entities (Garson, 1998, Kent, 2009). In this regard
it can be said
that neural networks are robust and have wide non-parametric
application. There
is also evidence that neural models are robust in the
statistical sense, and
also robust when faced with a small number of data points
(Garson, 1998).
Very few studies
within the educational literature
have used neural network analysis or any other type of
predictive system (e.g.,
Cascallar et al., 2006; Cascallar & Musso, 2008; Musso
& Cascallar,
2009a; Pinninghoff Junemann et al., 2007; Wilson &
Hardgrave, 1995).
2.5
ANN processing and measures to evaluate the
neural network system performance
In order to
evaluate the performance of the
neural network system, there are a number of measures used
which provide a
means of determining the quality of the solutions offered by
the various network
models tried. The traditional measures
include the
determination of actual numbers and rates for True Positive
(TP), True Negative
(TN), False Positive (FP), and False Negative (FN) outcomes,
as products of the
ANN analysis. In addition, certain summative evaluative
algorithms have been
developed in this field of work, to assess overall quality of
the predictive
system.
These overall
measures are: Recall, which
represents the proportion of correctly identified targets, out
of all targets
presented in the set, and is represented as: Recall = TP/(TP +
FN); and
Precision which represents the proportion of correctly
identified targets, out
of all identified targets by the system, and is represented
as: Precision =
TP/(TP + FP). Two other measures, derived from
signal-detection theory (ROC
analysis), have also been used to report the characteristics
of the detection
sensitivity of the system. One of them is Sensitivity (similar
to Recall: the
proportion of correctly identified targets, out of all targets
presented
in the set), and which is expressed as Sensitivity = TP/(TP +
FN). The other is
Specificity, defined as the proportion of correctly rejected
targets from all
the targets that should have been rejected by the system, and
which is
expressed as Specificity = TN/(TN + FP). All the traditional
measures are
typically represented in what is called a “confusion matrix”
representing all
four outcomes.
In addition, the
evaluation of ANN performance
is also carried out with another summative measure, which is
used to account
for the somewhat complementary relationship between Precision
and Recall. This
measure is defined as F1, and is defined as F1
= (2 *
Precision * Recall)/(Precision + Recall). Such a definitional
expression of F1
assumes equal weights for Precision and Recall. This
assumption can be modified
to favour either Precision or Recall, according to the utility
and cost/benefit
ratio of outcomes favouring either Precision or Recall for any
given predictive
circumstance.
2.6
Objectives and research questions
The objective of
this study is to identify
patterns of variables that will allow a correct predictive
classification of
three levels of General Academic Performance (GAP) into: Low,
Middle and High
GAP, measured by the grade-point-average (GPA). This
was achieved by taking into consideration basic cognitive
processes (working
memory capacity; alerting, orienting and executive
attention), learning
strategies, and family-social background factors. The
idea behind this
paper is to explore new approaches to obtain predictive
classifications of
learning outcomes, without the use of one specific test, using
a large number
of variables (cognitive and non-cognitive) that could better
capture the true
complex composite of influences participating in the actual
observed outcomes
from individual students. In addition, it is another objective
of the research
to explore the differences in the patterns predicting each
level of performance
(low, middle and high performance) to inform future research
into the causal
factors generating and participating in those sets of
identified variables and
that could explain different levels of performance using
artificial neural
networks. Of course, previous academic performance could have
been taken into
account to facilitate the predictive classification, but this
was purposely
avoided for two reasons: as a proof-of-concept that other
variables are
sufficient to predict academic performance, and to highlight
more clearly the
weight that each of these other variables has in the
determination of a
student’s academic performance.
In order to
explore the differences in the
patterns predicting each level of performance, three
artificial neural network
(ANN) models were developed. Two of them to predict the
students who would be
in each of the extreme performance levels (low 33% and high
33% of GPA) in
order to analyse the differences between the patterns of
variables having the
most predictive weight for each group, and thus providing
information on the
potentially different processes involved in those low and high
performance
outcomes. A third ANN was developed, capable of accurately
producing a
predictive classification for the three levels of performance
simultaneously
(low 33%, middle 33%, and high 33%). This final ANN model was
capable of
finding the common patterns
that could predict
simultaneously all performance groups. The relative
importance of the
predictors for each network was also analysed. The predictive
capability of
each ANN was systematically improved by modifying the
parameters that determine
the rate of learning, the persistence, momentum, and stopping
criteria, and the
type of functions used for weight adjustments. Precision,
sensitivity,
specificity and accuracy of the three networks were obtained.
In addition, the
correlation between the individual prediction for each student
and the actual
observed GPA was established, and proved to be very high.
The
main
research questions of this study are: How accurately can
different levels of
academic performance in higher education be predicted by working memory
capacity, attentional networks,
learning strategies and background variables when used as
inputs in a neural
network model? What is the relative importance of the
predictor variables and
the observed differences for each performance level category?
3.
Method
3.1
Participants
The
total sample included 864 university students, of both genders
(male 45.4%;
female 54.6%), ages between 18 and 25 (Mage
= 20.38, SD =
3.78), recently
enrolled in the first year in several different disciplines
(psychology,
engineering, medicine, law, social communication, business and
marketing), in three
private universities in Argentina, during the 2009-2011
academic years. In all,
67.8% of the sample was 17 to 20 years old, 24.7% was 21-25
years old, and 7.5%
was older than 25 years. The students in the sample came from
private religious
secondary schools (48.5%), private non-religious schools (19%)
, private
bilingual schools (15.4%), public secondary schools (15%), and
2.1% from
international community schools. All student data (predictors)
was collected at
the beginning of the corresponding academic year, and the
dependent variable
(GPA) was collected at the end of the same academic year. An
80% math accuracy
criterion was imposed for all participants in the Automated
Operation Span (Unsworth et al.,
2005). Therefore, they were encouraged to keep
their math accuracy at or
above 80% at all times (to insure that the interfering task
was actually being
performed). As a consequence of
this criterion, 78 participants were excluded from the
analyses. The final
sample consisted of 786 students.
3.2
Instruments
3.2.1 Attention Network Test (ANT)
(Fan et al., 2002)
This computerized
task provides a measure for
each of the three anatomically defined attentional networks:
alerting,
orienting, and executive. The ANT is a combination of the cued
reaction time (Posner,
1980) and the flanker test (Eriksen & Eriksen, 1974). The
participant saw
an arrow on the screen that, on some trials, was flanked by
two arrows to the
left and two arrows to the right. Participants were asked to
determine when the
central arrow points left or right, by two mouse buttons
(left- right). They
were instructed to focus on a centrally located fixation cross
throughout the
task, and to respond as quickly and accurately as possible.
During the practice
trials, but not during the experimental trials, subjects
received feedback from
the computer on their speed and accuracy. The practice trials
took
approximately 2 minutes and each of the three experimental
blocks took
approximately 5 minutes. The whole experiment took about
twenty minutes. The
measure for (general) attention is the average response time
regardless of the
cues or flankers. To analyse the effect of the three
attentional networks, a
set of cognitive subtractions described by Fan et al. (2002)
were used. The
efficiency of the three attentional networks is assessed by
measuring how
response times are influenced by alerting cues, spatial cues,
and flankers (Fan
et al., 2002). The alerting effect was calculated by
subtracting the mean
response time of the double-cue conditions from the mean response time of the no-cue
conditions. For the orienting
effect, the mean response time of the spatial cue conditions
(up and down) were
subtracted from the mean response time of the center cue
condition. Finally,
the effect of the executive control (conflict effect) was
calculated by
subtracting the mean response time of all congruent flanking
conditions, summed
across cue types, from the mean response time of incongruent
flanking
conditions (Fan et al. 2002). The test-retest reliability of
the general
response times (in this study used as a measurement of general
attention),
calculated by Fan et al. (2002) equaled .87. The test-retest
reliability of the
subtractions is less good. The executive control is the most
reliable (r=.77),
followed by the orienting
network (r=.61). The
alerting network
showed to be the least reliable (r=.52)
(Fan et al. 2002).
3.2.2 Automated Operation
Span (Unsworth et al., 2005)
This
is a computer-administered version of the Ospan instrument
(Unsworth et al.,
2005) that measures working memory capacity. The responses
were collected via
click of a mouse button. First, participants receive practice and
secondly, the participants
perform the actual experiment. The practice sessions are
further broken down
into three sections. The first practice is a simple letter
span task. They see
letters appear on the screen one at a time. In all
experimental conditions,
letters remain on-screen for 800 milliseconds (ms). Then,
participants must
recall these letters in the same order they saw them from a 4
x 3 matrix
of letters (F, H, J, K, L, N, P, Q, R, S, T, and Y)
presented to them.
Recall consists of clicking the box next to the appropriate
letters; the recall
phase is untimed. After each recall, the computer provides
feedback about the
number of letters correctly recalled. Next, participants
practice the math
portion of the experiment. Participants first see a math
operation (e.g. (1*2)
+ 1 = ?). Once the participant knows the answer they click the
mouse to advance
to the next screen. Participants then see a number (e.g. “3”)
and are required
to click if the number is the correct solution by clicking on
“True” or
“False.” After each operation participants are given feedback.
The math
practice serves to familiarize participants with the math
portion of the
experiment, as well as to calculate how long it takes a given
person to solve
the math problems, establishing an individual baseline. Thus,
it attempts to
account for individual differences in the time it takes to
solve math problems.
This is then used as an individualized time limit for the math
portion of the
experimental session. The final practice session has
participants perform both
the letter recall and math portions together, just as they
will do in the
experimental block. The participants first are presented with
a math operation,
and after they click the mouse button indicating that they
have solved it, they
see the letter to be recalled. If the participants take more
time to solve the
math operations than their average time plus 2.5 SD, the
program automatically
moves on and counts that trial as an error. This serves to
prevent participants
from rehearsing the letters when they should be solving the
operations. Participants complete
three practice trials, each of set size 2.
After the
participant completes all of the practice sessions, the
program moves them on
to the real trials. The real trials consist of 3 sets of each
set-size, with
the set-sizes ranging from 3 to 7 letters. This makes for a
total of 75 letters
and 75 math problems. Subjects are instructed to keep their
math accuracy at or
above 85% at all times. During recall, a percentage in red is
presented in the
upper right-hand corner. Subjects are instructed to keep a
careful watch on the
percentage in order to keep it above 85%. This
study reports the Absolute Ospan score (the sum of all
perfectly recalled sets)
that is interpreted as the measure of overall working memory
capacity, and one
Reaction Time score (operations). The task takes approximately
20–25 minutes to
complete (Unsworth et al., 2005). This measure of working
memory capacity has a
high correlation with other measures of working memory and
general
intelligence, as Ospan and Raven Progressive Matrices. In
addition, AOSPAN has
a good test-retest reliability (r =
.83) and an adequate internal consistency (α=.78) (Unsworth et
al., 2005).
3.2.3 Learning
Strategies Questionnaire (LASSI;
Weinstein et al.,1987; Weinstein & Palmer, 2002;
Weinstein et al., 1982).
The original version is a
77-item questionnaire with 10 scales that assesses the
students' awareness
about, and use of, learning and study strategies related to
skill, will, and
self-regulation components of strategic learning. These scales
and their
corresponding internal consistency coefficients reported in
the Users’ Manual
(Weinstein & Palmer, 2002), are as follows: Attitude Scale
(α = .77),
Motivation Scale (α = .84), Time Management Scale (α = .85),
Anxiety Scale (α =
.87), Concentration Scale (α = .86), Information Processing
Scale (α = .84),
Selecting Main Ideas scale
(α = .89),
Study Aids Scale (α= .73), Self-Testing Scale (α = .84), and
Test Strategies
Scale (α = .80). The present study used a Spanish-version
(Strucchi, 1991),
which was slightly modified in some semantic and grammatical
aspects for the
local sample. The exploratory factor analysis determined a
matrix with five
factors that explained 37.52% of the variance. Factor 1
related to “cognitive
resources/cognitive processing” (α = .871; 13 items; R2 =
18.03%);
Factor 2, related to “time management” (α = .807; 10 items; R2
=
8.404%); Factor 3, dealing with “processing of information and
generalization”
(α = .783; 8 items; R2 = 4.567%); Factor 4 which
is related to
“anxiety management” (α = .60; 5 items; R2 =
3.431%); and Factor 5,
which involves the construct of “study
3.2.4
Background information
Basic background
information of each student used in the analyses was: gender,
highest level of
education of mother and father (not completed primary school-
primary school-
secondary school- graduated university- post-graduate),
occupation of parents,
and secondary school from which the student graduated (public
- private
religious school - private non-religious school - bilingual
school - foreign
community)
3.2.5
Academic
performance
Academic performance was
measured by the Grade Point Average (GPA) of all courses
(different subjects
depending on the discipline) at the end of each of the
academic years. All
course grades which are used by the universities to calculate
the overall GPA
are obtained using university-wide criteria for the
interpretation and
assignment of final scores in each course, from which the GPA
was calculated.
The GPA information was collected from official records at the
end of the first
academic year for each student, at each of the participating
universities, and
they all are in a scale from 0 to 10 (with 10 indicating best
performance).
3.3
Analyses procedure
The ANN model
used was a backpropagation
multilayer perceptron neural network, that is, a multilayer
network composed of
nonlinear units, which computes its activation level by
summing all the
weighted activations it receives and which then transforms its
activation into
a response via a nonlinear transfer function, which
establishes a relationship
between the inputs and the weights they are assigned. During
the training
phase, these systems evaluate the effect of the weight
patterns on the
precision of their classification of outputs, and then,
through
backpropagation, they adjust those weights in a recursive
fashion until they
maximize the precision of the resulting classifications.
ANN parameters
and variable groupings, as well
as all other network architecture parameters, were adjusted to
maximize
predictive precision and total accuracy. Confusion matrices
have been
determined for each ANN, as well as ROC analyses for the
evaluation of
sensitivity and specificity parameters. Parameters such as
learning rate (the
rate at which the ANN “learns” by controlling the size of
weight and bias
changes during learning), momentum (adds a fraction of the
previous weight
update to the current one, and is used to prevent the system
from converging to
a local minimum), number of hidden layers, stopping rules
(when the network
should stop “learning” to avoid over-fitting the current
sample), activation
functions (which define the output of a node given an input or
set of inputs to
that node or unit), and number of nodes were specified and
varied in the model
construction phase in order to maximize the overall
performance of the network
model.
3.4
Architecture of the neural networks
According
to the objectives of this research,
three different neural networks (ANN) were developed as
predictive systems for
the GPA of the students in this study. ANN1 was
developed to
maximize the predictive classification of the lowest 33% of
students, which
would be scoring the lowest average GPA at the end of the
academic year. ANN2
was developed to maximize the predictive classification of the
highest 33% of
students, which would be scoring the highest GPA. ANN3
was developed
to predict the classification of students into the three
levels of expected GPA
at the same time. The data set was partitioned into a training
set and a testing
set for each ANN, and for each network, training and testing
samples were
chosen at random by the software, from the available set of
cases. One
suggested criterion is that the number of training
inputs (cases) should be at least 10 times the number of input
and middle layer
neurons in the network (Garson, 1998). Similarly, it is
suggested that about
2/3 (or 3/4) of the cases in the available data set be used
for the training
phase in order to include a set of cases representing most of
the patterns
expected to be present in the data (patterns represented by
the vector for each
case). The remaining 1/3 or 1/4 of the data is used for the
testing phase of
the network. The specific architecture of each of the three
neural networks
developed is as follows:
ANN1 -
(Maximizing the prediction
for the Low 33% performance group): All cognitive variables,
learning
strategies, and background variables were introduced in the
analysis. They were
used for the development of the vector-matrix containing all
predictor
variables for each student. The resulting network contained
all the input
predictors, with a total of 18 input units (Reaction
Time Operation, Reaction Time Math, Reaction Time Problem,
Orienting Attention,
Alerting Attention, Executive Control, Absolute Aospan, Processing of
information/ Generalization, Study Techniques and use of help,
Anxiety
Management, Time Management, Cognitive resources/Cognitive
processing, Gender,
Mother's occupation, Father's occupation, Secondary school
from which the
student graduated, Highest level of education completed by
father, and Highest
level of education completed by mother). The model built
contained one hidden layer,
with 15 units. The output layer contained a dependent variable
with two units
(categories corresponding to “belongs to lowest 33%” or
“belongs to highest 67
%”). In terms of the architecture of the network, a
standardized method for the
rescaling of the scale dependent variables was used. The
hidden layer had a
hyperbolic tangent activation function which is the most
common activation
function used for neural networks because of its greater
numeric range (from -1
to 1) and the shape of its graph. The output layer utilized a
softmax
activation function that is useful predominantly in the output
layer of a clustering
system, converting a raw value into a posterior probability.
The output layer
used the cross-entropy error function in which the error
signal associated with
the output layer is directly proportional to the difference
between the desired
and actual output values. This function accelerates the
backpropagation
algorithm and it provides good overall network performance
with relatively
short stagnation periods (Nasr, Badr, & Joun, 2002). The
training was
carried out with the ‘online’ methodology (one case per
cycle), with an initial
learning rate of 0.4, and momentum equal to 0.9. The
optimization algorithm was
gradient descent (which takes steps proportional to the
negative of the
approximate gradient of the function at the current point),
and the minimum
relative change in training error was 0.0001.
ANN2 -
(Maximizing the prediction
for the High 33% performance group): All cognitive, learning
strategies, and
background variables were introduced in the analysis. They
were used for the
development of the vector-matrix containing all predictor
variables for each
student. The resulting network contained all the input
predictors, with a total
of 18 units (Reaction Time
Operation, Reaction
Time Math, Reaction Time Problem, Orienting Attention,
Alerting Attention,
Executive Control, Absolute Aospan, Processing
of
information/Generalization, Study Techniques and use of help,
Anxiety
Management, Time Management, Cognitive resources/Cognitive
processing, Gender,
Mother's occupation, Father's occupation, Secondary school
from which the
student graduated, Highest level of education completed by
father, and Highest
level of education completed by mother). The model built
contained one hidden layer,
with nine units, and an output layer with two units
(categories corresponding
to “belongs to highest 33%” or “belongs to lowest 67%”). In
terms of the
architecture of the network, a standardized method for the
rescaling of scale
dependent variables was used. The hidden layer had a
hyperbolic tangent
activation function. The output layer utilized a softmax
activation function.
Cross-entropy was chosen as the error function. The dataset
was partitioned
into training set and testing set. The training was carried
out with the
‘online’ methodology, with an initial learning rate of 0.5,
and momentum equal
to 0.7. The optimization algorithm was gradient descent, and
the minimum
relative change in training error was 0.0001.
ANN3
- (Maximizing the simultaneous
prediction for all the performance groups: Low 33% - Middle
33% - High 33%,
simultaneously): All cognitive, learning strategies and
background variables
were introduced in the analysis. They were used for the
development of the
vector-matrix containing all predictor variables
for each student. The resulting network contained all the
input predictors,
with a total of 19 input units (Reaction Time Operation,
Reaction Time Math,
Reaction Time Problem, Orienting Attention, Alerting
Attention, Executive
Control, Absolute Aospan, Processing
of information/ Generalization, Study Techniques and use of
help, Anxiety
Management, Time Management, Cognitive resources/Cognitive
processing, Gender,
Mother's occupation, Father's occupation, Secondary school,
Highest level of
education completed by father, and Highest level of education
completed by
mother, Ln of Attention Total RT). The model built contained one hidden
layer, with 20 units, and one
output layer with three units (categories corresponding to
“belongs to low
33%”, “belongs to middle 33%” or “belongs to high 33%” of the
performance
groups). In terms of the architecture of the network, a
standardized method for
the rescaling of scale dependent variables was used. The
hidden layer and the
output layer both had a hyperbolic tangent activation
functions. A standardized
method for the rescaling of covariates was used. Sum of
squares was chosen as
error function. The dataset was partitioned into training set
and testing set.
The training was carried out with the ‘online’ methodology,
with an initial
learning rate of 0.4, and momentum equal to 0.8. The
optimization algorithm was
gradient descent, and the minimum relative change in training
error was 0.0001.
The software used
was SPSS v.19 – Neural
Network Module, for the development and analysis of all
predictive models in
this study. Two development phases of the predictive system
were carried out:
training of the network and testing of the network developed.
During the
training phase several models were attempted, and several
modifications of the
neural network parameters were explored, such as: learning
persistence,
learning rate, momentum, and other criteria. These tests
continued until
achieving desired levels of classification, maximizing the
benefits of the
model chosen. In these analyses both precision and recall, as
outcome measures
of the network, were given equal weight. There was no need to
trim the number
of predictor inputs in the three models. The validation
procedure used was the
leave-one-out methodology.
3.5 Discriminant
analyses
Discriminant
Analyses (DA) were carried out
using the same data and the same categories of GPA used in the
Neural Networks
Analyses. DA1 was performed to discriminate between
the students
belonging to the lowest 33% of GPA and contrasting them
against those not in
that category. DA2 was focused on identifying
students in the
highest 33% of academic performance versus those not in that
group, and DA3
was calculated to discriminate the students belonging to each
one of the three
levels of GPA performance. In order to give every variable the
opportunity to
contribute significantly to the prediction, a stepwise
discriminant analysis
was calculated for each category including all independent
variables. In
addition, we calculated three discriminant analyses, one for
each category
including the independent variables of the maximised neural
networks of each
category.
4.
Results
4.1 Descriptive
data
The final sample
included 786 university
students from several disciplines (Psychology, Engineering,
Medicine, Law,
Social Communication, Business and Marketing), in three
private universities,
during the 2009-2011 academic years.
Descriptive
statistics of the cognitive
variables and learning strategies are presented in Table 1
(cognitive
variables) and Table 2 (learning strategies).
Table 1
Descriptive Statistics for
Attentional Networks, General Reaction Time, Working Memory
Capacity (Absolute
Aospan) and Reaction Time Operation
|
Alerting
Attention |
Orienting
Attention |
Executive
Control |
Ln
of Attention Total RT |
Absolute
Aospan (Sum of perfectly recalled sets) |
Ln
RT Operation |
N |
786 |
786 |
786 |
786 |
786 |
786 |
Mean |
34.40 |
44.01 |
102.54 |
6.20 |
27.88 |
7.01 |
SD |
22.14 |
22.90 |
41.68 |
.11 |
14.83 |
.20 |
Skewness |
.25 |
.24 |
3.31 |
.67 |
.25 |
.46 |
Kurtosis |
1.96 |
5.01 |
26.14 |
.98 |
-.510 |
.45 |
Minimum |
-78.00 |
-77.67 |
19.00 |
5.92 |
0 |
6.50 |
Maximum |
123.83 |
213.83 |
558.00 |
6.74 |
68 |
7.75 |
Note:
Ln
of Attention Total RT: Logarithm of Attention Total Reaction
Time (measure of
Attention Network Test)
Ln
RT Operation: Logarithm of Reaction Time
Operation (measure of AOSPAN)
Table 2
Descriptive
Statistics
for Each Factor of Learning Strategies (LASSI)
|
Cognitive
resources/Cognitive processing |
Time
Management |
Processing
of information/ Generalization |
Anxiety
Management |
Study
Techniques and use of help |
|
N |
756 |
756 |
756 |
756 |
756 |
|
Mean |
-.02 |
.00 |
.01 |
.00 |
-.01 |
|
SD |
1.09 |
1.12 |
1.11 |
1.15 |
1.14 |
|
Skewness |
.24 |
.18 |
-.37 |
.35 |
-.67 |
|
Kurtosis |
-.16 |
-.21 |
-.07 |
-.41 |
-.03 |
|
Minimum |
-2.87 |
-2.86 |
-4.61 |
-2.53 |
-4.24 |
|
Maximum |
3.85 |
3.30 |
2.56 |
3.57 |
2.22 |
|
4.2 Neural
network
analyses
ANN1
was designed to predict the
performance group corresponding to the lowest 33% of predicted
GPA. It included
82.4 % of the participants (n = 632) in the training phase and
17.6% (n = 111)
in the testing phase. After training, ANN1-
predicting the group with
the low 33% of academic performance – was able to reach 100%
correct
identification of the students that belong to the target group
(Lowest 33%)
(see Figure 1).The precision of ANN1 equalled 1 on
a maximum of 1.
The sensitivity of the network equalled 1, and the specificity
(defined as the
proportion of correctly rejected targets from all the targets
that should have
been rejected by the system) was equal to 1. The area under
the curve equalled
.877.
|
Prediction
of academic performance |
||
33% Lowest (target group) |
Others |
||
Observed academic performance |
33% Lowest (target group) |
100% |
0% |
Others |
0% |
100% |
Figure 1. Testing Phase of the Neural Network Predicting the Lowest 33% of Academic Performance Scores. (see pdf file)
In general,
several tables (3-5) show the actual
predictive weights of the variables that the ANNs used in the
prediction of
future academic performance for each of the groups (Low 33%,
High 33% and the
whole sample). The
“Importance” column
can be interpreted as the actual predictive weight of each
variable, and the
“Normalized Importance” column represents the percent of
predictive weight for
each variable (in each group’s analysis) with respect to the
variable with the
greatest predictive weight for the group in question, which is
assigned a 100%. Table
6 summarizes the actual predictive
weights of the variables, grouped by construct: Background
variables (i.e.,
parents’ education, parents’ occupation, type of secondary
school), Basic
Cognitive variables (i.e., working memory capacity,
attentional networks),
Reaction time variables (i.e., operations, attentional), and
Learning
Strategies/Motivation variables (i.e., study techniques, time
management,
anxiety management). It allows an easier comparison of the
sources of
predictive weights by area between the various student groups
and also for the
total sample.
Table 3 shows the
actual predictive weight of
each input, and the normalised importance of the different
variables for the
ANN1 predictive classification. These results
indicate that the learning
strategies regarding cognitive processes, reaction time (RT),
and time
management were the most important predictors. All reaction
times are converted
to natural logarithms (Ln) of the actual RT.
Table 3
Relative
Importance of
the Most Predictive Variables included in the Model for the
Predictive
Classification of the Lowest 33% of Scores in Academic
Performance
Low
33% Group |
||
Independent
Variable Importance |
||
Variables |
Importance |
Normalized Importance |
Cognitive resources/Cognitive
processing |
0.092 |
100.00% |
Ln Reaction Time Math |
0.083 |
90.80% |
Time Management |
0.080 |
87.30% |
Secondary school from which the student
graduated |
0.066 |
71.50% |
Father's occupation |
0.065 |
70.90% |
Executive Control |
0.062 |
67.60% |
Mother's occupation |
0.058 |
63.70% |
Ln Reaction Time Problem |
0.058 |
62.80% |
Absolute Aospan (Sum of perfectly
recalled sets) |
0.055 |
60.50% |
Anxiety Management |
0.051 |
55.40% |
Alerting Attention |
0.050 |
54.40% |
Ln Reaction Time Operation |
0.048 |
52.40% |
Orienting Attention |
0.048 |
52.10% |
Study Techniques and use of help |
0.046 |
51.70% |
Processing of information/
Generalization |
0.043 |
46.50% |
Gender |
0.040 |
43.70% |
Highest level of education completed by
mother |
0.030 |
32.60% |
Highest level of education completed by
father |
0.025 |
27.10% |
ANN2
was designed to predict the
performance group corresponding to the highest 33% predicted
GPA. It included
77.9% of the students in the training phase (n= 614) and 22.1% in the testing phase (n= 136). After training, ANN2 reached
an accuracy of 100
% (see Figure 2). The precision of ANN2 equalled 1
on a maximum of
1. The sensitivity of the network equalled 1, and the
specificity amounted to
1. The area under the curve equalled .788.
|
Prediction
of academic performance |
||
33% Highest (target group) |
Others |
||
Observed academic performance |
33% Highest (target group) |
100% |
0% |
Others |
0% |
100% |
Figure 2. Testing Phase of the Neural Network Predicting the Highest 33% of Academic Performance Scores. (see pdf file)
The most important variables
for the prediction of ANN2 (High 33%) were reaction
time, mother’s
occupation, type of secondary school, father’s occupation and
executive control
(executive attention measure) (see Table 4).
Table 4
Relative Importance of the
Most Predictive Variables included in the Model for the
Predictive
Classification of the Highest 33% of Scores in Academic
Performance
High
33% group |
||
Independent
Variable Importance |
||
Variables |
Importance |
Normalized Importance |
Ln of Reaction Time Operation |
0.084 |
100.00% |
Mother's occupation |
0.081 |
97.10% |
Secondary school from which the student
graduated |
0.081 |
96.10% |
Father's occupation |
0.076 |
90.10% |
Executive Control |
0.072 |
86.40% |
Alerting Attention |
0.062 |
73.90% |
Processing of information/
Generalization |
0.055 |
65.10% |
Orienting Attention |
0.054 |
64.10% |
Study Techniques and use of help |
0.053 |
62.30% |
Highest level of education completed by
father |
0.051 |
60.70% |
Ln of Reaction Time Math |
0.049 |
58.50% |
Anxiety Management |
0.047 |
55.60% |
Highest level of education completed by
mother |
0.044 |
52.80% |
Absolute Aospan (Sum of perfectly
recalled sets) |
0.044 |
52.70% |
Time Management |
0.044 |
52.20% |
Cognitive resources/Cognitive
processing |
0.037 |
44.70% |
Ln of Reaction Time Problem |
0.033 |
39.90% |
Gender |
0.033 |
39.60% |
Both networks showed
interesting differences in the pattern of relative normalized
importance of
those variables with the highest participation in the
predictive model. For the
low performing group in terms of general GPA (those predicted
to be in the
lowest 33% of scores), several learning strategies related to
cognitive
processes, reaction time (WMC and attentional networks
functioning), and time
management were most important in providing predictive weights
for a correct
classification. On the other hand, results from the predictive
model for those
students expected to be in the highest 33% of the general GPA
scores, the top
three predictors with the most significant participation were
background
variables involving mother’s and father’s occupation, type of
secondary school,
and overall reaction time of the cognitive and attentional
processes.
ANN3,
which was designed to predict
the three GPA performance groups simultaneously, used 82.8% of
the students (n=710)
for the training phase, and 17.2%
(n=122) for the
testing phase. After maximizing
the training procedures, the accuracy in the testing phase
reached 87.5% for
the Lowest 33%, 100% for the Middle 33%, and 100% for the
Highest 33% (see
Figure 3). The precision of ANN3 equalled .875 on a
maximum
of 1. The sensitivity of the network equalled 1, and the
specificity amounted
to .50. The areas under the curve were .658 for the Low 33%,
.583 for the
Middle 33%, and .637 for the High 33%.
|
Prediction
of academic performance |
|||
33% Lowest |
Middle 33% |
33% Highest |
||
Observed academic performance |
Low 33% |
87.5
% |
10% |
2.5% |
Middle 33% |
0% |
100% |
0% |
|
High 33% |
0% |
0% |
100% |
The most
important variables for the prediction
of ANN3 were orienting attention, learning
strategies related to the
cognitive resources and information processing, time
management, and executive
control (executive attentional network) (see Table 5).
Table
5
Relative Importance of the Most
Predictive Variables included in the Model for the
Predictive Classification of
the Three Levels of Academic Performance
All
3 Groups - GPA (Low 33% - Mid 33% - High 33%) |
||
Independent
Variable Importance |
||
Variables |
Importance |
Normalized
Importance |
Orienting Attention |
0.087 |
100.00% |
Cognitive resources/Cognitive
processing |
0.076 |
86.86% |
Time Management |
0.074 |
84.92% |
Executive Control |
0.073 |
83.30% |
Father's occupation |
0.071 |
81.80% |
Mother's occupation |
0.070 |
79.91% |
Ln of Attention Total Reaction Time |
0.067 |
77.25% |
Alerting Attention |
0.067 |
76.63% |
Ln of Reaction Time Math |
0.061 |
70.14% |
Processing of information/
Generalization |
0.050 |
57.20% |
Ln of Reaction Time Operation |
0.043 |
49.64% |
Study Techniques and use of help |
0.041 |
46.55% |
Ln of Reaction Time Problem |
0.040 |
46.13% |
Anxiety Management |
0.038 |
43.89% |
Gender |
0.032 |
36.67% |
Highest level of education completed by
father |
0.031 |
35.73% |
Absolute Aospan (Sum of perfectly
recalled sets) |
0.031 |
35.09% |
Highest level of education completed by
mother |
0.026 |
29.88% |
Secondary school |
0.024 |
27.29% |
4.3 Maximizing
the
ANN models
All ANN models were developed
so as to maximize the accuracy of the classification. The
number of units in
the hidden layers was determined by optimizing the ability of
the hidden nodes
to store the necessary weight information, while avoiding the
over-determination that would result from an excessive number
of units. While
greater number of units would have given the model greater
flexibility, it
would have increased complexity at the cost of decreasing
generalizability to
the testing sample. Similarly, not enough units would not have
produced a
proper fit with the data and would have reduced the power of
the model. Therefore,
various models were developed in order to find the proper
balance and maximize
the predictive power for each model.
In
all models, the training and testing samples were
selected at random from the existing data and the proportions
were adjusted in
order to maximize the training sample while preserving the
appearance of all
detected patterns in the testing sample, so as to be able to
appropriately test
the model. Other
parameters that were
varied in order to maximize the performance of the networks
were learning rate
and momentum. The variations in the learning rate parameter
allowed the control
of the amount of weight and bias change during the training of
the network.
Different problem conditions find better solutions with
different size of
changes in the architecture of the network.
Regarding the momentum, it was used to prevent the
network from converging
too early to a local minimum, and conversely to avoid
overshooting the global
minimum of the function; thus, it is important to avoid having
a value which is
too large for the momentum (it can overshoot), or too low (it
can get stuck in
a local minimum). Balancing these parameters maximizes the
solution, and if
correctly identified provide a stable and reliable solution as
the ones that
were found in this study.
4.4 Predictive
contribution
by categories of variables
Besides studying the
contribution of each variable individually for each neural
network developed to
classify the various expected performance levels (low
performers, high
performers, and three performance groups simultaneously), the
contribution of
each category or set of variables (background, basic cognitive
processes, total
reaction times for WMC operations and attentional networks,
and learning
strategies/motivation) was analysed for each ANN developed,
and the total
predictive weight for each category of variables, as well as
their average, was
determined. Table 6 and Figure 4 show that in terms of
predictive weight, the
most important variables when estimating the levels of
predicted GPA
performance for all three groups simultaneously, are the
background factors
(e.g., socio-economic status proxy data, type of secondary
school, occupation
and education of parents, etc.), but when comparing the two
extreme predicted
performance groups, it is interesting to note that specific
patterns involving
different variables are evident for low and high expected
academic performance:
learning strategies/motivation had a stronger predictive
weight for students
expected to be in the lowest 33% of GPA performance; on the
other hand, for
students predicted to belong to the highest 33% of GPA
performance, background
variables and some of the cognitive processing variables were
those carrying
the most predictive weight.
Table 6
Comparative
Predictive
Weight Contribution for the Three Levels of Academic
Performance by
each of the Categories of Predictor Variables
Low
33% |
Mid 33 |
High
33% |
Mean
Predictive Weight of Each Area |
|
Background |
28.40% |
25.40% |
36.60% |
30.13% |
Basic Cognitive |
21.50% |
25.70% |
23.20% |
23.47% |
Reaction Time total |
18.90% |
21.10% |
16.60% |
18.87% |
Learning Strategies/Motivation |
31.20% |
27.80% |
23.60% |
27.53% |
100% |
100% |
100% |
||
Figure
4. Comparison of
Predictive Weight Levels for the
Three Levels of Academic Performance by Categories of
Predictor Variables. (see pdf file)
4.5 Initial
analysis
of individual continuous estimates of future academic
performance
While most of this study has been
centered around the successful development of models to
categorize expected
levels of performance (which can be varied according to the
problem situation),
it is also important and useful to demonstrate that this
machine learning
approach can be used to predict individual specific outcomes
(not just
relatively broad performance categories). Although these
performance categories
can be very useful, as has been indicated for the
identification and possible
intervention in specific groups of high achievers or low
achievers (i.e.,
learning disabilities, non-readiness for some specific task
such as reading),
and they can be used very effectively for targeted
interventions in learning
situations, it is also important to be
able to understand the underlying phenomenon at the individual
level,
considering performance a continuous variable.
For this reason, the predicted
GPA-category (low-middle-high) probability values assigned by
the network to
each individual student were used to analyze their correlation
with the
observed GPA, as compared to the predicted value, in the
context of the ANN3
model, in which the whole sample of students was
simultaneously
classified in the three levels of expected performance. That
is, the probability
value for each student of belonging to a given category (all
students received
a certain probability of belonging to each of the outcome
groups, as determined
by the ANN), was correlated with the GPA actually obtained by
each student.
Results were indicative of a high degree of correlation
between those measures.
The
three predicted groups of Low, Mid, and High performance had
an actual observed
GPA mean of 3.88 (SD = 1.21, n = 327), 5.67 (SD = .33, n =
243), and 7.28 (SD =
.78, n = 294), respectively. All these average GPA means were
significantly
different from each other (p < .000). Within each one of
the performance
levels, the correlation of the ANN individual predicted value
with the actual
GPA was: Low 33%, r = .78; High 33%, r = .73, and for the
whole sample of
students, at all three levels, the correlation of the ANN
predicted values with
the observed GPA was r = .86. Further studies will continue to
explore these
individual relationships, but as they are, they confirm a high
level of
correlation between the actual GPA and the expected values
assigned by the ANN.
4.6 Discriminant
analyses
(DA)
DA1 focused on the
attempted predictive classification of students expected to be
in the lowest
33% of GPA average, compared to the rest of the students. One
of the
restrictions of this analysis has to do with the assumption of
equality of
covariance matrices that, in this case, is not violated (Box’s
M = 5.253, F = .871, p=
.515).
Gender, WMC and cognitive resources/learning strategies, were
able to
discriminate between the two groups of students, but not the
rest of the
variables, that were included in the ANN1. The
squared canonical
correlation (CR²) gives the amount of variation between the
groups that is
explained by the discriminating variables, which in this case
was quite low
(Wilk’s λ = .896, χ² = 84.786, df = 3, p =
.001, CR² = .323).
DA2 was carried out
to attempt to discriminate between students expected to be in
the highest 33%
of GPA average, compared to the 67% of the rest of the
students. The same
independent variables that were used in the ANN2
were entered in
this analysis. Results show that the independent variables
were not able to
discriminate between both groups of students. The Box’s M statistic is not significant (Box’s M = 11.813, F = .781, p = .700), meaning
that the assumption
of equality of covariance matrices is not violated. In this
analysis the
squared canonical correlation indicated that the strength of
the function is
very low (Wilk’s λ = .926, χ² = 58.694, df = 5, p = .001, CR² = .271).
Only gender, highest level of education of the father, WMC,
and cognitive
resources, and time management among the learning strategies
set, were
variables that entered significantly in this model.
DA3 was carried out with
the same variables as those used to develop ANN3,
in order to
predict the expected GPA performance level of the three groups
of academic
performance simultaneously. The
assumption of equality of covariance matrices was not violated
(Box’s M = 7.522, F = .623, p
= .824). In
this case, only gender, cognitive resources within the
learning strategies set
and WMC were significant for the model, and participated in
the discrimination
between the students in the three groups. But the model
explained a very low
and non-significant proportion of the variance (Wilk’s λ =
.998, χ² = 1.791, df
= 2, p = .408, CR² = .048).
5.
Discussion and
conclusions
The purpose of this
study was to show the
applicability and the effectiveness of the ANN approach to the
predictive
classification of students in the full range of academic
performance (GPA), as
well as to identify and understand the importance of the
variables for each
level (low, middle and high) of expected GPA. This methodology,
using a
predictive system, was chosen as it is very effective under
conditions of very
complex and great amount of data, in which a large number of
variables interact
in various complex and not very well understood patterns.
The results
attained in this study have allowed
the identification of the specific influence of each input set
of variables on
different levels of academic performance (high and low
performance), on one
hand, and common processes across all students, on the other
hand. One
important contribution of this predictive approach is the
finding that the same
variables have different effects in each group of students,
defining specific
patterns for each performance level. Although the contribution
of each variable
in a particular pattern carries a relatively small predictive
weight, it is the
combined effect of the pattern of variables which explains a
lower or higher
academic performance model.
Among the student
group with the lowest 33% of
academic performance, two main predictors are learning
strategies components
(cognitive resources/cognitive processing and time management).
The importance
of learning strategies as a mediating factor in a model
predicting academic
performance has been shown in different studies (Dupeyrat &
Marine, 2005; Fenollar,
et al., 2007; Simons et al., 2004; Weinstein & Mayer, 1986;
Weinstein et
al., 1987; Weinstein et al., 1982). However, this study added
the contribution
of a complex pattern of variables for a particular group of
students,
identifying specific learning strategies that help the
classification of
students in a low performance group (i.e., thoughts or
behaviours that help to
use imagery, verbal elaboration, organization strategies, and
reasoning
skills). Included in this set are learning strategies that help
build bridges
between what they already know, and what they are trying to
learn and remember
(i.e., knowledge acquisition, retention, and future
application). In addition,
variables related to speed of processing involved in WMC
functioning have an
important predictive weight for the determination and modelling
of the low
performance group. Other studies that have used ANN have also
found that basic
cognitive processing variables such as WMC and Executive
Attention carried the
most predictive weight in the low performance group of students
(Kyndt et al.,
2012, submitted; Musso & Cascallar, 2009a; Musso et al.,
2012). Moreover,
the literature has indicated the positive association between
WMC and academic
achievement (Gathercole, Pickering,
Knight, & Stegmann, 2004; Riding, Grimley,
Dahraei, & Banner, 2003). Regarding the relative importance
of each
variable, if we compare the relative role of WMC and other
cognitive resources
between the low and high performance groups, WMC and cognitive
resources were
far more important for lower GPA students. The fact that their
importance for
the prediction is much greater for the lower performing group is
greatly due to
the fact that all members of the high group had higher levels of
WMC and
cognitive resources, therefore not providing the necessary
information to the
network. On the other hand, it was an identifying characteristic
of the low
performing group which had consistently lower values of WMC and
cognitive
resources. Remediation programmes, tutorial systems and
instruction methods
should consider these specific learning strategies, cognitive
processing
characteristics and WMC resources, in order to provide basic
support to
students at risk. Such informed interventions would improve the
possibilities
of successful academic achievement for the at-risk groups,
including those with
particular learning difficulties.
Background
variables together with reaction
time measures and attentional executive control are the most
important
predictors for the highest academic performance group, as
indicators of both
efficiency in the processing and of adequate selection of
information. Social
background variables, such as educational level of the parents,
have been found
to be significant in a previous ANN study (Pinninghoff, Junemann
et al., 2007),
and these results have been replicated in this study. The
executive control
mechanism is responsible for resolving conflicts among responses
(Fan et al.,
2002). This attentional system has been closely related to
working memory
capacity (Redick & Engle, 2006), and was found to mediate
and compensate
WMC deficits for certain tasks (Musso et al., 2012). Other
attentional networks
seem to be much less discriminating among students who reach
certain threshold
levels needed for high academic performance. These findings have
significant
implications in the way that the learning process can be
addressed for students
identified as potential high achievers.
For this group, promoting learning through the use of
metacognitive
strategies, complex processing, and targeted teacher feedback
would be an
important way of maximizing their potential performance.
Regarding
methodological implications, these
results demonstrate the greater accuracy of the ANN approach
compared to other
traditional methods such as DA. Other studies have also made use
of multilayer
perceptron artificial neural networks, with positive results for
the analysis
of educational data (Abu Naser, 2012; Croy et al., 2008; Fong,
et al., 2009; Kanakana,
& Olanrewaju, 2011; Mukta & Usha,
2009; Ramaswami &
Bhaskaran, 2010; Zambrano Matamala, et al., 2011). However, the
present study has been able to
maximize the precision obtained in the predictive classification
of overall
academic performance through the careful adjustment of network
parameters and
algorithms, producing highly accurate results with minimal
misclassifications.
Similarly, the
initial study of the correlation
between the ANN probabilities of performance level assigned to
each individual
student, with the actual GPA observed, shows a significant
degree of
correlation between the two measures (r = .86 for the whole
sample), with
performance as a continuous variable.
Further studies will refine the technique to maximize
these individual
results.
The results of the
DA confirm the lack of
significant linear relationships between the independent
variables analysed in
this study and academic performance. Neural network models have
an important
advantage in this respect, as they are able to model nonlinear
and complex
relationships among variables with greater precision and
accuracy. Even though
the assumptions required for traditional statistical predictive
models (e.g.
equality of covariance matrices) were not violated for the three
stepwise
discriminant analyses that were performed, the amount of
variance explained was
low in all three DA analyses. None of these analyses were able
to discriminate
with sufficient accuracy between the different levels of
expected academic
performance. When we compare these results with the ANNs
modelled in this
study, it can be concluded that ANNs are much more robust, and
perform
significantly better than other classical techniques, as prior
studies have
also indicated (Everson et al., 1994; Marquez et al., 1991).
This study has shown
the power of this predictive approach using ANNs to model future
overall academic performance in
higher education, specifically in academic admissions and/or
placement. To put
the current results in perspective, if we consider one of the
best known and
most reliable tests currently in use, the SAT from The College
Board, it has
been found (Kobrin, Patterson, Shaw, Mattern, & Barbuti,
2008) that all
sections of the SAT taken together, even with the more recent
addition of a
writing score, can predict at best 28% of the variance of the
first-year
college GPA for the average population of students. If we add to
the SAT
results the information of the GPA obtained in secondary
education, the overall
prediction is of only 38% of the variance of first-year college
GPA (Kobrin et
al, 2008). With the current ANN models, it has been possible to
correctly
classify 100% of student performance in the categories examined,
that is, 100%
of the students were correctly classified, and our research
currently continues
into the development of new predictive models, with much larger
data sets, to
classify students in much narrower bands of expected performance
having already
attained 98-99% accuracy in models for quintals of student
performance
distributions. In addition, work will also continue for the
prediction of
specific expected GPA results for each individual student.
In
conclusion, the current predictive systems
approach facilitates and maximizes the identification of those
factors (or
predictors) of the learning processes which participate in
varying degrees in
the modelling of different levels of performance in academic
outcomes in higher
education. If we can identify specific profiles of students,
focusing on the
most important variables, this opens major possibilities for the
improvement of
assessment procedures and the planning of pre-emptive
interventions. Given that
this methodology allows for the accurate prediction of actual
academic
performance at least one academic year in advance to it actually
being measured
(GPA), it has implications for the application of these methods
in educational
research and in the implementation of diagnostic “early-warning”
programmes in
educational settings. These results also inform cognitive theory
and help in
the development of improved automated tutoring and learning
systems. Although
some of the variables involved, such as educational level of the
parents,
are impossible to alter in their effects on academic performance
at the time of
the assessment, they do inform policy and indicate the weight
that many social
and environmental factors influence future academic performance.
This
methodological and conceptual approach allows us to consider a
large number of
variables simultaneously and select those which are most
relevant and allow a
greater degree of intervention to improve student performance,
including early
intervention programmes for students in need of special support.
The capacity to
very accurately classify
expected student performance, which is also what tests attempt
to do, without
the performance sampling issues of traditional testing, and
using a much
broader spectrum of all factors influencing a student’s overall
performance, is
a major advantage of the ANNs methodology. In fact, it also
represents a more
valid approach to educational assessment due to its overall
accuracy and the
breadth of the constructs considered to classify the expected
performance.
Traditional assessments are not sufficient for more complex
assessments or for
assessment systems that intend to serve multiple direct and
indirect purposes,
in complex educational situations (Mislevy, 2013; Mislevy,
Steinberg, &
Almond, 2003) In
this respect, this new
approach allows for the conceptualization and development of new
modes of
assessment which could facilitate breaking away from traditional
forms of
testing while at the same time improving the quality of the
assessment process
(Segers, Dochy & Cascallar, 2003).
Finally, the use of
ANN together with other
methods as cluster analyses and Kohonen networks could
contribute to the study
of the specific patterns of those variables which influence the
learning
process for each level of performance. In fact, a major
observation resulting
from the data in this study is that variables contribute to the
prediction in
relatively small proportions, and it is the joint effect of many
contributing
variables that could cause significant changes in performance.
In other words,
there is no “magic bullet”, rather the accumulation of effects
from all these
various sources that produces significant changes in outcomes.
These results
provide an insight into learning questions from a different
perspective and one
that has important implications for educational policy and
education at large.
Keypoints
References
Abu
Naser, S. S. (2012). Predicting
learners
performance using artificial neural networks in linear
programming
intelligent tutoring system. International
Journal of Artificial Intelligence & Applications (IJAIA),
3(2), 65-73
Anderson,
J. R. (1983). The
architecture of
cognition. Cambridge, MA: Harvard University Press.
Anderson,
J. R. (1993). Rules of the mind. Hillsdale, NJ:
Lawrence Erlbaum
Associates.
Anderson,
J. R. (2002). Spanning seven orders of magnitude: A challenge
for cognitive
modeling. Cognitive
Science, 26,
85–112.
Anderson,
J. R., Bothell, D., Byrne, M. D., Douglass S., Lebiere, C.,
&Yulin, Q.
(2004). An integrated theory of the mind. Psychological
Review, 111(4), 1036–1060.
Baddeley,
A. D. (1986). Working Memory. Oxford: Clarendon Press.
Bansal,
A., Kauffman, R. J., & Weitz, R. R. (1993). Comparing the Modeling
Performance of Regression and Neural Networks as Data Quality
Varies: a
Business Value Approach. Journal
of
Managemnet Informations Systems, 10(1), 11- 32.
Bekele,
R., & McPherson, M. (2011).A
Bayesian performance prediction model for mathematics education:
A prototypical
approach for effective group composition. British Journal of
Educational Technology, 42(3), 395–416.
Biggs,
J. (1987). Study Process
Questionnaire
manual. Melbourne, Australia: Australian Council for
Educational Research.
Birenbaum, M., Breuer, K.,
Cascallar, E., Dochy, F., Dori, Y, Ridgway, J, Wiesemes, R.
(2006), & Nickmans,
G. (Editor). A
learning Integrated Assessment System. Educational Research
Review, 1, 61-67.
Boekaerts, M., &
Cascallar, E. (2006). How far
have we moved toward the integration of theory and
practice in Self-regulation? Educational Psychology Review,
18(3),
199-210.
Boekaerts, M. &
Cascallar, E. C. (2011). Predicting and Explaining Writing
Outcomes: Neural
Network Methodology at work. Symposium:
Predicting academic performance with the use of predictive
systems analysis. Proceedings
of the Biennial Conference of
the European Association for Research on Learning and
Instruction (Earli).
Exeter, UK, 30 August – 3 September 2011.
Braten,
I. & Stromso, H. (2006). Epistemological beliefs, interest,
and gender as
predictors of Internet-based learning activities. Computers
in Human Behavior,
22, 1027-1042.
Cascallar, E. C., Boekaerts,
M., & Costigan, T. E.
(2006) Assessment in the Evaluation of Self- Regulation as a
Process, Educational
Psychology Review, 18(3), 297-306.
Cascallar, E. C., &
Musso, M. F. (2008). Classificatory stream analysis in the
prediction of
expected reading readiness: Understanding student performance. International Journal of
Psychology,
Proceedings of the XXIX International Congress of Psychology
ICP 2008, 43(43/44),
231-.231.
Castejón,
J. L., & Navas, L. (1992). Determinantes del rendimiento
académico en la
educación secundaria. Un modelo causal.
[Determinants of academic achievement in secondary
education. A
causal model]. Análisis y Modificación
de Conducta, 18(61),
697-728.
Cattell,
R. B. (1971). Abilities: Structure,
growth and action. Boston: Houghton
Mifflin.
Chamorro-Premuzic,
T., & Arteche, A. (2008). Intellectual
competence
and academic performance: preliminary validation of a model. Intelligence, 36, 564-573.
Colom,
R., Escorial, S., Chun Shih, P., & Privado, J. (2007).Fluid intelligence, memory
span, and temperament difficulties predict academic performance
of young
adolescents. Personality and
Individual Differences, 42, 1503-1514.
Conway,
A. R. A., Cowan, N., Bunting, M. F., Therriault, D., &
Minkoff, S. (2002).
A latent variable analysis of working memory capacity, short
term memory
capacity, processing speed, and general fluid intelligence. Intelligence, 30, 163-
183.
Conway,
A. R. A., & Engle, R.W. (1996). Individual differences in
working memory
capacity: More evidence for a general capacity theory. Memory, 4, 577-590.
Conway, A. R. A., Kane, M.
J., Bunting, M. F., Hambrick, D. Z., Wilhelm, O., & Engle,
R. W.
(2005).Working memory span tasks: A methodological review and
user’s guide. Psychonomic Bulletin
& Review, 12(5), 769-786
Croy, M., Barnes, T., &
Stamper, J. (2008). Towards an intelligent tutoring system for
propositional
proof construction. In A. Briggle, K. Waelbers, and
P. Brey (Eds.), Computing and
Philosophy (pp.
145-215). Amsterdam, The Netherlands: IOS Press.
Daneman,
M., & Carpenter, P. A. (1980).Individual-differences in
working memory and
reading. Journal of
Verbal Learning and
Verbal Behaviour, 19, 450 - 466.
Duliba, K. A. (1991)
Contrasting Neural Nets with Regression in Predicting
Performance in the
Transportation Industry. Proceedings
of the
Twenty-Fourth Annual Hawaii International Conference on System
Sciences,
4.
Dupeyrat,
C., & Marine, C. (2005). Implicit
theories of intelligence, goal orientation,
cognitive engagement, and achievement: A test of Dweck's
model with returning
to school adults. Contemporary
Educational Psychology, 30(1),
43-59.
Engle,
R.W. (2002). Working memory capacity as executive attention. Current Directions in
Psychological Science,
11, 19-23.
Engle, R.W., & Kane, M.
J. (2004).Executive attention, working memory capacity, and a
two-factor theory
of cognitive control. In B. Ross (Ed.), The
Psychology of Learning and Motivation (pp. 145-199).
NewYork, NY: Elsevier.
Eriksen,
B. A., & Eriksen, C.W. (1974). Effects of noise letters upon the
identification of a target letter
in a non search task. Perception
and
Psychophysics, 16, 143-149.
Everson, H. T. (1995).
Modelling the student in intelligent tutoring systems: The
promise of a new
psychometrics. Instructional
Science, 23(5-6),
433-452.
Everson,
H. T., Chance, D., & Lykins, S. (1994). Exploring the use of
artificial
neural networks in educational research. Paper
presented at the annual meeting of the American Educational
Research
Association, New York.
Fan, J., McCandliss, B. D.,
Summer, T., Raz, A., & Posner, M.I. (2002).Testing the
efficiency and
independence of attentional networks. Journal
of Cognitive Neuroscience, 14(3), 340-347.
Feldman
Barrett, L., Tugade, M. M., & Engle, R. W. (2004).
Individual differences
in working memory capacity and dual-process theories of mind. Psychological Bulletin,
130, 553-573.
Fenollar,
P., Roman, S., & Cuestas, P. J. (2007). University
students’ academic performance: An
integrative conceptual framework and empirical analysis. British Journal of Educational Psychology, 77,
873-891.
Fernandez-Castillo, A.,
& Gutiérrez-Rojas, M. E.
(2009). Selective attention, anxiety, depressive symptomatology
and academic
performance in adolescents. Electronic
Journal of Research in Educational Psychology, 7(1), 49-76.
Fletcher, J. M. (2005).
Predicting math outcomes: Reading predictors and comorbidity. Journal of Learning
Disabilities, 38(4),
308-312.
Fong,
S., Si, Y.-W., & Biuk-Aghai, R. P. (2009). Applying a Hybrid
Model of Neural Network and
Decision Tree Classifier for Predicting University Admission. Proceedings of the 7th
International
Conference on Information, Communication, and Signal
Processing (ICICS2009),
pp. 1-5, Macau, China, IEEE Press.
Garson, G. D. (1998). Neural Networks. An
Introductory Guide for
Social Scientists. London: Sage Publications Ltd.
Gathercole,
S. E., Pickering, S. J., Knight, C., & Stegmann, Z.
(2004).Working memory
skills and educational attainment: Evidence from national
curriculum
assessments at 7 and 14 years of age. Applied
Cognitive Psychology, 18, 1-16.
Gazzaniga,
M., Ivry, R., & Mangun, G. (2002).Cognitive
neuroscience:
The biology of the mind (2nd ed.). New York, NY: W.W.
Norton
Grimley,
M., & Banner, G. (2008).Working memory, cognitive style, and
behavioural
predictors of GCSE exam success. Educational
Psychology, 28(3),
341-351.
Grossberg,
S. (1980). How does the brain build a cognitive code? Psychological Review, 87, 1- 51.
Grossberg, S. (1982). Studies of mind and brain: Neural principles of
learning, perception,
development, cognition and motor control. Boston: Reidel
Press.
Gsanger, K., W., Homack,
S., Siekierski, B., &
Riccio, C. (2002).The relation of memory and attention to
academic achievement
in children. Archives of
Clinical
Neuropsychology, 17(8), 790.
Hailikari,
T., Nevgi, & A., Komulainen, E. (2008). Academic
self-beliefs and prior
knowledge as predictors of student achievement in Mathematics: a
structural
model. Educational
Psychology, 28(1),
59-71.
Hazy,
T. E., Frank, M. J., & O’ Reilly, R. C. (2006). Banishing
the Homunculus:
Making Working Memory Work, Neuroscience
139, 105–118.
Heitz, R. P.,
Redick, T. S.,
Hambrick, D. Z., Kane, M. J., Conway, A. R. A., & Engle,
R. W. (2006). Working
memory, executive
function, and general fluid intelligence are not the
same. Behavioral and Brain
Sciences, 29, 135-136.
Jarrold, C., & Towse, J.
N. (2006). Individual differences in working memory. Neuroscience, 139, 39-50.
Jimmerson,
S. R., Dubrow, E. H., Adam, E., Gunnar, M., & Bozoky, I. K.
(2006).Associations among academic achievement, attention, and
andrenocortical
reactivity in Caribbean village children. Canadian
Journal of School Psychology, 21, 120-138.
Kanakana, G., &
Olanrewaju, A. (2011).Predicting student performance in
engineering education
using an artificial neural network at Tshwane University of
Technology, Proceedings
of the ISEM, Stellenbosch,
South Africa.
Kane, M. J., Hambrick, D. Z.,
Tuholski, S.W., Wilhelm, O., Payne, T.W., & Engle, R.W.
(2004). The
generality of working memory capacity: A latent variable
approach to verbal and
visuospatial memory span and reasoning. Journal
of Experimental Psychology: General, 133, 189-217.
Kent,
R. (2009). Rethinking data analysis – part two. Some
alternatives to
frequentist approaches. International Journal of Market
Research, 51,
181-202.
Kobrin,
J. L., Patterson, B. F., Shaw,
E.
J., Mattern, K. D., & Barbuti, S. M. (2008). Validity of the
SAT for
predicting first-year college grade point average. College Board Research Report 2008-5.New York: The
College Board.
Retrieved from http://research.collegeboard.org/rr2008-5.pdf.
Kohavi,
R. & Provost, F. (1998).Glossary of terms. Machine Learning, 30(2–3): 271–274.
Kuncel, N. R., Crede, M.,
Thomas, L. L., Klieger, D.M., Seiler, S.N., & Woo, S.E.
(2004). A
meta-analysis of the Pharmacy College Admission Test (PCAT) and
grade
predictors of pharmacy student success. Annual
conference of the American Psychological Society, Chicago,
IL.
Krumm, S.,
Ziegler, M., Buehner, M. (2008). Reasoning and working memory as
predictors of
school grades. Learning and Individual Differences, 18
(2), 248-257.
Kyllonen,
P. C., & Christal, R. E. (1990). Reasoning ability is
(little more than)
working-memory capacity?! Intelligence,
14, 389-433.
Kyllonen,
P. C., & Stephens, D. L. (1990).Cognitive abilities as
determinants of
success in acquiring logic skill. Learning
and Individual Differences, 2, 129-160.
Kyndt, E.,
Cascallar, E., & Dochy, F.
(2012). Individual differences in working memory capacity and
attention, and
their relationship with students’ approaches to learning. Higher Education, 64(3),
285-297.
Kyndt, E., Musso,
M., Cascallar, E., &
Dochy, F. (2012, Submitted). Predicting
academic performance: The
role of cognition, motivation and learning approaches. A neural
network
analysis.Journal of Further and
Higher Education.
Landerl, K. (2010). Temporal
processing, attention, and learning disorders. Learning & Individual Differences, 20(5), 393-401.
Linn,
R. L., & Hastings, C. N. (1984). A meta-analysis of the
validity of
predictors of performance in law school. Journal
of Educational Measurement, 21, 245-259.
Lippman, R. (1987). An
introduction to computing with
neuralets. IEEE ASSP
Magazine, 3(4),
4-22.
Lovett, M. W. (1979). The
selective encoding of sentential information in normal reading
development. Child
Development, 50(3),
897.
Lykins,
S., & Chance, D. (1992). Comparing artificial neural
networks and multiple
regression for predictive application, Proceedings of the
Eight Annual
Conference on Applied Mathematics, Edmond OK, 155-169
Marquez, L., Hill, T.,
Worthley, R., & Remus, W.
(1991). Neural network models as an alternative to regression. Proceedings of the IEEE
24th Annual Hawaii
International Conference on Systems Sciences, 4, 129-135.
Marshall, D. B., &
English, D. J. (2000).Neural network modelling of risk
assessment in child
protective services. Psychological
Methods,
5(1), 102-124.
Maucieri, L. P. (2003).
Predicting behavior with an artificial neural network: A
comparison with linear
models of prediction (January 1, 2003). ETD Collection for
Fordham
University, NY, USA. Retrieved
from
http://fordham.bepress.com/dissertations/AAI3098134.
Mavrovouniotis, M. L.
& Chang, S.
(1992).Hierarchical neural networks. Computers
& Chemical Engineering, 16(4), 347-369.
Miñano, P., Gilar, R., &
Castejón, J. L. (2012) A structural model of
cognitive-motivational variables
as explanatory factors of academic achievement in Spanish
Language and
Mathematics. Anales de Psicología,
28(1), 45-54.
Mislevy, R. J. (2013).
Measurement is a Necessary but
not Sufficient Frame for Assessment. Measurement,
11, 47–49, 2013
Mislevy, R. J., Steinberg,
L. S., & Almond, R. A.
(2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and
Perspectives, 1,
3–67.
Mukta,
P., & Usha, A., (2009). A study
of academic performance of business school graduates using
neural network and
statistical techniques. Expert
Systems
with Applications, 36(4), 7865-7872.
Musso, M. F., &
Cascallar, E. C. (2009a). New approaches for improved quality in
educational assessments:
Using automated predictive systems in reading and mathematics. Journal
of
Problems of Education in the 21st Century, 17,
134-151.
Musso, M. F., &
Cascallar, E. C. (2009b).Predictive systems using artificial
neural networks:
An introduction to concepts and applications in education and
social sciences.
In M. C. Richaud & J. E. Moreno (Eds.).Research
in Behavioural Sciences (Volume I), (pp. 433-459). Argentina: CIIPME/CONICET.
Musso, M. F., Kyndt,
E., Cascallar, E. C., & Dochy, F. (2012). Predicting
mathematical
performance: The effect of cognitive processes and
self-regulation factors. Education
Research International.Vol.
12.
Nasr, G. E., Badr, E. A.,
& Joun, C. (2002). Cross Entropy Error Function In Neural
Networks:
Forecasting Gasoline Demand. FLAIRS-02
Proceedings of the AAAI. Retrieved from
http://www.aaai.org/Papers/FLAIRS/2002/FLAIRS02-075.pdf
Navas,
L., Sampascual, G., & Santed, M. A. (2003). Predicción de
las
calificaciones de los estudiantes: la capacidad explicativa de
la inteligencia
general y de la motivación. [Prediction
of students’ performance scores:
the role of the general
intelligence and motivation. Journal of
General and Applied Psychology], 56(2), 225-237.
Neal,
W., & Wurst, J. (2001). Advances in market segmentation. Marketing Research, 13(1),
14-18.
Passolunghi,
M. C., & Pazzaglia, F. (2004). Individual differences in
memory updating in
relation to arithmetic problem solving. Learning
and Individual Differences 14(4), 219-230.
Perkins,
K., Gupta, L. & Tammana (1995).
Predict item difficulty in a reading comprehension test
with an
artificial neural network. Language
Testing, 12(1), 34-53.
Pickering,
S. J. (2006). Working
memory and
education. USA: Academic Press.
Pinninghoff
Junemann, M. A., Salcedo Lagos, P. A., & Contreras
Arriagada, R.
(2007).Neural networks to predict schooling failure/success. In
J. Mira &
J.R. ´Alvarez (Eds.), IWINAC
2007, Part
II, LNCS 4528(pp. 571–579). Berlin / Heidelberg:
Springer-Verlag.
Pintrich,
P. R. (2000). The role of goal orientation in self-regulated
learning. In M.
Boekaerts, P.R. Pintrich, & M. Zeidner (Eds.), Handbook of self-regulation (pp. 452–502). San
Diego, CA: Academic
Press.
Posner, M. I. (1980).
Orienting of attention. Quarterly
Journal
of Experimental Psychology, 41A, 19-45.
Posner,
M. I., & Petersen, S. E. (1990). The attention
system of the human brain. Annual Review Neuroscience.
13, 25-42.
Posner, M. I., &
Rothbart, M. K.
(1998). Attention, self-regulation and
consciousness. Philosophical
transactions
of the Royal Society of London. Series B, Biological sciences,
353,
1915–1927.
Ramaswami, M.
M., & Bhaskaran,
R. R. (2010). A CHAID based performance prediction model in
educational data
mining. International Journal of Computer Science Issues,
7(1),
10-18.
Redick,
T. S., & Engle, R.W. (2006).Working memory capacity and
attention network
test performance. Applied
Cognitive
Psychology, 20, 713-721.
Reid, R. (2006).
Self-regulated strategy development for written expression with
students with
attention deficit/ hyperactivity disorder. Exceptional
Children, 73(1),
53-67.
Riccio,
C. A., Lee, D., Romine, C. Cash, D., & Davis, B.
(2002).Relation of memory
and attention to academic achievement in adults. Archives of Clinical Neuropsychology, 18(7), 755-756.
Riding,
R. J., Grimley, M., Dahraei, H., & Banner, G.
(2003).Cognitive style,
working memory and learning behaviour and attainment in school
subjects. British Journal
of Educational Psychology,
73, 749-769.
Roth, P. L., Be Vier, C. A., Switzer, F. S., &
Schippmann, J. S.
(1996). Meta-analyzing the relationship between grades and job
performance. Journal of
Applied Psychology, 81,
548-556.
Roth,
P. L., & Clarke, R. L. (1998). Meta-analyzing the relation
between grades
and salary. Journal of
Vocational
Behavior, 53, 386-400.
Ruban, L. M., & McCoach, D.
B. (2005). Gender differences in explaining
grades using structural
equation modeling. The Review of Higher Education, 28, 475-502.
Rueda,
M. R., Posner, M. I., & Rothbart, M. K. (2004). Attentional
control and
self regulation. In R.F. Baumeister & K.D. Vohs (Eds), Handbook of Self Regulation: Research, Theory, and
Applications, New
York: Guilford Press, 14: 283-300.
Rumelhart, D., Hinton, G.
& Williams, R. (1986).
Learning representations by back-propagating errors. Nature, 323, 533- 536.
Rumelhart, D. E.,
McClelland, J. L., & the PDP
research group. (1986). Parallel distributed
processing: Explorations
in the microstructure of cognition. Volume I.
Cambridge, MA: MIT
Press.
Schmidt, F. L. (2002). The
role of general cognitive
ability and job performance: Why there cannot be a debate. Human Performance, 15, 187–210.
Segers, M., Dochy, F.,
& Cascallar, E. (2003).Optimizing
new modes of assessment: In
search of qualities and standards.The Netherlands: Kluwer
Academic
Publishers.
Simons,
J., Dewitte, S., & Lens, W. (2004). The role of different
types of
instrumentality in motivation, study strategies, and
performance: Know why you
learn, so you'll know what you learn! British
Journal of Educational Psychology, 74, 343-360.
Snyderman,
M., & Rothman, S. (1987). Survey of expert opinion on
intelligence and
aptitude testing. American
Psychologist,
42(2), 137-144
Specht, D. (1991). A
general regression neural
network. IEEE
transactions on neural
networks, 2(6), 568-576.
St Clair-Thompson, H. L.,
& Gathercole, S. E. (2006). Executive functions and
achievements in school:
Shifting, updating, inhibition, and working memory. The Quarterly Journal of Experimental Psychology,
59(4), 745-759.
Strucchi, E. (1991). Inventario de Estrategias
de Aprendizaje y
de Estudio. [Learning Strategies
Inventory and Study]. Buenos Aires: Psicoteca.
Turner,
E. A., Chandler, M., & Heffer, R. W. (2009). Influence of
parenting styles,
achievement motivation, and self-efficacy on academic
performance in college
students. Journal of
College Student
Development, 50, 3, 337-346.
Unsworth,
N., Heitz, R. P., Schrock, J. C., & Engle, R. W. (2005). An
automated
version of the operation span task. Behavior
Research Methods, 37(3), 498-505.
Vandamme, J. P., Meskens, N., & Superby, J. F.
(2007). Predicting academic performance by data mining
methods.Education
Economic, 15(4),
405-41.
Weinstein,
C. E., & Mayer, R.E. (1986). The teaching of learning
strategies. In M.C.
Wittrock (Ed.), Handbook
of research on
teaching (3rd ed.). Macmillan, New York.
Weinstein,
C. E. & Palmer, D. R. (2002). LASSI: User’s Manual (2nd
Edition). Clearwater,
FL:
H&H Publishing Company, Inc.
Weinstein,
C. E., Palmer, D. R., & Schulte, A. C. (1987).Learning and study strategies inventory.
Clearwater, FL: H & H
Publishing company, Inc.
Weinstein,
C. E., Schulte, A. C, & Cascallar, E. C. (1982). The
learning and studies
strategies inventory (LASSI): Initial design and development.
Technical Report, US
Army Research Institute
for the Social and Behavioural Sciences, Alexandria,
VA.
Weiss, S. M. & Kulikowski,
C. A. (1991). Computer systems that
learn. San
Mateo, CA: Morgan Kaufmann Publishers.
Welsh, M.C.,
Satterlee-Cartmell, T., & Stine, M.
(1999). Towers of Hanoi and London: Contribution of working
memory and
inhibition to performance. Brain
Cognition, 41(2), 231-242.
White, H. & Racine, J.
(2001): Statistical
inference, the bootstrap, and neural network modelling with
application to
foreign exchange rates. IEEE Transactions on Neural
Networks: Special Issue
on Neural Networks in Financial Engineering, 12,
657-673.
Wilson, R. L. &
Hardgrave, B. C. (1995).
Predicting graduate student success in a MBA program: Regression
vs.
classification. Educational and
Psychological Measurement, 55, 186-195.
Zambrano Matamala, C.,
Rojas Díaz, D., Carvajal
Cuello, K., & Acuña Leiva, G. (2011). Análisis de
rendimiento académico
estudiantil usando data warehouse y redes neuronales. [Analysis of students’
academic performance using data warehouse and neural networks] Ingeniare. Revista
Chilena de Ingeniería, 19(3),
369-381.
Zeegers,
P.
(2004). Student learning in higher education: A path analysis of
academic
achievement in science. Higher Education Research &
Development, 23(1),
35-56.