Caffeine’s Effect on Appraisal and Mental Arithmetic Performance: A Cognitive Modeling Approach Tells Us More Sue E. Kase ([email protected]), Frank E. Ritter ([email protected])
College of Information Sciences and Technology, Pennsylvania State University
Michael Schoelles ([email protected])
Cognitive Science Department, Rensselaer Polytechnic Institute
Abstract
observations, self-reported appraisal, and performance data, we then developed a cognitive model in the ACT-R cognitive
A human subject experiment was conducted to investigate caffeine’s effect on appraisal and performance of a mental
architecture of the serial subtraction task. Parametric solution
serial subtraction task. Serial subtraction performance data was
sets resulting from optimizing the serial subtraction cognitive
collected from three treatment groups: placebo, 200 mg
model to data from three treatment groups (placebo, 200 mg,
caffeine, and 400 mg caffeine. Data were analyzed by average
400 mg) and two task appraisal conditions (challenge and
across treatment group and by challenge and threat task
threat) provided the first cognitive modeling-derived insights
appraisal conditions. A cognitive model of the serial subtraction
task was developed and fit to the human performance data. How the model’s parameters change to fit the data suggest how cognition changes across treatments and due to appraisal.
Overall, the cognitive modeling and optimization results
This section begins with an overview of the human subject
suggest that the speed of vocalization is changed the most along
experiment where performance and task appraisal data were
with some changes to declarative memory. This approach
collected and later utilized in the development and
promises to offer fine-grained knowledge about the effects of moderators on task performance.
optimization of a cognitive model. A detailed description of the cognitive task follows, as well as, the formulation of the
Keywords: Caffeine, stress, task appraisal, cognitive arithmetic
self-reported appraisal conditions. Lastly, results and interpretations of the human performance data are suggested.
Introduction
As part of a larger project, human subject data was
Caffeine is widely consumed throughout the world in
collected to study the effects of stress and caffeine on
beverages, foods, and as a drug for a variety of reasons,
cardiovascular health. The authors collaborated with Dr.
including its stimulant-like effects on mood and cognitive
Laura Klein and her lab in the Biobehavioral Health
performance (for review see Fredholm et al., 1999). Its
Department at Penn State University. A mixed experimental
positive effects on performance, notably sustained vigilance
design was conducted with 45 healthy men 18-30 years of age
and related cognitive functions, are well documented when
(Klein, Whetzel, Bennett, Ritter, & Granger, 2006). (Men are
administered to rested volunteers in doses equivalent to single
typically used in these types of studies because we also took
servings of beverages (Amendola et al., 1998; Smith et al.,
additional physiological measures and their systems are
1999). Additionally, its consumption in moderate doses is
associated with few, if any, adverse effects (Nawrot et al.,
All subjects were asked to perform a series of three
2003). Therefore, caffeine has been a strategy examined for
cognitive tasks. Subjects individually performed a simple
its usefulness to military personnel (Lieberman & Tharion,
reaction time (RT) and a working memory (WM) task taking
15 minutes to complete. Then subjects were administered one
The majority of caffeine research is conducted through
of three doses of caffeine: none (placebo), 200 mg caffeine
human experimentation with analysis of the collected
(equivalent to 1-2, 8 oz cups of coffee), or 400 mg caffeine
performance data. Few studies have attempted to model the
(equivalent to 3-4, 8 oz cups of coffee). After allowing
effects of caffeine. One such study by Benitez et al. (2009)
absorption time, a 20-minute stress session of the mental
presented a biomathematical model for describing
arithmetic portion of the TSST was performed. Following
performance during extended wakefulness with the effect of
completion of this stressor, subjects again were asked to
complete the RT and WM tasks. Cognitive performance was
Likewise, this study takes a modeling approach employing
determined by calculating accuracy and response time scores.
cognitive modeling and optimization techniques to investigate
This paper focuses on one portion of the experiment—the
the effects of caffeine on cognitive performance. In particular,
TSST. The TSST protocol has been used for investigating
we examined the effects of caffeine and task appraisal during
psychobiological stress responses in a laboratory setting since
the arithmetic portion of the Trier Social Stress Test (TSST),
the 1960s (Kirschbaum, Pirke, & Hellhammer, 1993). TSST
a mental serial subtraction task. Based on human subject
traditionally consists of an anticipation period and a test
period in which subjects have to deliver a free speech and
resources or reserves to deal with the serial subtraction task
perform mental arithmetic in front of an audience. The mental
and three focused on the subject’s perception as to how
arithmetic portion of the TSST is a mental serial subtraction
For all questions the scale was from 1 to 5 with a value of 3
indicating that the subject is neither challenged nor threatened
Serial Subtraction Task
by the task. After correcting for the imbalance in questions, a
The serial subtraction task utilized in the experiment
ratio of perceived stress to perceived coping resources was
consisted of four 4-minute blocks of mentally subtracting by
created. For example, if a subject’s total appraisal score was
7s and 13s from 4-digit starting numbers. Figure 1 illustrates
1.5 or less, their perceived stress was less than or equal to
the serial subtraction task. These were the four starting
their perceived ability to cope, which equated to a challenge
numbers used to begin the four blocks of subtraction during
condition. If a subject’s appraisal score was greater than 1.5,
their perceived stress was greater than their perceived ability
to cope, which equated to a threat condition.
Each treatment group was composed of 15 subjects. The
placebo group had approximately the same number of subjects in each appraisal condition (7 challenge, 8 threat). The 200 mg caffeine group had twice as many challenged subjects as threatened subjects (10 challenge, 5 threat). The 400 mg caffeine group contained only 2 challenged subjects with the remainder (13) subjects reporting a threatening appraisal.
Results and Discussion
For this investigation, the serial subtraction performance data
from the placebo group (PLAC), the 200 mg caffeine group
Figure 1: An illustration of the four blocks of the serial
(LoCAF), and the 400 mg caffeine group (HiCAF), were
analyzed by average across treatment group and by appraisal
condition. The performance statistics of primary interest were number of attempted subtraction problems and a percentage
Before the task begins the experimenter explains that the
correct score. The data are shown in Table 1 where each pair
subject’s performance is going to be voice recorded and
of values represents number of attempts and percent correct.
reviewed by a panel of psychologists for comparison with the
The results discussed in this paper apply to data from the first
other subjects participating in the experiment. The task is
performed mentally with no visual or paper clues. After the
task is explained to the subject, a task appraisal questionnaire
Table 1: Human performance (average number of attempts
is completed, and the subject begins performing the task. It is
and percent correct) by treatment group (each N=15) and
thought that this anticipation period, for some subjects,
appraisal condition (challenge, threat).
increases anxiety and worry about poor performance on the
Subjects sit in a chair directly in front and near the
experimenter who is holding a time keeping device and
clipboard of the correct subtraction answers that she checks
off as the subject performs the task. Before the task begins the experimenter emphasizes that the task should be preformed as
quickly and as accurately as possible. An experimenter tells
the subject the starting number; from then on, the subject
For all treatment groups the challenge condition showed the
speaks the answer to each subtraction problem. When an
best performance in both number of attempts and percent
incorrect answer was given, the subject was told to “Start
correct over the average across treatment and the threat
over at <the last correct number>”. At two minutes into each
condition. The threat condition showed the worst
4-minute session, subjects were told that “two minutes
performance. Performance differences between the challenge
remain, you need to go faster”. This prompt enhances the
and threat conditions were most pronounced in the LoCAF
group with an impressive increase of nearly 25 more attempted subtraction problems and a 13.5% increase in
Task Appraisal
subtraction accuracy by challenged subjects over threatened
Before and after the serial subtraction stress session, subjects
subjects. For the HiCAF group the challenge and threat
completed pre- and post-task appraisals based on Lazarus and
condition differences were less than LoCAF but still
Folkman’s (1984) theory of stress and coping. Each subject
substantial: 13 more attempted problems and a 7.7% increase
was asked five questions orally: two focused on the subject’s
in subtraction accuracy. Differences between the challenge
and threat condition were least visible in the PLAC group, 10
In the challenge condition (middle section), HiCAF
more attempted problems and only a 5.4% increase in
performance does not drop below PLAC, but is
approximately equivalent or slightly higher. In both the
Figure 2 better illustrates these performance differences
average across treatments and the challenge condition,
with the treatment groups labeled along the x-axis and the
LoCAF performance is well above that of PLAC. This is also
plot subdivided into three sections: averages across treatment
supported in previous research that low doses of caffeine tend
groups (not by appraisal condition) in the leftmost section,
to increase performance (Amendola et al., 1998; Smith et al.,
and averages across treatment groups subdivided by appraisal
1999). In both these cases, the across treatments and
condition in the center (challenge) and rightmost sections
challenge plots, the effects of caffeine take on characteristics
related to level of arousal studies (i.e., Anderson & Revelle,
The plot visualizes several interesting trends; some
1982) and appear to follow the Yerkes-Dodson (1908) law
supported by existing caffeine and cognition research and
that postulates that the relationship between arousal and
others not. In the average across treatments plot (leftmost
performance follows an inverted U-shape curve.
section), the performance of the HiCAF group drops below
There is no supporting research for the performance trends
that of PLAC for both performance statistics. This supports
visible under the threat condition (right section). Threatened
findings that large doses of caffeine are occasionally
subjects self-reported stress and lack of coping skills to
associated with anxiety and disrupt performance (Haishman,
adequately perform the serial subtraction task. The threat plot
& Henningfield, 1992; Wesensten, Belenky, & Kautz, 2002).
shows performance decreases from PLAC to LoCAF (instead
Whether a 400 mg dose is considered ‘large’ may be in
of increases as observed in the other sections of the plot) with
question as some studies administered up to 800 mg doses
HiCAF only very slightly higher than LoCAF (+1.4 attempts,
(McLellian et al., 2007). Generally, 100 to 300 mg doses are
and +0.3% correct). In this case, the U-shape is not inverted,
categorized as ‘low’ dosages because 50-300 mg of caffeine
is available in a number of forms including tablets, chewing
gum, a wide variety of beverages and some food products.
Figure 2: Comparing human performance differences in number of attempts and percent correct by treatment group (x-axis)
and appraisal condition: treatment groups not accounting for appraisal (leftmost section), and averages across treatment groups
divided by appraisal condition, challenge (middle section) and threat (rightmost section).
More can be discussed about the human performance data
task. The ACT-R cognitive architecture (Anderson, 2007)
by way of analysis and interpretation of caffeine’s effect on
was chosen to model the serial subtraction task for several
appraisal and serial subtraction. However, a more important
reasons: it provides a parameter-driven subsymbolic level of
question remains: Can these effects be modeled using a
processing; it permits the parallel execution of the verbal
cognitive architecture and what might be learned from the
system with the control and memory systems, and it has
parameters and values generating best fits during been used for other models of addition and subtraction optimization of the model?
The serial subtraction model performs a block of
Modeling Serial Subtraction
subtracting by 7s or 13s in a similar manner to that of the human subjects. The model’s declarative knowledge
Theory about how mental arithmetic is performed combined
consists of arithmetic facts and goal-related information.
with observations gathered during the human subjects’
The model’s procedural knowledge is production rules that
performance of serial subtraction laid the foundation for the
allow for retrieval of subtraction and comparison facts
development of a cognitive model of the serial subtraction
necessary to produce an appropriate answer. The model
Optimizing to Human Data
performs subtractions by column-by-column.
How does cognition change under stress and caffeine? We
The model runs under ACT-R 6.0 and utilizes the
can explore this question by adjusting theoretically
imaginal module and buffer. The imaginal buffer motivated parameters in architecture. The parameters that
implements a problem representation capability. In the serial
lead to better correspondences suggest how cognition
subtraction model the imaginal buffer holds the current 4-
changes. This section begins by discussing the architectural
digit number being operated on (the minuend) and the
parameters selected for adjusting the model’s performance
number being subtracted (the subtrahend). The goal module
to simulate the human data. This process of fitting the
and buffer implement control of task execution by
cognitive model to human data is a form of optimization.
manipulation of a state slot. ACT-R’s vocal module and
The optimization approach to fit the model is briefly
buffer verbalize the answer to each subtraction problem as
described in the second part of the section. The optimization
results, accompanied by interpretations of best fitting
The model starts with the main goal to perform a
parameter values, is discussed at the end of the section.
subtraction and a borrow goal to perform the borrow operation when needed. Both types of goal chunks contain a
Architectural Parameters
state slot, the current column indicator, and the current subtrahend. The current problem is maintained in the
Three ACT-R architectural parameters appeared important
imaginal buffer. This buffer is updated as the subtraction
in performing serial subtraction and were selected for
problem is being solved. The model begins with an integer
adjusting the model’s performance: seconds-per-syllable,
minuend of 4-digits. All numbers in the model are chunks of
base level constant, and activation noise. The rate the model
type integer with a slot that holds the number. The model
speaks is controlled by the seconds-per-syllable parameter
also contains subtraction and addition fact chunks whose
(SYL). The ACT-R default timing for speech is 0.15
slots are the integer chunks described above. This
seconds per assumed syllable based on the length of the text
representation of the integers and arithmetic facts has been
string to speak. There is a default of three characters per
syllable controlled by the characters-per-syllable parameter.
The model determines if a borrow operation is required
The seconds-per-syllable and characters-per-syllable
by trying to retrieve a comparison fact that has two slots, a
parameters control subsymbolic processes in ACT-R’s vocal
greater slot containing the minuend and a lesser slot
module. The vocal module gives ACT-R a rudimentary
containing the subtrahend. If the fact is successfully
ability to speak. It is not designed to provide a sophisticated
retrieved then no borrow is necessary, otherwise a borrow
simulation of human speech production, but to allow
subgoal is created and executed. Borrowing is performed by
ACT-R to speak words and short phrases for simulating
retrieving the addition fact that represents adding ten to the
verbal responses in experiments such as the answers to the
minuend. The subtraction fact with the larger minuend is
retrieved. The model then moves right one column by
The other two parameters affect declarative knowledge
retrieving a next-column fact using the current column value
access: the base level constant (BLC), and the activation
as a cue. If this retrieval fails, there are no more columns so
noise parameter (ANS). The BLC parameter and a decay
the borrow and the subgoal return back to the main task
parameter affect declarative memory retrieval and retrieval
goal. If there is a next column and its value is not 0 than 1 is
time. The ANS value affects variance in retrieving
subtracted from it by retrieval of a subtraction fact. If the
declarative information and error rate for retrievals in the
value is 0 then the problem is rewritten in the imaginal
model. This instantaneous noise value can also represent
buffer with a 9 and the model moves to the next column and
variance from trial to trial. Other parameters, such as base
repeats the steps discussed above, returning to the main task
level learning, decay, and the characters-per-syllable
parameters were built into the model as modifiable but were
The model outputs the answer by speaking the 4-digit
left fixed at their default values for this study. The search
result. The model has two output strategies. For this paper
space for the model optimization was defined by the
the data reported are for the calc-and-speak strategy where
parameter value boundaries: ANS and SYL 0.1 to 0.9, and
the model speaks the answer in parallel with the calculation
described above. If the answer is incorrect, the problem is reset to the last correct answer. If the answer is correct, the
Optimization Approach
main problem task is rewritten in the imaginal buffer.
Because the search space was large and assumed to be
After the model has performed a block of subtractions the
rather complex a departure from the cognitive modeling
number of attempted subtraction problems and percent
community’s traditional manual optimization technique was
correct, are recorded. The model’s performance can be
initiated (Kase, 2008). A new front-end function for the
adjusted by varying the values of architectural parameters
cognitive model was developed for execution in a parallel
associated with specific modules and buffers, and processing environment and the ACT-R parameter values subsymbolic processes within the architecture.
(ANS, BLC, and SYL) were passed to multiple instances of running models from a parallel genetic algorithm (PGA). The SYL parameter was chosen for optimization because
vocalization of the answer is the most time consuming
what would be manually assigned to the model in the
aspect of this task. The BLC and ANS parameters were
ACT-R modeling community. This could be because the
chosen because the task is memory intensive. Other memory
nature of the task is stressful (i.e., purposively used to
parameters could have been chosen and ongoing work is
elicited a stress response). The ANS value range in Table 2
exploring the fitting of other parameters. Normally, the
is narrow from the lowest ANS of 0.67 to the highest ANS
parameter values are set within the model code before
of 0.78, a difference of only 0.11. This hints at the fact that
runtime. Using the PGA to search the parameter space for
caffeine may not effect this parameter’s role in the model’s
promising parameter value sets generating best fits between
performance of serial subtraction. ANS values are basically
the model and human data saved a substantial amount of
equivalent for the PLAC and LoCAF groups for challenge
modeler time and computational resources. Model-to-data
(0.68) and threat (0.71). In this case, the slightly higher
fit was determined by an objective function, or fitness
ANS in predicting threatened subjects corresponds to the
function, defined as the discrepancy between model
lower performance (less attempts and lower accuracy), and
performance (number of attempts and percent correct) and
the self-reports where subjects do not believe they will
the corresponding human performance (e.g., 47.3 – 48.1).
perform well. Worrying or embarrassment about their poor
The fitness is in terms of error (or cost) with a fitness value
performance is a distraction and may interfere with working
of 0 representing perfect correspondence between the model
memory processes and verbalizing solutions. The greatest
variability in ANS values is found in HiCAF. Surprisingly,
Employing this type of ‘automated’ optimization the trend reverses with HiCAF challenge predictions
approach allowed for 20,000 different sets of parameter
yielding a higher ANS value (0.75) than threat predictions
value to be tested in a directed manner each time the PGA
was executed. Using the approach, the model was optimized
The base level constant parameter values (BLC, middle
to nine sets of human performance data (see Table 2).
value in triple) show a trend of nearly equivalent higher values for LoCAF and HiCAF challenge conditions (2.65
Results and Discussion
and 2.69) then threat conditions (2.48 and 2.35), and also for
Table 2 shows the resulting model performance compared to
all BLC values under PLAC (2.49, 2.48 and 2.53). In this
the human performance data using parameter value solution
case, caffeine may be causing a ‘boost’ in the base level
sets identified by the PGA that produced the best fits
activation value of facts in declarative memory promoting
(fitness values less than 1.0) to the human performance, and
higher probability of selection in response to a retrieval
suggest how cognition changed. Several trends can be
request and quicker fact retrieval time.
observed within the parameter values producing best fits.
The parameter values shown in the table are averaged;
Table 2: Optimization results for three treatment groups
denoted by the numeric value in parentheses after the
(PLAC, LoCAF, HiCAF) and appraisal conditions
parameter set values (i.e., ‘(3)’ in the first row means that
(CH=challenge, TH=threat) comparing human performance
the PGA found 3 parameter sets producing fitness less than
and model predictions in number attempts and percent
1.0, and that these values were averaged). Each parameter
correct (both rounded), and fitness value associated with
set included in the average was run 200 times (i.e., 200
average (over N) of best fitting (less than 1.0) ACT-R
Beginning with the seconds per syllable parameter, SYL
is shown in the last column and last value in the triple of
Table 2. The model predictions indicate that challenged
subjects speak a syllable more quickly than threatened
subjects. This is true for all treatment groups. LoCAF shows
the greatest difference in speech rate with challenge SYL at
0.31 (also lowest SYL overall) and threat SYL at nearly two
times slower (0.61). HiCAF differences in SYL are less: challenge 0.40 compared to threat 0.57, a difference of 0.17.
PLAC shows a slightly less SYL difference of 0.14.
Challenge subjects self-report less stress and are generally
confident that they can perform the serial subtraction task
well. With less stress and a low dose of caffeine more fluid
speech appears to result, or possibly the speech rate acts as a window into the cognitive processes required to complete
the subtractions (i.e., fact retrieval, working memory and
place-keeping operations, and concatenation of
Overall across treatments, the activation noise parameter
values (ANS, first value in triple) are high as compared to
Conclusion
Kase, S. E. (2008). HPC and PGA optimization of a cognitive model: Investigating performance on a math
A cognitive model of the serial subtraction task was
stressor task. Unpublished PhD thesis, College of IST,
developed and fit to the human performance data from three
Penn State University, University Park, PA.
caffeine treatments and by challenge and threat appraisal.
Klein, L. C., Whetzel, C. A., Bennett, J. M., Ritter, F. E., &
This fit suggests that there are systematic changes in
Granger, D. A. (2006). Effects of caffeine and stress on
cognition due to caffeine and appraisal. Most notable is the
salivary alpha-amylase in young men: A salivary
speaking rate, but declarative memory retrievals are also
biomarker of sympathetic activity. Psychosomatic
These results show that using a cognitive model and
Kirschbaum, C., Pirke, K. M., & Hellhammer, D. H. (1993).
parametric optimization approach can further our
The Trier Social Stress Test—A tool for investigating
understanding of caffeine beyond a human experimentation
psychobiological stress responses in a laboratory setting.
approach. Overall, the cognitive modeling and optimization
Neuropsychobiology, 28, 76-81.
approach was successful. The preliminary modeling results
Lazarus, R. S., & Folkman, S. (1984). Stress, appraisal and
and interpretations offer insight into the effects of caffeine
coping. New York, NY: Springer Publishing.
on task appraisal and subsequent performance of the task,
Lieberman, H. R., & Tharion, W. J. (2002). Effects of
and promise an improved methodology for the study of
caffeine, sleep loss, and stress on cognitive performance
other behavioral moderators and other cognitive tasks. At
and mood during U. S. Navy seal training.
this point in our investigation more analysis is needed and
Psychopharmacology, 164, 250-261.
additional parameter sets should be examined, along with
McLellan, T. M., Kamimori, G. H., Voss, D. M., Tate, C., &
continued refinement of the serial subtraction model for
Smith, S. J. R. (2007). Caffeine effects on physical and
predicting the effects of caffeine on cognition.
cognitive performance during sustained operations. Aviation, Space, and Environmental Medicine, 78(9),
Acknowledgments
This project is partially supported by ONR grant Nawrot, P., Jordan, S., & Eastwood, J. (2003). Effects of N000140310248. Computational resources were provided
caffeine on human health. Food Additives and
by TeraGrid DAC TG-IRI070000T and run on the NCSA
clusters. The authors would like to thank Laura Klein and
Smith, A. P., Clark, R., & Gallagher, J. (1999). Breakfast
her lab and Jeanette Bennett at the Department of
cereal and caffeinated coffee: Effects on working
Biobehavioral Health, Penn State University, for collection
memory, attention, mood and cardiovascular function.
of the human performance data and data analysis assistance.
Physiology & Behavior, 67, 9-17.
Wesensten, N. J., Belenky, G., & Kautz, M. (2002).
References
Maintaining alertness and performance during sleep
Amendola, C. A., Gabrieli, J. D. E., & Lieberman, H. R.
(1998). Caffeine’s effects on performance and mood are
Psychopharmacology, 159, 238-47.
independent of age and gender. Nutritional Neuroscience,
Yerkes, F. M., & Dodson, J. D. (1908). The relationship of
strength of stimulus to rapidity of habit-formation.
Anderson, J. R. (2007). How can the human mind occur in Journal of Comparative Neurology and Psychology, 18,
the physical universe? New York, NY: Oxford University
Anderson, K. J., & Revelle, W. (1982). Impulsivity,
caffeine, and proofreading: A test of the Easterbrook hypothesis. Journal of Experimental Psychology: Human Perception and Performance, 8, 614-624.
Benitez, P. L., Kamimori, G. H., Balkin, T. J., Greene, A.,
& Johnson, M. L. (2009). Modeling fatigue over sleep deprivation, circadian rhythm, and caffeine with a minimal performance inhibitor model. Methods in Enzymology, 454, 405-419.
Fredholm, B. B., Battig, L., Homen, J., Nehlig, A., &
Zvarlaw, E. E. (1999). Actions of caffeine in the brain with special reference to factors that contribute to its widespread use. Pharmacological Reviews, 51, 83-133.
Haishman, S. J., & Henningfield, J. E. (1992). Stimulus
functions of caffeine in humans: Relation to dependence potential. Neuroscience & Biobehavioral Reviews, 16, 273-287.
IAPP on novel genetic and phenotypic markers of Parkinson's disease and Essential Tremor (MarkMD) Grant agreement no.: 230596 SUMMARY OF MARKMD Project objectives Find genetic markers (CNVs) associated with Parkinson’s disease or Essential Tremor (WP1 and WP4) Test genetic markers in patients‘ cohorts and detailed clinical phenotyping of patients (WP2 and WP4) Test genetic mar
California State Fair. California Works. Sacramento: California State Fair, 1986. Illus.: Richard ______. California Works. Sacramento: California State Fair, 1987. Illus.: Richard and Mockingbird - Jacob, Mary Jane. The Annual. Oakland, CA: Pro Arts, 1989. Illus.: Richard and Mockingbird - Jones, Harvey. Oakland’s Artists '90. Oakland, CA: The Oakland Museum, 1990. Illus.: Woody Natio