Attention guidance in learning from a complex animation: Seeing is understanding?

https://doi.org/10.1016/j.learninstruc.2009.02.010Get rights and content

Abstract

To examine how visual attentional resources are allocated when learning from a complex animation about the cardiovascular system, eye movements were registered in the absence and presence of visual cues. Cognitive processing was assessed using cued retrospective reporting, whereas comprehension and transfer tests measured the quality of the constructed representation. Within the framework of Cognitive Load Theory, visual cues highlighting the subsystems of the heart were hypothesized to guide attention, reduce visual search and extraneous cognitive load, and enhance learning. As predicted, learners looked more often and longer at cued parts. However, we found no effects of cueing on visual search and cognitive load. With respect to cognitive processing, performance differences were found on the number of statements in the learners’ verbal reports. These findings suggest that visual cueing can guide attention in an animation, but other factors are also important in determining the effectiveness of visual cues on learning.

Introduction

In complex instructional animations learners are challenged to extract relevant information from a visual display, select corresponding parts of information, and integrate all of these elements into a coherent representation (Mayer & Moreno, 2003). This is a difficult task, as important information is briefly presented in successive frames and needs to be kept active in working memory to integrate it with earlier presented information, imposing a high cognitive load on the learners’ cognitive system (Paas, Renkl, & Sweller, 2003). Empirical findings as well as theoretical considerations have led to various design guidelines that take into account the processing limitations of working memory to manage this high cognitive load and foster learning from animations (Mayer, 2001, Paas et al., 2003).

Most of the current knowledge regarding animations and learning, however, is based on product-related measures (e.g., comprehension and transfer tasks), from which attentional and cognitive task demands are inferred. Much less is known about how learners actually attend to instructional animations, that is, the real-time perceptual and cognitive processes involved. It is argued that the use of more direct process-related measures could advance research on animations by testing specific claims about the perceptual and cognitive characteristics of an animation or by directly investigating the psychological basis for instructional design guidelines. The present study was designed to evaluate how attention guidance affects processing of an animation by applying the process-related methods of eye tracking and cued retrospective reporting (Van Gog, Paas, Van Merriënboer, & Witte, 2005).

To derive meaning from an animation, learners have to construct a mental representation that accurately represents the content depicted in the visualization. Animations are supposed to be superior to static graphics, especially when learning concerns a chain of events in dynamic systems. Animations do not only depict objects, they also provide information concerning object changes and their position over time (Rieber, 1990). However, as Tversky, Morrisson, and Bétrancourt (2002) have pointed out, learners often fail to process animations effectively, resulting in no advantage compared with static visualizations (but see Höffler & Leutner, 2007). Learning from animations often fails because complex perceptual and cognitive processing overwhelms the learner's limited processing capacities (Lowe, 1999).

Ayres and Paas (2007b) have argued that most animations are not designed with the limited capacity of working memory (WM) in mind and therefore may interfere with the learning process. Cognitive Load Theory (CLT; Paas et al., 2003, Sweller, 1999) provides a theoretical framework that may offer a way of dealing with these WM limitations by instructionally controlling the demands of complex instructions. Three categories of cognitive load can be identified when learning from complex tasks: intrinsic, extraneous, and germane cognitive load. Intrinsic cognitive load depends on the number of information elements and their interactions that must be processed simultaneously in WM to understand the learning material. Extraneous cognitive load is determined by the activities required from learners that do not contribute to learning, but instead reduce WM capacity available for learning activities. Finally, germane cognitive load is generated by mental activities required for the construction and automation of schemata in long-term memory (see for a more detailed discussion on the different forms of cognitive load, Paas, Tuovinen, Tabbers, & Van Gerven, 2003).

According to CLT, animations often create a high extraneous load because learners must split their visual attention between the visualization and accompanying text (inter-representational) and/or within the visualization (intra-representational, Lowe, 1999). In eye movement research, there are an increasing number of studies focusing on the perceptual and cognitive processes of mentally integrating different representations (i.e., text and picture; Graesser et al., 2005, Hegarty, 1992, Holsanova et al., 2009, Schmidt-Weigand et al., 2010, Schwonke et al., 2009). However, very little research has focused on how learners attend to instructional animations without text.

Animations depicting the functioning of technical or biological systems typically show several elements simultaneously that might change with respect to position, color, and orientation. The high degree of visual complexity due to the high information load and the distributed nature of the presentation may be perceptually and cognitively overwhelming (Lowe, 2003). The main processing of the visual system is limited to information, which is in foveal vision, so that learners cannot attend in one fixation to all information from a complex instructional animation. Consequently, only a subset of the presented information will receive attention and serve as the foundation for subsequent cognitive processing.

Constructing a mental representation of the content depicted in a complex animation initially requires effective search processes, that is, success in selecting and extracting task-relevant information. However, by requiring visuo-spatial resources to control the execution of eye movements, the process of locating task-relevant information creates extraneous load because WM resources may be diverted away from main learning activities (Jeung, Chandler, & Sweller, 1997). This especially holds for novices who lack relevant knowledge to guide their attention (see Canham & Hegarty, 2010).

Moreover, the possibility of missing key information increases when the most salient elements do not correspond to the thematically relevant elements (Boucheix et al., 2006, Lowe, 1999). Limiting the search space of the display by directing learners’ attention to specific parts of an animation inevitably differentiates between relevant and irrelevant parts. This provides an opportunity to reduce visual complexity, creates a situation in which it is more likely that visual search and hence extraneous load are reduced, and engagement in essential processing activities are more likely to occur.

Attention guidance techniques, such as cueing, have been used successfully to improve the learners’ understanding of specific aspects of the display (De Koning et al., 2007, Mayer and Moreno, 2003; see also Boucheix & Lowe, 2010). For example, in a study by De Koning et al. (2007), it was shown that increasing the visual salience of task-relevant information in an instructional animation through a spotlight-cue (i.e., luminance contrast), improved comprehension and transfer performance. Further evidence that attention-directing perceptual cues in a visualization can affect cognition, comes from a study by Grant and Spivey (2003). They have showed that solving an insight problem was facilitated when learners viewed a static diagram where critical information was made visually more salient. It is important to note that in these studies it could only be inferred that learners were focusing on the correct elements in the dynamic visualization. In the study of De Koning et al. (2007) only performance measures were used and in the Grant and Spivey (2003) study eye tracking was used but only as a means to identify critical features in the display. They did not, however, examine the actual viewing pattern during the problem-solving process when a perceptual cue was present to direct learners’ attention to critical elements.

A first attempt to study real-time viewing patterns while visually cueing a complex instructional animation was made by Kriz and Hegarty (2007). In this study, eye movements were measured while learners studied an interactive animation depicting the mechanics of a flushing cistern that did or did not contain visual cues (i.e., arrows pointing to relevant information). Interestingly, results revealed that although these cues directed more attention to task-relevant information, this did not result in a better understanding of the presented information. On the other hand, the De Koning et al. (2007) study showed that learners benefited from visual cueing, suggesting that it did improve processing. However, direct evidence regarding the perceptual and cognitive processes that underlie this positive effect of attention cueing is lacking. Therefore, our main question is whether spotlight-cueing enables learners to focus on specific parts of an animation, which may help learners in reducing their visual search and extraneous cognitive load.

In the present study, we replicated the methodology previously used by De Koning et al. (2007), now adding process-related measures to examine how spotlight-cueing influences perceptual and cognitive processing when learning from an animation. Learners viewed an animation of the cardiovascular system with or without a spotlight-cue on the valves of the heart. However, in the De Koning et al. (2007) study only a single visual cue was used. One might argue that the effect of cueing would be larger when multiple visual cues are presented that highlight all parts of the cardiovascular system. That is, some subsystems are dependent on the functioning of another subsystem and, therefore, cueing all subsystems might further improve understanding. For example, by cueing the transportation of oxygen into the lungs (circulatory system) before oxygenation occurs in the alveoli (pulmonary circulation), learners may establish a link between these subsystems. Therefore, we also included a condition with multiple visual cues.

To investigate overt attention allocation, we registered the eye movements of learners while they viewed the animation. Furthermore, to get more information about which cognitive processes occur during learning, learners retrospectively reported the thoughts they had while studying the animation using a record of their own eye movements as a retrieval cue (i.e., cued retrospective reporting, Van Gog et al., 2005). Combining measures of eye tracking and verbal reporting allows for making inferences about what information is attended to in a cued and an non-cued animation and how this information is interpreted by the learner. Furthermore, the combining can reveal relatively small differences in knowledge acquisition (Van Gog et al., 2005), which may be especially helpful in examining the underlying processes that are responsible for the small but positive effects of visual cueing.

For all hypotheses, it is important to note that the main comparisons are between the cued conditions (i.e., single-cueing and multiple-cueing condition) and the no-cueing condition.

It was hypothesized that, over the time course of the animation, cueing leads to an overall shift of attention distribution over the different subsystems, measured by the proportion of the number and duration of fixations on each of the subsystems (Hypothesis 1a). More specifically, it was hypothesized that, in line with Kriz and Hegarty (2007), visual cueing directs attention to the cued part yielding, compared to the no-cueing condition, proportionally more and longer fixations on the valves system (i.e., cued subsystem) in the single-cueing condition (Hypothesis 1b) and proportionally more and longer fixations on each of the five cued subsystems in the multiple-cueing condition when they are cued (Hypothesis 1c). Further, it was hypothesized that limiting the scope of the display that is searched by using cueing to direct attention to a region will reduce the competition for attention between simultaneously presented elements and its associated search for task-relevant information, yielding overall a lower fixation frequency and a longer average fixation duration in the animation in the cued conditions as compared to the no-cueing condition (Hypothesis 2).

In line with CLT it was expected that reducing the requirement to conduct searches to find task-relevant information will reduce extraneous cognitive load, resulting in lower mental effort while studying the animation (Hypothesis 3).

Consequently, the increase in available working memory resources can be allocated to learning (i.e., germane load). Therefore, it was expected that in accordance with the results of De Koning et al. (2007) spotlight-cueing will result in a better understanding of parts of the animation that are cued as compared to when these parts are not cued. This should be reflected in the verbal protocols that are hypothesized to contain more explanatory statements about an element when it is cued than when it is not. Hence, the number of explanatory statements in the verbal protocols were expected to be significantly higher in the multiple-cueing condition than in the single-cueing condition and the no-cueing condition, while the single-cueing condition should report a significantly higher number of explanatory statements at least on the valves system than the no-cueing condition (Hypothesis 4).

Although our main interest was on the effects of cueing on the process-related measures, we also examined its influence on learning outcomes, keeping in mind that they may be influenced by the verbal protocols in the cued retrospective reporting that preceded the measures of learning outcomes. It was expected that better understanding of cued elements, as reflected in verbal protocols, will be also reflected in the results of comprehension and transfer tests, yielding better learning outcomes on questions about cued elements on both of the latter measures. Because in the multiple-cueing condition all elements of the cardiovascular system were cued, it was expected that the multiple-cueing condition will have significantly higher learning outcomes than the single-cueing condition and the no-cueing condition (Hypothesis 5a). In addition, because in the single-cueing condition only the valves system was cued, it was expected that the single-cueing condition will have significantly higher scores on questions about the valves system than the no-cueing condition (Hypothesis 5b). Alternatively, there may be equal test performance but less mental effort involved in answering to comprehension and transfer tests because of the reduced cognitive load (Hypothesis 5c).

Section snippets

Participants – design

The participants were 40 psychology undergraduates (13 males and 27 females) from the Erasmus University of Rotterdam. Their mean age was 21.43 years (SD = 2.27). All were native Dutch speakers and received partial course credit or a small monetary reward for their participation. Participants had normal or corrected to normal vision. None of the participants had taken college level biology classes, but all had taken introductory courses on biology in high school that included the cardiovascular

Results

Table 1 shows the mean scores on the dependent measures for all three conditions. The no-cueing condition served as a baseline condition with which both cueing conditions could be compared.

Discussion

The present study examined how visual spotlight-cues influence attention allocation and cognitive processing when learning from a complex instructional animation. As expected, learners looked more often and for longer periods of time at cued than at non-cued content (Hypothesis 1b and 1c). This difference in fixation patterns between the cued and non-cued conditions is taken as evidence that cueing guides learners’ attention to specific regions in an instructional animation. The fixation

References (32)

  • J. Cohen

    Statistical power analysis for the behavioral sciences

    (1988)
  • B.B. De Koning et al.

    Attention cueing as a means to enhance learning from an animation

    Applied Cognitive Psychology

    (2007)
  • Ericsson, K.A., Simon, H.A. (1993). Protocol analysis: verbal reports as data (Rev. ed.). Cambridge, MA: MIT...
  • A.C. Graesser et al.

    Question asking and eye tracking during cognitive disequilibrium: comprehending illustrated texts on devices when the devices break down

    Memory and Cognition

    (2005)
  • E.R. Grant et al.

    Eye movements and problem solving: guiding attention guides thought

    Psychological Science

    (2003)
  • M. Hegarty

    Mental animation: inferring motion from static displays of mechanical systems

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (1992)
  • Cited by (252)

    View all citing articles on Scopus
    View full text