|Year : 2021 | Volume
| Issue : 1 | Page : 7-13
Comparison of mental workload with N-Back test: A new design for NASA-task load index questionnaire
Mahdi Malakoutikhah1, Reza Kazemi2, Hadiseh Rabiei3, Moslem Alimohammadlou4, Asma Zare5, Soheil Hassanipour6
1 Department of Occupational Health, Kashan University of Medical Sciences, Kashan, Iran
2 Department of Ergonomics, School of Health, Shiraz University of Medical Sciences, Shiraz, Iran
3 Department of Occupational Health Engineering, School of Health, Shahid Beheshti University of Medical Sciences, Tehran, Iran
4 Department of Industrial Management, Faculty of Economic, Management and Social Science, Shiraz University, Shiraz, Iran
5 Department of Industrial Management, Faculty of Economic, Management and Social Science, Shiraz, Iran
6 Cardiovascular Diseases Research Center, Department of Cardiology, Heshmat Hospital, School of Medicine, Guilan University of Medical Sciences, Rasht, Iran
|Date of Submission||05-Nov-2020|
|Date of Decision||05-Nov-2020|
|Date of Acceptance||07-Dec-2020|
|Date of Web Publication||31-Mar-2021|
Dr. Moslem Alimohammadlou
Department of Industrial Management, Faculty of Economic, Management and Social Science, Shiraz University, Shiraz
Source of Support: None, Conflict of Interest: None
Aims: A most widely used tool for measuring the workload is the NASA-task load index (TLX) questionnaire, in which various studies have reported numerous problems. The present study aimed to improve the NSAS-TLX mental workload questionnaire using the fuzzy linguistic variables instead of the virtual rating scale, and multicriteria decision-making Fuzzy Best-Worst method (FBWM) instead of pair-wise comparison. Materials and Methods: This cross-sectional study was carried out among students of Shiraz University of Medical Sciences. In order to compare the traditional NASA-TLX and FBWM NASA-TLX questionnaire, participants performed a standard N-Back task with three low, medium, and high workload, subsequently, they completed the two questionnaires. Finally, results were examined using the researcher-made software and SPSS 16. Results: With regard to the reaction time to response the N-Back test, the mean of correct responses were 107.43 and 85.86 responses at levels 1 and 3, respectively. The mean score of the subscales and the final score in the two questionnaires at different levels of the N-Back test were independent as follows: the two questionnaires had a significant difference on mental demand at level 3 with a mean (standard deviation [SD]) of 18.09 (6.39) in the FBWM-NASA-TLX questionnaire and mean (SD) of 22.64 (8.15) in the NASA-TLX questionnaire ( P < 0.05). Conclusion: In this study, the FBWM-NASA-TLX questionnaire was designed and studied with regard to the problems and limitations of the NASA-TLX questionnaire. Results of this study showed that the FBWM-NASA-TLX questionnaire can estimate more realistic scores and decisions of workload in the studied task.
Keywords: Ergonomics, fuzzy best-worst method, mental workload, NASA-task load index
|How to cite this article:|
Malakoutikhah M, Kazemi R, Rabiei H, Alimohammadlou M, Zare A, Hassanipour S. Comparison of mental workload with N-Back test: A new design for NASA-task load index questionnaire. Int Arch Health Sci 2021;8:7-13
|How to cite this URL:|
Malakoutikhah M, Kazemi R, Rabiei H, Alimohammadlou M, Zare A, Hassanipour S. Comparison of mental workload with N-Back test: A new design for NASA-task load index questionnaire. Int Arch Health Sci [serial online] 2021 [cited 2023 Mar 21];8:7-13. Available from: http://www.iahs.kaums.ac.ir/text.asp?2021/8/1/7/312702
| Introduction|| |
Workload is a term applied to describe the amount of cognitive and physical resources used to perform a task. According to Hart and Staveland, workload is defined as a presumptive structure representing the costs provided by the operator to achieve a certain level of performance., Nowadays, the trend to examine the mental workload (MWL) among scholars has increased due to change in the nature of tasks from physical to cognitive demands. As a result, assessment of MWL is essential as much as the physical workload. MWL, especially in studying and developing the human-machine interactions, is important to achieve appropriate levels of satisfaction, comfort, safety, and efficiency at the workplace, which is considered as one of the main goals of ergonomics. Hence, MWL has become one of the most commonly used concepts in ergonomics study and practice.,,,,,
Although there are many different subjective and objective methods to evaluate MWL, the NASA-task load index (TLX) questionnaire, first presented by the United States National Aeronautics and Space Administration, has high validity and acceptance due to its multifaceted features and has been used in previous studies.,, This questionnaire consists of two sections, (1) rating the intensity of each subscale in the range between 0 and 100 and (2) comparing the six subscales in two by two manner., Different studies have presented some problems in the results obtained from the NASA-TLX index., In the first section of the questionnaire, a visual rating scale is used, where the participant must mark specific lines. The experience shows that people are interested in marking between the lines; hence, the researcher cannot achieve the specified number.
The most important problem in this questionnaire is its pair-wise weighting. Pair-wise weighting can easily be calculated, but some methodological and practical problems might be risen, especially in the actual working environments. First, the weighted mean is based on mathematical assumptions that are not usually approved. The second issue is the fact that it cannot refer to variable interactions of workload to correctly represent the integration or effectiveness of a subscale. Third, the association of pair-wise weighting with raw NASA-TLX has also been questioned by others.,, Furthermore, in the pair-wise weighting of this index, one should only choose one of the two options, while both subscales might be equally important in a task. In addition, the importance level of each subscale cannot be specified by choosing only one option.
With regard to the problems of NASA-TLX questionnaire; the aimof this study was to improve the NSAS-TLX questionnaire using the Fuzzy Best-Worst method (FBWM). Therefore, this study will present a new instrument to measure the workload using a general change in the NASA-TLX questionnaire.
| Methods|| |
This cross-sectional study was carried out at Shiraz University of Medical Sciences. In order to evaluate the conventional NASA-TLX and proposed FBWM NASA-TLX questionnaires, the participants performed N-Back standard task at three low, moderate, and high workloads. After each level, the perceived MWL was assessed using the conventional and proposed questionnaires. At first, the participants were trained for 5 min to perform the test and then, they randomly selected a task from three levels of N-Back task and performed it. After completing each task, the questionnaires were immediately given to the participants and they were asked to accurately express their perceived MWL proportional to the performed task. After completing each task and questionnaire, participants rested for 15 min, so that mental exhaustion would not affect the performance of the individual for the next task. Written informed consent was obtained.
This ethics of the study was approved by Shiraz University of Medical Sciences (SUMS) with ethic code No. IR.SUMS.REC.1397.942.
Selection and description of participants
Participants were selected from the students of SUMS randomly. Inclusion criteria were having mental and physical health, nondrug users, nonuse of nerve stimulants drugs, and adequate sleep before the study. The demographic characteristics of participants are presented in [Table 1].
This test is considered as one of the most widely used instruments for measuring the active memory and is a cognitive function assessment task related to executive actions. The participant should check whether the current stimulus is similar to the previous n-step stimulus or not. The difficulty level of this test depends on the comparison of the stimulus with the n-step stimulus, so that if it is 2-back, the participant should compare the current stimulus with the two previous step stimulus.
NASA-task load index questionnaire
It is a six-dimensional scale for estimating workload. The index originally consisted of two sections. The total workload of an activity is divided into six subscales, including Mental demand (MD), Performance (PE), Effort (EF), Frustration (FR), Temporal demand (TD), and Physical demand (PD).
FBWM-NASA-TLX questionnaire: like the NASA-TLX, the questionnaire designed for this study consists of two parts, estimating the intensity of each subscale in a fuzzy function and weighing each subscale using the FBWM.
The present study uses the fuzzy linguistic variables (membership functions) instead of the numerical and visual rating scale. In other words, the participants first had to choose the intensity of each subscale by a term such as very low (0, 0, 25), moderate (0, 25, 50), high (25, 50, 75), and very high (75, 100, 100).
FBWM was proposed by Guo and Zhao in 2017. In the FBWM method, instead of pair-wise comparisons of the variables, comparisons and conclusion are carried out in four steps: (a) selecting the best and worst criteria; (b) comparing the degree and intensity of importance of the best criteria toward other criteria with linguistic variables; (c) comparing the degree and intensity of importance of all criteria toward the worst criterion with linguistic variables [Table 2]; and d) calculating the final weight of the criteria. In this way, frequent comparisons and their large number are prevented, and the decision maker can simply make a better and more accurate decision. Then, Eqs. (1) to (4) are used to weigh the criteria.
|Table 2: Transformation rules of linguistic variables of decision-makers|
Click here to view
Eq. (1): The function of comparing the best criterion with other criteria
Where ā A B is the function of comparing the best criterion with other criteria, a Bj represents the chosen linguistic variable to represent the importance degree of the best criterion toward the criterion j.
Eq. (2): The function of comparing other criteria toward the worst criterion.
Where Ã w is the function of comparing other criteria with the worst criterion, represents the linguistic variable to represent the importance degree of the other criterion i to the worst criterion.
Eq.(3) is the final equation for weighting the criteria in FBWM.
By placing triangular fuzzy numbers in the equation above, Eq. (4) is formed.
Section 3: Calculating the final score of each subscale and the total score.
Finally, after completing both sections of the questionnaire by multiplying the two fuzzy numbers, the first section of the questionnaire was multiplied by the second part (Eq. 5), and finally, the final score of each subscale was achieved by difuzzification of the number obtained from the fuzzy multiplication of the two-section multiplication, using the Eq. (6).
Where, A and B are the first and second sets, respectively, with three lower, medium, and upper bounds.
Where, R ( ai) is the difizzified number obtained from the product of the two sections of the FBWM-NASA-TLX questionnaire. In fact, this is the final score of each subscale. The total score of the questionnaire is also obtained using the algebraic summation of the final score of each subscale.
Microsoft Excel software used for calculating the FBWM-NASA-TLX questionnaire based on the above-mentioned equations and independent t-test was used by SPSS from IBM, North Castle, New York, U.S. (significance level P < 0.05).
| Results|| |
Calculating the scores for subscales of the FBWM-NASA-TLX questionnaire (an example).
One of the questionnaires completed by the participants was as follows. After the N-Back test at level three in the first part of the questionnaire, the participant had selected the high level of linguistic variable for mental demand subscale, very high linguistic variable for performance, medium linguistic variable for effort, high linguistic variable for frustration, high linguistic variable for temporal demand, and very low linguistic variable for physical demand were chosen.
After calculating the Eq. (4), the weight of each subscale was obtained, and in this example, the fuzzy weight of mental demand subscale was (0.23, 0.23, 0.29), for performance (0.27, 0.27, 0.30), effort (0.06, 0.06, 0.07), frustration (0.13, 0.13, 0.17), temporal demand (0.20, 0.21, 0.37), and physical load (0.06, 0.06, 0.07). Fuzzy weight and final defuzzified weight of the mental demand subscale are (17.22, 22.96, 29.26) and 23.04, performance (20.35, 27.13, 30.26) and 26.52, effort (1.57, 3.13, 5.37) and 3.24, frustration (6.26, 9.39, 17) and 10.09, temporal demand (9.90, 15.65, 37) and 18.33, physical demand (0, 0, 1.79) and 0.30, respectively. Eventually, the obtained workload for this person was 81.52.
Comparing the NASA-task load index and Fuzzy Best-Worst method-NASA-task load index questionnaires
The mean scores of the subscales and the final score of the two questionnaires were evaluated separately in different levels of the N-Back test [Table 3]. As it can be seen, there was a significant difference between the two questionnaires on mental demands at the level three and physical load at levels two and three of the tests ( P < 0.05).
|Table 3: Comparison of total scores of subscales between NASA-task load index and fuzzy best-worst method-NASA-task load index among N-Back levels|
Click here to view
Comparing different levels of the N-Back test and each questionnaire was performed separately by t-test [Table 4] and [Table 5]. This test showed that in the NASA-TLX questionnaire, there is a significant difference among the different levels of the N-Back test in subscales, including mental load, effort, frustration, and final score. In FBWM-NASA-TLX questionnaire, there is a significant difference between subscales, including mental load, performance, effort, and final score ( P <0.05).
|Table 4: The scores of subscales and total scores in NASA-task load index questionnaire by the different levels of the N-Back test|
Click here to view
|Table 5: The scores of subscales and total scores in fuzzy best-worst method-NASA-task load index questionnaire by the different levels of the N-Back test|
Click here to view
| Discussion|| |
The present study aimed to provide a novel instrument for measuring MWL with a fundamental change in the NASA-TLX questionnaire. As the results showed, at the most difficult level of the N-Back test (level 3), the two questionnaires were significantly different in mental demand and physical load sub-scales at levels 2 and 3.
The N-Back test used in this study was a complete mental task, and by increasing the level of test and difficulty of doing it, the participants' mental demand increases. Hence, it can be stated that the most important subscale of this study is the same as mental demand and the significant difference between the two questionnaires in this subscale shows the superiority of the developed instrument. There was a significant difference between the two questionnaires in the mental load subscale, and the mean scores of this subscale were lower in the FBWM-NASA-TLX questionnaire, which is due to the different way of rating in the first part and weighting in the second part, and the final calculations of the questionnaire.
Furthermore, in pair-wise comparison, one has to select an option and he cannot express the same importance and equality between the two variables, which leads to unwanted option to receive more scores than other options. Hence, using a method, in which the significance of the two variables would be determined and assigned the same importance to the two subscales, the final score obtained for the higher option is more realistic and will be lower than the pair-wise choice. Considering the problems and limitations of the NASA-TLX questionnaire (numerical visual rating scale and no fuzzy and two by two selection), as can be seen, the present study changed the calculations and also the way to choose subscales in comparison with the conventional version, different results were obtained, representing that the new tool is more powerful and more realistic. This is also consistent with the study of Amady et al. who used the fuzzy logic in the NASA-TLX questionnaire.
As mentioned, the task performed in this study was completely mental and had a low physical load, and the task requested in the present study was performed in sitting form; hence, it was expected that this subscale to have the lowest value. The expected results were obtained in both questionnaires. Physical load in ergonomics had the intensity and its absence was meaningless, and the concept of “low physical load” is always discussed in sitting works, and each sitting task also causes some physical load.,
With regard to the main problem of the second part of the NASA-TLX questionnaire, in most cases, participants did not choose the physical demand, and the weight of this subscale in the second part was considered as 0. Ultimately, the mean score of this subscale will be <1 and it significantly differs from the reality and the way of decision-making. In the weighing section of the FBWM-NASA-TLX questionnaire, the participant determines the weight of each subscale by expressing the importance of the weight of each subscale, leading one to identify and apply even the least intensity.
| Conclusions|| |
The present study considered the problems and limitations of the NASA-TLX questionnaire; hence, the FBWM-NASA-TLX questionnaire was designed and evaluated. The results showed that the FBWM-NASA-TLX questionnaire, used for estimating scores and making was more realistic about the workload in the task under study. The NASA-TLX questionnaire was designed to be easy to apply, but, because of aforementioned problems, it is necessary to be redesign.
Our work clearly has some limitations. The most important one lies in the fact that lack of examination of this questionnaire in an actual job task, and also lack of evaluation using objective methods such as EEG and ERP: event-related potential due to insufficient financial support. One other limitation of this study was the type of task under consideration, which merely examines the mental load and has a very low physical load that cannot be studied by the combined effect of physical and mental load. Considering the reasons for the study's limitations, it might be suggested that field studies be conducted in the future to better assess the current methodology and compare it with other methods, such as SWOT: Strengths, Weaknesses, Opportunities, and Threats.
This study was supported by Shiraz University of Medical Sciences for financial support (grant No. IR.SUMS.REC.1397.942).
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Backs RW, Ryan AM, Wilson GF. Psychophysiological measures of workload during continuous manual performance. Hum Factors 1994;36:514-31.
Hart SG, Staveland LE. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Adv Psychol 1988;52:139-83.
Hart SG, editor NASA-task load index (NASA-TLX); 20 years later. Proceedings of the Human Factors and Ergonomics Society Annual Meeting. Los Angeles, CA: SAGE Publications; 2006.
Delice EK, Can GFJEOÜMvMFD. An Integrated mental workload assessment approach based on Nasa-TLX and SMAA-2: A case study. Eskişehir Osmangazi Üniversitesi Mühendislik ve Mimarlık Fakültesi Dergisi. 2018;26:88-99.
Rubio S, Díaz E, Martín J, Puente JM. Evaluation of subjective mental workload: A comparison of SWAT, NASA-TLX, and workload profile methods. Appl Psychol 2004;53:61-86.
Parasuraman R, Hancock PA. Adaptive Control of Mental Workload. Stress, Workload, and Fatigue; 2001.
Flemisch FO, Onken RJC, Technology, Work. Open a Window to the Cognitive Work Process! Pointillist Analysis of Man–Machine Interaction. 2002;4:160-70.
Loft S, Sanderson P, Neal A, Mooij M. Modeling and predicting mental workload in en route air traffic control: Critical review and broader implications. Human Factors 2007;49:376-99.
Vidulich MA, Tsang PS. Methodological and theoretical concerns in multitask performance: A critique of boles, bursk, phillips, and perdelwitz. Hum Factors 2007;49:46-9.
Wickens CD. Multiple resources and mental workload. Hum Factors 2008;50:449-55.
Young MS, Brookhuis KA, Wickens CD, Hancock PA. State of science: Mental workload in ergonomics. Ergonomics 2015;58:1-17.
Yurko YY, Scerbo MW, Prabhu AS, Acker CE, Stefanidis D. Higher mental workload is associated with poorer laparoscopic performance as measured by the NASA-TLX tool. Simul Healthc 2010;5:267-71.
Zheng B, Jiang X, Tien G, Meneghetti A, Panton ON, Atkins MS. Workload assessment of surgeons: Correlation between NASA TLX and blinks. Surg Endosc 2012;26:2746-50.
Wiebe EN, Roberts E, Behrend TS. An examination of two mental workload measurement approaches to understanding multimedia learning. Comput Hum Behav 2010;26:474-81.
Malekpour F, Malekpour A, Mohammadian Y, Mohammadpour Y, Shakarami A, Sheikh Ahmadi A. Assessment of mental workload in nursing by using NASA-TLX. J Urmia Nurs Midwif Facul 2014;11:892-9.
Amady MM, Raufaste E, Prade H, Meyer JP, Fuzzy-TL. Using fuzzy integrals for evaluating human mental workload with NASA-Task Load indeX in laboratory and field studies. Ergonomics 2013;56:752-63.
Bridger RS, Brasher K. Cognitive task demands, self-control demands and the mental well-being of office workers. Ergonomics 2011;54:830-9.
Nygren TE. Psychometric properties of subjective workload measurement techniques: Implications for their use in the assessment of perceived mental workload. Human Factors 1991;33:17-33.
Hart SG. NASA Task load Index (TLX). Volume 1.0; Paper and pencil package. National Aeronautics and Space Administration 1986;2:10-5.
Ruiz-Rabelo JF, Rodriguez NE, Di-Stasi LL, Jimenez ND, Bermon CJ, Iglesias DC, et al
. Validation of the NASA-TLX score in ongoing assessment of mental workload during a laparoscopic learning curve in bariatric surgery. Obes Surg 2015;25:2451-6.
Kane MJ, Conway AR, Miura TK, Colflesh GJ. Working memory, attention control, and the N-back task: A question of construct validity. J Exp Psychol Learn Mem Cogn 2007;33:615-22.
Jaeggi SM, Buschkuehl M, Perrig WJ, Meier B. The concurrent validity of the N-back task as a working memory measure. Memory 2010;18:394-412.
Li RJ. Fuzzy method in group decision making. Comput Math Appl 1999;38:91-101.
Guo S, Zhao H. Fuzzy best-worst multi-criteria decision-making method and its applications. Knowl-Based Syst 2017;121:23-31.
Rezaei J. Best-worst multi-criteria decision-making method. Omega 2015;53:49-57.
Verhaeghen P, Basak C. Ageing and switching of the focus of attention in working memory: Results from a modified N-Back task. Q J Exp Psychol 2005;58:134-54.
Straker L, Mathiassen SE. Increased physical work loads in modern work–a necessity for better health and performance? Ergonomics 2009;52:1215-25.
Choi B, Schnall PL, Yang H, Dobson M, Landsbergis P, Israel L, et al
. Sedentary work, low physical job demand, and obesity in US workers. Am J Ind Med 2010;53:1088-101.
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5]