Materials and Equipment

DS Demet Soyyılmaz
LG Laura M. Griffin
MM Miguel H. Martín
ŠK Šimon Kucharský
EP Ekaterina D. Peycheva
NV Nina Vaupotič
PE Peter A. Edelsbrunner
ask Ask a question
Favorite

For every construct that we aim to assess, a literature search was done in the PsycInfo and Scopus databases to identify available measures. The choice of the instruments was based on psychometric quality, appropriateness for university context, administration time, translation feasibility, and meaningfulness of usage in a variety of international universities. Regarding psychometric quality, we ensured that basic analysis such as factor analysis, estimation of reliability or internal consistency had been conducted and achieved at least moderate results. Appropriateness for the 1st-year of university was taken into account insofar as we tried to estimate on which level Psychology students develop during their 1st-year. For example, scientific reasoning is a broad construct, and we chose an instrument that assesses skills which we think are critical for students’ further development, and likely to show at least some development already during their first university year. The chosen instrument assesses principles of experimental design that we deem relevant for understanding the critical quality characteristics of any research the students learn about (Drummond and Fischhoff, 2015).

Students’ demographic characteristics will include their age, gender, former university education, career aspirations, grades in high school, the grade of first university examination and family socioeconomic status (see Appendix A). For the latter, we will ask students about their parents’ highest achieved education, bedroom availability and the number of books at home in their adolescence (Evans et al., 2010). Socioeconomic status is assessed to examine its influence on the main study variables and to estimate other variables’ influence while controlling for it. We will assess family socioeconomic status because university students are still in education, which constrains their own educational level and also their working situation, the most common indicators of personal socioeconomic status. Family socioeconomic status is thus commonly assessed for research in academic contexts (Caro and Cortes, 2012). Students’ estimated score from the first principal component of the four variables will be used as an indicator of their family socioeconomic status. Finally, we will assess the quantity of formal education relevant for developing scientific thinking (number of methodology and statistics-related courses, number of philosophy of science and epistemology-related courses).

As a measure of scientific reasoning, the Scientific Reasoning Scale developed and validated by Drummond and Fischhoff (2015) will be used. It contains eleven true or false items in which hypothetical research scenarios are described and the participant has to decide whether the scenario can lead to proposed inferences. Each of the items relates to a specific concept crucial for the ability to come to valid scientific conclusions. The concepts include understanding the importance of control groups and random assignment, identifying confounding variables, and distinguishing between correlation and causation. Scores on the SRS show adequate internal consistency (Cronbach’s α = 0.70) and correlate positively with cognitive reflection, numeracy, open minded thinking, and the ability to analyze scientific information (Drummond and Fischhoff, 2015). Following this scale, we added an additional item assessing students’ understanding of sample representativeness (Appendix B). Students’ mean score on the scale will be used in descriptive analysis as an indicator of their scientific reasoning. Whether the item on sample representativeness can be added to the scale will be decided based on a confirmatory factor analysis: It will be added in case its factor loading is within the range of the other items.

We developed a questionnaire encompassing five questions that deal with common statistical misconceptions (Appendix B). Items dealing with p-value and confidence interval misinterpretations were taken directly from Gigerenzer (2004) and Morey et al. (2015). We chose the item with the highest prevalence of wrong answers among university students from each article to achieve high variance in our sample of 1st-year students. We further developed items similar in structure dealing with the interpretation of non-significant results, the equivalence of significant and non-significant results (Gelman and Stern, 2006; Nieuwenhuis et al., 2011), and sample representativeness. The items share structure and answer format with the scientific reasoning scale by Drummond and Fischhoff (2015). We added the items after the end of the scientific reasoning scale. Participants are also asked whether they have ever learned about p-values, confidence intervals, and sample representativeness. In case they check “no,” their answers on the respective questions will be treated as missing values. Students’ mean value across the four questions dealing with p-values and confidence intervals will be used as an indicator of their statistics misconceptions. The question on sample representativeness, as described above, will be used as an additional item of the scientific reasoning scale.

For the Scientific Reasoning Scale and the added statistics misconceptions items, we will add one open-answer validation question. Each student will receive the following question at one random item of the 16 items that the two scales encompass: “Why did you choose this answer? Please provide an explanation.”, followed by two lines on which the students are supposed to provide a short rationale for their multiple choice-answer. The question to which this additional open answer is added will differ randomly between students, so that a random subsample of the students will deal as validation sample for each question. We implement this validation measure because the SRS to the best of our knowledge has not yet been translated into our sampled languages and not been used in the sampled countries. It is therefore necessary to examine whether 1st-year psychology students’ answers on these questions reflect the target construct. The statistics misconceptions to the best of our knowledge have not yet been thoroughly validated but rather used to assess the prevalence of wrong answers among students and academics, and we developed three of the questions on our own, therefore we include them in this validation procedure.

To assess epistemic cognition we will administer the Epistemic and Ontological Cognition Questionnaire (EOCQ; Greene et al., 2010). It contains 13 items and a 6-point item response scale ranging from 1 (completely disagree) to 6 (completely agree). The instrument takes into account the contextuality of epistemic cognition by providing the opportunity to insert a domain into the item stems (Greene et al., 2008). We insert Psychology and Psychological science for the domain that the students should rate the items about. Five items represent simple and certain knowledge (example: “in psychological science, what is a fact today will be a fact tomorrow”), four items represent justification by authority (“I believe everything I learn in psychology class”), and four items represent personal justification (“in psychological science, what‘s a fact depends upon a person’s view”). Higher ratings of ten items indicate stronger beliefs and high ratings of three items indicate weaker beliefs. Reliability estimates (H coefficient) range from 0.45 to 0.90 depending on facet and context (Greene et al., 2010). Mean scores on all three subscales will undergo mixture modeling analysis, which will yield an epistemic cognition-profile for each student that will be used for further analysis (Greene et al., 2010).

We will use the Need for Cognition Short Scale (NFC-K; Beißert et al., 2014) to measure the tendency to engage in and enjoy thinking. The short scale is a modified 4-item version of the 18-item Need for Cognition Scale created by Cacioppo and Petty (1982). On a 7-point scale the students are asked to rate to which extent they agree with four simple statements. An example item is “I would prefer complex to simple problems.” Mean scores from this scale will be used for descriptive analysis, with higher scores indicating that students are more motivated to apply their thinking skills. Test retest reliability is r = 0.78, Cronbach’s α = 0.86 (Beißert et al., 2014). The score will be used to predict students’ development of scientific thinking, and also as a control variable to examine which variables predict students’ development beyond need for cognition.

The Science Self-Efficacy (SSE) scale, which consists of 10-items used by Moss (2012) will be used (Cronbach’s α > 0.80). It is a modified version of a vocational self-efficacy survey designed by Riggs et al. (1994). It particularly aims to measure confidence in skills to engage in scientific inquiry. The items are rated on a scale from 1 to 10 (1 = not able or not true at all, 10 = completely able or completely true). An example item is “I have all the skills needed to perform science tasks very well.” Students’ mean score on the scale will be used for statistical modeling. The score will be used to predict students’ development of scientific thinking, and also as a control variable to examine which variables predict students’ development beyond science self-efficacy.

We developed a survey to assess students’ engagement in learning experiences that we presume relevant for the development of scientific thinking (Appendix C). The selection of experiences is based on the discussed literature, and it will be further informed and adapted based on the pilot study interviews (Appendix D). Our definitions of formal and informal learning imply a continuum of formality within and across learning activities. For example, a frequent formal learning activity is the studying of a text that is mandatory reading for a research methods course. When students gain interest in the text contents, they might initiate further voluntary reading to inform themselves beyond the course requirements, which in our definition is then an informal learning experience. Our assessment method encompasses a wide variety of prescribed and non-prescribed scientific learning experiences: For each of the assessed activities that can be either formal or informal, we ask students how often they engaged in these as part of mandatory course activities, or for reasons going beyond these. Specifically, we let students rate subjectively for experiences where this applied how much they engaged in them because it was obligatory for course requirements (formal engagement), because it was obligatory but they were also interested (formal and informal engagement), or merely out of own interest (informal engagement).

In the second part of the survey, we ask students about the most relevant three courses they took that were related to research methods, statistics, science, history of science or other similar concepts. We ask for up to three courses because we studied the official bachelor curricula from the targeted universities and most students will not have more highly relevant courses during their first and second semester. Therefore, reporting on further courses might make it strongly subjective which courses the students deem relevant to this question, and it might take rather long and be exhausting to report details on any relevant courses they could think of. To check that they did not have many more relevant courses we, however, ask in the demographics for the absolute numbers of relevant courses. Thus, for up to three most relevant courses, they first list the names of the courses and whether the courses were mandatory or elective. Then, we ask students about their general engagement in these courses (student presence, devoted working time), and course quality (ratings of overall course quality, teaching quality, frequency of inquiry and reflective course elements). Finally, reflecting informal engagement, they rate how much they engaged in each of these courses out of their motivation or interest, beyond the course requirements. Estimating principal components, we will weigh general course engagement across courses with course content ratings to yield an indicator of formal engagement, and informal (out of own motivation or interest) engagement with course quality ratings to yield an indicator of informal engagement.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A