# Chapter 3 (Study 2). Language and vision in conceptual processing: Multilevel analysis and statistical power


\bigskip
\bigskip
\bigskip


```{r}

# This chapter is rendered by 'thesis-core.Rmd'. Thus, it inherits all the 
# parameters set in 'thesis-core.Rmd' and the objects loaded therein.

```


\begin{adjustwidth}{1.5cm}{1.5cm}

Research has suggested that conceptual processing depends on both language-based and vision-based information. We tested this interplay at three levels of the experimental structure: individuals, words and tasks. To this end, we drew on three existing, large data sets that implemented the paradigms of semantic priming, semantic decision and lexical decision. We extended these data sets with measures of language-based and vision-based information, and analysed how the latter variables interacted with participants' vocabulary size and gender, and also with presentation speed in the semantic priming study. We performed the analysis using mixed-effects models that included a comprehensive array of fixed effects---including covariates---and random effects. First, we found that language-based information was more important than vision-based information. Second, in the semantic priming study---whose task required distinguishing between words and nonwords---, both language-based and vision-based information were more influential when words were presented faster. Third, a 'task-relevance advantage' was identified in higher-vocabulary participants. Specifically, in lexical decision, higher-vocabulary participants were more sensitive to language-based information than lower-vocabulary participants. In contrast, in semantic decision, higher-vocabulary participants were more sensitive to word concreteness. Fourth, we demonstrated the influence of the analytical method on the results. These findings support the interplay between language and vision in conceptual processing, and demonstrate the influence of measurement instruments on the results. Last, we estimated the sample size required to reliably investigate various effects. We found that 300 participants were sufficient to examine the effect of language-based information contained in words, whereas more than 1,000 participants were necessary to examine the effect of vision-based information and the interactions of both former variables with vocabulary size, gender and presentation speed. In conclusion, this power analysis suggests that larger sample sizes are necessary to investigate perceptual simulation and individual differences in conceptual processing.

\end{adjustwidth}

\bigskip
\bigskip
\bigskip


Over the last two decades, research in the cognitive sciences has suggested that conceptual processing depends on both language and embodiment systems. That is, understanding words involves---on the one hand---lexical and semantic associations of an amodal kind, and---on the other hand---modality-specific associations within perceptual, motor, affective and social domains [@barsalouLanguageSimulationConceptual2008; @connell2019a; @davisBuildingSemanticMemory2021; @khatin-zadehStrongVersionsEmbodied2021; @viglioccoTheorySemanticRepresentation2009]. Studies addressing these systems have found that the language system is overall more prevalent in word processing, producing larger effect sizes [@banksLinguisticDistributionalKnowledge2021; @kiela2014a; @louwerse2015a; @lam2015a; @pecherDoesPizzaPrime1998; @petilli2021a]. More intricately, the roles of language and embodiment are modulated by the characteristics of individuals, words and tasks. For instance, people's individual experience with language is associated with differential effects relating to phonological, lexical and semantic features of words [@jaredSkilledAdultReaders2017; @pexman2018a; @yap2012a; @yap2017a; @yapIndividualDifferencesJoint2009]. Similarly, physical expertise and perceptual biases are associated with differences in the mental simulation of meaning [@beilockSportsExperienceChanges2008; @calvo-merinoActionObservationAcquired2005; @vukovic2015a]. Furthermore, the embodiment system is especially suited for the processing of concrete concepts---e.g., *red*, *building* [@jonesDistrubutionalSemanticsStill2022; @kousta2011a; @ponari2018a; cf. @borghiAbstractConceptsExternal2022]. Embodied information also becomes more important in the following conditions: (I) later in the time courses of word recognition [@bernabeu2017a; @louwerseNeurologicalEvidenceLinguistic2012; cf. @petilli2021a] and property generation [@santosPropertyGenerationReflects2011; @simmonsFMRIEvidenceWord2008], (II) when participants produce slower responses [@louwerseTasteWordsLinguistic2011], and (III) in tasks that elicit deeper semantic processing---e.g., semantic decision, as opposed to lexical decision [@ostarekTaskdependentCausalRole2017; @petilli2021a]. Last, research in computational linguistics has provided further support for the complementarity of language and embodied information, by revealing increased predictive performance when models are provided with perceptual information on top of text-based information [@NIPS2013_7cce53cf; @roadsLearningUnsupervisedAlignment2020].

In spite of the amount of evidence demonstrating the interplay between language and embodiment, there are four reasons to continue testing the interplay theory. First, the coexistence of several systems in a scientific theory must be thoroughly justified due to the value of simplicity [@galleseBrainConceptsRole2005; @tillmanHowSharpOccam2015]. This scrutiny is particularly necessary because the language system has consistently produced larger effect sizes than the embodiment system [@banksLinguisticDistributionalKnowledge2021; @kiela2014a; @louwerse2015a; @lam2015a; @pecherDoesPizzaPrime1998; @petilli2021a]. Therefore, it should be ruled out that the language system can suffice in all contexts.

Second, it is important to examine both language and embodiment across various levels of the experimental structure---namely, individuals (i.e., due to individual differences such as vocabulary size), words (i.e., lexical and semantic variables) and tasks (i.e., experimental conditions affecting, for instance, processing speed). Some studies have approached this comprehensive structure but there is still room to widen the scope. One of the findings revealed by cross-level analyses is the influence of word processing tasks on the importance of modality-specific information. For instance, @connellSeeHearWhat2014 found that the vision-based information in words is important both for word identification (i.e., lexical decision) and for reading aloud (i.e., naming). In contrast, the auditory information in words is important for reading aloud but not so much for word identification. Another finding from cross-level research is a 'task-relevance advantage' for individuals that have a greater linguistic experience. Specifically, @pexman2018a found that higher-vocabulary individuals were more sensitive to task-relevant information, such as word concreteness in the semantic decision task. Furthermore, regarding embodiment, research has revealed that individuals who are briefly exposed to a certain sport develop neural activity that allows them to mentally simulate sport-specific actions during language processing [@beilockSportsExperienceChanges2008]. While these works have covered a large swathe of the present topic, one question remains unanswered: how does an individual's linguistic experience relate to their sensivity to both linguistic and embodied information in words?

Third, there is some inconclusive evidence. First, some findings have suggested that higher-vocabulary participants are more sensitive to language-based information---as reflected in greater semantic priming [@yap2017a]---, whereas other findings have suggested the opposite [@yapIndividualDifferencesJoint2009]. Second, some studies have suggested that the language system is activated before the embodiment system [@lam2015a; @louwerseTasteWordsLinguistic2011], whereas a recent study suggested that this pattern does not hold in the lexical decision task [@petilli2021a]. Third, some evidence has suggested that female participants draw on the language system more prominently than males [@burman2008a; @hutchinson2013a; @jung2019a; @ullman2008a], whereas other research has suggested that this difference is negligible in the general population [@wallentinChapterGenderDifferences2020]. 

Fourth, some of the previous studies could have been affected by the scarcity of statistical power that has been identified in cognitive psychology and neuroscience [@marekReproducibleBrainwideAssociation2022; @lynottReplicationExperiencingPhysical2014; @montero-melisNoEvidenceEmbodiment2022]. Problematically, low-powered studies present more errors in the estimation of effect sizes and $p$ values [@lokenMeasurementErrorReplication2017; @heymanReliabilityItemlevelSemantic2018; @vasishthStatisticalSignificanceFilter2018]. The current studies address these four key issues.


# The present studies {#present-studies}

We revisit three larger-than-average studies to investigate the interplay between language and embodiment in conceptual processing. We devote a study to each of the three original studies. Thus, Study 2.1 is centred on @hutchison2013a and uses the semantic priming paradigm. Study 2.2 is centred on @pexman2017a and uses the semantic decision paradigm. Study 2.3 is centred on @balota2007a and uses the lexical decision paradigm. Each of these central studies contained measures of participants' vocabulary size and gender. Furthermore, the core data sets were expanded by adding variables that captured the language-based information in words [@mandera2017a; @wingfieldUnderstandingRoleLinguistic2022] and the vision-based information in words [@lynott2020a; @petilli2021a]---the latter being used to represent the embodiment system. One of the key questions we investigated using this array of variables was whether individual differences in vocabulary and gender modulated participants' sensivity to the language-based and vision-based information in words. Alongside the effects of interest, several covariates were included in the models to allow a rigorous analysis [@sassenhagenCommonMisapplicationStatistical2016]. These covariates comprised measures of general cognition and lexical characteristics of the stimulus words. Last, in each study, we performed a statistical power analysis to help estimate the sample size needed to investigate a variety of effects in future studies. 

Below, we delve into the language and the embodiment components of these studies.


## Language {#language}

Studies have operationalised the language system at the word level using measures that capture the relationships among words without explicitly drawing on any sensory or affective modalities. Two main types of linguistic measures exist: those based on text corpora---dubbed *word co-occurrence* measures [@bullinariaExtractingSemanticRepresentations2007; @petilli2021a; @wingfieldUnderstandingRoleLinguistic2022]---and those based on associations collected from human participants---dubbed *word association* measures [@de2016a; @de2019a]. Notwithstanding the interrelation between word co-occurrence and word association [@planchueloNatureWordAssociations2022], co-occurrence is more purely linguistic, whereas association indirectly captures more of the sensory and affective meaning of words [@dedeyneVisualAffectiveMultimodal2021].

### Operationalisation and hypotheses

In Study 2.1 (semantic priming) and Study 2.2 (semantic decision), co-occurrence measures were used to represent the language system at the word level. Specifically, in Study 2.1, this measure was called `language-based similarity`, and it was based on the degree of text-based co-occurrence between the prime word and the target word in each trial [@mandera2017a]. In Study 2.2, the measure was called `word co-occurrence`, and it was based on the degree of text-based co-occurrence between each stimulus word and the words 'abstract' and 'concrete' [@wingfieldUnderstandingRoleLinguistic2022]. In Study 2.3 (lexical decision), a co-occurrence measure could not be used, as the co-occurrence of words in consecutive trials could not be calculated due to the high frequency of nonword trials throughout the lexical decision task. Therefore, a single-word measure had to be used instead. Word frequency was used as it was the lexical variable, among five, that had the largest effect (see [\underline{Appendix A}](#appendix-A-lexical-covariates)).

At the individual level, language was represented by participants' vocabulary size in Studies 2.1 and 2.2, and by participants' vocabulary age in Study 2.3. Vocabulary *size* and *age* did not differ in any consequential way. They both captured the amount of vocabulary knowledge of each participant, by testing their knowledge of a small sample of pre-normed words, and thereby inferring their overall knowledge.

We hypothesised that word co-occurrence, word frequency and vocabulary size would all have facilitatory effects on participants' performance, with higher values leading to shorter RTs [@pexman2018a; @yapIndividualDifferencesJoint2009; @wingfieldUnderstandingRoleLinguistic2022].


## Embodiment represented by vision-based information

In previous studies, the embodiment system has been represented at the word level by perceptual, motor, affective or social variables [@fernandinoDecodingInformationStructure2022; @viglioccoTheorySemanticRepresentation2009; @wangSocialEmotionDimensional2021]. For instance, the perceptual modalities have often corresponded to the five Aristotelian senses---vision, hearing, touch, taste and smell [@bernabeu2017a; @bernabeu2021a; @louwerseTasteWordsLinguistic2011]---and, less often, to interoception [@connellInteroceptionForgottenModality2018]. Yet, out of all these domains, vision has been most frequently used in research [e.g., @bottiniConcretenessAdvantageLexical2021; @dedeyneVisualAffectiveMultimodal2021; @pearsonHeterogeneityMentalRepresentation2015; @petilli2021a; @yeeColorlessGreenIdeas2012]. The hegemony of vision is likely due to the central position of vision in the human brain [@reillyEnglishLexiconMirrors2020] as well as in several languages [@bernabeuDutchModalityExclusivity2018; @chenMandarinChineseModality2019; @miceliPerceptualInteroceptiveStrength2021; @morucciAugmentedModalityExclusivity2019; @roqueVisionVerbsDominate2015; @speedDutchSensoryModality2021; @speedGroundingLanguageNeglected2020; @lynott2020a; @vergallitoPerceptualModalityNorms2020; @winterVisionDominatesPerceptual2018; @zhongSensorimotorNormsChinese2022]. In the present study, we focussed on vision alone due to three reasons. First, we wanted to use a single variable to represent sensorimotor information, just as a single variable would be used to represent linguistic information. Using a single variable for each system facilitates the analysis of interactions with other variables. Second, vision is very prominent in cognition, as we just reviewed. Third, we had planned to use the present research to determine the sample size of a subsequent study that focusses on vision (indeed, the present study grew out of a statistical power analysis).

### Operationalisation and hypotheses

At the word level, we operationalised visual information using the visual strength variable from the Lancaster Sensorimotor Norms [@lynott2020a]. This variable measures the degree of visual experience associated with concepts. In Study 2.1, we created the variable `visual-strength difference` by subtracting the visual strength of the prime word from that of the target word, in each trial. Thus, visual-strength difference measured---in each trial---how much the prime word and the target word differed in their degrees of vision-based information. Even though we could not find any previous studies that reported the effect of visual strength (or visual-strength difference) on RT, we hypothesised a priming effect underpinned by this variable, consistent with related research [@petilli2021a]. Specifically, we hypothesised that visual-strength difference would have an inhibitory effect on participants' performance, with higher values leading to longer RTs. 

In Studies 2.2 and 2.3, we used the `visual strength` score per stimulus word. We hypothesised that this variable would have a facilitatory effect on participants' performance---i.e., higher values leading to shorter RTs---, consistent with related research [@petilli2021a].

Unlike language, vision was not examined at the individual level because the available variables were based on one self-reported value per participant [@hutchison2013a; @balota2007a], contrasting with the greater precision of the vocabulary measures, which consisted of multiple trials. Nonetheless, we recognise the need to investigate the role of perceptual experience [@murakiSimulatingSemanticsAre2021; @plautIndividualDevelopmentalDifferences2000] alongside that of linguistic experience in the future.


## Levels of analysis

Experimental data in psycholinguistics can be divided into various levels, such as individuals, words and tasks. The simultaneous examination of a theory across several levels is expected to enhance our understanding of the theory [@ostarekStrongInferenceResearch2021]---for instance, by revealing the distribution of explanatory power (that is, effect size) within and across these levels. Several studies have probed more than one level---for instance, word level and individual level [@aujlaLanguageExperiencePredicts2021; @lim2020a; @pexman2018a; @yapIndividualDifferencesJoint2009], or word level and task level [@al-azaryCanYouTouch2022; @connell2013a; @connellSeeHearWhat2014; @ostarekSixChallengesEmbodiment2019; @petilli2021a]. This multilevel approach is complementary to a different line of research that aims to test the causality of various sources of information in conceptual processing, such as language [@ponari2018a], perception [@stasenkoWhenConceptsLose2014] and action [@speedImpairedComprehensionSpeed2017]. 

The three levels considered in this study---individual, word and task---are described below.

### Individual level

The individual level is concerned with the role of individual differences in domains such as language, perception, mental imagery and physical experience [e.g., @daidoneVocabularySizeKey2021; @davies2017a; @dils2010a; @fettermanFeelingWarmBeing2018; @holt2006a; @mak2019a; @miceliDifferencesRelatedAging2022; @pexman2018a; @vukovic2015a; @yap2012a; @yap2017a; @yapIndividualDifferencesJoint2009].^[According to @lamiellStatisticalThinkingPsychology2019, 'individual differences' is a misnomer in that the analyses used to examine those (e.g, regression) are not participant-specific. While this may partly hold for the current study too, the use of by-participant random effects increases the role of individuals in the analysis.] Recent studies have revealed important roles of participant-specific variables in topics where these variables have not traditionally been considered [@delucaRedefiningBilingualismSpectrum2019; @kosIndividualVariationLate2012; @montero-melisConsistencyMotionEvent2021].

Vocabulary size is used to represent the language system at the individual level. It measures the number of words a person can recognise out of a sample. Furthermore, covariates akin to general cognition---where available---were included the models (see [\underline{Covariates}](#covariates) section below).

### Word level

The word level is concerned with the lexical and semantic information in words [e.g., @dedeyneVisualAffectiveMultimodal2021; @lam2015a; @lund1995a; @lund1996a; @lynott2020a; @mandera2017a; @petilli2021a; @pexman2017a; @santosPropertyGenerationReflects2011; @wingfieldUnderstandingRoleLinguistic2022]. The word-level variables of interest in this study are language-based and vision-based information (both described above). The covariates are lexical variables and word concreteness. The lexical covariates were selected in each study out of the same five variables (see [\underline{Covariates}](#covariates) section below).

### Task level

The task level is concerned with experimental conditions affecting, for instance, processing speed. In Study 2.1 (semantic priming), there is one task-level factor, namely, stimulus onset asynchrony (SOA), which measures the temporal interval between the onset of the prime word and the onset of the target word.^[The names of all variables used in the analyses were slightly adjusted for this text to facilitate their understanding---for instance, by replacing underscores with spaces (conversions reflected in the scripts available at [http://doi.org/10.17605/OSF.IO/UERYQ](http://doi.org/10.17605/OSF.IO/UERYQ)). One specific case deserves further comment. We use the formula of the SOA in this paper, instead of the 'interstimulus interval' (ISI)---which we used in the analysis---, as the SOA has been more commonly used in previous papers [e.g., @hutchison2013a; @pecherDoesPizzaPrime1998; @petilli2021a; @yap2017a]. In our analysis, we used the ISI formula as it was the one present in the data set of @hutchison2013a---retrieved from [https://www.montana.edu/attmemlab/documents/all%20ldt%20subs_all%20trials3.xlsx](https://www.montana.edu/attmemlab/documents/all%20ldt%20subs_all%20trials3.xlsx). The only difference between these formulas is that the ISI does not count the presentation of the prime word. In the current study [@hutchison2013a], the presentation of the prime word lasted 150 ms. Therefore, the 50 ms ISI is equivalent to a 200 ms SOA, and the 1,050 ms ISI is equivalent to a 1,200 ms SOA. The use of either formula in the analysis would not affect our results, as the ISI conditions were recoded as -0.5 and 0.5 [@brauer2018a].] In Studies 2.2 and 2.3, there are no task-level variables. 

Beyond task-level variables, there is an additional source of task-related information across the three studies---namely, the experimental paradigm used in each study (i.e., semantic priming, semantic decision and lexical decision). Indeed, it is possible to examine how the effects vary across these paradigms [see @wingfieldUnderstandingRoleLinguistic2022]. This comparison, however, must be considered cautiously due to the existence of other non-trivial differences across these studies, such as the numbers of observations. With this caveat noted, the tasks used across these studies likely elicit varying degrees of semantic depth, as ordered below [see @balotaDepthAutomaticSpreading1986; @barsalouLanguageSimulationConceptual2008; @beckerLongtermSemanticPriming1997; @dewitMaskedSemanticPriming2015; @joordensLongShortSemantic1997; @lam2015a; @murakiSimulatingSemanticsAre2021; @ostarekTaskdependentCausalRole2017; @versaceImpactEmbodiedSimulation2021; @wingfieldUnderstandingRoleLinguistic2022].

1. *Semantic decision* (Study 2.2) likely elicits the deepest semantic processing, as the instructions of this task ask for a concreteness judgement. In this task, participants are asked to classify words as abstract or concrete, which elicits deeper semantic processing than the task of identifying word forms---i.e., lexical decision [@dewitMaskedSemanticPriming2015].

2. *Semantic priming* (Study 2.1). The task administered to participants in semantic priming studies is often lexical decision, as in Study 2.1 below. The fundamental characteristic of semantic priming is that, in each trial, a prime word is briefly presented before the target word. The prime word is not directly relevant to the task, as participants respond to the target word. Nonetheless, participants normally process both the prime word and the target word in each trial, and this combination allows researchers to analyse responses based on the prime--target relationship. In this regard, this paradigm could be considered more deeply semantic than lexical decision. Indeed, slower responses in semantic priming studies---reflecting difficult lexical decisions---have been linked to larger priming effects [@balotaMeanResponseLatency2008; @hoedemakerItTakesTime2014; @yapAdditiveInteractiveEffects2013], revealing a degree of semantic association that has not been identified in the lexical decision task.

3. *Lexical decision* (Study 2.3) is likely the semantically-shallowest task of these three, as it focusses solely on the identification of word forms.


## Hypotheses {#hypotheses}

The central objective of the present studies is the simultaneous investigation of language-based and vision-based information, along with the interactions between each of those and vocabulary size, gender and presentation speed (i.e., SOA). Previous studies have examined subsets of these effects using the same data sets we are using [@balota2007a; @petilli2021a; @pexman2017a; @pexman2018a; @yap2012a; @yap2017a; @wingfieldUnderstandingRoleLinguistic2022]. Out of these studies, only @petilli2021a investigated *both* language and vision. However, in contrast to our present study, Petilli et al. did not examine the role of vocabulary size or any other individual differences, instead collapsing the data across participants.

In addition to main effects of the aforementioned variables, our three studies have four interactions in common: (1a) language-based information $\times$ vocabulary size, (1b) vision-based information $\times$ vocabulary size, (2a) language-based information $\times$ participants' gender, and (2b) vision-based information $\times$ participants' gender. In addition, Study 2.1 contained two further interactions: (3a) language-based information $\times$ SOA, (3b) vision-based information $\times$ SOA (note that the names of some predictors vary across studies, as detailed in the [\underline{present studies}](#language) section above). Each interaction and the corresponding hypotheses are addressed below.

### 1a. Language-based information $\times$ vocabulary size 

We outline three hypotheses supported by literature regarding the interaction between language-based information and participants’ vocabulary size.

- *Larger vocabulary, larger effects.* Higher-vocabulary participants might be more sensitive to linguistic features than lower-vocabulary participants, thanks to a larger number of semantic associations [@connell2019a; @landauerIntroductionLatentSemantic1998; @louwerse2015a; @paivioMentalRepresentationsDual1990; @pylyshynWhatMindEye1973]. For instance, @yap2017a revisited the semantic priming study of @hutchinson2013a and observed a larger semantic priming effect in higher-vocabulary participants. 

- *Larger vocabulary, smaller effects.* Higher-vocabulary participants might be less sensitive to linguistic features, thanks to a more automated language processing [@perfettiLexicalQualityHypothesis2002]. Some of the evidence aligned with this hypothesis was obtained by @yapIndividualDifferencesJoint2009, who observed a smaller semantic priming effect in higher-vocabulary participants. Similarly, @yap2012a found that higher-vocabulary participants in a lexical decision task [@balota2007a] were less sensitive to a cluster of lexical and semantic features (i.e., word frequency, semantic neighborhood density and number of senses). 

- *Larger vocabulary, more task-relevant effects.* Higher-vocabulary participants might present a greater sensitivity to task-relevant variables, borne out of their greater linguistic experience, relative to lower vocabulary participants. This would be consistent with the findings of @pexman2018a, who revisited the semantic decision study of @pexman2017a. The semantic decision task of the Pexman et al. consisted of classifying words as abstract or concrete. Pexman and Yap found that word concreteness---a very relevant source of information for this task---was more influential in higher-vocabulary participants than in lower-vocabulary ones. In contrast, word frequency and age of acquisition----not as relevant to the task--were more influential in lower-vocabulary participants [also see @lim2020a]. In our present studies, we set our hypotheses regarding the 'task-relevance advantage' by working under the assumption that the language-based information in words---represented by one variable in each study---is important for the three tasks, given the large effects of language across tasks [@banksLinguisticDistributionalKnowledge2021; @kiela2014a; @louwerse2015a; @lam2015a; @pecherDoesPizzaPrime1998; @petilli2021a]. Therefore, the relevance hypothesis predicts that higher-vocabulary participants---compared to lower-vocabulary ones---will be more sensitive to language-based information (as represented by `language-based similarity` in Study 2.1, `word co-occurrence` in Study 2.2, and `word frequency` in Study 2.3).

### 1b. Vision-based information $\times$ vocabulary size 

To our knowledge, no previous studies have investigated the interaction between vision-based information and participants' vocabulary size. We entertained two hypotheses. First, lower-vocabulary participants might be more sensitive to visual strength than higher-vocabulary participants. In this way, lower-vocabulary participants might *compensate* for the disadvantage on the language side. Second, we considered the possibility that there were no interaction effect.

### 2a. Language-based information $\times$ gender

We entertained two hypotheses regarding the interaction between language-based information and participants' gender: (a) that the language system would be more important in female participants than in males [@burman2008a; @hutchinson2013a; @jung2019a; @ullman2008a], and (b) that this interaction effect would be absent, as a recent review suggested that gender differences are negligible in the general population [@wallentinChapterGenderDifferences2020].

### 2b. Vision-based information $\times$ gender

To our knowledge, no previous studies have investigated the interaction between vision-based information and participants' gender. We entertained two hypotheses. Our first hypothesis was that this interaction would stand opposite to the interaction between language and gender. That is, if female participants were to present a greater role of language-based information, male participants would present a greater role of vision-based information, thereby compensating for the disadvantage on the language side. Our second hypothesis was the absence of this interaction effect [see @wallentinChapterGenderDifferences2020].

### 3a. Language-based information $\times$ SOA

Previous research predicts that language-based information will have a larger effect with the short SOA than with the long one [@lam2015a; @petilli2021a]), which also aligns with research demonstrating the fast activation of language-based information [@louwerseTasteWordsLinguistic2011; @santosPropertyGenerationReflects2011; @simmonsFMRIEvidenceWord2008].

### 3b. Vision-based information $\times$ SOA

The interaction between vision-based information and SOA allows three hypotheses. First, some previous research predicts that the role of vision-based information will be more prevalent with the long SOA than with the short one [@louwerseTasteWordsLinguistic2011; @santosPropertyGenerationReflects2011; @simmonsFMRIEvidenceWord2008; also see @barsalouLanguageSimulationConceptual2008]. Second, in contrast, other research [@petilli2021a] based on the same data that we are analysing [@hutchison2013a] predicts vision-based priming only with the short SOA (200 ms), and not with the long one (1,200 ms). Third, other research does not predict any vision-based priming effect [@hutchisonSemanticPrimingDue2003; @ostarekTaskdependentCausalRole2017; @pecherDoesPizzaPrime1998; @yeeColorlessGreenIdeas2012]. In this regard, some studies have observed vision-based priming when the task was preceded by another task that required attention to visual features of concepts [@pecherDoesPizzaPrime1998; @yeeColorlessGreenIdeas2012], but the present data [@hutchison2013a] does not contain such a prior task.

### Language and vision across studies

Next, we consider our hypotheses regarding the role of language and vision across studies. Yet, before addressing those, we reiterate that caution is required due to the existence of other differences across these studies, such as the number of observations. First, we hypothesise that language-based information will be relevant in the three studies due to the consistent importance of language observed in past studies [@banksLinguisticDistributionalKnowledge2021; @kiela2014a; @louwerse2015a; @lam2015a; @pecherDoesPizzaPrime1998; @petilli2021a]. Second, the extant evidence regarding vision-based information is less conclusive. Some studies have observed effects of vision-based information [@floresdarcaisSemanticActivationRecognition1985; @schreuderEffectsPerceptualConceptual1984; @connellSeeHearWhat2014; @petilli2021a], whereas others have not [@hutchisonSemanticPrimingDue2003; @ostarekTaskdependentCausalRole2017], and a third set of studies have only observed them when the critical task was preceded by a task that required attention to visual features of concepts [@pecherDoesPizzaPrime1998; @yeeColorlessGreenIdeas2012]. Based on these precedents, we hypothesise that vision-based information will be relevant in semantic decision, whereas it might or might not be relevant in semantic priming and in lexical decision.


## Statistical power analysis

Statistical power depends on the following factors: (1) sample size---comprising the number of participants, items, trials, etc.---, (2) effect size, (3) measurement variability and (4) number of comparisons being performed. Out of these, sample size is the factor that can best be controlled by researchers [@kumleEstimatingPowerGeneralized2021]. The three studies we present below, containing larger-than-average sample sizes, offer an opportunity to perform an a-priori power analysis to help determine the sample size of future studies [@albersWhenPowerAnalyses2018].

### Motivations

Insufficient statistical power lowers the reliability of effect sizes, and increases the likelihood of false positive results---i.e., Type I errors---as well as the likelihood of false negative results---i.e., Type II errors [@gelmanPowerCalculationsAssessing2014; @lokenMeasurementErrorReplication2017; @tverskyBeliefLawSmall1971; @vondermalsburgFalsePositivesOther2017]. For instance, @vasishthHowEmbraceVariation2021 illustrate how, in low-powered studies, effect sizes associated with significant results tend to be overestimated [also see @vasishthStatisticalSignificanceFilter2018]. 

Over the past decade, replication studies and power analyses have uncovered insufficient sample sizes in psychology [@brysbaertHowManyParticipants2019; @heymanReliabilityItemlevelSemantic2018; @lynottReplicationExperiencingPhysical2014; @montero-melisSatelliteVsVerbframing2017; @montero-melisNoEvidenceEmbodiment2022; @rodriguez-ferreiroSemanticPrimingSchizotypal2020; @vasishthStatisticalSignificanceFilter2018]. In one of these studies, @heymanReliabilityItemlevelSemantic2018 demonstrated that increasing the sample size resulted in an increase of the reliability of the estimates, which in turn lowered the Type I error rate and the Type II error rate---i.e., false negative and false positive results, respectively. Calls for larger sample sizes have also been voiced in the field of neuroscience. For instance, @marekReproducibleBrainwideAssociation2022 estimated the sample size that would be required to reliably study the mapping between individual differences---such as general cognition---and brain structures. The authors found that the current median of 25 participants in each of these studies contrasted with the thousands of participants---around 10,000---that would be needed for a well-powered study [also see @buttonPowerFailureWhy2013]. 

More topic-specific power analyses are necessary due to several reasons. First, power analyses provide greater certainty on the reasons behind non-replications [e.g., @opensciencecollaborationEstimatingReproducibilityPsychological2015], and behind non-significant results at large. Non-replications are not solely explained by methodological differences across studies, questionable research practices and publication bias [@andersonResponseCommentEstimating2016; @barsalouEstablishingGeneralizableMechanisms2019; @corkerHighQualityDirect2014; @gilbertCommentEstimatingReproducibility2016; @williamsImprovingPsychologicalScience2014; @zwaanReplicationsShouldBe2014; also see @tiokhinCompetitionPriorityHarms2021]. In addition to these factors, a lack of statistical power can cause non-replications and non-significant results [see @lokenMeasurementErrorReplication2017; @vasishthHowEmbraceVariation2021]. 

Regarding non-significant results, it is worthwhile to consider some examples from research on individual differences. In this literature, there is a body of non-significant results, both in behavioural studies [@daidoneVocabularySizeKey2021; @hedge2018a; @murakiSimulatingSemanticsAre2021; @ponari2018a; @rodriguez-ferreiroSemanticPrimingSchizotypal2020; for a Bayes factor analysis, see @rouderPsychometricsIndividualDifferences2019] and in neuroscientific studies [@diazNeuralSensitivityPhonological2021]. A greater availability of power analyses within this topic area and others will at least shed light on the influence of statistical power on the results. Furthermore, power analyses facilitate the identification of sensible sample sizes for future studies. Last, it should be noted that although increasing the statistical power comes at a cost in the short term, power analyses will help maximise the use of research funding in the long term by fostering more replicable research [see @vasishthHowEmbraceVariation2021; remember @opensciencecollaborationEstimatingReproducibilityPsychological2015].


# General methods

The analytical method was broadly similar across the three studies. Below, we present the commonalities in the statistical analysis and in the power analysis. Several R packages from the 'tidyverse' [@R-tidyverse] were used.

## Covariates {#covariates}

Several covariates---or nuisance variables---were included in each study to allow a rigorous analysis of the effects of interest [@sassenhagenCommonMisapplicationStatistical2016]. Unlike the effects of interest, these covariates were not critical to our research question (i.e., the interplay between language-based and vision-based information). They comprised participant-specific variables (e.g., attentional control), lexical variables (e.g., word frequency) and word concreteness. The covariates are distinguished from the effects of interest in the results table(s) in each study. The three kinds of covariates included were as follows.

Participant-specific covariates were measures akin to general cognition, and were included because some studies have found that the effect of vocabulary size was moderated by general cognition variables such as processing speed [@ratcliff2010a; @yap2012a]. Similarly, research has evidenced the role of attentional control [@hutchisonAttentionalControlAsymmetric2014; @yap2017a], and authors have expressed the desirability of including such covariates in models [@james2018a; @pexman2018a]. Therefore, we included in the analyses a individual measure of 'general cognition', where available. These measures were available in the first two studies, and they indexed task performance abilities that were different from vocabulary knowledge. We refer to them by their more specific names in each study.^[The general cognition measures could also be dubbed general or fluid intelligence, but we think that cognition is more appropriate in our present context.] In Study 2.1, the measure used was `attentional control` [@hutchison2013a]. In Study 2.2, it was `information uptake` [@pexman2018a]. In Study 2.3, such a covariate was not used as it was not available in the data set of @balota2007a.

Lexical covariates were selected in every study out of the same five variables, which had been used as covariates in @wingfieldUnderstandingRoleLinguistic2022 [also see @petilli2021a]. They comprised: number of letters (i.e., orthographic length), word frequency, number of syllables [both the latter from @balota2007a], orthographic Levenshtein distance [@yarkoniMovingColtheartNew2008] and phonological Levenshtein distance [@suarezObservingNeighborhoodEffects2011; @yapVisualWordRecognition2009]. The selection among these candidates was performed because some of them were highly intercorrelated---i.e., $r$ > .70 [@dormannCollinearityReviewMethods2013; @harrison2018a]. The correlations and the selection models are available in [\underline{Appendix A}](#appendix-A-lexical-covariates).

Word concreteness was included due to the pervasive effect of this variable across lexical and semantic tasks [@brysbaert2014a; @connellStrengthPerceptualExperience2012; @pexman2018a], and due to the sizable correlations ($r$ > .30) between word concreteness and some other predictors, such as visual strength (see correlation figures in each study). Furthermore, the role of word concreteness has been contested, with some research suggesting that its effect stems from perceptual simulation [@connellStrengthPerceptualExperience2012] versus other research suggesting that the effect is amodal [@bottiniConcretenessAdvantageLexical2021]. In passing, we will bring our results to bear on the role of word concreteness.

## Data preprocessing and statistical analysis

In the three studies, the statistical analysis was designed to investigate the contribution of each effect of interest. The following preprocessing steps were applied. First, incorrect responses were removed. Second, nonword trials were removed (only necessary in Studies 2.1 and 2.3). Third, too fast and too slow responses were removed. For the latter purpose, we applied the same thresholds that had been applied in each of the original studies. That is, in Study 2.1, we removed responses faster than 200 ms or slower than 3,000 ms [@hutchison2013a]. In Study 2.2, we removed responses faster than 250 ms or slower than 3,000 ms [@pexman2018a]. In Study 2.3, we removed responses faster than 200 ms or slower than 4,000 ms [@balota2007a]. Next, the dependent variable---response time (RT)---was $z$-scored around each participant's mean to curb the influence of each participant's baseline speed [@balota2007a; @lim2020a; @kumarDistantConnectivityMultiplestep2020; @pexman2017a; @pexman2018a; @yap2012a; @yap2017a]. This was important because the size of experimental effects is known to increase with longer RTs [@faust1999a]. Next, binary predictors were recoded into continuous variables [@brauer2018a]. Specifically, participants' gender was recoded as follows: Female = 0.5, X = 0, Male = -0.5. The SOAs in Study 2.1 were recoded as follows: 200 ms = -0.5; 1,200 ms = 0.5. Next, the data sets were trimmed by removing rows that lacked values on any variable, and by also removing RTs that were more than 3 standard deviations ($SD$) away from the mean ($M$). The nesting factors applied in the trimming are specified in each study. Finally, all predictors were $z$-scored, resulting in $M \approx$ 0 and $SD \approx$ 1 (values not exact as the variables were not normally distributed). More specifically, between-item predictors---i.e., word-level variables (e.g., language-based information) and task-level variables (e.g., SOA)---were $z$-scored around each participant's own mean [@brauer2018a].

### Random effects

With regard to random effects, participants and stimuli were *crossed* in the three studies. That is, each participant was presented with a subset of the stimuli. Conversely, each word was presented to a subset of participants. Therefore, linear mixed-effects models were implemented. These models included a maximal random-effects structure, with by-participant and by-item random intercepts, and the appropriate random slopes for all effects of interest [@barrRandomEffectsStructure2013; @brauer2018a; @singmann2019a]. Random effects---especially random slopes---constrain the analytical space by *claiming* their share of variance. As a result, that variance cannot be taken by the fixed effects. In the semantic priming study, the items were prime--target pairs, whereas in the semantic decision and lexical decision studies, the items were individual words. In the case of interactions, random slopes were included only when the interacting variables varied within the same unit [@brauer2018a]---e.g., an interaction of two variables varying within participants (only present in Study 2.1). Where required due to convergence warnings, random slopes for covariates were removed, as inspired by Remedy 11 from @brauer2018a. In this regard, whereas @brauer2018a contemplate the removal of random slopes for covariates only when the covariates are not interacting with any effects of interest, we removed random slopes for covariates even if they interacted with effects of interest because these interactions were covariates themselves.

To avoid an inflation of the Type I error rate---i.e., false positives---, the random slopes for the effects of interest (as indicated in each study) were never removed [see Table 17 in @brauer2018a; for an example of this approach, see @diazNeuralSensitivityPhonological2021]. This approach arguably provides a better protection against false positives [@barrRandomEffectsStructure2013; @brauer2018a; @singmann2019a] than the practice of removing random slopes when they do not significantly improve the fit [e.g., @bernabeu2017a; @pexman2018a; @batesFittingLinearMixedeffects2015; @baayenMixedeffectsModelingCrossed2008; but also see @matuschekBalancingTypeError2017].

### Frequentist analysis

$P$ values were calculated using the Kenward-Roger approximation for degrees of freedom [@luke2017a] in the R package 'lmerTest', Version 3.1-3 [@R-lmerTest]. The latter package in turn used 'lme4', Version 1.1-26 [@batesPackageLme42021; @batesFittingLinearMixedeffects2015]. To facilitate the convergence of the models, the maximum number of iterations was set to 1 million. Diagnostics regarding convergence and normality are provided in [\underline{Appendix B}](#appendix-B-frequentist-analysis-diagnostics). Those effects that are non-significant or very small are best interpreted by considering the confidence intervals and the credible intervals [@cummingNewStatisticsWhy2014].

The R package 'GGally' [@R-GGally] was used to create correlation plots, whereas the package 'sjPlot' [@R-sjPlot] was used for interaction plots.

### Bayesian analysis

A Bayesian analysis was performed to complement the estimates that had been obtained in the frequentist analysis. Whereas the goal of the frequentist analysis had been hypothesis testing, for which $p$ values were used, the goal of the Bayesian analysis was parameter estimation. Accordingly, we estimated the posterior distribution of every effect, without calculating Bayes factors [for other examples of the same *estimation approach*, see @milekEavesdroppingHappinessRevisited2018; @preglaVariabilitySentenceComprehension2021; @rodriguez-ferreiroSemanticPrimingSchizotypal2020; for comparisons between estimation and hypothesis testing, see @cummingNewStatisticsWhy2014; @kruschkeBayesianNewStatistics2018; @schmalzWhatBayesFactor2021; @tendeiroReviewIssuesNull2019; @tendeiroOnTheWhite2022; @rouderBayesianInferencePsychology2018; @vanravenzwaaijAdvantagesMasqueradingIssues2021]. In the estimation approach, the estimates are interpreted by considering the position of their credible intervals in relation to the expected effect size. That is, the closer an interval is to an effect size of 0, the smaller the effect of that predictor. For instance, an interval that is symmetrically centred on 0 indicates a very small effect, whereas---in comparison---an interval that does not include 0 at all indicates a far larger effect.

This analysis served two purposes: first, to ascertain the interpretation of the smaller effects---which were identified as unreliable in the power analyses---, and second, to complement the estimates obtained in the frequentist analysis. The latter purpose was pertinent because the frequentist models presented convergence warnings---even though it must be noted that a previous study found that frequentist and Bayesian estimates were similar despite convergence warnings appearing in the frequentist analysis [@rodriguez-ferreiroSemanticPrimingSchizotypal2020]. Furthermore, the complementary analysis was pertinent because the frequentist models presented residual errors that deviated from normality---even though mixed-effects models are fairly robust to such a deviation [@kniefViolatingNormalityAssumption2021; @schielzethRobustnessLinearMixed2020]. Owing to these precedents, we expected to find broadly similar estimates in the frequentist analyses and in the Bayesian ones. Across studies, each frequentist model has a Bayesian counterpart, with the exception of the secondary analysis performed in Study 2.1 (semantic priming) that included `vision-based similarity` as a predictor. The R package 'brms', Version 2.17.0, was used for the Bayesian analysis [@burknerPackageBrms2022; @burknerAdvancedBayesianMultilevel2018].

#### Priors

The priors were established by inspecting the effect sizes obtained in previous studies as well as the effect sizes obtained in our frequentist analyses of the present data (reported in Studies 2.1, 2.2 and 2.3 below). In the first regard, the previous studies that were considered were selected because the experimental paradigms, variables and analytical procedures they had used were similar to those used in our current studies. Specifically, regarding paradigms, we sought studies that implemented: (I) semantic priming with a lexical decision task---as in Study 2.1---, (II) semantic decision---as in Study 2.2---, or (III) lexical decision---as in Study 2.3. Regarding analytical procedures, we sought studies in which both the dependent and the independent variables were $z$-scored. We found two studies that broadly matched these criteria: @lim2020a (see Table 5 therein) and @pexman2018a (see Tables 6 and 7 therein). Out of these studies, @pexman2018a contained the variables that were most similar to ours, which included vocabulary size (labelled 'NAART') and word frequency.

Based on both these studies and on the frequentist analyses reported below, a range of effect sizes was identified that spanned between $\upbeta$ = -0.30 and $\upbeta$ = 0.30. This range was centred around 0 as the variables were $z$-scored. The bounds of this range were determined by the largest effects, which appeared in @pexman2018a. Pexman et al. conducted a semantic decision study, and split the data set into abstract and concrete words. The two largest effects they found were---first---a word concreteness effect in the concrete-words analysis of $\upbeta$ = -0.41, and---second---a word concreteness effect in the abstract-words analysis of $\upbeta$ = 0.20. Unlike Pexman et al., we did not split the data set into abstract and concrete words, but analysed these sets together. Therefore, we averaged between the aforementioned values, obtaining a range between $\upbeta$ = -0.30 and $\upbeta$ = 0.30. 

In the results of @lim2020a and @pexman2018a, and in our frequentist results, some effects consistently presented a negative polarity (i.e., leading to shorter response times), whereas some other effects were consistently positive. We incorporated the direction of effects into the priors only in cases of large effects that had presented a consistent direction (either positive or negative) in previous studies and in our frequentist analyses in the present studies. These criteria were matched by the following variables: word frequency---with a negative direction, as higher word frequency leads to shorter RTs [@brysbaertImpactWordPrevalence2016; @brysbaertWordFrequencyEffect2018a; @lim2020a; @mendesPervasiveEffectWord2021; @pexman2018a]---, number of letters and number of syllables---both with positive directions [@bartonWordlengthEffectReading2014; @beyersmannEvidenceEmbeddedWord2020; @pexman2018a]---, and orthographic Levenshtein distance---with a positive direction [@cerniMotorExpertiseTyping2016; @dijkstraMultilinkComputationalModel2019; @kimEffectsLexicalFeatures2018; @yarkoniMovingColtheartNew2008]. We did not incorporate information about the direction of the word concreteness effect, as this effect can follow different directions in abstract and concrete words [@brysbaert2014a; @pexman2018a], and we analysed both sets of words together. In conclusion, the four predictors that had directional priors were covariates. All the other predictors had priors centred on 0. Last, as a methodological matter, it is noteworthy that most of the psycholinguistic studies applying Bayesian analysis have not incorporated any directional information in priors [e.g., @preglaVariabilitySentenceComprehension2021; @rodriguez-ferreiroSemanticPrimingSchizotypal2020; @stoneEffectDecayLexical2020; cf. @stoneInteractionGrammaticallyDistinct2021].

##### Prior distributions and prior predictive checks

The choice of priors can influence the results in consequential ways. To assess the extent of this influence, *prior sensitivity analyses* have been recommended. These analyses are performed by comparing the effect of more and less strict priors---or, in other words, priors varying in their degree of informativeness. The degree of variation is adjusted through the standard deviation, and the means are not varied [@leeBayesianCognitiveModeling2014; @schootBayesianStatisticsModelling2021; @stoneEffectDecayLexical2020]. 

In this way, we compared the results obtained using 'informative' priors ($SD$ = 0.1), 'weakly-informative' priors ($SD$ = 0.2) and 'diffuse' priors ($SD$ = 0.3). These standard deviations were chosen so that around 95% of values in the informative priors would fall within our initial range of effect sizes that spanned from -0.30 to 0.30. All priors are illustrated in Figure \@ref(fig:bayesian-priors). These priors resembled others from previous psycholinguistic studies [@preglaVariabilitySentenceComprehension2021; @stoneEffectDecayLexical2020; @stoneInteractionGrammaticallyDistinct2021]. For instance, @stoneEffectDecayLexical2020 used the following priors: $Normal$(0, 0.1), $Normal$(0, 0.3) and $Normal$(0, 1). The range of standard deviations we used---i.e., 0.1, 0.2 and 0.3---was narrower than those of previous studies because our dependent variable and our predictors were $z$-scored, resulting in small estimates and small $SD$s [see @lim2020a; @pexman2018a]. These priors were used on the fixed effects and on the standard deviation parameters of the fixed effects. For the correlations among the random effects, an $LKJ$(2) prior was used [@lewandowskiGeneratingRandomCorrelation2009]. This is a 'regularising' prior, as it assumes that high correlations among random effects are rare [also used in @rodriguez-ferreiroSemanticPrimingSchizotypal2020; @stoneEffectDecayLexical2020; @stoneInteractionGrammaticallyDistinct2021; @vasishthBayesianDataAnalysis2018].

```{r bayesian-priors, fig.cap = 'Priors used in the three studies. The green vertical rectangle shows the range of plausible effect sizes based on previous studies and on our frequentist analyses. In the informative priors, around 95\\% of the values fall within the range.'}

source('bayesian_priors/bayesian_priors.R', local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/bayesian_priors/plots/bayesian_priors.pdf'
  ))

```


The adequacy of each of these priors was assessed by performing prior predictive checks, in which we compared the observed data to the predictions of the model [@schootBayesianStatisticsModelling2021]. Furthermore, in these checks we also tested the adequacy of two model-wide distributions: the traditional Gaussian distribution (default in most analyses) and an exponentially modified Gaussian---dubbed 'ex-Gaussian'---distribution [@matzkePsychologicalInterpretationExGaussian2009]. The ex-Gaussian distribution was considered because the residual errors of the frequentist models were not normally distributed [@loTransformNotTransform2015], and because this distribution was found to be more appropriate than the Gaussian one in a previous, related study [see supplementary materials of @rodriguez-ferreiroSemanticPrimingSchizotypal2020]. The ex-Gaussian distribution had an identity link function, which preserves the interpretability of the coefficients, as opposed to a transformation applied directly to the dependent variable [@loTransformNotTransform2015]. The results of these prior predictive checks revealed that the priors were adequate, and that the ex-Gaussian distribution was more appropriate than the Gaussian one (see [\underline{Appendix C}](#appendix-C-Bayesian-analysis-diagnostics)), converging with @rodriguez-ferreiroSemanticPrimingSchizotypal2020. Therefore, the ex-Gaussian distribution was used in the final models. 

##### Prior sensitivity analysis

In the main analysis, the informative, weakly-informative and diffuse priors were used in separate models. In other words, in each model, all priors had the same degree of informativeness [as done in @preglaVariabilitySentenceComprehension2021; @rodriguez-ferreiroSemanticPrimingSchizotypal2020; @stoneEffectDecayLexical2020; @stoneInteractionGrammaticallyDistinct2021]. In this way, a prior sensitivity analysis was performed to acknowledge the likely influence of the priors on the posterior distributions---that is, on the results [@leeBayesianCognitiveModeling2014; @schootBayesianStatisticsModelling2021; @stoneEffectDecayLexical2020].

#### Posterior distributions

Posterior predictive checks were performed to assess the consistency between the observed data and new data predicted by the posterior distributions [@schootBayesianStatisticsModelling2021]. These checks are available in [\underline{Appendix C}](#appendix-C-Bayesian-analysis-diagnostics). 

#### Convergence

When convergence was not reached in a model, as indicated by $\widehat R$ > 1.01 [@schootBayesianStatisticsModelling2021; @vehtariRanknormalizationFoldingLocalization2021], the number of iterations was increased and the random slopes for covariates were removed [@brauer2018a]. The resulting random effects in these models were largely the same as those present in the frequentist models. The only exception concerned the models of the lexical decision study. In the frequentist model for the latter study, the random slopes for covariates were removed due to convergence warnings, whereas in the Bayesian analysis, these random slopes did not have to be removed as the models converged, thanks to the large number of iterations that were run. In the lexical decision study, it was possible to run a larger number of iterations than in the two other studies, as the lexical decision data set had fewer observations, resulting in faster running.

The Bayesian models in the semantic decision study could not be made to converge, and the final results of these models were not valid. Therefore, those estimates are not shown in the main text, but are available in [\underline{Appendix E}](#appendix-E-Bayesian-analysis-results).


## Statistical power analysis

Power curves based on Monte Carlo simulations were performed for most of the effects of interest using the R package 'simr', Version 1.0.5 [@greenSIMRPackagePower2016]. Obtaining power curves for a range of effects in each study allows for a comprehensive assessment of the plausibility of the power estimated for each effect. 

In each study, the item-level sample size---i.e., the number of words---was not modified. Therefore, to plan the sample size for future studies, these results must be considered under the assumptions that the future study would apply a statistical method similar to ours---namely, a mixed-effects model with random intercepts and slopes---, and that the analysis would encompass at least as many stimuli as the corresponding study (numbers detailed in each study below). $P$ values were calculated using the Satterthwaite approximation for degrees of freedom [@luke2017a].

Monte Carlo simulations consist of running the statistical model a large number of times, under slight, random variations of the dependent variable [@greenSIMRPackagePower2016; for a comparable approach, see @lokenMeasurementErrorReplication2017]. The power to detect each effect of interest is calculated by dividing the number of times that the effect is significant by the total number of simulations run. For instance, if an effect is significant on 85 simulations out of 100, the power for that effect is 85% [@kumleEstimatingPowerGeneralized2021]. The sample sizes tested in the semantic priming study ranged from 50 to 800 participants, whereas those tested in the semantic decision and lexical decision studies ranged from 50 to 2,000 participants. These sample sizes were unequally spaced to limit the computational requirements. They comprised the following: 50, 100, 200, 300, 400, 500, 600, 700, 800, 1,200, 1,600 and 2,000 participants.^[For the semantic priming study, the remaining sample sizes up to 2,000 participants have not finished running yet. Upon finishing, they will be reported in this manuscript.] The variance of the results decreases as more simulations are run. In each of our three studies, 200 simulations [as in @brysbaertPowerAnalysisEffect2018] were run for each effect of interest and for each sample size under consideration. Thus, for a power curve examining the power for an effect across 12 sample sizes, 2,400 simulations were run.

Power analyses require setting an effect size for each effect. Often, it is difficult to determine the effect size, as the amount and the scope of relevant research are usually finite and biased [@albersWhenPowerAnalyses2018; @gelmanPowerCalculationsAssessing2014; @kumleEstimatingPowerGeneralized2021]. In some power analyses, the original effect sizes from previous studies have been adopted without any modification [e.g., @paciniExocentricCodingMapping2021; @villalongaPerceptualTimingPrecision2021]. In contrast, some authors have opted to reduce the previous effect sizes to account for two intervening factors. First, publication bias and insufficient statistical power cause published effect sizes to be inflated [@brysbaertHowManyParticipants2019; @lokenMeasurementErrorReplication2017; @vasishthStatisticalSignificanceFilter2018; @vasishthHowEmbraceVariation2021; @opensciencecollaborationEstimatingReproducibilityPsychological2015]. Second, over the course of the research, a variety of circumstances could create differences between the planned study and the studies that were used in the power analysis. Some of these differences could be foreseeable---for instance, if they are due to a limitation in the literature available for the power analysis---, whereas other differences might be unforeseeable and could go unnoticed  [@barsalouEstablishingGeneralizableMechanisms2019; @noahWhenBothOriginal2018]. Reducing the effect size in the power analysis leads to an increase of the sample size of the planned study [@brysbaertPowerAnalysisEffect2018; @greenSIMRPackagePower2016; @hoenigAbusePower2001]. The reduced effect size---sometimes dubbed the smallest effect size of interest---is often set with a degree of arbitrariness. In previous studies, @fleurDefinitelySawIt2020 applied a reduction of 1/8 (i.e., 12.5%), whereas @kumleEstimatingPowerGeneralized2021 applied a 15% reduction. In the present study, a reduction of 20% was applied to every effect in the power analysis. By comparison with the power analyses reviewed in this paragraph, the present reduction will lead to a more conservative estimate of required sample sizes. However, after considering the precedents of small samples and publication bias reviewed above, a 20% reduction is arguably a reasonable safeguard. Indeed, a posteriori, the results of our power analyses suggested that the 20% reduction had not been excessive, as some of the effects examined were detectable with small sample sizes.

Both the primary analysis and the power analysis were performed in R [@R-base]. Version 4.0.2 was used for the frequentist analysis, Version 4.1.0 was used for the Bayesian analysis, and Version 4.1.2 was used for fast operations such as data preprocessing and plotting. Given the complexity of these analyses, all the statistical and the power analyses were run on the High-End Computing facility at Lancaster University.^[Information about this facility is available at [https://answers.lancaster.ac.uk/display/ISS/High+End+Computing+%28HEC%29+help](https://answers.lancaster.ac.uk/display/ISS/High+End+Computing+%28HEC%29+help). Even though analysis jobs were run in parallel, some of the statistical analyses took four months to complete (specifically, one month for the final model to run, which was delayed due to three reasons: limited availability of machines, occasional cancellations of jobs to allow maintenance work on the machines, and lack of convergence of the models). Furthermore, the power analysis for the semantic priming study took six months (specifically, two months of running, with delays due to the limited availability of machines and occasional cancellations of jobs).]


# Study 2.1: Semantic priming

The core data set in this study was that of the Semantic Priming Project [@hutchison2013a; also see @yap2017a]. The study of @hutchison2013a comprised two tasks: lexical decision and naming. We limited our analysis to the lexical decision task because it was more relevant to a subsequent study that we were planning. In the lexical decision task, participants judged whether strings of letters constituted real words (e.g., *building*) or nonwords (e.g. *gop*). Importantly, in each trial, the target word that participants assessed was preceded by a prime word. Participants were only required to provide a response regarding the *target* word. The characteristic feature of the semantic priming paradigm is the analysis of responses to the targets as a function of the semantic relationship between the primes and the targets [@hoedemakerItTakesTime2014; @brunelliereCooccurrenceFrequencyEvaluated2017; @dewitMaskedSemanticPriming2015].

In some studies, the association between prime and target words has been investigated in terms of related versus unrelated pairs [@lam2015a; @pecherDoesPizzaPrime1998; @trumppMaskedPrimingConceptual2013] and---in other studies---in terms of first- and second-order relationships [@hutchison2013a]. In contrast to these categorical associations, a third set of studies have measured the association between the prime and the target words using continuous estimates of text-based similarity [@guntherLatentSemanticAnalysis2016; @guenther2016a; @hutchisonPredictingSemanticPriming2008; @jones2006a; @lund1995a; @lund1996a; @mandera2017a; @mcdonald2002a; @pad2007a; @petilli2021a; @wingfieldUnderstandingRoleLinguistic2022]. In one of these studies, @mandera2017a found that computational measures of similarity outperformed human-based associations at explaining language-based priming.


## Language, vision and SOA

Priming associations beyond the linguistic realm have also been investigated, with early studies observing perceptual priming effects [@floresdarcaisSemanticActivationRecognition1985; @schreuderEffectsPerceptualConceptual1984]. Yet, those early findings were soon reframed by @pecherDoesPizzaPrime1998, who conducted a follow-up with an improved design, and observed vision-based priming only when the task was preceded by another task that required attention to visual features of concepts [also see @yeeColorlessGreenIdeas2012; @ostarekTaskdependentCausalRole2017]. Furthermore, two studies have failed to observe vision-based priming [@hutchisonSemanticPrimingDue2003; @ostarekTaskdependentCausalRole2017]. 

Nonetheless, a considerable number of studies have observed perceptual priming, even in the absence of a pretask. A set of these studies used the Conceptual Modality Switch paradigm, in which the primes and the targets are presented in separate, consecutive trials---e.g., *Loud Welcome* → *Fine Selection* [@pecherVerifyingDifferentmodalityProperties2003; @lynottModalityExclusivityNorms2009; @louwerseTasteWordsLinguistic2011; @collinsModalitySwitchingProperty2011; @hald2011a; @hald2013a; @trumppMaskedPrimingConceptual2013; @bernabeu2017a]. The other set of studies implemented the more classic priming manipulation, whereby a prime word is briefly presented before the target word in each trial---e.g., *Welcome* → *Selection*. This design is more relevant to our present study, as it was used in the study we are revisiting [@hutchison2013a]. Below, we review studies that have used the *prime* → *target* design.

@lam2015a conducted a semantic priming experiment containing a lexical decision task, in which participants were instructed to assess whether the prime word and the target word in each trial were both real words. The semantic priming manipulation consisted of the following types of associations between the prime and the target words: (1) semantic association (e.g., bolt → screwdriver), (2) action association (e.g., housekey → screwdriver), (3) visual association (e.g., soldering iron → screwdriver), and (4) no association (e.g., charger → screwdriver). In addition, the following SOAs were compared: 500, 650, 800 and 1,400 ms. First, Lam et al. observed priming effects of the semantic type with all SOAs. Second, the authors observed action-based priming with the SOAs of 500, 650 and 1,400 ms. Last, they observed vision-based priming only with the SOA of 1,400 ms. Overall, semantic---i.e., language-based---priming was more prevalent than visual and action priming. The greater role of language-based information converges with other semantic priming studies [@bottiniNatureSemanticPriming2016; @lam2015a; @pecherDoesPizzaPrime1998; @petilli2021a], as well as with studies that used other paradigms [@banksLinguisticDistributionalKnowledge2021; @kiela2014a; @louwerse2015a]. 

Similarly, the results of @lam2015a regarding the time course of language-based and vision-based priming were consistent with a wealth of literature observing that the influence of perceptual systems, such as vision, peaks later than the influence of the language system [@barsalouLanguageSimulationConceptual2008; @louwerseTasteWordsLinguistic2011; @santosPropertyGenerationReflects2011]. For instance, studies using electroencephalography have observed perceptual priming effects within 300 ms from the word onset. Thereafter, the perceptual priming effect increased [@bernabeu2017a; @amselEmpiricallyGroundingGrounded2014], or it stabilised [@kieferDifferentialTemporospatialPattern2022], or fluctuated [@amselTrackingRealtimeNeural2011]. Overall, these patterns reveal a gradual accumulation of information throughout word processing [also see @haukOnlyTimeWill2016], which is consistent with the integration of contextual information [see @haldEEGThetaGamma2006].

In a more recent study, @petilli2021a revisited the data of @hutchison2013a using new variables that indexed language-based and vision-based associations between the prime and the target words. These variables had two important characteristics: (1) they were continuous rather than categorical [see @cohenCostDichotomization1983; @guntherLatentSemanticAnalysis2016; @mandera2017a], and (2) they were not dependent on human ratings [cf. @hutchisonPredictingSemanticPriming2008; @hutchison2013a; @lam2015a; @pecherDoesPizzaPrime1998]. By this means, Petilli et al. avoided the circularity problem (rarely addressed in studies) that arises (or may arise) when human-based ratings are used to explain human behaviour. 

@petilli2021a operationalised word co-occurrence using text-based similarity [@mandera2017a]. Next, to operationalise vision-based similarity, the authors obtained images from ImageNet corresponding to each word (a minimum of 100 images per word), and trained vector representations on those images using neural networks [for related work, see @roadsLearningUnsupervisedAlignment2020]. The resulting computational measure of vision-based similarity was then validated against human-based ratings [@pecherDoesPizzaPrime1998], with a satisfactory result. In a concrete demonstration, Petilli et al. show how vision-based similarity correctly concluded that drills were more visually similar to pistols than to screwdrivers, showing that the measure was not misled by functional similarity. In conclusion, using `language-based similarity` and `vision-based similarity`, Petilli et al. investigated language-based and vision-based priming in two tasks---lexical decision and naming---and with both a short and a long SOA. 

In lexical decision, the largest effect observed by @petilli2021a was that of language-based priming with the short SOA (200 ms). The second largest effect was that of language-based priming with the long SOA (1,200 ms). Next, the weakest, significant effect was that of vision-based priming with the short SOA. Last, there was no effect of vision-based priming with the long SOA. Petilli et al. explained the absence of vision-based priming with the long SOA by contending that visual activation had likely decayed before participants processed the target words [also see @yeeFunctionFollowsForm2011], owing to the limited semantic processing required for lexical decision [also see @balotaDepthAutomaticSpreading1986; @beckerLongtermSemanticPriming1997; @dewitMaskedSemanticPriming2015; @joordensLongShortSemantic1997; @ostarekTaskdependentCausalRole2017]. Therefore, the authors suggested that perceptual simulation does *not* peak before language-based processing in lexical decision, contrasting with the results of @lam2015a and with the results found in other tasks [@louwerseTasteWordsLinguistic2011; @santosPropertyGenerationReflects2011; @simmonsFMRIEvidenceWord2008; also see @barsalouLanguageSimulationConceptual2008].

In the naming task, the largest effect observed by @petilli2021a was that of language-based priming with the long SOA. The second largest effect was that of language-based priming with the short SOA. Last, there was no effect of vision-based priming with either SOA. This finding contrasts with @connellSeeHearWhat2014, who found facilitatory effects of visual strength in both lexical decision and naming. Petilli et al. explained the lack of vision-based priming in the naming task by alluding to the lower semantic depth of this task---compared to lexical decision---, and the mixture of visual and auditory processing in this task [also see @connellSeeHearWhat2014].

In conclusion, there is mixed evidence regarding the time course of language-based and vision-based information in conceptual processing, and particularly in semantic priming. First, regarding language, previous research predicts that language-based priming will have a larger effect with the short SOA than with the long one [@lam2015a; @petilli2021a]. Second, regarding vision, three hypotheses are available: (a) more vision-based priming with the long SOA [@louwerseTasteWordsLinguistic2011; @santosPropertyGenerationReflects2011; @simmonsFMRIEvidenceWord2008; also see @barsalouLanguageSimulationConceptual2008], (b) vision-based priming only with the short SOA [@petilli2021a], and (c) no vision-based priming [@hutchisonSemanticPrimingDue2003; @ostarekTaskdependentCausalRole2017; @pecherDoesPizzaPrime1998; @yeeColorlessGreenIdeas2012].


## Language, vision and vocabulary size

Next, we turn to considering the role of participants' vocabulary size with respect to language-based and vision-based information (this recaps the general [\underline{Hypotheses}](#hypotheses) section). First, three hypotheses exist the interaction with language. On the one hand, some research predicts a larger effect of language-based priming in higher-vocabulary participants [@yap2017a; also see @connell2019a; @landauerIntroductionLatentSemantic1998; @louwerse2015a; @paivioMentalRepresentationsDual1990; @pylyshynWhatMindEye1973]. On the other hand, other research has found the opposite pattern [@yapIndividualDifferencesJoint2009; also see @yap2012a]. Also relevant to these mixed findings is the notion that vocabulary knowledge is associated with increased attention to task-relevant variables [@pexman2018a]. We hypothesised that language-based information---represented by `language-based similarity` in this study---was indeed important for present task, given its importance across the board [@banksLinguisticDistributionalKnowledge2021; @kiela2014a; @louwerse2015a; @lam2015a; @pecherDoesPizzaPrime1998; @petilli2021a]. Accordingly, the relevance hypothesis predicted that higher-vocabulary participants would present a larger priming effect. 

To our knowledge, no previous studies have investigated the interaction between vision-based information and participants' vocabulary size. We entertained two hypotheses: (a) that lower-vocabulary participants would be more sensitive to visual strength than higher-vocabulary participants, thereby compensating for the disadvantage on the language side, and (b) that this interaction effect would be absent.


## The present study

In the present study, we expanded on @petilli2021a by examining the role of participants' vocabulary size. In other regards, we used the same primary data set [@hutchison2013a], and a language-based similarity measure that was very similar to that used by Petilli et al. [also created by @mandera2017a]. In contrast, our vision-based predictors differed. Whereas Petilli et al. used a human-independent measure trained on images (see description above), we calculated the difference in visual strength [@lynott2020a] between the prime and the target word in each trial.^[These measures are compared at [\underline{the end of the Results section}](#results-human-based-and-computational-measures-of-visual-information).]


## Methods

### Data set {#semanticpriming-dataset}

```{r}

# Calculate some of the sample sizes to be reported in the paragraph below

# Number of prime-target pairs per participant.
# Save mean as integer and SD rounded while keeping trailing zeros
semanticpriming_mean_primetarget_pairs_per_participant = 
  semanticpriming %>% group_by(Participant) %>% 
  summarise(length(unique(primeword_targetword))) %>% 
  select(2) %>% unlist %>% mean %>% round(0)

semanticpriming_SD_primetarget_pairs_per_participant = 
  semanticpriming %>% group_by(Participant) %>% 
  summarise(length(unique(primeword_targetword))) %>% 
  select(2) %>% unlist %>% sd %>% sprintf('%.2f', .)

# Number of participants per prime-target pair.
# Save mean as integer and SD rounded while keeping trailing zeros
semanticpriming_mean_participants_per_primetarget_pair = 
  semanticpriming %>% group_by(primeword_targetword) %>% 
  summarise(length(unique(Participant))) %>% 
  select(2) %>% unlist %>% mean %>% round(0)

semanticpriming_SD_participants_per_primetarget_pair = 
  semanticpriming %>% group_by(primeword_targetword) %>% 
  summarise(length(unique(Participant))) %>% 
  select(2) %>% unlist %>% sd %>% sprintf('%.2f', .)

```


The data set was trimmed by removing rows that lacked values on any variable, and by also removing RTs that were more than 3 standard deviations away from the mean. The standard deviation trimming was performed within participants, within sessions and within SOA conditions, as done in the Semantic Priming Project [@hutchison2013a]. The resulting data set contained `r length(unique(semanticpriming$Participant))` participants, `r length(unique(semanticpriming$primeword_targetword)) %>% formattable::comma(digits = 0)` prime--target pairs and `r length(semanticpriming$z_target.RT) %>% formattable::comma(digits = 0)` RTs. On average, there were `r semanticpriming_mean_primetarget_pairs_per_participant` prime--target pairs per participant ($SD$ = `r semanticpriming_SD_primetarget_pairs_per_participant`), and conversely, `r semanticpriming_mean_participants_per_primetarget_pair` participants per prime--target pair ($SD$ = `r semanticpriming_SD_participants_per_primetarget_pair`).

### Variables

While the variables are outlined in the [\underline{general introduction}](#present-studies), a few further details are provided below regarding some of them.

- `Vocabulary size`. The test used by @hutchison2013a comprised a synonym test, an antonym test, and an analogy test, all three extracted from the Woodcock–Johnson III diagnostic reading battery [@woodcockWoodcockJohnsonIII2001]. We operationalised the vocabulary measure as the mean score across the three tasks per participant.

- `Language-based similarity`. This measure was calculated using a semantic space from @mandera2017a, which the authors found to be the second-best predictor ($R$^2^ = .465) of the semantic priming effect in the lexical decision task of @hutchison2013a (we could not use the best semantic space, $R$^2^ = .471, owing to computational limitations). The second-best semantic space [see first row in Table 5 in @mandera2017a] was based on lemmas from a subtitle corpus, and was processed using a Continuous Bag Of Words model. It had 300 dimensions and a window size of six words. The R package 'LSAfun' [@R-LSAfun] was used to import this variable.^[Despite the name of the package, the measure we used was not based on Latent Semantic Analysis.]

- `Stimulus onset asynchrony (SOA)`. Following @brauer2018a, the categories of this factor were recoded as follows: 200 ms = -0.5, 1,200 ms = 0.5.

A few details regarding the covariates follow.

- `Attentional control` [@hutchison2013a] was included as a measure akin to general cognition, and specifically as a covariate of vocabulary size [@ratcliff2010a]. The role of attentional control in semantic priming was evidenced by @yap2017a. Attentional control comprised three attention-demanding tasks, namely, operation span, Stroop and antisaccade [@hutchison2013a].

- Lexical covariates (see [\underline{Appendix A}](#appendix-A-lexical-covariates)): `word frequency` and `orthographic Levenshtein distance` [@balota2007a].

- `Word concreteness` [@brysbaert2014a], used as a covariate of visual strength.

Figure \@ref(fig:semanticpriming-correlations) shows the correlations among the predictors and the dependent variable.

```{r semanticpriming-correlations, fig.cap = 'Zero-order correlations in the semantic priming study.', fig.width = 7.5, fig.height = 4.5, out.width = '65%'}

# Using the following variables...
semanticpriming[, c('z_target.RT', 'z_vocabulary_size', 
                    'z_attentional_control',  'z_cosine_similarity', 
                    'z_visual_rating_diff', 'z_word_concreteness_diff', 
                    'z_target_word_frequency', 
                    'z_target_number_syllables')] %>%
  
  # renamed for the sake of clarity
  rename('RT' = z_target.RT, 
         'Vocabulary size' = z_vocabulary_size,
         'Attentional control' = z_attentional_control,
         'Language-based similarity' = z_cosine_similarity,
         'Visual-strength difference' = z_visual_rating_diff,
         'Word-concreteness difference' = z_word_concreteness_diff,
         'Word frequency' = z_target_word_frequency,
         'Number of syllables' = z_target_number_syllables) %>%
  
  # make correlation matrix (custom function from the 'R_functions' folder)
  correlation_matrix() + 
  theme(plot.margin = unit(c(0, 0, 0.1, -3.1), 'in'))

```


### Diagnostics for the frequentist analysis

The model presented convergence warnings. To avoid removing important random slopes, which could increase the Type I error rate---i.e., false positives [@brauer2018a; @singmann2019a], we examined the model after refitting it using seven optimisation algorithms through the 'allFit' function of the R package 'lme4' [@batesPackageLme42021]. The results showed that all optimisers produced virtually identical means for all effects, suggesting that the convergence warnings were not consequential (Bates et al., 2021; see [\underline{Appendix B}](#appendix-B-frequentist-analysis-diagnostics)).

```{r}

# Calculate VIF for every predictor and return only the maximum VIF rounded up
maxVIF_semanticpriming = car::vif(semanticpriming_lmerTest) %>% max %>% ceiling

```

The residual errors were not normally distributed, and attempts to mitigate this deviation proved unsuccessful (see [\underline{Appendix B}](#appendix-B-frequentist-analysis-diagnostics)). However, this is not likely to have posed a major problem, as mixed-effects models are fairly robust to deviations from normality [@kniefViolatingNormalityAssumption2021; @schielzethRobustnessLinearMixed2020]. Last, the model did not present multicollinearity problems, with all variance inflation factors (VIF) below `r maxVIF_semanticpriming` [see @dormannCollinearityReviewMethods2013; @harrison2018a].

### Diagnostics for the Bayesian analysis

```{r}

# Calculate number of post-warmup draws (as in 'brms' version 2.17.0).
# Informative prior model used but numbers are identical in the three models.
semanticpriming_post_warmup_draws = 
  (semanticpriming_summary_informativepriors_exgaussian$iter -
     semanticpriming_summary_informativepriors_exgaussian$warmup) *
  semanticpriming_summary_informativepriors_exgaussian$chains

# As a convergence diagnostic, find maximum R-hat value for the 
# fixed effects across the three models.
semanticpriming_fixedeffects_max_Rhat = 
  max(semanticpriming_summary_informativepriors_exgaussian$fixed$Rhat,
      semanticpriming_summary_weaklyinformativepriors_exgaussian$fixed$Rhat,
      semanticpriming_summary_diffusepriors_exgaussian$fixed$Rhat) %>% 
  # Round
  sprintf('%.2f', .)

# Next, find find maximum R-hat value for the random effects across the three models
semanticpriming_randomeffects_max_Rhat = 
  max(semanticpriming_summary_informativepriors_exgaussian$random[['Participant']]$Rhat,
      semanticpriming_summary_weaklyinformativepriors_exgaussian$random[['Participant']]$Rhat,
      semanticpriming_summary_diffusepriors_exgaussian$random[['Participant']]$Rhat,
      semanticpriming_summary_informativepriors_exgaussian$random[['primeword_targetword']]$Rhat,
      semanticpriming_summary_weaklyinformativepriors_exgaussian$random[['primeword_targetword']]$Rhat,
      semanticpriming_summary_diffusepriors_exgaussian$random[['primeword_targetword']]$Rhat) %>% 
  # Round
  sprintf('%.2f', .)

```


Three Bayesian models were run that were respectively characterised by informative, weakly-informative and diffuse priors. In each model, `r semanticpriming_summary_informativepriors_exgaussian$chains` chains were used. In each chain, `r semanticpriming_summary_informativepriors_exgaussian$warmup %>% formattable::comma(digits = 0)` warmup iterations were run, followed by `r (semanticpriming_summary_informativepriors_exgaussian$iter - semanticpriming_summary_informativepriors_exgaussian$warmup) %>% formattable::comma(digits = 0)` post-warmup iterations. Thus, a total of `r semanticpriming_post_warmup_draws %>% formattable::comma(digits = 0)` post-warmup draws were produced over all the chains. 

The maximum $\widehat R$ value for the fixed effects across the three models was `r semanticpriming_fixedeffects_max_Rhat`, suggesting that these parameters had converged [@schootBayesianStatisticsModelling2021; @vehtariRanknormalizationFoldingLocalization2021]. In contrast, the maximum $\widehat R$ value for the random effects was `r semanticpriming_randomeffects_max_Rhat`, slightly exceeding the 1.01 threshold [@vehtariRanknormalizationFoldingLocalization2021]. Since the interest of the present research is on the fixed effects, and the random effects were very close to convergence, the present model is valid.

The results of the posterior predictive checks were sound (see [\underline{Appendix C}](#appendix-C-Bayesian-analysis-diagnostics)), indicating that the posterior distributions were sufficiently consistent with the observed data. Furthermore, in the prior sensitivity analysis, the results were virtually identical with the three priors that were considered (refer to the priors in Figure \@ref(fig:bayesian-priors) above; to view the results in detail, see [\underline{Appendix E}](#appendix-E-Bayesian-analysis-results)).


## Results of Study 2.1 {#semanticpriming-results}

```{r}

# Calculate R^2. This coefficient must be interpreted with caution 
# (Nakagawa et al., 2017; https://doi.org/10.1098/rsif.2017.0213). 
# Also, transform coefficient to rounded percentage.

Nakagawa2017_fixedeffects_R2_semanticpriming_lmerTest = 
  paste0(
    (MuMIn::r.squaredGLMM(semanticpriming_lmerTest)[1, 'R2m'][[1]] * 100) %>% 
      sprintf('%.2f', .), '%'
  )

Nakagawa2017_randomeffects_R2_semanticpriming_lmerTest = 
  paste0(
    (MuMIn::r.squaredGLMM(semanticpriming_lmerTest)[1, 'R2c'][[1]] * 100) %>% 
      sprintf('%.2f', .), '%'
  )

```


Table \@ref(tab:semanticpriming-frequentist-model) presents the results. The fixed effects explained `r Nakagawa2017_fixedeffects_R2_semanticpriming_lmerTest` of the variance, and the random effects explained `r Nakagawa2017_randomeffects_R2_semanticpriming_lmerTest` [@nakagawaCoefficientDeterminationR22017]. It is reasonable that random effects explain more variance, as they involve a far larger number of estimates for each effect. That is, whereas each fixed effect is formed of one estimate, the by-item random slopes for an individual difference variable---such as vocabulary size---comprise as many estimates as the number of stimulus items (in this study, the stimuli refer to the prime--target pairs).^[For future reference, it should be noted that, in Studies 2.2 and 2.3, the *stimuli* are the stimulus words, as there are no prime words in those studies.] Conversely, the by-participant random slopes for an item-level variable---such as language-based similarity---comprise as many estimates as the number of participants.

```{r semanticpriming-frequentist-model, results = 'asis'}

# Rename effects in plain language and specify the random slopes
# (if any) for each effect, in the footnote. For this purpose, 
# superscripts are added to the names of the appropriate effects.
# 
# In the interactions below, word-level variables are presented 
# first for the sake of consistency (the order does not affect 
# the results in any way). Also in the interactions, double 
# colons are used to inform the 'frequentist_model_table' 
# function that the two terms in the interaction must be split 
# into two lines.

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_attentional_control'] = 'Attentional control'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_vocabulary_size'] = 'Vocabulary size $^{\\text{a}}$'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_recoded_participant_gender'] = 'Gender $^{\\text{a}}$'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_target_word_frequency'] = 'Word frequency'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_target_number_syllables'] = 'Number of syllables'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_word_concreteness_diff'] = 'Word-concreteness difference'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_cosine_similarity'] = 'Language-based similarity $^{\\text{b}}$'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_visual_rating_diff'] = 'Visual-strength difference $^{\\text{b}}$'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_recoded_interstimulus_interval'] = 'Stimulus onset asynchrony (SOA) $^{\\text{b}}$'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_word_concreteness_diff:z_vocabulary_size'] = 
  'Word-concreteness difference :: Vocabulary size'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_word_concreteness_diff:z_recoded_interstimulus_interval'] = 
  'Word-concreteness difference : SOA'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_word_concreteness_diff:z_recoded_participant_gender'] = 
  'Word-concreteness difference : Gender'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_attentional_control:z_cosine_similarity'] = 
  'Language-based similarity :: Attentional control'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_attentional_control:z_visual_rating_diff'] = 
  'Visual-strength difference :: Attentional control'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_vocabulary_size:z_cosine_similarity'] = 
  'Language-based similarity :: Vocabulary size'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_vocabulary_size:z_visual_rating_diff'] = 
  'Visual-strength difference :: Vocabulary size'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_recoded_participant_gender:z_cosine_similarity'] = 
  'Language-based similarity : Gender'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_recoded_participant_gender:z_visual_rating_diff'] = 
  'Visual-strength difference : Gender'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_recoded_interstimulus_interval:z_cosine_similarity'] = 
  'Language-based similarity : SOA $^{\\text{b}}$'

rownames(KR_summary_semanticpriming_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_lmerTest$coefficients) == 
    'z_recoded_interstimulus_interval:z_visual_rating_diff'] = 
  'Visual-strength difference : SOA $^{\\text{b}}$'


# Next, change the names in the confidence intervals object

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_attentional_control'] = 'Attentional control'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_vocabulary_size'] = 'Vocabulary size $^{\\text{a}}$'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_recoded_participant_gender'] = 'Gender $^{\\text{a}}$'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_target_word_frequency'] = 'Word frequency'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_target_number_syllables'] = 'Number of syllables'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_word_concreteness_diff'] = 'Word-concreteness difference'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_cosine_similarity'] = 'Language-based similarity $^{\\text{b}}$'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_visual_rating_diff'] = 'Visual-strength difference $^{\\text{b}}$'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_recoded_interstimulus_interval'] = 'Stimulus onset asynchrony (SOA) $^{\\text{b}}$'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_word_concreteness_diff:z_vocabulary_size'] = 
  'Word-concreteness difference :: Vocabulary size'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_word_concreteness_diff:z_recoded_interstimulus_interval'] = 
  'Word-concreteness difference : SOA'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_word_concreteness_diff:z_recoded_participant_gender'] = 
  'Word-concreteness difference : Gender'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_attentional_control:z_cosine_similarity'] = 
  'Language-based similarity :: Attentional control'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_attentional_control:z_visual_rating_diff'] = 
  'Visual-strength difference :: Attentional control'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_vocabulary_size:z_cosine_similarity'] = 
  'Language-based similarity :: Vocabulary size'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_vocabulary_size:z_visual_rating_diff'] = 
  'Visual-strength difference :: Vocabulary size'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_recoded_participant_gender:z_cosine_similarity'] = 
  'Language-based similarity : Gender'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_recoded_participant_gender:z_visual_rating_diff'] = 
  'Visual-strength difference : Gender'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_recoded_interstimulus_interval:z_cosine_similarity'] = 
  'Language-based similarity : SOA $^{\\text{b}}$'

rownames(confint_semanticpriming_lmerTest)[
  rownames(confint_semanticpriming_lmerTest) == 
    'z_recoded_interstimulus_interval:z_visual_rating_diff'] = 
  'Visual-strength difference : SOA $^{\\text{b}}$'


# Create table (using custom function from the 'R_functions' folder)
frequentist_model_table(
  KR_summary_semanticpriming_lmerTest, 
  confint_semanticpriming_lmerTest,
  order_effects = c('(Intercept)',
                    'Attentional control',
                    'Vocabulary size $^{\\text{a}}$',
                    'Gender $^{\\text{a}}$',
                    'Word frequency',
                    'Number of syllables',
                    'Word-concreteness difference',
                    'Language-based similarity $^{\\text{b}}$',
                    'Visual-strength difference $^{\\text{b}}$',
                    'Stimulus onset asynchrony (SOA) $^{\\text{b}}$',
                    'Word-concreteness difference :: Vocabulary size',
                    'Word-concreteness difference : SOA',
                    'Word-concreteness difference : Gender',
                    'Language-based similarity :: Attentional control',
                    'Visual-strength difference :: Attentional control',
                    'Language-based similarity :: Vocabulary size',
                    'Visual-strength difference :: Vocabulary size',
                    'Language-based similarity : Gender',
                    'Visual-strength difference : Gender',
                    'Language-based similarity : SOA $^{\\text{b}}$',
                    'Visual-strength difference : SOA $^{\\text{b}}$'),
  interaction_symbol_x = TRUE,
  caption = 'Frequentist model for the semantic priming study.') %>%
  # kable_styling(latex_options = 'scale_down') %>%
  
  # Group predictors under headings
  pack_rows('Individual differences', 2, 4) %>% 
  pack_rows('Target-word lexical covariates', 5, 6) %>% 
  pack_rows('Prime--target relationship', 7, 9) %>% 
  pack_rows('Task condition', 10, 10) %>% 
  pack_rows('Interactions', 11, 21) %>% 
  
  # Place table close to designated position and highlight covariates
  kable_styling(latex_options = c('hold_position', 'striped'), 
                stripe_index = c(2, 5:7, 11:15)) %>%
  
  # Footnote describing abbreviations, random slopes, etc. 
  # LaTeX code used to format the text.
  footnote(escape = FALSE, threeparttable = TRUE, general_title = '\\\\linebreak', 
           general = paste('\\\\textit{Note}. $\\\\upbeta$ = Estimate based on $z$-scored predictors; \\\\textit{SE} = standard error;',
                           'CI = confidence interval. Shaded rows contain covariates. Some interactions are',
                           'split over two lines, with the second line indented. \\\\linebreak', 
                           '$^{\\\\text{a}}$ By-word random slopes were included for this effect.',
                           '$^{\\\\text{b}}$ By-participant random slopes were included for this effect.', 
                           # After first line in the footnote, begin next lines with a dot-sized indent to correct default error.
                           sep = ' \\\\linebreak \\\\phantom{.}'))

```


Both language-based similarity and visual-strength difference produced significant main effects. As expected, their effects had opposite directions. On the one hand, higher values of language-based similarity facilitated participants' performance, as reflected in shorter RTs. On the other hand, higher values of visual-strength difference led to longer RTs. Furthermore, language-based similarity interacted with vocabulary size and with SOA. There were no effects of participants' gender (see interaction figures below).

The effect sizes of language-based similarity and its interactions were larger than those of visual-strength difference. Figure \@ref(fig:semanticpriming-frequentist-bayesian-plot-weaklyinformativepriors-exgaussian) displays the frequentist and the Bayesian estimates, which are broadly similar. The Bayesian estimates are from the weakly-informative prior model. The estimates of the two other models, based on informative and diffuse priors, were virtually identical to these (see [\underline{Appendix E}](#appendix-E-Bayesian-analysis-results)).

\FloatBarrier

```{r semanticpriming-frequentist-bayesian-plot-weaklyinformativepriors-exgaussian, fig.cap = 'Estimates for the semantic priming study. The frequentist means (represented by red points) are flanked by 95\\% confidence intervals. The Bayesian means (represented by blue vertical lines) are flanked by 95\\% credible intervals in light blue.'}

# Run plot through source() rather than directly in this R Markdown document
# to preserve the format.

source('semanticpriming/frequentist_bayesian_plots/semanticpriming_frequentist_bayesian_plots.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticpriming/frequentist_bayesian_plots/plots/semanticpriming_frequentist_bayesian_plot_weaklyinformativepriors_exgaussian.pdf'
  ))

```


Figure \@ref(fig:semanticpriming-interactions-with-vocabulary-size)-a shows the significant interaction between language-based similarity and vocabulary size, whereby higher-vocabulary participants presented a greater benefit from the language-based similarity between prime and target words. This interaction replicates the results of @yap2017a, who analysed the same data set but using a categorical measure of similarity instead. Indeed, this replication is noteworthy as it holds in spite of some methodological differences between the studies. First, @yap2017a operationalised the priming effect as a categorical difference between related and unrelated prime--target pairs, which were based on association ratings produced by people [@nelsonUniversitySouthFlorida2004]. In contrast, the present study applied a continuous measure of relatedness---i.e., cosine similarity---, which is more precise and may thus afford more statistical power [@mandera2017a; @petilli2021a]. Therefore, this interaction demonstrates the consistency between human ratings and computational approximations to meaning [@charbonnierPredictingWordConcreteness2019; @charbonnierPredictingConcretenessGerman2020; @guenther2016a; @louwerse2015a; @mandera2017a; @petilli2021a; @solovyevConcretenessAbstractnessConcept2021; @wingfieldUnderstandingRoleLinguistic2022]. The second difference between the present study and @yap2017a is that @yap2017a performed a correlational analysis, whereas the present analysis used maximal mixed-effects models that included several covariates to measure the effects of interest as rigorously as possible.

Figure \@ref(fig:semanticpriming-interactions-with-vocabulary-size)-b presents the non-significant interaction between visual-strength difference and vocabulary size.^[All interaction plots across the three studies are based on the frequentist models. Further interaction plots available in [\underline{Appendix D}](#appendix-D-interaction-plots).] Albeit a non-significant interaction, the effect of visual-strength difference was larger in lower-vocabulary participants.

(ref:semanticpriming-interactions-with-vocabulary-size) Interactions of vocabulary size with language-based similarity (panel a) and with visual-strength difference (panel b). Vocabulary size is constrained to deciles (10 sections) in this plot, whereas in the statistical analysis it contained more values within the current range. $n$ = number of participants contained between deciles.

```{r semanticpriming-interactions-with-vocabulary-size, fig.cap = '(ref:semanticpriming-interactions-with-vocabulary-size)', out.width = '85%'}

# Run plot through source() rather than directly in this R Markdown document 
# to preserve the italicised text.

source('semanticpriming/frequentist_analysis/semanticpriming-interactions-with-vocabulary-size.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticpriming/frequentist_analysis/plots/semanticpriming-interactions-with-vocabulary-size.pdf'
  ))

```


Figure \@ref(fig:semanticpriming-interactions-with-SOA) shows that the effects of language-based similarity and visual-strength difference were both larger with the short SOA. However, whereas the effect of language-based similarity was present with both SOAs (i.e., 200 ms and 1,200 ms), the effect of visual-strength difference was almost exclusive to the the long SOA. These results are consistent with @petilli2021a, whereas they contrast with previous findings regarding the slower pace of the visual system in semantic priming [@lam2015a] and in other paradigms [@louwerseTasteWordsLinguistic2011].

```{r semanticpriming-interactions-with-SOA, fig.cap = 'Interactions of stimulus onset asynchrony (SOA) with language-based similarity (panel a) and with visual-strength difference (panel b) in the semantic priming study. SOA was analysed using $z$-scores, but for clarity, the basic labels are used in the legend.', out.width = '80%'}

# Run plot through source() rather than directly in this R Markdown document 
# to preserve the italicised text.

source('semanticpriming/frequentist_analysis/semanticpriming-interactions-with-SOA.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticpriming/frequentist_analysis/plots/semanticpriming-interactions-with-SOA.pdf'
  ))

```


Figure \@ref(fig:semanticpriming-interactions-with-gender) shows the non-significant interactions of gender with language-based similarity and with visual-strength difference.

```{r semanticpriming-interactions-with-gender, fig.cap = 'Interactions of gender with language-based similarity (panel a) and with visual-strength difference (panel b) in the semantic priming study. Gender was analysed using $z$-scores, but for clarity, the basic labels are used in the legend.', out.width = '80%'}

# Run plot through source() rather than directly in this R Markdown document 
# to preserve the italicised text.

source('semanticpriming/frequentist_analysis/semanticpriming-interactions-with-gender.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticpriming/frequentist_analysis/plots/semanticpriming-interactions-with-gender.pdf'
  ))

```


### Human-based and computational measures of visual information {#results-human-based-and-computational-measures-of-visual-information}

Next, we reflected on the adequacy of visual-strength difference as a measurement instrument, as it had never (to our knowledge) been used before in the study of semantic priming. Even though the effect of this variable on task performance was---as expected---inhibitory (i.e., higher values of this variable leading to longer RTs), we were concerned about the low correlation between visual-strength difference and language-based similarity ($r$ = `r cor(semanticpriming$z_cosine_similarity, semanticpriming$z_visual_rating_diff) %>% sprintf('%.2f', .) %>% sub('^(-)?0[.]', '\\1.', .)`). First, the negligible size of this correlation raised concerns, as we expected a larger and negative correlation. Second, @petilli2021a had found a correlation of $r$ = .50 between vision-based similarity and language-based similarity. This prompted us to compare the performance of our measure---i.e., `visual-strength difference`---to that of Petilli et al.---i.e., `vision-based similarity`. 

```{r}

# Calculate some of the sample sizes to be reported in the paragraph below

# Number of prime--target pairs per participant.
# Save mean as integer and SD rounded while keeping trailing zeros
semanticpriming_with_visualsimilarity_mean_primetarget_pairs_per_participant = 
  semanticpriming_with_visualsimilarity %>% group_by(Participant) %>% 
  summarise(length(unique(primeword_targetword))) %>% 
  select(2) %>% unlist %>% mean %>% round(0)

semanticpriming_with_visualsimilarity_SD_primetarget_pairs_per_participant = 
  semanticpriming_with_visualsimilarity %>% group_by(Participant) %>% 
  summarise(length(unique(primeword_targetword))) %>% 
  select(2) %>% unlist %>% sd %>% sprintf('%.2f', .)

# Number of participants per prime--target pair.
# Save mean as integer and SD rounded while keeping trailing zeros
semanticpriming_with_visualsimilarity_mean_participants_per_primetarget_pair = 
  semanticpriming_with_visualsimilarity %>% group_by(primeword_targetword) %>% 
  summarise(length(unique(Participant))) %>% 
  select(2) %>% unlist %>% mean %>% round(0)

semanticpriming_with_visualsimilarity_SD_participants_per_primetarget_pair = 
  semanticpriming_with_visualsimilarity %>% group_by(primeword_targetword) %>% 
  summarise(length(unique(Participant))) %>% 
  select(2) %>% unlist %>% sd %>% sprintf('%.2f', .)

```


For this purpose, we first subsetted our previous data set to ensure that all trials contained data from all relevant variables---i.e., from all the existing variables and from the newly-added `vision-based similarity` from @petilli2021a. This process resulted in the loss of 83% of trials, owing to the strict selection criteria that had been applied by Petilli et al. in the creation of their variable---for instance, both the target and the prime word had to be associated to at least 100 pictures in ImageNet. The rest of the preprocessing involved the same steps as the main analysis (detailed in [\underline{Methods}](#semanticpriming-dataset)). The resulting data set contained `r length(unique(semanticpriming_with_visualsimilarity$Participant))` participants, `r length(unique(semanticpriming_with_visualsimilarity$primeword_targetword)) %>% formattable::comma(digits = 0)` prime--target pairs and `r length(semanticpriming_with_visualsimilarity$z_target.RT) %>% formattable::comma(digits = 0)` RTs. On average, there were `r semanticpriming_with_visualsimilarity_mean_primetarget_pairs_per_participant` prime--target pairs per participant ($SD$ = `r semanticpriming_with_visualsimilarity_SD_primetarget_pairs_per_participant`), and conversely, `r semanticpriming_with_visualsimilarity_mean_participants_per_primetarget_pair` participants per prime--target pair ($SD$ = `r semanticpriming_with_visualsimilarity_SD_participants_per_primetarget_pair`).

Figure \@ref(fig:semanticpriming-with-visualsimilarity-correlations) shows the correlations among the predictors and the dependent variable.

```{r semanticpriming-with-visualsimilarity-correlations, fig.cap = 'Zero-order correlations in the semantic priming data set that included vision-based similarity.', fig.width = 8.3, fig.height = 5, out.width = '73%'}

# Using the following variables...
semanticpriming_with_visualsimilarity %>%
  
  select(z_target.RT, z_vocabulary_size, z_attentional_control, 
         z_cosine_similarity, z_visual_similarity, 
         z_visual_rating_diff, z_word_concreteness_diff, 
         z_target_word_frequency, z_target_number_syllables) %>%
  
  # Use plain names
  rename('RT' = z_target.RT, 
         'Vocabulary size' = z_vocabulary_size,
         'Attentional control' = z_attentional_control,
         'Language-based similarity' = z_cosine_similarity,
         'Visual-strength difference' = z_visual_rating_diff,
         'Vision-based similarity' = z_visual_similarity,
         'Word-concreteness difference' = z_word_concreteness_diff,
         'Target-word frequency' = z_target_word_frequency,
         'Number of target-word syllables' = z_target_number_syllables) %>%
  
  # make correlation matrix (custom function from the 'R_functions' folder)
  correlation_matrix() + 
  theme(plot.margin = unit(c(0, 0, 0.1, -3.1), 'in'))

```


### Diagnostics for the frequentist analysis

The model presented convergence warnings. To avoid removing important random slopes, which could increase the Type I error rate---i.e., false positives [@brauer2018a; @singmann2019a], we examined the model after refitting it using seven optimisation algorithms through the 'allFit' function of the 'lme4' package [@batesPackageLme42021]. The results showed that all optimisers produced virtually identical means for all effects, suggesting that the convergence warnings were not consequential (Bates et al., 2021; see [\underline{Appendix B}](#appendix-B-frequentist-analysis-diagnostics)).

```{r}

# Calculate VIF for every predictor and return only the maximum VIF rounded up
maxVIF_semanticpriming_with_visualsimilarity = 
  car::vif(semanticpriming_with_visualsimilarity_lmerTest) %>% max %>% ceiling

```

The residual errors were not normally distributed, and attempts to mitigate this deviation proved unsuccessful (see [\underline{Appendix B}](#appendix-B-frequentist-analysis-diagnostics)). However, this is not likely to have posed a major problem, as mixed-effects models are fairly robust to deviations from normality [@kniefViolatingNormalityAssumption2021; @schielzethRobustnessLinearMixed2020]. Last, the model did not present multicollinearity problems, with all VIFs below `r maxVIF_semanticpriming_with_visualsimilarity` [see @dormannCollinearityReviewMethods2013; @harrison2018a].

#### Results

```{r}

# Calculate R^2. This coefficient must be interpreted with caution 
# (Nakagawa et al., 2017; https://doi.org/10.1098/rsif.2017.0213). 
# Also, transform coefficient to rounded percentage.

Nakagawa2017_fixedeffects_R2_semanticpriming_with_visualsimilarity_lmerTest = 
  paste0(
    (MuMIn::r.squaredGLMM(semanticpriming_with_visualsimilarity_lmerTest)[1, 'R2m'][[1]] * 100) %>% 
      sprintf('%.2f', .), '%'
  )

Nakagawa2017_randomeffects_R2_semanticpriming_with_visualsimilarity_lmerTest = 
  paste0(
    (MuMIn::r.squaredGLMM(semanticpriming_with_visualsimilarity_lmerTest)[1, 'R2c'][[1]] * 100) %>% 
      sprintf('%.2f', .), '%'
  )

```


Table \@ref(tab:semanticpriming-with-visualsimilarity-frequentist-model) presents the results. Due to space, the covariates are shown in Table \@ref(tab:semanticpriming-with-visualsimilarity-frequentist-model-covariates). The fixed effects explained `r Nakagawa2017_fixedeffects_R2_semanticpriming_with_visualsimilarity_lmerTest` of the variance, and the random effects explained `r Nakagawa2017_randomeffects_R2_semanticpriming_with_visualsimilarity_lmerTest` (Nakagawa et al., 2017; for an explanation of this difference, see [\underline{Results of Study 2.1}](#semanticpriming-results)). Figure \@ref(fig:semanticpriming-with-visualsimilarity-confidence-intervals-plot) displays the frequentist estimates of the effects of interest (Bayesian estimates not computed due to time constraints).

```{r semanticpriming-with-visualsimilarity-frequentist-model, results = 'asis', out.width = '90%'}

# Rename effects in plain language and specify the random slopes
# (if any) for each effect, in the footnote. For this purpose, 
# superscripts are added to the names of the appropriate effects.
# 
# In the interactions below, word-level variables are presented 
# first for the sake of consistency (the order does not affect 
# the results in any way). Also in the interactions, double 
# colons are used to inform the 'frequentist_model_table' 
# function that the two terms in the interaction must be split 
# into two lines.

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) ==
    'z_attentional_control'] = 'Attentional control'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) == 
    'z_vocabulary_size'] = 'Vocabulary size $^{\\text{a}}$'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) == 
    'z_recoded_participant_gender'] = 'Gender $^{\\text{a}}$'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) ==
    'z_target_word_frequency'] = 'Word frequency'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) ==
    'z_target_number_syllables'] = 'Number of syllables'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) ==
    'z_word_concreteness_diff'] = 'Word-concreteness difference'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) == 
    'z_cosine_similarity'] = 'Language-based similarity $^{\\text{b}}$'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) == 
    'z_visual_rating_diff'] = 'Visual-strength difference $^{\\text{b}}$'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) == 
    'z_visual_similarity'] = 'Vision-based similarity $^{\\text{b}}$'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) == 
    'z_recoded_interstimulus_interval'] = 'Stimulus onset asynchrony (SOA) $^{\\text{b}}$'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) ==
    'z_word_concreteness_diff:z_vocabulary_size'] =
  'Word-concreteness difference :: Vocabulary size'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) ==
    'z_word_concreteness_diff:z_recoded_interstimulus_interval'] =
  'Word-concreteness difference : SOA'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) ==
    'z_word_concreteness_diff:z_recoded_participant_gender'] =
  'Word-concreteness difference : Gender'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) ==
    'z_attentional_control:z_cosine_similarity'] =
  'Language-based similarity :: Attentional control'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) ==
    'z_attentional_control:z_visual_rating_diff'] =
  'Visual-strength difference :: Attentional control'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) ==
    'z_attentional_control:z_visual_similarity'] =
  'Vision-based similarity :: Attentional control'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) == 
    'z_vocabulary_size:z_cosine_similarity'] = 
  'Language-based similarity :: Vocabulary size'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) == 
    'z_vocabulary_size:z_visual_rating_diff'] = 
  'Visual-strength difference :: Vocabulary size'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) == 
    'z_vocabulary_size:z_visual_similarity'] = 
  'Vision-based similarity :: Vocabulary size'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) == 
    'z_recoded_participant_gender:z_cosine_similarity'] = 
  'Language-based similarity : Gender'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) == 
    'z_recoded_participant_gender:z_visual_rating_diff'] = 
  'Visual-strength difference : Gender'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) == 
    'z_recoded_participant_gender:z_visual_similarity'] = 
  'Vision-based similarity : Gender'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) == 
    'z_recoded_interstimulus_interval:z_cosine_similarity'] = 
  'Language-based similarity : SOA $^{\\text{b}}$'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) == 
    'z_recoded_interstimulus_interval:z_visual_rating_diff'] = 
  'Visual-strength difference : SOA $^{\\text{b}}$'

rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients)[
  rownames(KR_summary_semanticpriming_with_visualsimilarity_lmerTest$coefficients) == 
    'z_recoded_interstimulus_interval:z_visual_similarity'] = 
  'Vision-based similarity : SOA $^{\\text{b}}$'


# Next, change the names in the confidence intervals object

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) ==
    'z_attentional_control'] = 'Attentional control'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) == 
    'z_vocabulary_size'] = 'Vocabulary size $^{\\text{a}}$'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) == 
    'z_recoded_participant_gender'] = 'Gender $^{\\text{a}}$'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) ==
    'z_target_word_frequency'] = 'Word frequency'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) ==
    'z_target_number_syllables'] = 'Number of syllables'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) ==
    'z_word_concreteness_diff'] = 'Word-concreteness difference'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) == 
    'z_cosine_similarity'] = 'Language-based similarity $^{\\text{b}}$'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) == 
    'z_visual_rating_diff'] = 'Visual-strength difference $^{\\text{b}}$'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) == 
    'z_visual_similarity'] = 'Vision-based similarity $^{\\text{b}}$'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) == 
    'z_recoded_interstimulus_interval'] = 'Stimulus onset asynchrony (SOA) $^{\\text{b}}$'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) ==
    'z_word_concreteness_diff:z_vocabulary_size'] =
  'Word-concreteness difference :: Vocabulary size'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) ==
    'z_word_concreteness_diff:z_recoded_interstimulus_interval'] =
  'Word-concreteness difference : SOA'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) ==
    'z_word_concreteness_diff:z_recoded_participant_gender'] =
  'Word-concreteness difference : Gender'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) ==
    'z_attentional_control:z_cosine_similarity'] =
  'Language-based similarity :: Attentional control'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) ==
    'z_attentional_control:z_visual_rating_diff'] =
  'Visual-strength difference :: Attentional control'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) ==
    'z_attentional_control:z_visual_similarity'] =
  'Vision-based similarity :: Attentional control'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) == 
    'z_vocabulary_size:z_cosine_similarity'] = 
  'Language-based similarity :: Vocabulary size'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) == 
    'z_vocabulary_size:z_visual_rating_diff'] = 
  'Visual-strength difference :: Vocabulary size'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) == 
    'z_vocabulary_size:z_visual_similarity'] = 
  'Vision-based similarity :: Vocabulary size'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) == 
    'z_recoded_participant_gender:z_cosine_similarity'] = 
  'Language-based similarity : Gender'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) == 
    'z_recoded_participant_gender:z_visual_rating_diff'] = 
  'Visual-strength difference : Gender'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) == 
    'z_recoded_participant_gender:z_visual_similarity'] = 
  'Vision-based similarity : Gender'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) == 
    'z_recoded_interstimulus_interval:z_cosine_similarity'] = 
  'Language-based similarity : SOA $^{\\text{b}}$'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) == 
    'z_recoded_interstimulus_interval:z_visual_rating_diff'] = 
  'Visual-strength difference : SOA $^{\\text{b}}$'

rownames(confint_semanticpriming_with_visualsimilarity_lmerTest)[
  rownames(confint_semanticpriming_with_visualsimilarity_lmerTest) == 
    'z_recoded_interstimulus_interval:z_visual_similarity'] = 
  'Vision-based similarity : SOA $^{\\text{b}}$'


# Create table (using custom function from the 'R_functions' folder)

# Covariates are commented out as they do not fit in the table. 
# They are instead shown in the subsequent table.

frequentist_model_table(
  KR_summary_semanticpriming_with_visualsimilarity_lmerTest, 
  confint_semanticpriming_with_visualsimilarity_lmerTest,
  select_effects = c('(Intercept)',
                     # 'Attentional control',
                     'Vocabulary size $^{\\text{a}}$',
                     'Gender $^{\\text{a}}$',
                     # 'Word frequency',
                     # 'Number of syllables',
                     # 'Word-concreteness difference',
                     'Language-based similarity $^{\\text{b}}$',
                     'Visual-strength difference $^{\\text{b}}$',
                     'Vision-based similarity $^{\\text{b}}$',
                     'Stimulus onset asynchrony (SOA) $^{\\text{b}}$',
                     # 'Word-concreteness difference :: Vocabulary size',
                     # 'Word-concreteness difference : SOA',
                     # 'Word-concreteness difference : Gender',
                     # 'Language-based similarity :: Attentional control',
                     # 'Visual-strength difference :: Attentional control',
                     # 'Vision-based similarity :: Attentional control',
                     'Language-based similarity :: Vocabulary size',
                     'Visual-strength difference :: Vocabulary size',
                     'Vision-based similarity :: Vocabulary size',
                     'Language-based similarity : Gender',
                     'Visual-strength difference : Gender',
                     'Vision-based similarity : Gender',
                     'Language-based similarity : SOA $^{\\text{b}}$',
                     'Visual-strength difference : SOA $^{\\text{b}}$',
                     'Vision-based similarity : SOA $^{\\text{b}}$'),
  interaction_symbol_x = TRUE,
  caption = 'Effects of interest in the semantic priming model that included vision-based similarity.') %>%
  # kable_styling(latex_options = 'scale_down') %>%
  
  # Group predictors under headings
  pack_rows('Individual differences', 2, 3) %>% 
  pack_rows('Prime--target relationship', 4, 6) %>% 
  pack_rows('Task condition', 7, 7) %>% 
  pack_rows('Interactions', 8, 16) %>% 
  
  # Place table close to designated position
  kable_styling(latex_options = c('hold_position')) %>%
  
  # Footnote describing abbreviations, random slopes, etc. 
  # LaTeX code used to format the text.
  footnote(escape = FALSE, threeparttable = TRUE, general_title = '\\\\linebreak', 
           general = paste('\\\\textit{Note}. $\\\\upbeta$ = Estimate based on $z$-scored predictors; \\\\textit{SE} = standard error;',
                           'CI = confidence interval. Covariates shown in next table due to space. Some ',
                           'interactions are split over two lines, with the second line indented. \\\\linebreak', 
                           '$^{\\\\text{a}}$ By-word random slopes were included for this effect.',
                           '$^{\\\\text{b}}$ By-participant random slopes were included for this effect.', 
                           # After first line in the footnote, begin next lines with a dot-sized indent to correct default error.
                           sep = ' \\\\linebreak \\\\phantom{.}'))

```


```{r semanticpriming-with-visualsimilarity-frequentist-model-covariates, results = 'asis'}

# Create table (using custom function from the 'R_functions' folder)

# Only the covariates are shown, and the effects of interest are
# commented out as they were shown in the table above.

frequentist_model_table(
  KR_summary_semanticpriming_with_visualsimilarity_lmerTest, 
  confint_semanticpriming_with_visualsimilarity_lmerTest,
  select_effects = c('Attentional control',
                     # 'Vocabulary size $^{\\text{a}}$',
                     # 'Gender $^{\\text{a}}$',
                     'Word frequency',
                     'Number of syllables',
                     'Word-concreteness difference',
                     # 'Language-based similarity $^{\\text{b}}$',
                     # 'Visual-strength difference $^{\\text{b}}$',
                     # 'Vision-based similarity $^{\\text{b}}$',
                     # 'Stimulus onset asynchrony (SOA) $^{\\text{b}}$',
                     'Word-concreteness difference :: Vocabulary size',
                     'Word-concreteness difference : SOA',
                     'Word-concreteness difference : Gender',
                     'Language-based similarity :: Attentional control',
                     'Visual-strength difference :: Attentional control',
                     'Vision-based similarity :: Attentional control'  # comma deleted
                     # 'Language-based similarity :: Vocabulary size',
                     # 'Visual-strength difference :: Vocabulary size',
                     # 'Vision-based similarity :: Vocabulary size',
                     # 'Language-based similarity : Gender',
                     # 'Visual-strength difference : Gender',
                     # 'Language-based similarity : SOA $^{\\text{b}}$',
                     # 'Visual-strength difference : SOA $^{\\text{b}}$',
                     # 'Vision-based similarity : SOA $^{\\text{b}}$'
  ),
  interaction_symbol_x = TRUE,
  caption = 'Covariates in the semantic priming model that included vision-based similarity.') %>%
  # kable_styling(latex_options = 'scale_down') %>%
  
  # Group predictors under headings
  pack_rows('Individual difference covariate', 1, 1) %>% 
  pack_rows('Target-word lexical covariates', 2, 3) %>% 
  pack_rows('Prime--target covariate', 4, 4) %>% 
  pack_rows('Covariate interactions', 5, 10) %>%
  
  # Place table close to designated position
  kable_styling(latex_options = c('hold_position')) %>%
  
  # Footnote describing abbreviations, random slopes, etc. 
  # LaTeX code used to format the text.
  footnote(escape = FALSE, threeparttable = TRUE, general_title = '\\\\linebreak', 
           general = paste('\\\\textit{Note}. $\\\\upbeta$ = Estimate based on $z$-scored predictors; \\\\textit{SE} = standard error;',
                           'CI = confidence interval. Some interactions are split over two lines, with the',
                           'second line indented. \\\\linebreak', 
                           # '$^{\\\\text{a}}$ By-word random slopes were included for this effect.',
                           # '$^{\\\\text{b}}$ By-participant random slopes were included for this effect.', 
                           # After first line in the footnote, begin next lines with a dot-sized indent to correct default error.
                           sep = ' \\\\linebreak \\\\phantom{.}'))

```


\FloatBarrier

```{r semanticpriming-with-visualsimilarity-confidence-intervals-plot, fig.cap = 'Means and 95\\% confidence intervals for the effects of interest in the semantic priming model that included vision-based similarity.', out.width = '89%'}

# Run plot through source() rather than directly in this R Markdown document
# to preserve the format.

source('semanticpriming/analysis_with_visualsimilarity/semanticpriming_with_visualsimilarity_confidence_intervals_plot.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticpriming/analysis_with_visualsimilarity/plots/semanticpriming_with_visualsimilarity_confidence_intervals_plot.pdf'
  ))

```


The results revealed an effect of the human-based measure, `visual-strength difference` (as in the main analysis above), along with a smaller effect of the computational measure, `vision-based similarity`. There was an important difference between these measures regarding the interaction with SOA. Whereas visual-strength difference had a larger effect with the short SOA, vision-based similarity did not interact with SOA, contrary to the results of @petilli2021a. This difference was not due to collinearity between these measures ($r$ = `r cor(semanticpriming_with_visualsimilarity$z_visual_rating_diff, semanticpriming_with_visualsimilarity$z_visual_similarity) %>% sprintf('%.2f', .) %>% sub('^(-)?0[.]', '\\1.', .)`). Also importantly, both measures appeared to be valid based on their correlations with language-based similarity and with word concreteness (Figure \@ref(fig:semanticpriming-with-visualsimilarity-correlations)). We reflect on this result in the discussion.

### Statistical power analysis

Power curves were performed for most effects of interest in the main model. This was done using the main model, not the follow-up that included vision-based similarity. Figures \@ref(fig:semanticpriming-powercurve-plots-1-2-3) and \@ref(fig:semanticpriming-powercurve-plots-4-5-6-7-8-9) show the estimated power for some main effects and interactions of interest as a function of the number of participants. To plan the sample size for future studies, these results must be considered under the assumptions that the future study would apply a statistical method similar to ours---namely, a mixed-effects model with random intercepts and slopes---, and that the analysis would encompass at least as many prime--target pairs as the current study, namely, `r length(unique(semanticpriming$primeword_targetword)) %>% formattable::comma(digits = 0)` (distributed in various blocks across participants, not all being presented to every participant). Furthermore, it is necessary to consider each figure in detail. Here, we provide a summary. First, detecting the main effect of language-based similarity---which had a strong effect on RTs---would require 50 participants. Second, detecting the interaction between language-based similarity and SOA---which was a considerably weaker effect---would require 600 participants. Last, the other effects would require more than 1,000 participants---or, in the case of gender differences, many more than that. 

```{r semanticpriming-powercurve-plots-1-2-3, fig.cap = 'Power curves for some main effects in the semantic priming study.'}

# Run plot through source() rather than directly in this R Markdown document 
# to preserve the italicised text.
source('semanticpriming/power_analysis/semanticpriming_all_powercurves.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticpriming/power_analysis/plots/semanticpriming_powercurve_plots_1_2_3.pdf'
  ))

```

```{r semanticpriming-powercurve-plots-4-5-6-7-8-9, fig.cap = 'Power curves for some interactions in the semantic priming study.', out.width = '92%'}

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticpriming/power_analysis/plots/semanticpriming_powercurve_plots_4_5_6_7_8_9.pdf'
  ))

```


## Discussion of Study 2.1

The results revealed a significant, facilitatory effect of `language-based similarity` and a smaller but significant, inhibitory effect of `visual-strength difference`. That is, a greater language-based similarity resulted in shorter RTs, whereas a greater visual-strength difference resulted in larger RTs. There was also a sizable effect of stimulus onset asynchrony (SOA), with shorter RTs in the short SOA condition (200 ms) than in the long SOA (1,200 ms). Furthermore, there were significant interactions. First, language-based priming was larger in higher-vocabulary participants than in lower-vocabulary ones. Second, both language-based priming and vision-based priming were larger with the short SOA than with the long one. Thus far, these results broadly replicated those of @petilli2021a. It is especially noteworthy that vision-based information had a significant effect, consistent with some of the previous research [@floresdarcaisSemanticActivationRecognition1985; @schreuderEffectsPerceptualConceptual1984; @connellSeeHearWhat2014; @petilli2021a], and contrasting with other research that did not find such an effect [@ostarekTaskdependentCausalRole2017] or only observed it after visually-focussed tasks [@pecherDoesPizzaPrime1998; @yeeColorlessGreenIdeas2012]. Last, no effect of gender was found. Below, we delve into some other aspects of these results.

### The importance of outliers

The interaction between language-based similarity and vocabulary size (Figure \@ref(fig:semanticpriming-interactions-with-vocabulary-size)-a) was patent in all deciles of vocabulary size but it was clearest among those participants that were more than one standard deviation away from the mean. Outliers in individual differences have played important roles in other areas of cognition as well, such as in the study of aphantasia and hyperphantasia---traits characterised, respectively, by a diminished and an extraordinary ability to mentally visualise objects [@milton2021a; @zeman2020a]. Such an influence of outliers provides a reason to study more varied samples of participants when possible. Furthermore, a greater interindividual variation might help detect the effects of individual differences that have been elusive [e.g., @hedge2018a; @murakiSimulatingSemanticsAre2021; @ponari2018a; @rodriguez-ferreiroSemanticPrimingSchizotypal2020; @rouderPsychometricsIndividualDifferences2019].

### Human-based and computational measures of vision-based information

Next, in a secondary analysis, we compared the roles of two measures of vision-based priming. The first measure---`visual-strength difference`---was operationalised as the difference in visual strength between the prime word and the target word in each trial. This difference score was thus based on modality-specific ratings provided by human participants [@lynott2020a]. The second measure---`vision-based similarity`---, created by @petilli2021a, was based on vector representations trained on labelled images from ImageNet. This variable is therefore computational. The effect of visual-strength difference was slightly larger than that of vision-based similarity. This result is consistent with some previous findings suggesting that human-based measures explained more variance than computational measures [@de2016a; @de2019a; @gagneProcessingEnglishCompounds2016; @schmidtke2018a; cf. @michaelovClozeFarN4002022; @snefjella2020a]. If the different degree of human dependence of our two variables were indeed behind the effect size of each, we would need to consider a related issue. The problem of circularity was addressed by Petilli et al., who argued that using human-based predictors---such as ratings---to investigate human behaviour was less valid than using predictors that were more independent of human behaviour---such as computational measures. On the one hand, we identify two reasons for skepticism regarding the circularity hypothesis. First, the underlying basis of all computational measures [e.g., @mandera2017a; @petilli2021a] is indeed human behaviour, notwithstanding the degree to which this human basis is filtered by computational methods. Second, we have not found sufficient research addressing the validity question. Yet, on the other hand, the circularity hypothesis is important enough to warrant dedicated research. Specifically, future studies could be conducted to systematically compare the theoretical insights provided by human-based measures and by computational ones, as well as the effect size achieved by both types.

It is noteworthy that both visual-strength difference and vision-based similarity have *independently* proven to be relevant, and arguably valid, considering their correlations with other measures---especially `word-concreteness difference` and `language-based similarity`---and considering the effects of each measure in semantic priming [see @petilli2021a]. However, the differences between these measures are worthy of attention. Visual-strength difference was barely correlated with language-based similarity. Conversely, vision-based similarity was barely correlated with word-concreteness difference (refer to Figure \@ref(fig:semanticpriming-with-visualsimilarity-correlations)). These results call for an investigation into the underlying composition of visual-strength difference and vision-based similarity.

Furthermore, whereas visual-strength difference retained its significant interaction with SOA---also observed in the main analysis presented above---, in contrast, vision-based similarity did not present such an interaction. The lack of an interaction between vision-based similarity and SOA contrasts with the results of  @petilli2021a, who found that vision-based similarity was only significant in the short SOA condition. There are several possible reasons for this difference, including: (I) a more conservative method in our current analysis---i.e., a maximal mixed-effects model containing more predictors than the hierarchical regression performed by Petilli et al.---, and (II) the presence of individual differences in the present study (i.e., vocabulary size, attentional control and gender), versus the aggregation performed by Petilli et al.

Last, the interaction between language-based similarity and SOA became non-significant in this sub-analysis. This difference from the original analysis may have been caused by the sizable correlation between language-based similarity and vision-based similarity ($r$ = `r cor(semanticpriming_with_visualsimilarity$z_cosine_similarity, semanticpriming_with_visualsimilarity$z_visual_similarity) %>% sprintf('%.2f', .) %>% sub('^(-)?0[.]', '\\1.', .)`). In this regard, we should notice the large influence of the addition of a single variable (along with its interactions) into the model. 


### The influence of the analytical method

Taken together, the sub-analysis that included vision-based similarity offered a glimpse into the crucial role of analytical choices in the present topic. A previous example of this influence appeared in a set of studies that used Latent Semantic Analysis (LSA) as a predictor of semantic priming. @hutchisonPredictingSemanticPriming2008 operationalised LSA as a difference score, and did not find an effect of this variable. In contrast, later studies did not use a difference score and they observed a significant effect [@guntherLatentSemanticAnalysis2016; @mandera2017a]. We can extrapolate this issue to a very important comparison we often make---namely, that between language-based and embodied simulation. The pervasive superiority of language over the other systems (perception, action, emotion and sociality)---found in the three current studies and in previous ones [@banksLinguisticDistributionalKnowledge2021; @kiela2014a; @louwerse2015a; @lam2015a; @pecherDoesPizzaPrime1998; @petilli2021a]---would be less trustworthy if the instruments that were used to measure the language system had been far more precise than the instruments used to measure the embodiment system. In this sense, it is relevant to consider how variables are improved in research: it is done iteratively, by comparing the performance of different variables. Critically, the literature contains many comparisons of text-based variables, some dating back to the 1990s [@jones2006a; @lund1996a; @mandera2017a; @mikolovEfficientEstimationWord2013; @dedeyneBetterExplanationsLexical2013; @de2016a; @guenther2016a; @guntherLatentSemanticAnalysis2016; @wingfieldUnderstandingRoleLinguistic2022]. In contrast, the work on embodiment variables began more than a decade afterwards, and it has been less concerned with benchmarking the explanatory power of variables [but see @vergallitoPerceptualModalityNorms2020]. Instead, this literature contains more comparisons of different *modalities*---e.g., visual strength, auditory strength, valence, etc. [@lynott2020a; @lynottModalityExclusivityNorms2009; @newcombeEffectsEmotionalSensorimotor2012]. Thus, if linguistic measures are more precise than embodiment measures due to greater work on the variables, such a difference could account for a certain portion of the superiority of linguistic information over embodied information [see @banksLinguisticDistributionalKnowledge2021; @kiela2014a; @louwerse2015a; @lam2015a; @pecherDoesPizzaPrime1998; @petilli2021a]. Analytical choices such as the operationalisation of variables and the complexity of statistical models can greatly influence the conclusions of research. Indeed, our current results and previous ones suggest that the conclusions of research are inextricable from the method used in each study [see @barsalouEstablishingGeneralizableMechanisms2019; @botvinik-nezerVariabilityAnalysisSingle2020; @perretWhichVariablesShould2019; @wagenmakersOneStatisticalAnalysis2022]. Therefore, in the medium term, it may pay dividends to continue examining the influence of analytical choices. Unfortunately, in many research fields, reflecting on the sensitivity of our analyses might conflict with the incentives of the system, which may penalise nuanced conclusions in favour of simplified stories. To overcome such a bias, it may be necessary to devote greater importance to the methodology in scientific papers---for instance, by commenting on the method in the abstract and by extending the methods section in the body of the paper. In stark contrast, our current results should make us question some decisions by scientific publishers such as rendering the methods section in a smaller font than the results section, or placing the method section at the end of the paper. In a nutshell, it may be useful to ensure that scientists are aware that research findings are fundamentally dependent on research methods.

### Statistical power analysis

We analysed the statistical power associated with several effects of interest, across various sample sizes. The results of this power analysis can help determine the number of participants required to reliably examine each of these effects in a future study. Importantly, the results assume two conditions. First, the future study would apply a statistical method similar to ours---namely, a mixed-effects model with random intercepts and slopes. Second, the analysis of the future study would encompass at least `r length(unique(semanticpriming$primeword_targetword)) %>% formattable::comma(digits = 0)` prime--target pairs (distributed in various blocks across participants, not all being presented to every participant).

First, the results revealed that detecting the main effect of language-based similarity would require 50 participants. Next, detecting the interaction between language-based similarity and SOA would require 600 participants. Last, the other effects would require more than 1,000 participants---or, in the case of gender differences, many more than that. 

\clearpage


# Study 2.2: Semantic decision

The semantic decision task probes the role of concreteness in conceptual processing. Specifically, this task requires participants to classify words as abstract or concrete, which elicits deeper semantic processing than the task of identifying word forms (i.e., lexical decision). Researchers then analyse whether the responses can be explained by the sensory experientiality of the referents---that is, the degree to which they can be experienced through our senses---and by other variables, such as word frequency. The core data set in this study was that of the Calgary Semantic Decision Project [@pexman2017a; @pexman2018a]. The experimental task is semantic decision, in which participants judge whether words are primarily abstract (e.g., *thought*) or concrete (e.g., *building*). 

Research has found that the processing of relatively concrete words relies considerably on sensorimotor information [@hultenNeuralRepresentationAbstract2021; @kousta2011a; @vigliocco2014a]. In contrast, the processing of relatively abstract words seems to draw more heavily on information from language [@barca2020a; @dunabeitiaQualitativeDifferencesRepresentation2009; @snefjella2020a], emotion [@kousta2011a; @ponari2018a; @ponari2018b; @ponariRoleEmotionalValence2020; @vigliocco2014a], interoception [@connellInteroceptionForgottenModality2018] and social information [@borghiWordsSocialTools2019; @borghiAbstractConceptsExternal2022; @diveicaQuantifyingSocialSemantics2022].


## Methods

### Data set

```{r}

# Calculate some of the sample sizes to be reported in the paragraph below

# Number of words per participant.
# Save mean as integer and SD rounded while keeping trailing zeros
semanticdecision_mean_words_per_participant = 
  semanticdecision %>% group_by(Participant) %>% 
  summarise(length(unique(Word))) %>% 
  select(2) %>% unlist %>% mean %>% round(0)

semanticdecision_SD_words_per_participant = 
  semanticdecision %>% group_by(Participant) %>% 
  summarise(length(unique(Word))) %>% 
  select(2) %>% unlist %>% sd %>% sprintf('%.2f', .)

# Number of participants per word.
# Save mean as integer and SD rounded while keeping trailing zeros
semanticdecision_mean_participants_per_word = 
  semanticdecision %>% group_by(Word) %>% 
  summarise(length(unique(Participant))) %>% 
  select(2) %>% unlist %>% mean %>% round(0)

semanticdecision_SD_participants_per_word = 
  semanticdecision %>% group_by(Word) %>% 
  summarise(length(unique(Participant))) %>% 
  select(2) %>% unlist %>% sd %>% sprintf('%.2f', .)

```


The data set was trimmed by removing rows that lacked values on any variable, and by also removing RTs that were more than 3 standard deviations away from the mean.^[In the removal of missing values, six participants whose gender appeared as 'NA' were inadvertently removed from the data set.] The standard deviation trimming was performed within participants and within trial blocks, as done in the Calgary Semantic Decision Project [@pexman2017a]. The resulting data set contained `r length(unique(semanticdecision$Participant))` participants, `r length(unique(semanticdecision$Word)) %>% formattable::comma(digits = 0)` words and `r length(semanticdecision$z_RTclean) %>% formattable::comma(digits = 0)` RTs. On average, there were `r semanticdecision_mean_words_per_participant` words per participant ($SD$ = `r semanticdecision_SD_words_per_participant`), and conversely, `r semanticdecision_mean_participants_per_word` participants per word ($SD$ = `r semanticdecision_SD_participants_per_word`).

### Variables

While the variables are outlined in the [\underline{general introduction}](#present-studies), a few further details are provided below regarding some of them.

#### Vocabulary size 

In the vocabulary test used by @pexman2017a, participants were presented with 35 rare words with irregular pronunciations (e.g., *gaoled*, *ennui*), and they were asked to read the words aloud [also see @pexman2018a]. When they pronounced a word correctly, it was inferred that they knew the word. This test was based on NAART35, a short version of the North American Adult Reading Test [@uttlNorthAmericanAdult2002].

#### Word co-occurrence

@wingfieldUnderstandingRoleLinguistic2022 reanalysed the data from @pexman2017a using language-based variables that are more related to the language system than to the visual system. The task used by Pexman et al. was semantic decision, in which participants assessed whether words were abstract or concrete. Wingfield and Connell found that the variables that best explained RTs were word co-occurrence measures. Specifically, one of these variables was the corpus distance between each stimulus word and the word 'abstract'. The other variable was the corpus distance between each stimulus word and the word 'concrete'. Wingfield and Connell studied these distance measures in various forms, and found that cosine and correlation distance yielded the best results. We used the correlation distances following the advice of @kiela2014a.

The zero-order correlation between Wingfield and Connell's (2022) distance to 'abstract' and distance to 'concrete' was $r$ = `r cor(semanticdecision$Conditional_probability_BNC_r5_correlation_abstract_distance, semanticdecision$Conditional_probability_BNC_r5_correlation_concrete_distance) %>% sprintf('%.2f', .) %>% sub('^(-)?0[.]', '\\1.', .)`. To avoid the collinearity between these variables in the model [@dormannCollinearityReviewMethods2013; @harrison2018a], and to facilitate the analysis of interactions with other variables, we created a difference score by subtracting the distance to 'abstract' from the distance to 'concrete'. This new variable was named 'word co-occurrence'. As shown in Figure \@ref(fig:semanticdecision-cooccurrence-correlations), the correlation between word co-occurrence and word concreteness was twice as large as the correlation between either form of the distance and word concreteness. This suggested that the difference score had successfully encapsulated the information of both distances.

```{r semanticdecision-cooccurrence-correlations, fig.cap = "Zero-order correlations among Wingfield and Connell's (2022) distances, the difference score (word co-occurrence) and word concreteness (Brysbaert et al., 2014).", fig.width = 7, fig.height = 2.2, out.width = '63%'}

# Using the following variables...
semanticdecision[, c('word_concreteness', 'word_cooccurrence',
                     'Conditional_probability_BNC_r5_correlation_concrete_distance',
                     'Conditional_probability_BNC_r5_correlation_abstract_distance')] %>%
  
  # renamed for the sake of clarity
  rename('Word concreteness' = word_concreteness, 
         'Word co-occurrence' = word_cooccurrence,
         "Distance to 'concrete'" = Conditional_probability_BNC_r5_correlation_concrete_distance,
         "Distance to 'abstract'" = Conditional_probability_BNC_r5_correlation_abstract_distance) %>%
  
  # make correlation matrix (custom function from the 'R_functions' folder)
  correlation_matrix() + 
  theme(plot.margin = unit(c(0, -0.5, 0.05, -3.78), 'in'))

```


A few details regarding the covariates follow.

- `Information uptake` was included as a measure akin to general cognition, and specifically as a covariate of vocabulary size [@ratcliff2010a; also see @james2018a; @pexman2018a]. Information uptake was effectively the drift rate per participant in @pexman2018a. This drift rate measured participants' ability to correctly and quickly perform the semantic decision task, in which they classified words as abstract or concrete [for graphical illustrations, see @lercheDiffusionModelingIntelligence2020; @vanravenzwaaijOptimalDecisionMaking2012]. In other words, drift rate measures an individual's ability  [@lercheDiffusionModelingIntelligence2020; @pexman2018a].

- Lexical covariates (see [\underline{Appendix A}](#appendix-A-lexical-covariates)): `word frequency` and `orthographic Levenshtein distance` [@balota2007a].

- `Word concreteness` [@brysbaert2014a]: a fundamental variable in the semantic decision task, in which participants judge whether words are abstract or concrete [for further considerations, see @bottiniConcretenessAdvantageLexical2021]. Indeed, owing to the instructions of the task, word concreteness is likely to be more relevant to the participants' task than our effects of interest.

Figure \@ref(fig:semanticdecision-correlations) shows the correlations among the predictors and the dependent variable.

```{r semanticdecision-correlations, fig.cap = 'Zero-order correlations in the semantic decision study.', fig.width = 7, fig.height = 4.5, out.width = '61%'}

# Using the following variables...
semanticdecision[, c('z_RTclean', 'z_vocabulary_size', 
                     'z_information_uptake', 'z_word_cooccurrence', 
                     'z_visual_rating', 'z_word_concreteness', 
                     'z_word_frequency', 
                     'z_orthographic_Levenshtein_distance')] %>%
  
  # renamed for the sake of clarity
  rename('RT' = z_RTclean, 
         'Vocabulary size' = z_vocabulary_size,
         'Information uptake' = z_information_uptake,
         "Word co-occurrence" = z_word_cooccurrence,
         'Visual strength' = z_visual_rating,
         'Word concreteness' = z_word_concreteness,
         'Word frequency' = z_word_frequency,
         'Orthographic Levenshtein distance' = z_orthographic_Levenshtein_distance) %>%
  
  # make correlation matrix (custom function from the 'R_functions' folder)
  correlation_matrix() + 
  theme(plot.margin = unit(c(0, 0, 0.1, -3.1), 'in'))

```


### Diagnostics for the frequentist analysis

The model presented convergence warnings. To avoid removing important random slopes, which could increase the Type I error rate---i.e., false positives [@brauer2018a; @singmann2019a], we examined the model after refitting it using seven optimisation algorithms through the 'allFit' function of the 'lme4' package [@batesPackageLme42021]. The results showed that all optimisers produced virtually identical means for all effects, suggesting that the convergence warnings were not consequential (Bates et al., 2021; see [\underline{Appendix B}](#appendix-B-frequentist-analysis-diagnostics)).

```{r}

# Calculate VIF for every predictor and return only the maximum VIF rounded up
maxVIF_semanticdecision = car::vif(semanticdecision_lmerTest) %>% max %>% ceiling

```

The residual errors were not normally distributed, and attempts to mitigate this deviation proved unsuccessful (see [\underline{Appendix B}](#appendix-B-frequentist-analysis-diagnostics)). However, this is not likely to have posed a major problem, as mixed-effects models are fairly robust to deviations from normality [@kniefViolatingNormalityAssumption2021; @schielzethRobustnessLinearMixed2020]. Last, the model did not present multicollinearity problems, with all VIFs below `r maxVIF_semanticdecision` [see @dormannCollinearityReviewMethods2013; @harrison2018a].

### Diagnostics for the Bayesian analysis

```{r}

# Calculate number of post-warmup draws (as in 'brms' version 2.17.0).
# Informative prior model used but numbers are identical in the three models.
semanticdecision_post_warmup_draws = 
  (semanticdecision_summary_informativepriors_exgaussian$iter -
     semanticdecision_summary_informativepriors_exgaussian$warmup) *
  semanticdecision_summary_informativepriors_exgaussian$chains

# As a convergence diagnostic, find maximum R-hat value for the 
# fixed effects across the three models.
semanticdecision_fixedeffects_max_Rhat = 
  max(semanticdecision_summary_informativepriors_exgaussian$fixed$Rhat,
      semanticdecision_summary_weaklyinformativepriors_exgaussian$fixed$Rhat,
      semanticdecision_summary_diffusepriors_exgaussian$fixed$Rhat) %>% 
  # Round
  sprintf('%.2f', .)

# Next, find find maximum R-hat value for the random effects across the three models
semanticdecision_randomeffects_max_Rhat = 
  max(semanticdecision_summary_informativepriors_exgaussian$random[['Participant']]$Rhat,
      semanticdecision_summary_weaklyinformativepriors_exgaussian$random[['Participant']]$Rhat,
      semanticdecision_summary_diffusepriors_exgaussian$random[['Participant']]$Rhat,
      semanticdecision_summary_informativepriors_exgaussian$random[['Word']]$Rhat,
      semanticdecision_summary_weaklyinformativepriors_exgaussian$random[['Word']]$Rhat,
      semanticdecision_summary_diffusepriors_exgaussian$random[['Word']]$Rhat) %>% 
  # Round
  sprintf('%.2f', .)

```


Three Bayesian models were run that were respectively characterised by informative, weakly-informative and diffuse priors. In each model, `r semanticdecision_summary_informativepriors_exgaussian$chains` chains were used. In each chain, `r semanticdecision_summary_informativepriors_exgaussian$warmup %>% formattable::comma(digits = 0)` warmup iterations were run, followed by `r (semanticdecision_summary_informativepriors_exgaussian$iter - semanticdecision_summary_informativepriors_exgaussian$warmup) %>% formattable::comma(digits = 0)` post-warmup iterations. Thus, a total of `r semanticdecision_post_warmup_draws %>% formattable::comma(digits = 0)` post-warmup draws were produced over all the chains. 

The maximum $\widehat R$ value for the fixed effects across the three models was `r semanticdecision_fixedeffects_max_Rhat`, far exceeding the 1.01 threshold [@vehtariRanknormalizationFoldingLocalization2021; also see @schootBayesianStatisticsModelling2021]. Similarly, the maximum $\widehat R$ value for the random effects was `r semanticdecision_randomeffects_max_Rhat`. Furthermore, the posterior predictive checks revealed major divergences between the observed data and the posterior distributions (see [\underline{Appendix C}](#appendix-C-Bayesian-analysis-diagnostics)). In conclusion, since the Bayesian results were not valid, they are not shown in the main text, but are available in [\underline{Appendix E}](#appendix-E-Bayesian-analysis-results).


## Results of Study 2.2

```{r}

# Calculate R^2. This coefficient must be interpreted with caution 
# (Nakagawa et al., 2017; https://doi.org/10.1098/rsif.2017.0213). 
# Also, transform coefficient to rounded percentage.

Nakagawa2017_fixedeffects_R2_semanticdecision_lmerTest = 
  paste0(
    (MuMIn::r.squaredGLMM(semanticdecision_lmerTest)[1, 'R2m'][[1]] * 100) %>% 
      sprintf('%.2f', .), '%'
  )

Nakagawa2017_randomeffects_R2_semanticdecision_lmerTest = 
  paste0(
    (MuMIn::r.squaredGLMM(semanticdecision_lmerTest)[1, 'R2c'][[1]] * 100) %>% 
      sprintf('%.2f', .), '%'
  )

```


Table \@ref(tab:semanticdecision-frequentist-model) presents the results. The fixed effects explained `r Nakagawa2017_fixedeffects_R2_semanticdecision_lmerTest` of the variance, and the random effects explained `r Nakagawa2017_randomeffects_R2_semanticdecision_lmerTest` (Nakagawa et al., 2017; for an explanation of this difference, see [\underline{Results of Study 2.1}](#semanticpriming-results)). Both word co-occurrence and visual strength produced significant main effects. Higher values of these variables facilitated participants' performance, as reflected in shorter RTs. Furthermore, visual strength interacted with vocabulary size. There were no effects of participants' gender (see interaction figures below).

The effect sizes of word co-occurrence and its interactions were larger than those of visual strength. Figure \@ref(fig:semanticdecision-confidence-intervals-plot) displays these estimates.^[Only frequentist estimates shown, as Bayesian ones were not valid (see [\underline{Appendix E}](#appendix-E-Bayesian-analysis-results)).] 

```{r semanticdecision-frequentist-model, results = 'asis'}

# Rename effects in plain language and specify the random slopes
# (if any) for each effect, in the footnote. For this purpose, 
# superscripts are added to the names of the appropriate effects.
# 
# In the interactions below, word-level variables are presented 
# first for the sake of consistency (the order does not affect 
# the results in any way). Also in the interactions, double 
# colons are used to inform the 'frequentist_model_table' 
# function that the two terms in the interaction must be split 
# into two lines.

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_information_uptake'] = 'Information uptake'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_vocabulary_size'] = 'Vocabulary size $^{\\text{a}}$'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_recoded_participant_gender'] = 'Gender $^{\\text{a}}$'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_word_frequency'] = 'Word frequency'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_orthographic_Levenshtein_distance'] = 'Orthographic Levenshtein distance'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_word_concreteness'] = 'Word concreteness'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_word_cooccurrence'] = "Word co-occurrence $^{\\text{b}}$"

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_visual_rating'] = 'Visual strength $^{\\text{b}}$'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_word_concreteness:z_vocabulary_size'] = 
  'Word concreteness : Vocabulary size'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_word_concreteness:z_recoded_participant_gender'] = 
  'Word concreteness : Gender'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_information_uptake:z_word_cooccurrence'] = 
  "Word co-occurrence : Information uptake"

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_information_uptake:z_visual_rating'] = 
  'Visual strength : Information uptake'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_vocabulary_size:z_word_cooccurrence'] = 
  "Word co-occurrence : Vocabulary size"

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_vocabulary_size:z_visual_rating'] = 
  'Visual strength : Vocabulary size'

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_recoded_participant_gender:z_word_cooccurrence'] = 
  "Word co-occurrence : Gender"

rownames(KR_summary_semanticdecision_lmerTest$coefficients)[
  rownames(KR_summary_semanticdecision_lmerTest$coefficients) == 
    'z_recoded_participant_gender:z_visual_rating'] = 
  'Visual strength : Gender'


# Next, change the names in the confidence intervals object

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_information_uptake'] = 'Information uptake'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_vocabulary_size'] = 'Vocabulary size $^{\\text{a}}$'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_recoded_participant_gender'] = 'Gender $^{\\text{a}}$'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_word_frequency'] = 'Word frequency'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_orthographic_Levenshtein_distance'] = 'Orthographic Levenshtein distance'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_word_concreteness'] = 'Word concreteness'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_word_cooccurrence'] = "Word co-occurrence $^{\\text{b}}$"

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_visual_rating'] = 'Visual strength $^{\\text{b}}$'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_word_concreteness:z_vocabulary_size'] = 
  'Word concreteness : Vocabulary size'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_word_concreteness:z_recoded_participant_gender'] = 
  'Word concreteness : Gender'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_information_uptake:z_word_cooccurrence'] = 
  "Word co-occurrence : Information uptake"

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_information_uptake:z_visual_rating'] = 
  'Visual strength : Information uptake'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_vocabulary_size:z_word_cooccurrence'] = 
  "Word co-occurrence : Vocabulary size"

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_vocabulary_size:z_visual_rating'] = 
  'Visual strength : Vocabulary size'

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_recoded_participant_gender:z_word_cooccurrence'] = 
  "Word co-occurrence : Gender"

rownames(confint_semanticdecision_lmerTest)[
  rownames(confint_semanticdecision_lmerTest) == 
    'z_recoded_participant_gender:z_visual_rating'] = 
  'Visual strength : Gender'


# Create table (using custom function from the 'R_functions' folder)
frequentist_model_table(
  KR_summary_semanticdecision_lmerTest, 
  confint_semanticdecision_lmerTest,
  order_effects = c('(Intercept)',
                    'Information uptake',
                    'Vocabulary size $^{\\text{a}}$',
                    'Gender $^{\\text{a}}$',
                    'Word frequency',
                    'Orthographic Levenshtein distance',
                    'Word concreteness',
                    "Word co-occurrence $^{\\text{b}}$",
                    'Visual strength $^{\\text{b}}$',
                    'Word concreteness : Vocabulary size',
                    'Word concreteness : Gender',
                    "Word co-occurrence : Information uptake",
                    'Visual strength : Information uptake',
                    "Word co-occurrence : Vocabulary size",
                    'Visual strength : Vocabulary size',
                    "Word co-occurrence : Gender",
                    'Visual strength : Gender'),
  interaction_symbol_x = TRUE,
  caption = 'Frequentist model for the semantic decision study.') %>%
  # kable_styling(latex_options = 'scale_down') %>%
  
  # Group predictors under headings
  pack_rows('Individual differences', 2, 4) %>% 
  pack_rows('Lexicosemantic covariates', 5, 7) %>% 
  pack_rows('Semantic variables', 8, 9) %>% 
  pack_rows('Interactions', 10, 17) %>% 
  
  # Place table close to designated position and highlight covariates
  kable_styling(latex_options = c('hold_position', 'striped'), 
                stripe_index = c(2, 5:7, 10:13)) %>%
  
  # Footnote describing abbreviations, random slopes, etc. 
  # LaTeX code used to format the text.
  footnote(escape = FALSE, threeparttable = TRUE, general_title = '\\\\linebreak', 
           general = paste('\\\\textit{Note}. $\\\\upbeta$ = Estimate based on $z$-scored predictors; \\\\textit{SE} = standard error;',
                           'CI = confidence interval. Shaded rows contain covariates. \\\\linebreak', 
                           '$^{\\\\text{a}}$ By-word random slopes were included for this effect.',
                           '$^{\\\\text{b}}$ By-participant random slopes were included for this effect.', 
                           # After first line in the footnote, begin next lines with a dot-sized indent to correct default error.
                           sep = ' \\\\linebreak \\\\phantom{.}'))

```


\FloatBarrier

```{r semanticdecision-confidence-intervals-plot, fig.cap = 'Means and 95\\% confidence intervals for the effects of interest in the semantic decision study.'}

# Run plot through source() rather than directly in this R Markdown document
# to preserve the format.

source('semanticdecision/frequentist_analysis/semanticdecision_confidence_intervals_plot.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticdecision/frequentist_analysis/plots/semanticdecision_confidence_intervals_plot.pdf'
  ))

```


Figure \@ref(fig:semanticdecision-interactions-with-vocabulary-size)-a shows the non-significant interaction between word co-occurrence and vocabulary size, whereby lower-vocabulary participants were more sensitive to word co-occurrence than higher-vocabulary participants. Next, Figure \@ref(fig:semanticdecision-interactions-with-vocabulary-size)-b shows the significant interaction between visual strength and vocabulary size, demonstrating that lower-vocabulary participants were also more sensitive to visual strength. Last, Figure \@ref(fig:semanticdecision-interactions-with-vocabulary-size)-c shows the significant interaction between word concreteness and vocabulary size, whereby higher-vocabulary participants were more sensitive to word concreteness than lower-vocabulary participants. Word concreteness is likely the most relevant variable for the semantic decision task, in which participants classify words as abstract or concrete. In conclusion, these interactions suggest that higher-vocabulary participants were better able to focus on the most relevant information, whereas lower-vocabulary participants were 
sensitive to a greater breadth of information [see @lim2020a; @pexman2018a; @yap2012a; @yap2017a; @yapIndividualDifferencesJoint2009].

(ref:semanticdecision-interactions-with-vocabulary-size) Interactions of vocabulary size with language-based information (panel a), with visual strength (panel b) and with word concreteness (panel c) in the semantic decision study. Vocabulary size is constrained to deciles in this plot, whereas in the statistical analysis it contained more values within the current range. $n$ = number of participants contained between deciles.

```{r semanticdecision-interactions-with-vocabulary-size, fig.cap = '(ref:semanticdecision-interactions-with-vocabulary-size)', out.width = '98%'}

# Run plot through source() rather than directly in this R Markdown document 
# to preserve the italicised text.

source('semanticdecision/frequentist_analysis/semanticdecision-interactions-with-vocabulary-size.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticdecision/frequentist_analysis/plots/semanticdecision-interactions-with-vocabulary-size.pdf'
  ))

```


A continuous measure of word concreteness was used in the present study. In contrast, @pexman2018a split the data set it into a subset with abstract words and another subset with concrete words, and they analysed these subsets separately. Pexman and Yap found that high-vocabulary participants were more sensitive to the relative abstractness of words. Specifically, these participants were faster to classify very abstract words than mid-abstract ones, thus presenting a reverse concreteness effect [also see @bonnerReversalConcretenessEffect2009]. Such a reverse effect might stem from the bimodal distributions that have appeared in concreteness ratings [@brysbaert2014a] and in semantic decisions [@pexman2018a], or it might be due to confounding variables [@hoffmanReverseConcretenessEffects2011]. Notwithstanding the bimodal distributions, @trocheDefiningConceptualTopography2017 suggested that a continuous analysis remained necessary to study word concreteness [also see @cohenCostDichotomization1983]. Consistent with this, our present findings demonstrated the sensitivity of a continuous word concreteness variable to patterns such as the greater role of task-relevant variables in high-vocabulary participants. In conclusion, the literature and our findings suggest that the split-data approach and the continuous approach to word concreteness are both useful. Where it is feasible, the application of both approaches would provide the greatest information.

Figure \@ref(fig:semanticdecision-interactions-with-gender) shows the interactions with gender. The interactions of interest, in panels a and b, were non-significant.^[Further interaction plots available in [\underline{Appendix D}](#appendix-D-interaction-plots).]

```{r semanticdecision-interactions-with-gender, fig.cap = 'Interactions of gender with word co-occurrence (panel a), with visual strength (panel b) and with word concreteness (panel c) in the semantic decision study. Gender was analysed using $z$-scores, but for clarity, the basic labels are used in the legend.', out.width = '98%'}

# Run plot through source() rather than directly in this R Markdown document 
# to preserve the italicised text.

source('semanticdecision/frequentist_analysis/semanticdecision-interactions-with-gender.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticdecision/frequentist_analysis/plots/semanticdecision-interactions-with-gender.pdf'
  ))

```


### Statistical power analysis

Figures \@ref(fig:semanticdecision-powercurve-plots-1-2-3) and \@ref(fig:semanticdecision-powercurve-plots-4-5-6-7) show the estimated power for some main effects and interactions of interest as a function of the number of participants. To plan the sample size for future studies, these results must be considered under the assumptions that the future study would apply a statistical method similar to ours---namely, a mixed-effects model with random intercepts and slopes---, and that the analysis would encompass at least as many words as the current study, namely, `r length(unique(semanticdecision$Word)) %>% formattable::comma(digits = 0)` (distributed in various blocks across participants, not all being presented to every participant). Furthermore, it is necessary to consider each figure in detail. Here, we provide a summary. First, detecting the main effect of word co-occurrence would require 300 participants. Second, detecting the main effect of visual strength would require 1,200 participants. Third, detecting the interactions of word co-occurrence and visual strength with vocabulary size would require 1,500 participants. Last, detecting the other effects would require more than 2,000 participants---or, in the case of gender differences, many more than that. 

```{r semanticdecision-powercurve-plots-1-2-3, fig.cap = 'Power curves for some main effects in the semantic decision study.'}

# Run plot through source() rather than directly in this R Markdown document 
# to preserve the italicised text.
source('semanticdecision/power_analysis/semanticdecision_all_powercurves.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticdecision/power_analysis/plots/semanticdecision_powercurve_plots_1_2_3.pdf'
  ))

```

```{r semanticdecision-powercurve-plots-4-5-6-7, fig.cap = 'Power curves for some interactions in the semantic decision study.'}

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/semanticdecision/power_analysis/plots/semanticdecision_powercurve_plots_4_5_6_7.pdf'
  ))

```


## Discussion of Study 2.2

The results revealed a significant, facilitatory effect of `word co-occurrence` and a smaller but significant, facilitatory effect of `visual strength`. That is, higher values of these variables resulted in shorter RTs. Furthermore, there were significant interactions. First, language-based priming was larger in higher-vocabulary participants than in lower-vocabulary ones. Second, both language-based priming and vision-based priming were larger with the short SOA than with the long one. Thus far, these results broadly replicated those of @petilli2021a. As in Study 2.1, vision-based information had a significant effect. This was to be expected, as semantic decision is likely to engage deeper semantic processing. Last, no effect of gender was found. Below, we delve into some other aspects of these results.

### Statistical power analysis

We analysed the statistical power associated with several effects of interest, across various sample sizes. The results of this power analysis can help determine the number of participants required to reliably examine each of these effects in a future study. Importantly, the results assume two conditions. First, the future study would apply a statistical method similar to ours---namely, a mixed-effects model with random intercepts and slopes. Second, the analysis of the future study would encompass at least `r length(unique(semanticdecision$Word)) %>% formattable::comma(digits = 0)` stimulus words (distributed in various blocks across participants, not all being presented to every participant).

First, the results revealed that detecting the main effect of word co-occurrence would require 300 participants. Next, detecting the interactions with vocabulary size would require 1,500 participants. Last, detecting the other effects would require more than 2,000 participants---or, in the case of gender differences, many more than that.


\clearpage


# Study 2.3: Lexical decision {#lexicaldecision}

The core data set in this study was the lexical decision subset of the English Lexicon Project (ELP; Balota et al., 2007). As in Study 2.1, we limited our analysis to the lexical decision task because it was more relevant to a subsequent study that we were planning. The lexical decision task differs from semantic priming and semantic decision in two important aspects. First, lexical decision is likely to involve less semantic processing than the other paradigms [@balotaDepthAutomaticSpreading1986; @beckerLongtermSemanticPriming1997; @dewitMaskedSemanticPriming2015; @joordensLongShortSemantic1997; @murakiSimulatingSemanticsAre2021; @ostarekTaskdependentCausalRole2017].

Second, it is more difficult in the lexical decision task to create word-to-word distance measures to capture language-based and vision-based information. The possibility of calculating the distance between words in consecutive trials is hindered by the need to skip trials, owing to the high prevalence of nonword trials throughout the lexical decision task. Therefore, the measures must be based on each word alone. Accordingly, vision-based information can be operationalised as the visual strength of each word. Language-based information could be operationalised as one of several lexical variables. In the present study, word frequency was chosen as it had the largest effect size out of five candidates---the other candidates being number of letters, number of syllables, orthographic Levenshtein distance and phonological Levenshtein distance (see [\underline{Appendix A}](#appendix-A-lexical-covariates)). It should also be noted that word frequency has been found to be more closely related to semantic variables than to lexical ones, such as word length, orthography and phonology [see Table 4 in @yap2012a]. Another noteworthy feature of word frequency how it relates to vocabulary size across different paradigms. In lexical decision, the effect of word frequency has been stronger in higher-vocabulary participants than in lower-vocabulary ones [@lim2020a; @yap2012a]. In contrast, the opposite pattern has emerged in deeper semantic tasks, such as semantic priming [@yap2017a] and semantic decision [@pexman2018a].


## Methods

### Data set

```{r}

# Calculate some of the sample sizes to be reported in the paragraph below

# Number of words per participant.
# Save mean as integer and SD rounded while keeping trailing zeros
lexicaldecision_mean_words_per_participant = 
  lexicaldecision %>% group_by(Participant) %>% 
  summarise(length(unique(word))) %>% 
  select(2) %>% unlist %>% mean %>% round(0)

lexicaldecision_SD_words_per_participant = 
  lexicaldecision %>% group_by(Participant) %>% 
  summarise(length(unique(word))) %>% 
  select(2) %>% unlist %>% sd %>% sprintf('%.2f', .)

# Number of participants per word.
# Save mean as integer and SD rounded while keeping trailing zeros
lexicaldecision_mean_participants_per_word = 
  lexicaldecision %>% group_by(word) %>% 
  summarise(length(unique(Participant))) %>% 
  select(2) %>% unlist %>% mean %>% round(0)

lexicaldecision_SD_participants_per_word = 
  lexicaldecision %>% group_by(word) %>% 
  summarise(length(unique(Participant))) %>% 
  select(2) %>% unlist %>% sd %>% sprintf('%.2f', .)

```


The data set was trimmed by removing rows that lacked values on any variable, and by also removing RTs that were more than 3 standard deviations away from the mean. The standard deviation trimming was performed within participants, as done in the English Lexicon Project [@balota2007a]. The resulting data set contained `r length(unique(lexicaldecision$Participant))` participants, `r length(unique(lexicaldecision$word)) %>% formattable::comma(digits = 0)` words and `r length(lexicaldecision$z_RT) %>% formattable::comma(digits = 0)` RTs. On average, there were `r lexicaldecision_mean_words_per_participant` words per participant ($SD$ = `r lexicaldecision_SD_words_per_participant`), and conversely, `r lexicaldecision_mean_participants_per_word` participants per word ($SD$ = `r lexicaldecision_SD_participants_per_word`).

Figure \@ref(fig:lexicaldecision-correlations) shows the correlations among the predictors and the dependent variable.

```{r lexicaldecision-correlations, fig.cap = 'Zero-order correlations in the lexical decision study.', fig.width = 6, fig.height = 3.5, out.width = '52%'}

# Using the following variables...
lexicaldecision[, c('z_RT', 'z_vocabulary_age', 'z_word_frequency', 
                    'z_visual_rating', 'z_word_concreteness', 
                    'z_orthographic_Levenshtein_distance')] %>%
  
  # renamed for the sake of clarity
  rename('RT' = z_RT, 
         'Vocabulary age' = z_vocabulary_age, 
         'Word frequency' = z_word_frequency,
         'Visual strength' = z_visual_rating,
         'Word concreteness' = z_word_concreteness,
         'Orthographic Levenshtein distance' = z_orthographic_Levenshtein_distance) %>%
  
  # make correlation matrix (custom function from the 'R_functions' folder)
  correlation_matrix() + 
  theme(plot.margin = unit(c(0, 0, 0.1, -2), 'in'))

```


### Variables

While the variables are outlined in the [\underline{general introduction}](#present-studies), a few further details are provided below regarding some of them.

- `Vocabulary age`: the present study uses the name vocabulary *age*, as used in the study of @balota2007a. It measures the same linguistic experience as vocabulary size.

A few details regarding the covariates follow.

- General cognition covariate: unlike in the two previous studies, the present study did not include a general cognition covariate as such a variable was not available in the data set of @balota2007a.

- Lexical covariates (see preselection in [\underline{Appendix A}](#appendix-A-lexical-covariates)): `orthographic Levenshtein distance` [@balota2007a].

- `Word concreteness` [@brysbaert2014a], used as a covariate of visual strength.

### Diagnostics for the frequentist analysis

The model presented convergence warnings. To avoid removing important random slopes, which could increase the Type I error rate---i.e., false positives [@brauer2018a; @singmann2019a], we examined the model after refitting it using seven optimisation algorithms through the 'allFit' function of the 'lme4' package [@batesPackageLme42021]. The results showed that all optimisers produced virtually identical means for all effects, suggesting that the convergence warnings were not consequential (Bates et al., 2021; see [\underline{Appendix B}](#appendix-B-frequentist-analysis-diagnostics)).

```{r}

# Calculate VIF for every predictor and return only the maximum VIF rounded up
maxVIF_lexicaldecision = car::vif(lexicaldecision_lmerTest) %>% max %>% ceiling

```

The residual errors were not normally distributed, and attempts to mitigate this deviation proved unsuccessful (see [\underline{Appendix B}](#appendix-B-frequentist-analysis-diagnostics)). However, this is not likely to have posed a major problem, as mixed-effects models are fairly robust to deviations from normality [@kniefViolatingNormalityAssumption2021; @schielzethRobustnessLinearMixed2020]. Last, the model did not present multicollinearity problems, with all VIFs below `r maxVIF_lexicaldecision` [see @dormannCollinearityReviewMethods2013; @harrison2018a].

### Diagnostics for the Bayesian analysis

```{r}

# Calculate number of post-warmup draws (as in 'brms' version 2.17.0).
# Informative prior model used but numbers are identical in the three models.
lexicaldecision_post_warmup_draws = 
  (lexicaldecision_summary_informativepriors_exgaussian$iter -
     lexicaldecision_summary_informativepriors_exgaussian$warmup) *
  lexicaldecision_summary_informativepriors_exgaussian$chains

# As a convergence diagnostic, find maximum R-hat value for the 
# fixed effects across the three models.
lexicaldecision_fixedeffects_max_Rhat = 
  max(lexicaldecision_summary_informativepriors_exgaussian$fixed$Rhat,
      lexicaldecision_summary_weaklyinformativepriors_exgaussian$fixed$Rhat,
      lexicaldecision_summary_diffusepriors_exgaussian$fixed$Rhat) %>% 
  # Round
  sprintf('%.2f', .)

# Next, find find maximum R-hat value for the random effects across the three models
lexicaldecision_randomeffects_max_Rhat = 
  max(lexicaldecision_summary_informativepriors_exgaussian$random[['Participant']]$Rhat,
      lexicaldecision_summary_weaklyinformativepriors_exgaussian$random[['Participant']]$Rhat,
      lexicaldecision_summary_diffusepriors_exgaussian$random[['Participant']]$Rhat,
      lexicaldecision_summary_informativepriors_exgaussian$random[['word']]$Rhat,
      lexicaldecision_summary_weaklyinformativepriors_exgaussian$random[['word']]$Rhat,
      lexicaldecision_summary_diffusepriors_exgaussian$random[['word']]$Rhat) %>% 
  # Round
  sprintf('%.2f', .)

```


Three Bayesian models were run that were respectively characterised by informative, weakly-informative and diffuse priors. In each model, `r lexicaldecision_summary_informativepriors_exgaussian$chains` chains were used. In each chain, `r lexicaldecision_summary_informativepriors_exgaussian$warmup %>% formattable::comma(digits = 0)` warmup iterations were run, followed by `r (lexicaldecision_summary_informativepriors_exgaussian$iter - lexicaldecision_summary_informativepriors_exgaussian$warmup) %>% formattable::comma(digits = 0)` post-warmup iterations. Thus, a total of `r lexicaldecision_post_warmup_draws %>% formattable::comma(digits = 0)` post-warmup draws were produced over all the chains. 

The maximum $\widehat R$ value for the fixed effects across the three models was `r lexicaldecision_fixedeffects_max_Rhat`, suggesting that these effects hadconverged [@schootBayesianStatisticsModelling2021; @vehtariRanknormalizationFoldingLocalization2021]. For the random effects, the maximum $\widehat R$ value was `r lexicaldecision_randomeffects_max_Rhat`, barely exceeding the 1.01 threshold [@vehtariRanknormalizationFoldingLocalization2021].

The results of the posterior predictive checks were sound (see [\underline{Appendix C}](#appendix-C-Bayesian-analysis-diagnostics)), indicating that the posterior distributions were sufficiently consistent with the observed data. Furthermore, in the prior sensitivity analysis, the results were virtually identical with the three priors that were considered (refer to the priors in Figure \@ref(fig:bayesian-priors) above; to view the results in detail, see [\underline{Appendix E}](#appendix-E-Bayesian-analysis-results)).


## Results of Study 2.3

```{r}

# Calculate R^2. This coefficient must be interpreted with caution 
# (Nakagawa et al., 2017; https://doi.org/10.1098/rsif.2017.0213). 
# Also, transform coefficient to rounded percentage.

Nakagawa2017_fixedeffects_R2_lexicaldecision_lmerTest = 
  paste0(
    (MuMIn::r.squaredGLMM(lexicaldecision_lmerTest)[1, 'R2m'][[1]] * 100) %>% 
      sprintf('%.2f', .), '%'
  )

Nakagawa2017_randomeffects_R2_lexicaldecision_lmerTest = 
  paste0(
    (MuMIn::r.squaredGLMM(lexicaldecision_lmerTest)[1, 'R2c'][[1]] * 100) %>% 
      sprintf('%.2f', .), '%'
  )

```


Table \@ref(tab:lexicaldecision-frequentist-model) presents the results. The fixed effects explained `r Nakagawa2017_fixedeffects_R2_lexicaldecision_lmerTest` of the variance, and the random effects explained `r Nakagawa2017_randomeffects_R2_lexicaldecision_lmerTest` (Nakagawa et al., 2017; for an explanation of this difference, see [\underline{Results of Study 2.1}](#semanticpriming-results)). Word frequency produced a significant main effect, with higher values of variable facilitating participants' performance, as reflected in shorter RTs. None of the other effects of interest were significant.

The effect size of word frequency was far larger than that of visual strength. Figure \@ref(fig:lexicaldecision-frequentist-bayesian-plot-weaklyinformativepriors-exgaussian) displays the frequentist and the Bayesian estimates, which are broadly similar. The Bayesian estimates are from the weakly-informative prior model. The estimates of the two other models, based on informative and diffuse priors, were virtually identical to these (see [\underline{Appendix E}](#appendix-E-Bayesian-analysis-results)).

\FloatBarrier

```{r lexicaldecision-frequentist-model, results = 'asis'}

# Rename effects in plain language and specify the random slopes
# (if any) for each effect, in the footnote. For this purpose, 
# superscripts are added to the names of the appropriate effects.
# 
# In the interactions below, word-level variables are presented 
# first for the sake of consistency (the order does not affect 
# the results in any way). Also in the interactions, double 
# colons are used to inform the 'frequentist_model_table' 
# function that the two terms in the interaction must be split 
# into two lines.

rownames(KR_summary_lexicaldecision_lmerTest$coefficients)[
  rownames(KR_summary_lexicaldecision_lmerTest$coefficients) == 
    'z_vocabulary_age'] = 'Vocabulary age $^{\\text{a}}$'

rownames(KR_summary_lexicaldecision_lmerTest$coefficients)[
  rownames(KR_summary_lexicaldecision_lmerTest$coefficients) == 
    'z_recoded_participant_gender'] = 'Gender $^{\\text{a}}$'

rownames(KR_summary_lexicaldecision_lmerTest$coefficients)[
  rownames(KR_summary_lexicaldecision_lmerTest$coefficients) == 
    'z_orthographic_Levenshtein_distance'] = 'Orthographic Levenshtein distance'

rownames(KR_summary_lexicaldecision_lmerTest$coefficients)[
  rownames(KR_summary_lexicaldecision_lmerTest$coefficients) == 
    'z_word_concreteness'] = 'Word concreteness'

rownames(KR_summary_lexicaldecision_lmerTest$coefficients)[
  rownames(KR_summary_lexicaldecision_lmerTest$coefficients) == 
    'z_word_frequency'] = 'Word frequency $^{\\text{b}}$'

rownames(KR_summary_lexicaldecision_lmerTest$coefficients)[
  rownames(KR_summary_lexicaldecision_lmerTest$coefficients) == 
    'z_visual_rating'] = 'Visual strength $^{\\text{b}}$'

rownames(KR_summary_lexicaldecision_lmerTest$coefficients)[
  rownames(KR_summary_lexicaldecision_lmerTest$coefficients) == 
    'z_word_concreteness:z_vocabulary_age'] = 
  'Word concreteness : Vocabulary age'

rownames(KR_summary_lexicaldecision_lmerTest$coefficients)[
  rownames(KR_summary_lexicaldecision_lmerTest$coefficients) == 
    'z_word_concreteness:z_recoded_participant_gender'] = 
  'Word concreteness : Gender'

rownames(KR_summary_lexicaldecision_lmerTest$coefficients)[
  rownames(KR_summary_lexicaldecision_lmerTest$coefficients) == 
    'z_vocabulary_age:z_word_frequency'] = 
  'Word frequency : Vocabulary age'

rownames(KR_summary_lexicaldecision_lmerTest$coefficients)[
  rownames(KR_summary_lexicaldecision_lmerTest$coefficients) == 
    'z_vocabulary_age:z_visual_rating'] = 
  'Visual strength : Vocabulary age'

rownames(KR_summary_lexicaldecision_lmerTest$coefficients)[
  rownames(KR_summary_lexicaldecision_lmerTest$coefficients) == 
    'z_recoded_participant_gender:z_word_frequency'] = 
  'Word frequency : Gender'

rownames(KR_summary_lexicaldecision_lmerTest$coefficients)[
  rownames(KR_summary_lexicaldecision_lmerTest$coefficients) == 
    'z_recoded_participant_gender:z_visual_rating'] = 
  'Visual strength : Gender'


# Next, change the names in the confidence intervals object

rownames(confint_lexicaldecision_lmerTest)[
  rownames(confint_lexicaldecision_lmerTest) == 
    'z_vocabulary_age'] = 'Vocabulary age $^{\\text{a}}$'

rownames(confint_lexicaldecision_lmerTest)[
  rownames(confint_lexicaldecision_lmerTest) == 
    'z_recoded_participant_gender'] = 'Gender $^{\\text{a}}$'

rownames(confint_lexicaldecision_lmerTest)[
  rownames(confint_lexicaldecision_lmerTest) == 
    'z_orthographic_Levenshtein_distance'] = 'Orthographic Levenshtein distance'

rownames(confint_lexicaldecision_lmerTest)[
  rownames(confint_lexicaldecision_lmerTest) == 
    'z_word_concreteness'] = 'Word concreteness'

rownames(confint_lexicaldecision_lmerTest)[
  rownames(confint_lexicaldecision_lmerTest) == 
    'z_word_frequency'] = 'Word frequency $^{\\text{b}}$'

rownames(confint_lexicaldecision_lmerTest)[
  rownames(confint_lexicaldecision_lmerTest) == 
    'z_visual_rating'] = 'Visual strength $^{\\text{b}}$'

rownames(confint_lexicaldecision_lmerTest)[
  rownames(confint_lexicaldecision_lmerTest) == 
    'z_word_concreteness:z_vocabulary_age'] = 
  'Word concreteness : Vocabulary age'

rownames(confint_lexicaldecision_lmerTest)[
  rownames(confint_lexicaldecision_lmerTest) == 
    'z_word_concreteness:z_recoded_participant_gender'] = 
  'Word concreteness : Gender'

rownames(confint_lexicaldecision_lmerTest)[
  rownames(confint_lexicaldecision_lmerTest) == 
    'z_vocabulary_age:z_word_frequency'] = 
  'Word frequency : Vocabulary age'

rownames(confint_lexicaldecision_lmerTest)[
  rownames(confint_lexicaldecision_lmerTest) == 
    'z_vocabulary_age:z_visual_rating'] = 
  'Visual strength : Vocabulary age'

rownames(confint_lexicaldecision_lmerTest)[
  rownames(confint_lexicaldecision_lmerTest) == 
    'z_recoded_participant_gender:z_word_frequency'] = 
  'Word frequency : Gender'

rownames(confint_lexicaldecision_lmerTest)[
  rownames(confint_lexicaldecision_lmerTest) == 
    'z_recoded_participant_gender:z_visual_rating'] = 
  'Visual strength : Gender'


# Create table (using custom function from the 'R_functions' folder)
frequentist_model_table(
  KR_summary_lexicaldecision_lmerTest, 
  confint_lexicaldecision_lmerTest,
  order_effects = c('(Intercept)',
                    'Vocabulary age $^{\\text{a}}$',
                    'Gender $^{\\text{a}}$',
                    'Orthographic Levenshtein distance',
                    'Word concreteness',
                    'Word frequency $^{\\text{b}}$',
                    'Visual strength $^{\\text{b}}$',
                    'Word concreteness : Vocabulary age',
                    'Word concreteness : Gender',
                    'Word frequency : Vocabulary age',
                    'Visual strength : Vocabulary age',
                    'Word frequency : Gender',
                    'Visual strength : Gender'),
  interaction_symbol_x = TRUE,
  caption = 'Frequentist model for the lexical decision study.') %>%
  # kable_styling(latex_options = 'scale_down') %>%
  
  # Group predictors under headings
  pack_rows('Individual differences', 2, 3) %>% 
  pack_rows('Lexicosemantic covariates', 4, 5) %>% 
  pack_rows('Semantic variables', 6, 7) %>% 
  pack_rows('Interactions', 8, 13) %>% 
  
  # Place table close to designated position and highlight covariates
  kable_styling(latex_options = c('hold_position', 'striped'), 
                stripe_index = c(4:5, 8:9)) %>%
  
  # Footnote describing abbreviations, random slopes, etc. 
  # LaTeX code used to format the text.
  footnote(escape = FALSE, threeparttable = TRUE, general_title = '\\\\linebreak', 
           general = paste('\\\\textit{Note}. $\\\\upbeta$ = Estimate based on $z$-scored predictors; \\\\textit{SE} = standard error;',
                           'CI = confidence interval. Shaded rows contain covariates. \\\\linebreak', 
                           '$^{\\\\text{a}}$ By-word random slopes were included for this effect.',
                           '$^{\\\\text{b}}$ By-participant random slopes were included for this effect.', 
                           # After first line in the footnote, begin next lines with a dot-sized indent to correct default error.
                           sep = ' \\\\linebreak \\\\phantom{.}'))

```


```{r lexicaldecision-frequentist-bayesian-plot-weaklyinformativepriors-exgaussian, fig.cap = 'Estimates for the lexical decision study. The frequentist means (represented by red points) are flanked by 95\\% confidence intervals. The Bayesian means (represented by blue vertical lines) are flanked by 95\\% credible intervals in light blue.'}

# Run plot through source() rather than directly in this R Markdown document
# to preserve the format.

source('lexicaldecision/frequentist_bayesian_plots/lexicaldecision_frequentist_bayesian_plots.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/lexicaldecision/frequentist_bayesian_plots/plots/lexicaldecision_frequentist_bayesian_plot_weaklyinformativepriors_exgaussian.pdf'
  ))

```


Figure \@ref(fig:lexicaldecision-interactions-with-vocabulary-age) presents the interactions of vocabulary age with word frequency and with visual strength, both non-significant. Figure \@ref(fig:lexicaldecision-interactions-with-gender) shows the interactions with gender, both non-significant too.^[Further interaction plots available in [\underline{Appendix D}](#appendix-D-interaction-plots).]

(ref:lexicaldecision-interactions-with-vocabulary-age) Interactions of vocabulary age with word frequency (panel a) and with visual strength (panel b). Vocabulary age is constrained to sextiles (6 sections) in this plot, whereas in the statistical analysis it contained more values within the current range. Sextiles were used because there was not enough data for deciles nor for octiles. $n$ = number of participants contained between sextiles.

```{r lexicaldecision-interactions-with-vocabulary-age, fig.cap = '(ref:lexicaldecision-interactions-with-vocabulary-age)', out.width = '80%'}

# Run plot through source() rather than directly in this R Markdown document 
# to preserve the italicised text.

source('lexicaldecision/frequentist_analysis/lexicaldecision-interactions-with-vocabulary-age.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/lexicaldecision/frequentist_analysis/plots/lexicaldecision-interactions-with-vocabulary-age.pdf'
  ))

```


```{r lexicaldecision-interactions-with-gender, fig.cap = 'Interactions of gender with word frequency (panel a) and with visual strength (panel b) in the lexical decision study. Gender was analysed using $z$-scores, but for clarity, the basic labels are used in the legend.', out.width = '80%'}

# Run plot through source() rather than directly in this R Markdown document 
# to preserve the italicised text.

source('lexicaldecision/frequentist_analysis/lexicaldecision-interactions-with-gender.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/lexicaldecision/frequentist_analysis/plots/lexicaldecision-interactions-with-gender.pdf'
  ))

```


### Statistical power analysis

Figures \@ref(fig:lexicaldecision-powercurve-plots-1-2-3) and \@ref(fig:lexicaldecision-powercurve-plots-4-5-6-7) show the estimated power for some main effects and interactions of interest as a function of the number of participants. To plan the sample size for future studies, these results must be considered under the assumptions that the future study would apply a statistical method similar to ours---namely, a mixed-effects model with random intercepts and slopes---, and that the analysis would encompass at least as many words as the current study, namely, `r length(unique(lexicaldecision$word)) %>% formattable::comma(digits = 0)` (distributed in various blocks across participants, not all being presented to every participant). Furthermore, it is necessary to consider each figure in detail. Here, we provide a summary. First, detecting the main effect of word frequency would require 100 participants. Second, detecting the interactions of word frequency and visual strength with vocabulary size would require 1,500 participants. Third, detecting the other effects would require more than 2,000 participants.

```{r lexicaldecision-powercurve-plots-1-2-3, fig.cap = 'Power curves for some main effects in the lexical decision study.'}

# Run plot through source() rather than directly in this R Markdown document 
# to preserve the italicised text.
source('lexicaldecision/power_analysis/lexicaldecision_all_powercurves.R', 
       local = TRUE)

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/lexicaldecision/power_analysis/plots/lexicaldecision_powercurve_plots_1_2_3.pdf'
  ))

```

```{r lexicaldecision-powercurve-plots-4-5-6-7, fig.cap = 'Power curves for some interactions in the lexical decision study.'}

include_graphics(
  paste0(
    getwd(),  # Circumvent illegal characters in file path
    '/lexicaldecision/power_analysis/plots/lexicaldecision_powercurve_plots_4_5_6_7.pdf'
  ))

```


## Discussion of Study 2.3

In the present study, we have delved into a task that is likely to elicit a shallower level of semantic processing than the tasks from the previous studies. Furthermore, the data set used in this study was considerably smaller (`r length(lexicaldecision$z_RT) %>% formattable::comma(digits = 0)` RTs, compared to `r length(semanticpriming$z_target.RT) %>% formattable::comma(digits = 0)` RTs in Study 2.1 and `r length(semanticdecision$z_RTclean) %>% formattable::comma(digits = 0)` in Study 2.2). The relatively small size of the data set of Study 2.3 was due to the small number of words per participant ($M$ = `r lexicaldecision_mean_words_per_participant`) and participants per word ($M$ = `r lexicaldecision_mean_participants_per_word` participants per word). In this regard, the English Lexicon Project [@balota2007a] prioritised the total number of words included in their archive.

While the covariates presented large effects, none of the effects of interest turned out to be significant or noteworthy. Furthermore, he comparison with the two previous tasks is hindered by the major difference in the size of the data sets. Therefore, while it is reasonable to find smaller semantic effects in the lexical decision task than in the other two, we cannot reliably attribute this difference to the nature of the task. 

As a minor suggestion, future studies could operationalise language using a measure of orthographic neighbourhood size (e.g., orthographic Levenshtein distance), instead of using word frequency as in the present study. While we used word frequency guided by a data-driven selection (see Appendix A), neighbourhood size is a measure created for the purpose of indexing word co-occurrence where only one word is directly available to the researcher [@suarezObservingNeighborhoodEffects2011; @yapVisualWordRecognition2009].


### Statistical power analysis

We analysed the statistical power associated with several effects of interest, across various sample sizes. The results of this power analysis can help determine the number of participants required to reliably examine each of these effects in a future study. Importantly, the results assume two conditions. First, the future study would apply a statistical method similar to ours---namely, a mixed-effects model with random intercepts and slopes. Second, the analysis of the future study would encompass at least `r length(unique(lexicaldecision$word)) %>% formattable::comma(digits = 0)` stimulus words (distributed in various blocks across participants, not all being presented to every participant).

The results revealed that detecting the main effect of word frequency would require 100 participants. In contrast, detecting the other effects would require more than 2,000 participants.


# General discussion of Study 2

In the present study, we have revisited three existing data sets in conceptual processing to investigate the interplay between language-based and vision-based information. Specifically, we have investigated how this interplay is modulated by individual differences in vocabulary size, by the linguistic and visual information contained in words, and by contextual demands such as semantic depth and presentation speed. Although both language and vision played significant roles in some contexts (detailed below), the main effects and the interactions of language-based information were larger than those of vision-based information, consistent with previous research [@banksLinguisticDistributionalKnowledge2021; @kiela2014a; @louwerse2015a; @lam2015a; @pecherDoesPizzaPrime1998; @petilli2021a]. 

In our current approach, the sensorimotor domain was represented by a single variable in each study, just as the language domain was represented by a single variable. In the sensorimotor domain, we focussed on the vision to its hegemonic role in the human brain [@reillyEnglishLexiconMirrors2020] as well as in several languages [@bernabeuDutchModalityExclusivity2018; @chenMandarinChineseModality2019; @miceliPerceptualInteroceptiveStrength2021; @morucciAugmentedModalityExclusivity2019; @roqueVisionVerbsDominate2015; @speedDutchSensoryModality2021; @speedGroundingLanguageNeglected2020; @lynott2020a; @vergallitoPerceptualModalityNorms2020; @winterVisionDominatesPerceptual2018; @zhongSensorimotorNormsChinese2022]. Notably, vision was also the domain chosen in a recent study that strongly influenced the present study [@petilli2021a], as well as in previous studies [@bottiniConcretenessAdvantageLexical2021; @dedeyneVisualAffectiveMultimodal2021; @pearsonHeterogeneityMentalRepresentation2015; @yeeColorlessGreenIdeas2012]. In contrast to this parsimonious approach, more comprehensive alternatives could be used in future research to consider more sensorimotor domains. The first of these approaches is the preselection approach, which incorporates a step prior to the main analysis. In this prior step, a selection is performed among a large variety of word-level information, including visual, auditory and motor information, etc. [@bernabeu2021a]. Selecting a single variable provides a convenient way to compare the role of sensorimotor information to that of linguistic information, if the latter is also represented by a single variable. The second approach is using a variable that aggregates sensorimotor information [@wingfieldSensorimotorDistanceGrounded2022]. Last, the third approach would be using more than one variable to represent sensorimotor information in the main analysis. This would complicate the analysis of interactions with other variables, as the overall number of terms in the model could quickly exceed the maximum normally encountered in mixed-effects models---that is, around 15. If random slopes are included for all those effects of interest [see @brauer2018a; @singmann2019a], the model would most likely present convergence warnings. In the face of this challenge, authors could either probe into those warnings (see [\underline{Appendix B}](#appendix-B-frequentist-analysis-diagnostics)), or could opt for different method, such as linear regression or machine learning. Ultimately, in any selection of variables, there is a trade-off between parsimony and comprehensiveness, and negotiating this trade-off often involves a certain degree of arbitrariness. A time-consuming, stepwise selection can help reduce this arbitrariness (for an example, see [\underline{Appendix A}](#appendix-A-lexical-covariates)).

Insofar as both 'language' and 'vision' were present in the models, it is (arguably) valid to make conclusions based on them [see @louwerseSymbolInterdependencySymbolic2011; @louwerseTasteWordsLinguistic2011; @santosPropertyGenerationReflects2011; @simmonsFMRIEvidenceWord2008]. In contrast, when only one of these variables is analysed, it may contain information from the other variable. If the superiority of language is genuine---rather than due to a bogus reflection of sensorimotor information---, the present results suggest that language is the main source of information in conceptual processing, whereas sensorimotor information provides extra help, especially for higher-vocabulary individuals (see Study 2.2, Semantic decision) and in deeper semantic tasks (refer to task-relevance advantage above). As the ultimate conclusion, should sensorimotor simulation be considered smaller but nonetheless important---especially for some individuals and in some contexts---, or should it be considered a negligible by-product of conceptual processing [@mahonCriticalLookEmbodied2008]? Although the jury is still out, the present results provide support for the tenet that sensorimotor simulation is smaller yet important, especially for some individuals and in some contexts, whereas language is important across the board.

Furthermore, it is necessary to acknowledge a longstanding caveat in the present topic, which also affects the present study. That is, it is extremely difficult to ascertain whether our variables encode what we intend for them to encode. Specifically, it is possible that the variables for language-based information encode some sensorimotor information, and vice versa. To address this caveat, future research could combine the use of continuous word-level variables---as used in the present study---with the use of brain-level measurements [see @borghesaniWordMeaningVentral2016]. Specifically, such an investigation should examine whether language-based information is primarily circumscribed to the brain regions in charge of semantic retrieval---such as the posterior left inferior frontal gyrus, the right posterior inferior frontal gyrus, the left anterior superior temporal gyrus and sulcus, and the left middle and posterior middle temporal gyrus [@hagoortCoreLanguagereadyBrain2017; @skeideOntogenyCorticalLanguage2016]. Conversely, this investigation should also examine whether vision-based information is primarily circumscribed to the brain regions in charge of visual semantic information---such as Brodmann area 17, in the occipital lobe, corresponding to primary visual cortex [@borghesaniWordMeaningVentral2016]. Due to the importance of the time course, a method that provides both spatial and temporal resolution, such as magnetoencephalography, would be ideally suited for this research. If both sources of information are largely circumscribed to their regions of interest in the brain, we could conclude that the variables are valid. In contrast, if there are *drifts* in the processing---whereby language-based information is consistently associated with activation in primary visual cortex, or whereby vision-based information is associated with activation in the language regions of interest---, we would need to question the validity of the variables.

As an alternative to the above design, a thriftier method would be available by using two clusters of covariates. One of these clusters would be primarily associated with language-based information, whereas the other cluster would be primarily associated with vision-based information.^[Thank you to Prof. Max Louwerse for suggesting this idea.] This research should examine whether the variables in each cluster all behave similarly, or whether---instead---there are any drifts between the language and vision. As in the above design, the absence of drifts would validate the operationalisation of the two sides in the dichotomy, whereas the presence of drifts would question the validity.

The present analysis controlled important sources of variance in the fixed effects and in the random effects. First, in the fixed effects, covariates such as word concreteness and individual differences in general cognition were included in the models. It was important to include these covariates as they were substantially correlated with some of our variables of interest, and research has suggested that these covariates may represent fundamentally different processes from those of our variables of interest. For instance, word concreteness and visual strength were highly correlated. However, whereas visual strength indexes a perceptual component of semantic information, word concreteness might be circumscribed to the lexical level, which does not require the processing of meaning [@bottiniConcretenessAdvantageLexical2021; cf. @connellStrengthPerceptualExperience2012; @pexman2018a]. Similarly, it was important to control for individual differences in general cognition measures as covariates of vocabulary size [@ratcliff2010a; also see @james2018a; @pexman2018a]. We contend that controlling (or, in other words, statistically adjusting) for important covariates is a valuable asset of our present research. Furthermore, we think that the number of covariates we selected was enough but not excessive. We did not find any signs of overfitting in the models, as the variables that have been consistently influential in the literature were also influential in our current models. To further delve into the role of covariates in conceptual processing, we think that it would be valuable to investigate how the presence and the absence of several covariates in a model can affect the effect sizes and the significance results.^[Thank you to Prof. Max Louwerse for this idea.] Indeed, the differences between the results of Study 2.1 (semantic priming) and the results of @petilli2021a suggested that the influence of covariates can be very important. However, because these analyses differed in other aspects of the models, a study focussed on covariates would be insightful [see @botvinik-nezerVariabilityAnalysisSingle2020; @perretWhichVariablesShould2019; @wagenmakersOneStatisticalAnalysis2022].  

Second, in the random effects, the models contained a maximal structure that accounted for far more variance than the fixed effects, thus providing for a conservative analysis. Indeed, the maximal random-effects structure served to impede a violation of the independence of observations [@barrRandomEffectsStructure2013; @brauer2018a; @singmann2019a]. Specifically, random intercepts and slopes ensured that sources of dependence such as participants and stimuli were kept outside of the fixed effects, which are the relevant effects for the conclusions of this (and most other) research in conceptual processing.  

The RTs of higher-vocabulary participants were influenced by a smaller number of variables than those of lower-vocabulary participants. This converges with previous findings suggesting that higher and lower-vocabulary participants are affected by different variables. In this regard, some research has suggested that the variables affecting higher-vocabulary participants most are especially relevant to the task [@lim2020a; @pexman2018a; @yap2012a; @yap2017a]. Our results were consistent with the 'task-relevance advantage' associated with greater vocabulary knowledge. Specifically, in lexical decision, higher-vocabulary participants were more sensitive than lower-vocabulary participants to language-based information. In contrast, in semantic decision, higher-vocabulary participants were more sensitive to word concreteness. In summary, the present findings suggest that greater linguistic experience may be associated with greater task adaptabiity during cognitive performance, with better comprehenders able to selectively attend to task-relevant features compared to poorer comprehenders [@lim2020a; @pexman2018a].

In addition, the semantic priming paradigm analysed in Study 2.1 revealed that both language and vision were more important with the short SOA (200 ms) than with the long SOA (1,200 ms). This finding replicates some of the previous literature [@petilli2021a] while highlighting the importance of the time course and the level of semantic processing. That is, although the finding seems to be at odds with the theory that perceptual simulation peaks after language-based associations [@barsalouLanguageSimulationConceptual2008; @louwerseTasteWordsLinguistic2011], the long SOA may have been too long for perceptual simulation to be maintained in the lexical decision task that was performed by participants, which is semantically shallow [@petilli2021a].


## Operationalisation of variables and other analytical choices

We compared two measures of vision-based priming. The first measure---`visual-strength difference`---was operationalised as the difference in visual strength [@lynott2020a] between the prime word and the target word in each trial. The second measure---`vision-based similarity`---, created by @petilli2021a, was based on vector representations trained on images. The results revealed that both measures---including their interactions with other variables---produced similar effect sizes. This underscores the consistency that exists between human ratings and computational approximations to meaning [e.g., @charbonnierPredictingWordConcreteness2019; @charbonnierPredictingConcretenessGerman2020; @guenther2016a; @louwerse2015a; @mandera2017a; @petilli2021a; @solovyevConcretenessAbstractnessConcept2021; @wingfieldUnderstandingRoleLinguistic2022]. However, the effect of the human-based variable was slightly larger, which is consistent with previous comparisons of human-based and computational measures [@de2016a; @de2019a; @gagneProcessingEnglishCompounds2016; @schmidtke2018a; cf. @michaelovClozeFarN4002022; @snefjella2020a]. 

In contrast to the results of @petilli2021a, vision-based similarity did not significantly interact with SOA. Furthermore, in contrast to the main analysis, this sub-analysis did not present a significant interaction between language-based similarity and SOA. These two differences demonstrate how the results of our analyses can be critically influenced by analytical choices such as the operationalisation of variables and the degree of complexity of statistical models. In this regard, we must draw attention to an often-overlooked difference between the variables used to operationalise the language system---usually, text-based measures based on large corpora---and the variables used to operationalise the embodiment system---usually, human-based measures based on ratings. Critically, the literature contains many comparisons of text-based variables [@jones2006a; @lund1996a; @mandera2017a; @mikolovEfficientEstimationWord2013; @dedeyneBetterExplanationsLexical2013; @de2016a; @guenther2016a; @guntherLatentSemanticAnalysis2016; @wingfieldUnderstandingRoleLinguistic2022], whereas the work on embodiment variables is more sparse and tends to compare different *modalities*---e.g., valence, visual strength, auditory strength, etc. [@lynott2020a; @lynottModalityExclusivityNorms2009; @newcombeEffectsEmotionalSensorimotor2012; for an exception, see @vergallitoPerceptualModalityNorms2020]. This accident of history might in part account for the superiority of linguistic information over embodied information [see @banksLinguisticDistributionalKnowledge2021; @kiela2014a; @louwerse2015a; @lam2015a; @pecherDoesPizzaPrime1998; @petilli2021a]. Therefore, it may be important to consider whether *engineering work* should be devoted to the betterment of embodiment variables. On a more general conclusion, the present results suggest that research findings are fundamentally dependent on research methods.


## Statistical power

Power analyses were performed to estimate the sample sizes required to reliably investigate a range of effects. The results suggested that 300 participants were sufficient to examine the effect of language-based information contained in words, whereas more than 1,000 participants were necessary for the effect of vision-based information and for the interactions of both former variables with vocabulary size, gender and presentation speed. Regarding interactions specifically, The large sample sizes required to investigate some of the effects relevant to embodied cognition and individual differences are not easily attainable with the usual organisation of funding in Psychology and Neuroscience.