---
name: questionnaire-design-guide
description: "Questionnaire and survey design with Likert scales and coding"
metadata:
  openclaw:
    emoji: "📋"
    category: "analysis"
    subcategory: "wrangling"
    keywords: ["questionnaire design", "survey design", "Likert scale", "data transformation"]
    source: "wentor-research-plugins"
---

# Questionnaire Design Guide

Design valid and reliable survey instruments with proper question types, Likert scale construction, response coding, and data preparation for analysis.

## Survey Design Principles

### Question Types

| Type | Example | Best For | Analysis |
|------|---------|----------|----------|
| **Likert scale** | "Rate your agreement: 1-5" | Attitudes, perceptions | Ordinal/interval statistics |
| **Multiple choice** | "Select your field" | Demographics, categories | Frequencies, chi-square |
| **Ranking** | "Rank these 5 options" | Preferences, priorities | Rank correlations |
| **Open-ended** | "Describe your experience" | Exploratory, rich data | Qualitative coding |
| **Matrix/grid** | Multiple items, same scale | Efficient battery of items | Factor analysis, reliability |
| **Slider/VAS** | 0-100 visual analog scale | Continuous measures | Parametric statistics |
| **Semantic differential** | "Easy __ __ __ __ __ Difficult" | Bipolar attitudes | Factor analysis |

### The Four C's of Good Questions

1. **Clear**: Avoid jargon, double-barreled questions, and ambiguity
2. **Concise**: Keep questions short (ideally under 20 words)
3. **Complete**: Include all relevant response options
4. **Consistent**: Use the same scale direction and format throughout

## Likert Scale Design

### Scale Points

| Points | Scale Example | Recommended Use |
|--------|---------------|-----------------|
| 4-point | Strongly Disagree to Strongly Agree | Forces choice (no neutral), less discriminating |
| 5-point | SD, D, Neutral, A, SA | Most common, good balance of simplicity and discrimination |
| 7-point | SD, D, Somewhat D, Neutral, Somewhat A, A, SA | More discriminating, better for experienced respondents |
| 11-point (0-10) | Not at all to Completely | NPS, continuous-like measures |

### Anchoring Labels

```
5-Point Agreement Scale:
1 = Strongly Disagree
2 = Disagree
3 = Neither Agree nor Disagree
4 = Agree
5 = Strongly Agree

5-Point Frequency Scale:
1 = Never
2 = Rarely
3 = Sometimes
4 = Often
5 = Always

5-Point Satisfaction Scale:
1 = Very Dissatisfied
2 = Dissatisfied
3 = Neutral
4 = Satisfied
5 = Very Satisfied
```

### Reverse-Coded Items

Include 2-3 reverse-coded items per construct to detect acquiescence bias:

```
Regular:  "I find research methods interesting."        (1-5: SD to SA)
Reversed: "I find research methods tedious and dull."   (1-5: SD to SA)

# Recode reversed items before analysis:
# reversed_score = (max_scale + 1) - raw_score
# For a 5-point scale: reversed_score = 6 - raw_score
```

## Constructing a Multi-Item Scale

### Step-by-Step Process

1. **Define the construct**: Write a clear conceptual definition
2. **Generate items**: Write 1.5-2x the number of items you plan to keep (e.g., write 15 items for an 8-item scale)
3. **Expert review**: Have 3-5 experts rate each item for relevance (Content Validity Index)
4. **Pilot test**: Administer to 30-50 respondents
5. **Item analysis**: Calculate item-total correlations, check reliability
6. **Exploratory Factor Analysis (EFA)**: Confirm dimensionality
7. **Finalize scale**: Remove weak items, re-test reliability

### Example: Research Self-Efficacy Scale

```
Construct: Belief in one's ability to conduct academic research

Items (5-point Likert, Strongly Disagree to Strongly Agree):
RSE1: I can formulate clear research questions.
RSE2: I can design an appropriate research methodology.
RSE3: I can analyze data using statistical software.
RSE4: I can write a publishable research paper.
RSE5: I can critically evaluate published research.
RSE6: I can present research findings at a conference.
RSE7R: I struggle to interpret statistical results. [REVERSED]
RSE8R: I find it difficult to synthesize literature. [REVERSED]
```

## Data Coding and Preparation

### Coding Scheme

```python
import pandas as pd
import numpy as np

# Define coding scheme
likert_coding = {
    "Strongly Disagree": 1,
    "Disagree": 2,
    "Neither Agree nor Disagree": 3,
    "Agree": 4,
    "Strongly Agree": 5
}

# Apply coding
df["Q1_coded"] = df["Q1_raw"].map(likert_coding)

# Reverse code specific items
reverse_items = ["RSE7R", "RSE8R"]
max_scale = 5
for item in reverse_items:
    df[f"{item}_recoded"] = (max_scale + 1) - df[item]

# Calculate composite score (mean of items)
scale_items = ["RSE1", "RSE2", "RSE3", "RSE4", "RSE5", "RSE6",
               "RSE7R_recoded", "RSE8R_recoded"]
df["RSE_mean"] = df[scale_items].mean(axis=1)
```

### Missing Data Handling

```python
# Check missing data patterns
print(df[scale_items].isnull().sum())
print(f"Complete cases: {df[scale_items].dropna().shape[0]} / {df.shape[0]}")

# Common strategies:
# 1. Listwise deletion (if < 5% missing)
df_complete = df.dropna(subset=scale_items)

# 2. Mean imputation per item (simple but biased)
df[scale_items] = df[scale_items].fillna(df[scale_items].mean())

# 3. Person-mean imputation (if < 20% of items missing per person)
def person_mean_impute(row, items, max_missing=2):
    if row[items].isnull().sum() <= max_missing:
        return row[items].fillna(row[items].mean())
    return row[items]  # leave as NaN if too many missing

df[scale_items] = df.apply(lambda r: person_mean_impute(r, scale_items), axis=1)
```

## Reliability Analysis

### Cronbach's Alpha

```python
import pingouin as pg

# Calculate Cronbach's alpha
alpha = pg.cronbach_alpha(df[scale_items])
print(f"Cronbach's alpha: {alpha[0]:.3f}")
# Interpretation: >= 0.70 acceptable, >= 0.80 good, >= 0.90 excellent
```

```r
library(psych)

# Cronbach's alpha with item-level diagnostics
alpha_result <- alpha(data[, scale_items])
print(alpha_result)
# Check "raw_alpha if item dropped" to identify weak items
```

### Item-Total Correlations

```r
# Corrected item-total correlations (should be > 0.30)
item_stats <- alpha_result$item.stats
print(item_stats[, c("r.drop", "raw.alpha")])
# r.drop < 0.30: consider removing the item
# raw.alpha increases if dropped: item is weakening the scale
```

## Validity Assessment

| Validity Type | Method | Criterion |
|--------------|--------|-----------|
| **Content validity** | Expert panel rating (CVI) | I-CVI >= 0.78, S-CVI/Ave >= 0.90 |
| **Construct validity** | Exploratory Factor Analysis (EFA) | Eigenvalue > 1, loadings > 0.40 |
| **Convergent validity** | Correlation with related construct | r > 0.30 |
| **Discriminant validity** | Correlation with unrelated construct | r < 0.30 |
| **Criterion validity** | Correlation with external criterion | Significant correlation |
| **Test-retest reliability** | ICC or Pearson r over 2-4 weeks | ICC > 0.70 |

## Common Design Mistakes

| Mistake | Example | Fix |
|---------|---------|-----|
| Double-barreled question | "This course is interesting and useful" | Split into two separate items |
| Leading question | "Don't you agree that X is important?" | "How important is X to you?" |
| Absolute terms | "Do you always check citations?" | "How often do you check citations?" |
| Missing option | No "Not Applicable" when needed | Add N/A option or filter logic |
| Inconsistent scale direction | Some items 1=good, others 1=bad | Standardize direction; clearly mark reversed items |
| Too many items | 100-item survey | Aim for 5-8 items per construct, 15-30 min total |
| No pilot test | Skip straight to full deployment | Always pilot with 30-50 respondents |

## Survey Platform Comparison

| Platform | Cost | Features | Best For |
|----------|------|----------|----------|
| Qualtrics | Institutional | Advanced logic, panels, API | Large academic studies |
| SurveyMonkey | Freemium | Easy to use, basic analysis | Quick surveys |
| Google Forms | Free | Simple, integrates with Sheets | Classroom, pilot testing |
| LimeSurvey | Free/self-hosted | Open source, full control | Privacy-sensitive research |
| REDCap | Free (academic) | Clinical data, HIPAA compliant | Medical/clinical research |
| Prolific | Per-response | Participant recruitment | Online experiments |