---
name: nlm-index
description: Index documentation sites or GitHub repos to NotebookLM
---

# NotebookLM Index

Workflow to scrape docs/repos and upload to NotebookLM for AI-powered research.

## Use Cases

- Index entire documentation site (React, Next.js, etc.)
- Index GitHub repo (README, docs, source files)
- Bulk upload YouTube video transcripts

## Workflow

### 1. Identify Target

```
User provides:
- Docs URL: "https://react.dev/reference/react"
- GitHub repo: "vercel/ai" or "https://github.com/vercel/ai"
- YouTube playlist/channel
```

### 2. Create or Select Notebook

```
notebook_create({ title: "React Docs" })
# or
notebook_list()  # select existing
```

### 3. Discover URLs

#### Option A: Documentation Site

```bash
# Use webfetch to get sitemap or crawl links
webfetch({ url: "https://react.dev/sitemap.xml", format: "text" })

# Or scrape navigation links from docs page
webfetch({ url: "https://react.dev/reference/react", format: "markdown" })
# Extract all internal links from the page
```

#### Option B: GitHub Repo

```bash
# Use gh CLI to list files (quote URL to prevent shell glob expansion)
gh api 'repos/vercel/ai/git/trees/main?recursive=1' --jq '.tree[].path'

# Filter for docs/README
# Common patterns: README.md, docs/**, *.md, src/**/*.ts
```

#### Option C: YouTube

```
# Collect video URLs from playlist or channel
# Each video URL can be added directly
```

### 4. Filter & Prioritize

**Keep:**
- Documentation pages (guides, API refs, tutorials)
- README files
- Source code with good comments
- YouTube videos with transcripts

**Skip:**
- Asset files (.png, .css, .js bundles)
- Generated/minified code
- node_modules, dist, build
- Paid/private content

**Limits:**
- Max 50 sources per notebook (NotebookLM limit)
- If >50, split into multiple notebooks: "React Docs (Part 1)", "(Part 2)"

### 5. Batch Upload

```
# Collect URLs (space or newline separated)
source_add({
  urls: """
    https://react.dev/reference/react/useState
    https://react.dev/reference/react/useEffect
    https://react.dev/reference/react/useContext
    https://react.dev/learn/thinking-in-react
  """,
  notebook_id: "..."
})
```

**Rate Limiting:**
- NotebookLM processes URLs async
- For large batches (20+ URLs), split into chunks of 10-15
- Wait a few seconds between batches

### 6. Verify & Report

```
notebook_get({ notebook_id: "...", include_summary: true })
```

Report:
- Total sources added
- Any failed URLs (paid content, 404s, etc.)
- Suggest next steps (query, generate audio, etc.)

## Examples

### Index React Hooks Docs

```
1. notebook_create({ title: "React Hooks Reference" })

2. Scrape https://react.dev/reference/react/hooks
   Extract: useState, useEffect, useContext, useReducer, etc.

3. source_add({
     urls: "https://react.dev/reference/react/useState https://react.dev/reference/react/useEffect ..."
   })

4. notebook_query({ query: "Summarize all hooks and their use cases" })
```

### Index GitHub Repo

```
1. notebook_create({ title: "Vercel AI SDK" })

2. gh api 'repos/vercel/ai/git/trees/main?recursive=1'
   Filter: README.md, docs/**, packages/**/README.md

3. For each doc file:
   - If URL accessible: source_add({ urls: "https://github.com/vercel/ai/blob/main/README.md" })
   - If raw content needed: webfetch + source_add({ text: content, title: filename })

4. notebook_query({ query: "How do I use the AI SDK with Next.js?" })
```

### Index YouTube Playlist

```
1. notebook_create({ title: "React Conf 2024" })

2. Collect video URLs from playlist

3. source_add({
     urls: """
       https://youtube.com/watch?v=xxx
       https://youtube.com/watch?v=yyy
       https://youtube.com/watch?v=zzz
     """
   })

4. studio_create({ type: "audio", focus_prompt: "Key announcements" })
```

## Tips

- **Sitemap first**: Most doc sites have `/sitemap.xml` - parse it for all URLs
- **GitHub raw URLs**: Use `raw.githubusercontent.com` for direct file content
- **YouTube limits**: Only public videos with captions work
- **Chunking**: For 100+ URLs, create multiple notebooks by topic
- **Verification**: Always check `notebook_get` after bulk upload to confirm sources added

## Constraints

| Constraint | Limit |
|------------|-------|
| Sources per notebook | ~50 |
| URL types | Public websites, YouTube |
| Content | Visible text only (no JS-rendered) |
| YouTube | Public videos with transcripts |