---
name: url-dump
description: Quick capture URLs with automatic content extraction, insights, and categorization into knowledge booklets
roles: [all]
integrations: [web-fetch]
---

# COG URL Dump Skill

## Purpose
Transform raw URLs into structured, insightful knowledge entries through intelligent content extraction, categorization, and integration with the user's knowledge base. Quick capture with automatic insight generation.

## When to Invoke
- User shares a URL they want to save
- User says "save this link", "bookmark this", "url dump", or "save for later"
- User pastes a URL and wants to capture it
- User wants to organize web resources into their knowledge base

## Agent Mode Awareness

**Check `agent_mode` in `00-inbox/MY-PROFILE.md` frontmatter:**
- If `agent_mode: team` — delegate content extraction, analysis, and categorization to a sub-agent while handling user interaction directly. The sub-agent fetches URL content, generates insights, and returns structured results for filing.
- If `agent_mode: solo` (default) — handle everything directly in the conversation. No delegation.

## Pre-Flight Check

**Before executing, check for user profile:**

1. Look for `00-inbox/MY-PROFILE.md` in the vault
2. If NOT found:
   ```
   Welcome to COG! It looks like this is your first time.

   Before we start, let's quickly set up your profile (takes 2 minutes).

   Would you like to run onboarding first, or should I proceed with default settings?
   ```
3. If found:
   - Read the profile to get user's interests and projects
   - Use interests to help with auto-categorization
   - Check for existing booklet categories in `05-knowledge/booklets/`

## Process Flow

### 1. User Interaction & Input Collection
- Accept URL(s) from the user (single URL or batch)
- Optionally accept user's quick note about why they're saving this
- Accept any format: bare URL, markdown link, or with notes

**Prompt:**
```
What URL(s) would you like to save?
(You can paste one or more URLs, optionally with a note about why you're saving it)
```

### 2. URL Validation & Fetch
- Validate URL format
- Check if URL is accessible
- Detect duplicate URLs in existing knowledge base
- Fetch the web page content

#### Content Extraction
Extract from the page:
- **Page Title:** [extracted-title]
- **Meta Description:** [if available]
- **Author:** [if detected]
- **Published Date:** [if detected]
- **Word Count:** [estimated]
- **Read Time:** [X minutes]
- **Main Content:** [extracted body text]
- **Key Headings:** [list of H1/H2s]

### 3. Category Selection

**Default Categories:**
- **Articles & Blogs:** Long-form content, tutorials, opinion pieces
- **Tools & Resources:** Software, utilities, services, APIs
- **Reference:** Documentation, specs, standards
- **Research:** Papers, studies, academic content
- **Inspiration:** Design, ideas, creative references
- **Videos & Media:** YouTube, podcasts, multimedia
- **News & Updates:** Industry news, announcements
- **Project-Specific:** Related to a specific project (offer project list from MY-PROFILE.md)
- **To Review:** Unsure, save for later categorization

**Custom Categories:**
- Check `05-knowledge/booklets/` for existing custom categories
- Offer to create new category if needed

**Auto-suggestion:** Based on content analysis, suggest the most likely category but let user confirm or change.

### 4. Content Analysis and Processing

#### Phase 1: Content Classification
Determine:
- **Content Category:** [article|tool|reference|research|video|news|etc]
- **Primary Topics:** [topic1, topic2, topic3]
- **Tone:** [informative|opinion|tutorial|news|etc]
- **Quality Assessment:** [high|medium|low]
- **Credibility Indicators:** [author credentials, citations, etc]

#### Phase 2: Insight Extraction
Generate:
- **Executive Summary:** [2-3 sentences]
- **Key Insights:**
  1. [Insight 1 with context]
  2. [Insight 2 with context]
  3. [Insight 3 with context]
- **Notable Quotes:** [if any stand out]
- **Action Items:** [practical takeaways]

#### Phase 3: Relevance Assessment
Analyze:
- **User Interest Match:** [high|medium|low] - [which interests from profile]
- **Project Relevance:** [project-name] - [why relevant]
- **Knowledge Gap:** [yes|no] - [what gap it fills]
- **Timeliness:** [evergreen|current|dated]
- **Uniqueness:** [novel|common|duplicate-adjacent]

#### Phase 4: Cross-Reference
Identify connections to:
- **Related Bookmarks:** [existing similar saves]
- **Related Braindumps:** [if content connects]
- **Related Projects:** [if applicable]
- **Suggested Tags:** [tag1, tag2, tag3]

### 5. Generate Structured Output

Create bookmark file with this structure:

```markdown
---
type: "url-bookmark"
category: "[category-name]"
domain: "[source-domain.com]"
date_saved: "YYYY-MM-DD"
date_accessed: "YYYY-MM-DD HH:MM"
url: "[original-url]"
title: "[page-title]"
author: "[author-if-available]"
published: "[publish-date-if-available]"
tags: ["#bookmark", "#category-tag", "#topic-tags"]
relevance: "[high|medium|low]"
status: "unread"
related_projects: ["project1", "project2"]
confidence: "[high|medium|low]"
---

# [Title]

## Quick Summary
[2-3 sentence summary of the content]

## Key Insights
- **Insight 1:** [description with context]
- **Insight 2:** [description with context]
- **Insight 3:** [description with context]

## Why This Matters
[Connection to user's interests/projects. What makes this worth saving?]

## User Note
[Original user note if provided, otherwise omit section]

## Content Highlights
[Key excerpts or quotes from the content - 200-400 words max]

## Practical Takeaways
- [ ] [Action item 1 if applicable] 📅 [YYYY-MM-DD = date +1 week from today]
- [ ] [Action item 2 if applicable] 📅 [YYYY-MM-DD = date +1 week from today]

## Related Knowledge
- **Similar Bookmarks:** [[bookmark1]], [[bookmark2]]
- **Connected Projects:** [[project1]]
- **Related Notes:** [[note1]], [[note2]]

## Source Details
| Field | Value |
|-------|-------|
| Domain | [domain] |
| Author | [author or "Unknown"] |
| Published | [date or "Unknown"] |
| Word Count | [~X words] |
| Read Time | [~X minutes] |

## Processing Notes
- **Extracted:** [timestamp]
- **Category Confidence:** [percentage]
- **Review Needed:** [yes|no] - [reason if yes]

---

*Processed by COG URL Curator*
```

Save to appropriate location:
- **Standard:** `05-knowledge/booklets/[category-slug]/[title-slug]-YYYY-MM-DD.md`
- **Project-specific:** `04-projects/[project-slug]/resources/[title-slug]-YYYY-MM-DD.md`
- **Mixed/Unclear:** `00-inbox/url-[title-slug]-YYYY-MM-DD.md`

### 6. Tool/Resource Special Handling

For tools and software, use enhanced template:

```markdown
---
type: "url-tool"
category: "tools"
domain: "[domain]"
url: "[url]"
title: "[tool-name]"
date_saved: "YYYY-MM-DD"
pricing: "[free|freemium|paid|enterprise]"
tags: ["#tool", "#category-tags"]
status: "to-evaluate"
---

# [Tool Name]

## What It Does
[1-2 sentence description]

## Key Features
- Feature 1
- Feature 2
- Feature 3

## Use Cases
- Use case 1
- Use case 2

## Pricing
[Pricing details if available]

## Why It's Relevant
[Connection to user's work/interests]

## Evaluation Status
- [ ] Sign up / try demo 📅 [YYYY-MM-DD = date +3 days from today]
- [ ] Test key features 📅 [YYYY-MM-DD = date +1 week from today]
- [ ] Compare with alternatives 📅 [YYYY-MM-DD = date +1 week from today]
- [ ] Decision: [use|pass|revisit] 📅 [YYYY-MM-DD = date +2 weeks from today]

## Notes
[Space for user's evaluation notes]

---

*Processed by COG URL Curator*
```

### 7. Batch Processing

For multiple URLs:

```
Processing [X] URLs...

1. [URL 1] → [category] → Saved to [path]
2. [URL 2] → [category] → Saved to [path]
3. [URL 3] → [category] → Saved to [path]

Summary:
- Articles: 2 saved
- Tools: 1 saved
- Total: 3 URLs processed
```

### 8. Confirm Completion
- Confirm file(s) created
- Show user: "URL saved to [file path]"
- Show quick summary: title, category, key insight preview
- Ask if they want to:
  - Add another URL
  - Deep-dive into the content
  - Connect to specific project or braindump

## Booklet Structure

URLs are organized into "booklets" (category folders):

```
05-knowledge/
└── booklets/
    ├── articles/
    │   ├── _index.md (category overview - auto-created)
    │   └── [article-entries].md
    ├── tools/
    │   ├── _index.md
    │   └── [tool-entries].md
    ├── reference/
    │   ├── _index.md
    │   └── [reference-entries].md
    ├── research/
    │   ├── _index.md
    │   └── [research-entries].md
    ├── inspiration/
    │   ├── _index.md
    │   └── [inspiration-entries].md
    ├── videos/
    │   ├── _index.md
    │   └── [video-entries].md
    └── [custom-category]/
        ├── _index.md
        └── [entries].md
```

### Category Index Template

When creating a new category, also create an index file:

```markdown
---
type: "booklet-index"
category: "[category-name]"
created: "YYYY-MM-DD"
last_updated: "YYYY-MM-DD"
entry_count: 0
---

# [Category Name] Booklet

## Description
[What this category contains]

## Recent Additions
[Auto-updated list - most recent 10 entries]

## Top Entries
[Manually curated or most-accessed entries]

## Tags in This Category
[List of common tags used]

## Related Categories
- [[other-category-1]]
- [[other-category-2]]
```

## YAML Formatting Requirements

**CRITICAL:** All YAML frontmatter must use proper Obsidian-compatible formatting:
- All string values MUST be quoted with double quotes
- Arrays MUST use quoted strings: `["item1", "item2", "item3"]`
- URLs MUST be quoted to handle special characters
- Boolean values should NOT be quoted: `true` or `false`
- Ensure proper YAML syntax to prevent parsing errors in Obsidian

**Examples:**
```yaml
# CORRECT
type: "url-bookmark"
url: "https://example.com/path?query=value"
tags: ["#bookmark", "#article", "#ai"]
relevance: "high"
reviewed: false

# INCORRECT
type: url-bookmark
url: https://example.com/path?query=value
tags: [#bookmark, #article, #ai]
relevance: high
reviewed: "false"
```

## Verification Protocols

### Content Accuracy
- **Title Verification:** Ensure extracted title matches page
- **Author Attribution:** Verify author if stated
- **Date Accuracy:** Confirm publication date if shown
- **Summary Fidelity:** Ensure summary accurately represents content

### Categorization Verification
- **Category Fit:** Confirm content matches selected category
- **Tag Relevance:** Verify tags accurately describe content
- **Interest Alignment:** Confirm relevance assessment is accurate
- **Project Connection:** Verify project relevance if claimed

### Quality Checks
- **Completeness:** All required fields populated
- **Formatting:** Proper markdown and YAML syntax
- **Links:** All internal links valid
- **Metadata:** Frontmatter properly formatted

## Uncertainty Handling

### When Content is Unclear
- **Paywalled Content:** Note limitation, extract available preview
- **Dynamic Content:** Note if content may change
- **Complex Content:** Flag for manual review if needed
- **Non-English:** Note language, provide translation if possible

### Confidence Indicators
- **High Confidence (90%+):** Clear content with obvious categorization
- **Medium Confidence (70-89%):** Generally clear with some ambiguity
- **Low Confidence (50-69%):** Significant ambiguity requiring user input
- **Very Low Confidence (<50%):** Major uncertainty, save to inbox

Always explicitly state confidence levels and reasoning in processing notes.

## Integration with Other Skills

### Immediate Follow-up
After URL capture, suggest:
- `/braindump` - Capture thoughts about the URL
- `/knowledge-consolidation` - Integrate into knowledge frameworks
- Daily brief will surface relevant saved URLs

### Cross-Referencing
Automatically check for connections to:
- Active projects (from MY-PROFILE.md)
- Recent braindumps
- Competitive watchlist companies (if exists)
- User interests

## Success Metrics
- Speed of capture (< 30 seconds for single URL)
- Accurate categorization with user confirmation
- Useful insight extraction
- Proper integration with existing knowledge
- Easy retrieval and discovery later
- High confidence in extractions

## Learning and Adaptation

### Pattern Learning
- Track which bookmarks get revisited
- Learn user's categorization preferences
- Improve relevance scoring based on engagement
- Refine insight extraction based on what user finds useful

### Continuous Improvement
- Monitor categorization accuracy over time
- Adapt to user's preferred tag taxonomy
- Learn domain-specific terminology
- Improve cross-referencing accuracy