--- name: site-architecture description: Technical SEO - robots.txt, sitemap, meta tags, Core Web Vitals --- # Site Architecture Skill *Load with: base.md + web-content.md* For technical website structure that enables discovery by search engines AND AI crawlers (GPTBot, ClaudeBot, PerplexityBot). --- ## Philosophy **Content is king. Architecture is the kingdom.** Great content buried in poor architecture won't be discovered. This skill covers the technical foundation that makes your content findable by: - Google, Bing (traditional search) - GPTBot (ChatGPT), ClaudeBot, PerplexityBot (AI assistants) - Social platforms (Open Graph, Twitter Cards) --- ## robots.txt ### Basic Template ```txt # robots.txt # Allow all crawlers by default User-agent: * Allow: / Disallow: /api/ Disallow: /admin/ Disallow: /private/ Disallow: /_next/ Disallow: /cdn-cgi/ # Sitemap location Sitemap: https://yoursite.com/sitemap.xml # Crawl delay (optional - be careful, not all bots respect this) # Crawl-delay: 1 ``` ### AI Bot Configuration ```txt # robots.txt with AI bot rules # === SEARCH ENGINES === User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / # === AI ASSISTANTS (Allow for discovery) === User-agent: GPTBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: Claude-Web Allow: / User-agent: ClaudeBot Allow: / User-agent: PerplexityBot Allow: / User-agent: Amazonbot Allow: / User-agent: anthropic-ai Allow: / User-agent: Google-Extended Allow: / # === BLOCK AI TRAINING (Optional - block training, allow chat) === # Uncomment these if you want to be cited but not used for training # User-agent: CCBot # Disallow: / # User-agent: GPTBot # Disallow: / # Blocks both chat and training # === BLOCK SCRAPERS === User-agent: AhrefsBot Disallow: / User-agent: SemrushBot Disallow: / User-agent: MJ12bot Disallow: / # === DEFAULT === User-agent: * Allow: / Disallow: /api/ Disallow: /admin/ Disallow: /auth/ Disallow: /private/ Disallow: /*.json$ Disallow: /*?* Sitemap: https://yoursite.com/sitemap.xml ``` ### Next.js robots.txt ```typescript // app/robots.ts import { MetadataRoute } from 'next'; export default function robots(): MetadataRoute.Robots { const baseUrl = process.env.NEXT_PUBLIC_URL || 'https://yoursite.com'; return { rules: [ { userAgent: '*', allow: '/', disallow: ['/api/', '/admin/', '/private/', '/_next/'], }, { userAgent: 'GPTBot', allow: '/', }, { userAgent: 'ClaudeBot', allow: '/', }, { userAgent: 'PerplexityBot', allow: '/', }, ], sitemap: `${baseUrl}/sitemap.xml`, }; } ``` --- ## Sitemap ### XML Sitemap Template ```xml https://yoursite.com/ 2025-01-15 weekly 1.0 https://yoursite.com/pricing 2025-01-10 monthly 0.9 https://yoursite.com/blog/article-slug 2025-01-12 monthly 0.8 https://yoursite.com/images/article-image.jpg ``` ### Next.js Dynamic Sitemap ```typescript // app/sitemap.ts import { MetadataRoute } from 'next'; export default async function sitemap(): Promise { const baseUrl = process.env.NEXT_PUBLIC_URL || 'https://yoursite.com'; // Static pages const staticPages = [ { url: '/', priority: 1.0, changeFrequency: 'weekly' as const }, { url: '/pricing', priority: 0.9, changeFrequency: 'monthly' as const }, { url: '/about', priority: 0.8, changeFrequency: 'monthly' as const }, { url: '/contact', priority: 0.7, changeFrequency: 'yearly' as const }, ]; // Dynamic pages (e.g., blog posts) const posts = await getBlogPosts(); // Your data fetching function const blogPages = posts.map((post) => ({ url: `/blog/${post.slug}`, lastModified: new Date(post.updatedAt), changeFrequency: 'monthly' as const, priority: 0.8, })); return [ ...staticPages.map((page) => ({ url: `${baseUrl}${page.url}`, lastModified: new Date(), changeFrequency: page.changeFrequency, priority: page.priority, })), ...blogPages.map((page) => ({ url: `${baseUrl}${page.url}`, lastModified: page.lastModified, changeFrequency: page.changeFrequency, priority: page.priority, })), ]; } ``` ### Sitemap Index (Large Sites) ```xml https://yoursite.com/sitemap-pages.xml 2025-01-15 https://yoursite.com/sitemap-blog.xml 2025-01-14 https://yoursite.com/sitemap-products.xml 2025-01-13 ``` --- ## Meta Tags ### Essential Meta Tags ```html Page Title | Brand Name ``` ### Open Graph (Social Sharing) ```html ``` ### Twitter Cards ```html ``` ### Next.js Metadata ```typescript // app/layout.tsx import { Metadata } from 'next'; export const metadata: Metadata = { metadataBase: new URL('https://yoursite.com'), title: { default: 'Brand Name', template: '%s | Brand Name', }, description: 'Your default site description.', keywords: ['keyword1', 'keyword2', 'keyword3'], authors: [{ name: 'Brand Name', url: 'https://yoursite.com' }], creator: 'Brand Name', publisher: 'Brand Name', robots: { index: true, follow: true, googleBot: { index: true, follow: true, 'max-video-preview': -1, 'max-image-preview': 'large', 'max-snippet': -1, }, }, openGraph: { type: 'website', locale: 'en_US', url: 'https://yoursite.com', siteName: 'Brand Name', title: 'Brand Name', description: 'Your site description.', images: [ { url: '/og-image.jpg', width: 1200, height: 630, alt: 'Brand Name', }, ], }, twitter: { card: 'summary_large_image', site: '@yourbrand', creator: '@yourbrand', }, verification: { google: 'google-verification-code', yandex: 'yandex-verification-code', }, }; // app/blog/[slug]/page.tsx export async function generateMetadata({ params }): Promise { const post = await getPost(params.slug); return { title: post.title, description: post.excerpt, openGraph: { title: post.title, description: post.excerpt, type: 'article', publishedTime: post.publishedAt, modifiedTime: post.updatedAt, authors: [post.author.name], images: [post.coverImage], }, }; } ``` --- ## URL Structure ### Best Practices ```markdown ✅ GOOD URLs: /blog/ai-seo-best-practices /products/pro-plan /pricing /about/team ❌ BAD URLs: /blog?id=123 /p/12345 /index.php?page=about /Products/Pro_Plan (inconsistent casing) ``` ### URL Guidelines | Rule | Example | |------|---------| | Lowercase only | `/blog/my-post` not `/Blog/My-Post` | | Hyphens not underscores | `/my-page` not `/my_page` | | No trailing slashes | `/about` not `/about/` | | Descriptive slugs | `/pricing` not `/p` | | No query params for content | `/blog/post-title` not `/blog?id=123` | | Max 3-4 levels deep | `/blog/category/post` | ### Redirect Configuration ```typescript // next.config.js module.exports = { async redirects() { return [ // Redirect old URLs to new { source: '/old-page', destination: '/new-page', permanent: true, // 301 redirect }, // Redirect with wildcard { source: '/blog/old/:slug', destination: '/articles/:slug', permanent: true, }, // Trailing slash redirect { source: '/:path+/', destination: '/:path+', permanent: true, }, ]; }, }; ``` --- ## Canonical URLs ### Implementation ```html ``` ### When to Use ```markdown ✅ USE CANONICAL: - Every page (even if only version exists) - Paginated content (point to page 1 or use rel=prev/next) - URL parameters that don't change content (?utm_source=...) - HTTP vs HTTPS (canonical to HTTPS) - www vs non-www (pick one, canonical to it) Example: /products?sort=price should canonical to /products ``` ### Next.js Canonical ```typescript // Automatic in metadata export const metadata: Metadata = { alternates: { canonical: '/current-page', }, }; ``` --- ## Security Headers ### Essential Headers ```typescript // next.config.js const securityHeaders = [ { key: 'X-DNS-Prefetch-Control', value: 'on', }, { key: 'Strict-Transport-Security', value: 'max-age=63072000; includeSubDomains; preload', }, { key: 'X-Frame-Options', value: 'SAMEORIGIN', }, { key: 'X-Content-Type-Options', value: 'nosniff', }, { key: 'Referrer-Policy', value: 'strict-origin-when-cross-origin', }, { key: 'Permissions-Policy', value: 'camera=(), microphone=(), geolocation=()', }, ]; module.exports = { async headers() { return [ { source: '/:path*', headers: securityHeaders, }, ]; }, }; ``` --- ## Core Web Vitals ### Target Metrics | Metric | Good | Needs Improvement | Poor | |--------|------|-------------------|------| | LCP (Largest Contentful Paint) | ≤2.5s | ≤4.0s | >4.0s | | INP (Interaction to Next Paint) | ≤200ms | ≤500ms | >500ms | | CLS (Cumulative Layout Shift) | ≤0.1 | ≤0.25 | >0.25 | ### Optimization Checklist ```markdown ## LCP (Loading) - [ ] Optimize largest image (WebP, proper sizing) - [ ] Preload critical assets - [ ] Use CDN for static assets - [ ] Enable compression (gzip/brotli) - [ ] Minimize render-blocking resources ## INP (Interactivity) - [ ] Minimize JavaScript execution time - [ ] Break up long tasks - [ ] Use web workers for heavy computation - [ ] Optimize event handlers - [ ] Lazy load non-critical JS ## CLS (Visual Stability) - [ ] Set dimensions on images/videos - [ ] Reserve space for dynamic content - [ ] Avoid inserting content above existing - [ ] Use transform for animations - [ ] Preload fonts ``` ### Next.js Performance ```typescript // Image optimization import Image from 'next/image'; Hero image // Font optimization import { Inter } from 'next/font/google'; const inter = Inter({ subsets: ['latin'], display: 'swap', // Prevent FOIT }); // Dynamic imports import dynamic from 'next/dynamic'; const HeavyComponent = dynamic(() => import('./HeavyComponent'), { loading: () => , ssr: false, // Client-only if needed }); ``` --- ## Internal Linking ### Structure ```markdown ## Link Architecture Homepage ├── /pricing (1 click) ├── /features (1 click) ├── /blog (1 click) │ ├── /blog/category-1 (2 clicks) │ │ └── /blog/category-1/post (3 clicks) │ └── /blog/category-2 (2 clicks) └── /about (1 click) Rule: Every page within 3 clicks of homepage ``` ### Best Practices ```markdown ✅ DO: - Use descriptive anchor text - Link contextually within content - Create hub pages for topics - Link to related content at end of posts - Use breadcrumbs for navigation ❌ AVOID: - "Click here" as anchor text - Orphan pages (no internal links) - Too many links per page (>100) - Broken internal links - Redirect chains ``` ### Breadcrumbs ```typescript // components/Breadcrumbs.tsx import Link from 'next/link'; interface BreadcrumbItem { name: string; href: string; } export function Breadcrumbs({ items }: { items: BreadcrumbItem[] }) { const jsonLd = { '@context': 'https://schema.org', '@type': 'BreadcrumbList', itemListElement: items.map((item, index) => ({ '@type': 'ListItem', position: index + 1, name: item.name, item: `https://yoursite.com${item.href}`, })), }; return ( <> ``` ### Multiple Schema Per Page ```html ``` --- ## Project Structure ``` project/ ├── public/ │ ├── robots.txt # Or generate dynamically │ ├── sitemap.xml # Or generate dynamically │ ├── favicon.ico │ ├── icon.svg │ ├── apple-touch-icon.png │ ├── og-image.jpg # Default OG image (1200x630) │ └── manifest.webmanifest ├── app/ │ ├── layout.tsx # Global metadata │ ├── robots.ts # Dynamic robots.txt │ ├── sitemap.ts # Dynamic sitemap │ └── [page]/ │ └── page.tsx # Page-specific metadata ├── components/ │ ├── SchemaMarkup.tsx │ ├── Breadcrumbs.tsx │ └── MetaTags.tsx └── lib/ ├── schema.ts # Schema generators └── seo.ts # SEO utilities ``` --- ## Verification & Submission ### Search Console Setup ```bash # Verify ownership methods 1. HTML file upload (google*.html to public/) 2. Meta tag (add to ) 3. DNS TXT record 4. Google Analytics (if already installed) ``` ### Submit Sitemap ```markdown 1. Google Search Console - Sitemaps → Add new sitemap → yoursite.com/sitemap.xml 2. Bing Webmaster Tools - Sitemaps → Submit sitemap 3. Yandex Webmaster (if relevant) - Indexing → Sitemap files ``` --- ## Checklist ```markdown ## Technical SEO Checklist ### robots.txt - [ ] Allow search engines - [ ] Allow AI bots (GPTBot, ClaudeBot, PerplexityBot) - [ ] Block admin/private areas - [ ] Include sitemap reference - [ ] Test with Google's robots.txt tester ### Sitemap - [ ] Include all indexable pages - [ ] Exclude noindex pages - [ ] Include lastmod dates - [ ] Submit to Search Console - [ ] Auto-update on content changes ### Meta Tags - [ ] Unique title per page (50-60 chars) - [ ] Unique description per page (150-160 chars) - [ ] Canonical URL on every page - [ ] Open Graph tags - [ ] Twitter Card tags ### URL Structure - [ ] Lowercase, hyphenated - [ ] Descriptive slugs - [ ] No query params for content - [ ] 301 redirects for moved content - [ ] No broken links ### Performance - [ ] LCP < 2.5s - [ ] INP < 200ms - [ ] CLS < 0.1 - [ ] HTTPS enabled - [ ] Security headers configured ### Structured Data - [ ] Organization schema (homepage) - [ ] BreadcrumbList (all pages) - [ ] Article schema (blog posts) - [ ] FAQ schema (FAQ sections) - [ ] Validate with Rich Results Test ``` --- ## Quick Reference ### File Checklist ``` public/ ├── robots.txt ✓ Required ├── sitemap.xml ✓ Required ├── favicon.ico ✓ Required ├── og-image.jpg ✓ Required (1200x630) └── manifest.json ○ Recommended ``` ### Meta Tag Lengths | Tag | Length | |-----|--------| | Title | 50-60 characters | | Description | 150-160 characters | | OG Title | 60-90 characters | | OG Description | 200 characters | | Twitter Description | 200 characters | ### Image Sizes | Image | Dimensions | |-------|------------| | OG Image | 1200 x 630 | | Twitter Image | 1200 x 628 | | Favicon | 32 x 32 | | Apple Touch Icon | 180 x 180 |