--- name: github-review-report description: Use when generating Chinese format Excel reports from GitHub code reviews with AI analysis and filtering by date, branch, or reviewers --- # GitHub Code Review Report (Chinese Format with AI Analysis) ## Overview Generate structured Chinese-format Excel reports from GitHub pull requests with AI-powered review analysis. Uses GitHub API via `gh` CLI to fetch PR data, filter by date range, branch, and specific reviewers, then leverages AI models to intelligently analyze review content and generate structured problem descriptions and impact analysis. **Key innovation:** Two-phase workflow separates data collection from AI analysis, enabling true AI-powered review summarization instead of simple text truncation. ## When to Use Use when you need to: - Generate code review reports for GitHub repositories in Chinese format - Track review activity over time periods (this month, last week, last 15 days) - Export review data to Chinese-format Excel with 13 specific columns - Leverage AI for intelligent review analysis (problem description, impact analysis) instead of simple truncation - Use Excel template file for consistent formatting **DO NOT use for:** - Local git repository analysis (use `git log` instead) - General PR statistics without review summaries - Issues or other GitHub artifacts - Simple keyword-based classification without AI understanding ## Core Pattern **Before (without skill):** ```bash # No AI integration - simple truncation to 50 chars # No template usage - create new Excel every time # Inconsistent problem classification based on keyword matching # No two-phase workflow (data + AI analysis mixed together) ``` **After (with skill):** ```bash # Two-phase workflow: Data collection → AI analysis # AI-powered intelligent summarization # Template-based Excel generation (consistent formatting) # Structured problem classification (15 types, severity levels) ``` ## Decision Flow ```dot digraph github_review_flow { rankdir=LR; node [shape=box, style="rounded,filled", fillcolor=lightgray]; start [label="Start: Generate review report", fillcolor=lightgreen]; parse_time [label="Parse time range\n(this month, last week, etc.)"]; fetch_prs [label="Fetch PRs via\ngh pr list API"]; any_prs [label="Any PRs found?", shape=diamond, fillcolor=lightyellow]; no_prs [label="Report empty\nsuggest alternatives", fillcolor=lightcoral]; filter_reviewer [label="Filter by reviewer patterns\n(include/exclude)"]; fetch_details [label="Fetch PR details via\ngh pr view"]; extract_reviews [label="Extract original review\ncontent (not summarization)"]; ai_analyze [label="AI Analysis:\nAnalyze reviews and generate\nproblem description & impact", shape=diamond, fillcolor=lightblue]; use_template [label="Use Excel template\n(读取模板文件)", shape=diamond, fillcolor=lightblue]; generate_excel [label="Generate Excel with AI-processed data\n基于模板填充数据", fillcolor=lightblue]; done [label="✅ Deliver report", fillcolor=lightgreen]; start -> parse_time -> fetch_prs -> any_prs; any_prs -> no_prs [label="No"]; any_prs -> filter_reviewer [label="Yes"]; filter_reviewer -> fetch_details -> extract_reviews; extract_reviews -> ai_analyze; ai_analyze -> use_template [label="Yes"]; ai_analyze -> generate_excel [label="No"]; use_template -> generate_excel -> done; } ``` ## Key decisions: **AI Integration:** - **When to use AI:** Only for review analysis (problem description, impact analysis), NOT for data collection - **Data collection phase:** Only use GitHub API, preserve original review content - **AI analysis phase:** Each review → AI prompt → generate structured fields - **Template usage:** Read template first, then append data (preserve formatting/styling) **Template file:** - Location: `resources/模块名-代码走查报告-template.xlsx` - Contains: Empty rows with proper columns and formatting - Purpose: Preserve formatting (width, colors, fonts, borders, header styles) - Read before generating, write only new data rows **Font specification:** - **Required font:** "CESI宋体-GB18030" (CESI Songti GB18030) - Template must use this font for Chinese characters to ensure proper rendering - When creating template, apply this font to all cells containing Chinese text - If template doesn't exist, programmatically set font when creating Excel **Two-phase workflow:** 1. Collect data (GitHub API) → Store original reviews 2. AI analyze each review → Generate problem description, impact analysis 3. Load template → Append AI-processed data → Save Excel **Template fallback:** If template file doesn't exist, create basic template structure programmatically (columns only, no formatting) ## Implementation ### Step 1: Parse Time Range Convert relative time expressions to absolute date ranges. Use `parse_time_range()` in `generator.py`. | Input | Conversion | |-------|------------| | "这个月" | First day of current month to today | | "上周" | Last Monday to last Sunday | | "这半个月" | 15 days ago to today | | "这个星期" | Last Monday to today | | "YYYY-MM-DD to YYYY-MM-DD" | Use exact dates | ### Step 2: Fetch PRs (Data Collection Phase) Use `gh pr list` with search query: ```bash gh pr list \ --repo linuxdeepin/dde-cooperation \ --state merged \ --search "merged:${start_date}..${end_date} base:master" \ --json number,title,author,createdAt,mergedAt,mergedBy,baseRefName,url,reviews,comments ``` Filter by branch: append `base:master` to search query. **Important:** This is DATA COLLECTION only. Do NOT summarize or process reviews yet. Store original content. **Filter Version Update Commits:** Skip PRs that are version updates only. Use `is_version_update_pr()` to filter out: - PR titles matching: `Update version to X.Y.Z`, `Bump version to X.Y.Z`, `Version X.Y.Z` - PRs with only version changes in commits - Commits by `deepin-ci-robot` (considered invalid, see Step 5) ```python def is_version_update_pr(pr): """Check if PR is only about version update""" title = pr.get('title', '').lower() patterns = [ r'^update version to \d+\.\d+\.\d+', r'^bump version to \d+\.\d+\.\d+', r'^version \d+\.\d+\.\d+$', ] for pattern in patterns: if re.search(pattern, title): return True return False # Filter out version update PRs if is_version_update_pr(pr): continue ``` ### Step 3: Filter Reviewers Use `should_include_reviewer()` in `generator.py` with fnmatch patterns: | Pattern | Matches | Does NOT Match | |---------|----------|----------------| | `sourcery-*` | sourcery-ai, sourcery-bot | liuzheng | | `*-bot` | any name ending with "-bot" | sourcery-ai, liuzheng | **Rule:** Exclude checked first, then include. Include requires matching an include pattern AND not matching any exclude pattern. **Important:** Only validate reviewer identity, NOT review content quality. Keep all valid reviews for AI analysis. **Reviewer Definitions:** - **提出人 (Proposed by):** The person who submitted the review comments on the PR - **解决人 (Resolved by):** The person who submitted/merged the PR (mergedBy field) These fields are distinct - one provides review feedback, the other implements the changes. **Priority: Human over AI:** When both human and AI reviews exist on the same PR: - If a human reviewer (e.g., zhangs, liuzheng) added review comments: **ONLY include human reviews, ignore AI reviews** - If only AI reviews exist (e.g., sourcery-ai): Include those - **AI reviewers to exclude:** sourcery-ai, sourcery-bot, deepin-ci-robot - **Example:** If PR has reviews from zhangs (human) and sourcery-ai (AI), only include zhangs's review ```python def should_include_reviewer(author): """Check if reviewer should be included in report""" # Always exclude invalid automated accounts EXCLUDE_REVIEWERS = [ 'deepin-ci-robot', 'sourcery-ai', 'sourcery-bot', ] if author in EXCLUDE_REVIEWERS: return False # Include by default for other users return True def prioritize_human_reviews(reviews): """Filter reviews to prioritize human over AI""" # Check if any human review exists has_human = any( not should_exclude_as_ai(review['author']['login']) for review in reviews ) # If human reviews exist, exclude all AI reviews if has_human: return [ review for review in reviews if not should_exclude_as_ai(review['author']['login']) ] else: # If only AI reviews, include all return reviews ``` **Reviewer Definitions:** - **提出人 (Proposed by):** The person who submitted the review comments on the PR - **解决人 (Resolved by):** The person who submitted/merged the PR (mergedBy field) These fields are distinct - one provides review feedback, the other implements the changes. ### Step 4: Fetch PR Details (Data Collection Phase) For each PR, use `gh pr view` to get full data: ```bash gh pr view ${pr_number} --repo linuxdeepin/dde-cooperation \ --json reviews,comments ``` **Important:** Collect ALL reviews (valid and invalid), but only send VALID reviews to AI. This allows AI to judge validity. **Ignore Invalid Commits:** - **deepin-ci-robot:** All commits by this user are considered invalid and should be excluded - **Version updates:** PRs with titles matching version update patterns (see Step 2) - These should be filtered out before review extraction to avoid cluttering the report ### Step 5: Extract Original Reviews Extract review data WITHOUT summarization. Store original content for AI analysis. ```python # In extract_review_suggestions(): for review in pr_data.get('reviews', []): author = review.get('author', {}).get('login', '') body = review.get('body', '') # Original content, NOT summarized state = review.get('state', '') # Apply reviewer filter if not should_include_reviewer(author): continue # Store for AI analysis valid_reviews.append({ 'original_content': body, # Original text 'reviewer': author, 'review_time': review.get('submittedAt', ''), 'pr_number': pr.get('number', ''), }) ``` ### Step 6: AI Analysis (AI Integration Phase) For each valid review, generate AI-powered analysis: **AI Prompt Template:** ``` 你是代码review专家。请分析以下GitHub review内容,并提供结构化分析: **原始Review内容:** ${original_review_content} **Review上下文:** - PR标题: ${pr_title} - Review者: ${reviewer} - Review时间: ${review_time} **任务:** 1. 判断这是否为有效的代码review(不是简单的批准或自动评论) 2. 提取核心问题点 3. 分类问题类型(从15种类型中选择) 4. 评估严重程度(严重或一般) 5. 判断问题来源(commit log=1, 代码=2, 注释=3) - 默认为"代码"(2) - 只有当review明确提到注释问题才选"注释"(3) - 只有当review明确提到commit信息/格式问题才选"commit log"(1) 6. 生成中文问题描述(≤50字) 7. 生成影响分析(≤20字) 8. 如果无明显影响,填"无" **输出格式(JSON):** ```json { "is_valid_review": true/false, "problem_type": 1-15, "problem_source": 1/2/3, "problem_description": "中文问题描述≤50字", "impact_analysis": "中文影响分析≤20字(或"无")", "confidence": high/medium/low, "reasoning": "简要说明判断依据" } ``` **Implementation:** ```python # In generate_review_report(): for review in valid_reviews: ai_prompt = build_ai_prompt(review, pr_info) # Call AI model via skill_mcp or subprocess ai_result = call_ai_analysis(ai_prompt) # Parse JSON result result = json.loads(ai_result) # Add to processed list processed_reviews.append({ '序号': serial_number, '包名': module_name, '仓库地址': f"https://github.com/{repo}", '代码提交地址': pr.get('url', ''), '问题来源': PROBLEM_SOURCES.get(result.get('problem_source', 2), '代码'), '问题描述': result['problem_description'], '严重程度': result.get('severity', '一般'), '影响分析': result['impact_analysis'], '问题类型': PROBLEM_TYPES.get(result['problem_type'], '其他'), '提出人': review['reviewer'], # Reviewer who submitted the review comment '提出时间': review_date_only, '解决人': merged_by, # Person who submitted/merged the PR '计划解决时间': merged_date_only, '实际解决时间': merged_date_only, '提出人确认是否验收通过': "是", '问题状态': problem_status, # "Close" if merged, "Open" otherwise }) ``` **AI Models Available:** - Claude: Use direct reasoning (for small datasets) - OpenAI: Use `openai` MCP or API - DeepSeek: Use deepseek API or MCP - Local LLM: Use local model APIs **Fallback:** If AI unavailable, use original content as-is (no analysis) ### Step 7: Load Template ```python # Read template file template_path = 'resources/模块名-代码走查报告-template.xlsx' df_template = pd.read_excel(template_path, engine='openpyxl') # Verify template structure required_columns = ['序号', '包名', '仓库地址', '代码提交地址', '问题来源', '问题描述', '严重程度', '影响分析', '问题类型', '提出人', '提出时间', '解决人', '计划解决时间', '实际解决时间', '提出人确认是否验收通过', '问题状态'] if set(required_columns) != set(df_template.columns): print("⚠️ Template column mismatch. Creating basic template.") df_template = pd.DataFrame(columns=required_columns) # If creating template, set required font from openpyxl.styles import Font # Load workbook to set styles wb = load_workbook(template_path) if os.path.exists(template_path) else Workbook() ws = wb.active # Set font for header and data rows chinese_font = Font(name='CESI宋体-GB18030', size=11) for row in ws.iter_rows(): for cell in row: cell.font = chinese_font wb.save(template_path) ``` **Template preservation rules:** - Read template with formatting - Append data using `pd.concat()` or direct indexing - Save with `to_excel(index=False, engine='openpyxl')` - Template styling (widths, borders, colors, header styles) will be preserved - **Font requirement:** Use "CESI宋体-GB18030" for all Chinese text ### Step 8: Generate Excel Report ```python # Combine template data with AI-processed reviews df_final = pd.concat([df_template, df_processed], ignore_index=True) # Save to Excel output_file = f"{module_name}-代码走查报告-{yyyymmdd}.xlsx" df_final.to_excel(output_file, index=False, engine='openpyxl') ``` ## Quick Reference ### Two-Phase Workflow | Phase | Purpose | Tools | |-------|---------|-------| | 数据收集 | Fetch PRs via `gh pr list/view`, extract original reviews | GitHub CLI, generator.py | | AI分析 | Analyze reviews, generate problem description & impact analysis | AI models (Claude, OpenAI, DeepSeek) | | 报表生成 | Load template, append data, preserve formatting | pandas, openpyxl | ### AI Prompt Template Structure ```python AI_PROMPT_TEMPLATE = """ 你是代码review专家。请分析以下GitHub review内容,并提供结构化分析: **原始Review内容:** {original_content} **Review上下文:** - PR标题: {pr_title} - Review者: {reviewer} - Review时间: {review_time} **任务:** 1. 判断这是否为有效的代码review(不是简单的批准或自动评论) 2. 提取核心问题点 3. 分类问题类型(从15种类型中选择) 4. 评估严重程度(严重或一般) 5. 判断问题来源(commit log=1, 代码=2, 注释=3) - 默认为"代码"(2) - 只有当review明确提到注释问题才选"注释"(3) - 只有当review明确提到commit信息/格式问题才选"commit log"(1) 6. 生成中文问题描述(≤50字) 7. 生成影响分析(≤20字) 8. 如果无明显影响,填"无" **输出格式(JSON):** {{ "is_valid_review": true/false, "problem_type": 1-15, "problem_source": 1/2/3, "problem_description": "中文问题描述≤50字", "impact_analysis": "中文影响分析≤20字(或"无")", "confidence": high/medium/low, "reasoning": "简要说明判断依据" } """ ``` ### Excel Report Columns (13 total) | 列名 | 说明 | 来源 | |--------|------|------| | 序号 | 自增序号(1,2,3...) | Auto-increment | | 包名 | --module-name 参数 | 用户指定 | | 仓库地址 | `https://github.com/{repo}` | GitHub URL | | 代码提交地址 | `pr['url']` | PR链接 | | 问题来源 | 映射表(1-3,默认为代码2) | commit log(1) / 代码(2) / 注释(3) | | 问题描述 | AI生成的中文描述(≤50字) | AI分析结果 | | 严重程度 | 问题类型8(安全)、12(内存)="严重",其他="一般" | AI判断 | | 影响分析 | AI生成的影响分析(≤20字或"无") | AI分析结果 | | 问题类型 | 映射表(1-15) | AI分类结果 | | 提出人 | reviewer登录名(提出review备注的人) | GitHub API | | 提出时间 | YYYY-MM-DD格式 | review['submittedAt'] | | 解决人 | mergedBy登录名(提交PR的人) | GitHub API | | 计划解决时间 | mergedAt YYYY-MM-DD | GitHub API | | 实际解决时间 | mergedAt YYYY-MM-DD | GitHub API | | 提出人确认是否验收通过 | 固定值"是" | 硬编码 | | 问题状态 | merged="Close", not merged="Open" | 判断逻辑 | ### Time Range Examples | Expression | Meaning | |------------|---------| | `this month` | 1st of current month to today | | `last month` | 1st of last month to last day of last month | | `last week` | Last Monday to last Sunday | | `this week` | Last Monday to today | | `last 15 days` | 15 days ago to today | ### gh PR Commands ```bash # Basic query gh pr list --repo linuxdeepin/dde-cooperation --state merged # With date and branch filters gh pr list --repo linuxdeepin/dde-cooperation --state merged \ --search "merged:2026-01-01..2026-01-31 base:master" # With JSON output gh pr list --repo linuxdeepin/dde-cooperation --state merged \ --json number,title,author,createdAt,mergedAt,mergedBy,baseRefName,url,reviews,comments # Fetch single PR details gh pr view 123 --repo linuxdeepin/dde-cooperation \ --json reviews,comments ``` ## Common Mistakes | Mistake | Why It's Wrong | Fix | |---------|----------------|-----| | Not using AI for review analysis | AI analysis is the core feature, not optional | Use AI to analyze each review and generate structured fields | | Creating Excel without template | Loses formatting consistency | Read template first, then append data | | Summarizing reviews manually | Misses AI's intelligent analysis | Let AI do the analysis | | Not preserving template formatting | Template has styling for a reason | Use pd.concat() to append, preserve formatting | | Mixing data collection with AI analysis | Two separate phases | Separate concerns: first collect all data, then analyze with AI | | Not validating reviews before AI | AI can judge validity itself | Send all to AI, let it determine validity | | Processing reviews individually | Slows down AI | Batch process: collect all, then send to AI as array | | Missing error handling | AI may fail | Try-catch and provide meaningful error messages | | Wrong default problem source value | Default should be "代码"(2), not "注释"(3) | Use result.get('problem_source', 2) and map correctly | | **Using Chinese for "问题状态"** | Must be English "Close" or "Open" | Change merged="关闭" to merged="Close" | | **Not filtering version update PRs** | Clutters report with non-code changes | Filter out "Update version to X.Y.Z" PRs | | **Including deepin-ci-robot commits** | Automated commits are not real reviews | Exclude deepin-ci-robot entirely | | **Not prioritizing human reviews** | AI reviews are lower quality than humans | Filter out AI reviews when human reviews exist | | **Not using correct Chinese font** | Text rendering may break | Use "CESI宋体-GB18030" for all Chinese text | ## Rationalization Table | Excuse | Reality | |--------|---------| | "AI integration is complex" | Two-phase design is cleaner and more powerful | AI is meant for intelligent tasks, not complex | | "Template file is unused" | It preserves formatting and ensures consistency | Read template, append data | | "AI might be slow" | AI analysis is worth the quality improvement | Better quality summaries save review time | | "Can just use keyword matching" | Simple matching misses nuance | AI understands context and semantics | | "Need to implement AI model" | Use available AI tools or services | Not implementing from scratch | | "JSON parsing is error-prone" | AI returns structured JSON | Use proper error handling | | "Chinese text works with default font" | Chinese characters may not render correctly | Use "CESI宋体-GB18030" for proper rendering | | "deepin-ci-robot commits should be included" | They are automated, not human reviews | Exclude automated accounts entirely | | "AI reviews are useful too" | Human reviews have higher quality and priority | Filter out AI when human reviews exist | | "Version updates need review too" | They are routine changes, not code review | Filter out version update PRs | ## Red Flags - STOP and Start Over - Creating Excel without reading template - Processing reviews before AI analysis (should be two-phase) - Generating "问题描述" and "影响分析" without AI - Not preserving template formatting when using it - Validating reviews in data collection phase (AI should do it) - Not batching AI requests (slow and inefficient) - Using simple truncation instead of AI analysis - Changing reviewer/column specifications without skill update - Hardcoding problem source default value to anything other than "代码"(2) - Not letting AI determine problem source (use AI's judgment, don't assume) - **Using Chinese for "问题状态" (must be "Close" or "Open")** - **Not ignoring deepin-ci-robot commits** - **Including AI reviews when human reviews exist** - **Not using "CESI宋体-GB18030" font for Chinese text** **All of these mean: STOP. Re-read skill. Start over with correct two-phase approach.**