--- name: research-agent-optimization description: Optimize the research agent for rate limit handling, API call efficiency, web search integration fixes, and improved streaming UX with granular progress updates and source attribution. --- # Research Agent Optimization ## Scope - Project root: `/home/bender/classwork/Thesis` - Backend: `backend/news_research_agent.py`, `backend/app/api/routes/research.py`, `backend/app/services/news_research.py` - Frontend: `frontend/app/search/page.tsx`, `frontend/lib/api.ts` - Configuration: `backend/app/core/config.py` ## Problem Statement 1. **Rate Limiting**: Gemini API hits 429 quota exceeded errors during research and article analysis 2. **Web Search**: DuckDuckGo tool integration has naming issues (not properly initialized) 3. **Unclear Progress**: Research streaming shows generic "Still working..." instead of specific tool calls 4. **JSON in Response**: Results show raw JSON blocks instead of formatted source cards 5. **Redundant API Calls**: Multiple internal search calls without caching/deduplication ## Required Outcomes - Graceful rate limit handling with exponential backoff and quota monitoring - Working web search tool with proper DuckDuckGo initialization - Verbose streaming events showing real tool execution (web_search, news_search, internal_news_search) - Research results rendered with inline source cards (not JSON blocks) - Optimized API calls: batch searches, cache semantic results, reuse internal knowledge base - Clear error messages when quota is exceeded ## Workflow ### 1. API Call Optimization - Implement request batching in `search_internal_news` tool - Add caching layer for semantic search results (avoid duplicate queries within 5min window) - Combine web_search + news_search into single result set - Track API call counts per session and warn before quota exhaustion - Add exponential backoff retry logic (1s, 2s, 4s, 8s max) **Files:** - `backend/news_research_agent.py` - tools and caching - `backend/app/services/news_research.py` - request batching helpers ### 2. Rate Limit & Quota Handling - Add try/catch wrapper around Gemini calls - Detect 429 errors and return user-friendly message ("API Rate Limit: ...please wait a moment...") - Add optional `--skip-gemini-analysis` mode for article analysis when quota is low - Log quota usage and remaining tokens - Set model to `gemini-2.0-flash` (faster, lower token cost) instead of `gemini-2.0-flash-exp` **Files:** - `backend/app/core/config.py` - error handling wrapper, model selection - `backend/app/api/routes/research.py` - HTTP error responses - `backend/news_research_agent.py` - LLM call error handling ### 3. Web Search Tool Fix - Verify DuckDuckGo import: `from duckduckgo_search import DDGS` (not `ddgs` or `DuckDuckGo`) - Ensure `web_search` and `news_search` tools are properly bound to LLM - Add fallback to internal search if web search fails - Log tool execution with query and result count **Files:** - `backend/news_research_agent.py` - tool definitions and error handling - Use `exa-code` to verify current DuckDuckGo API patterns ### 4. Streaming Progress Clarity - Expand SSE event types: `tool_start` includes tool name + query parameters - Map tool events to user-friendly messages: - `web_search("climate change")` → "Searching web for: climate change..." - `news_search(keywords="COP30")` → "Searching news for: COP30..." - `search_internal_news(query)` → "Searching internal knowledge base..." - `fetch_article_content(url)` → "Reading article: [title/domain]..." - Add timestamps and tool execution duration - Emit status updates every 3-5 seconds if no tool activity **Files:** - `backend/news_research_agent.py` - streaming generator - `backend/app/api/routes/research.py` - SSE formatting ### 5. Frontend Result Rendering - Remove JSON blocks from response text - Render referenced articles in a "Sources" section below the answer - Use article cards: title, source, date, image thumbnail - Make cards clickable to open article detail modal - Group sources by retrieval method (semantic, web search, internal) **Files:** - `frontend/app/search/page.tsx` - message rendering and sources grid - `frontend/lib/api.ts` - response parsing ### 6. Error Handling & User Feedback - Detect and handle: - 429 quota exceeded → "API Rate Limit: The AI service has reached its rate limit. Please wait a moment and try again." - Connection timeout → "Request Timeout: The research took too long. Try a simpler query." - Tool execution failure → "Tool [name] failed: [reason]. Continuing with alternative search..." - Add retry prompt on error (not automatic, user chooses) - Log all errors with request ID for debugging **Files:** - `backend/app/api/routes/research.py` - error formatting - `frontend/app/search/page.tsx` - error UI and retry logic ## Checks ### API Optimization - Verify semantic search results are cached (no duplicate calls) - Check web_search and news_search return results (not empty) - Confirm tool execution logs show cache hits for repeated queries ### Rate Limit Handling - Trigger 429 error and verify graceful fallback message displays - Confirm no stack traces shown to user - Check logs show quota status and retry timing ### Web Search - Query "climate change" and verify web_search returns 5+ results - Confirm DuckDuckGo DDGS class is properly instantiated - Check news_search returns recent news articles ### Streaming Clarity - Monitor SSE events for tool_start with query details - Verify timestamps increment correctly - Confirm "Still working..." message only shows after 30s inactivity ### Frontend Rendering - Verify research answer is plain text (no JSON) - Check "Sources" section appears with article cards - Confirm card click opens article detail modal - Verify no duplicate sources (de-duplication working) ### Error Scenarios - Submit invalid query and verify doesn't crash - Test with network disconnect and check timeout message - Simulate quota exceeded (403) and verify user sees rate limit message ## Implementation Checklist - [ ] Add retry decorator with exponential backoff to Gemini client - [ ] Implement request cache in `search_internal_news` with 5min TTL - [ ] Fix DuckDuckGo tool initialization (verify DDGS import) - [ ] Update `research_stream()` to emit granular tool start/result events - [ ] Map tool events to human-readable status messages in API endpoint - [ ] Remove JSON block from final answer text - [ ] Add "Sources" section with article cards to frontend - [ ] Update error handling for 429 quota exceeded - [ ] Add streaming status animation to UI - [ ] Write tests for quota handling and web search integration