# Long-Running & Life Augmentation Agents > Deep research into persistent, cloud-hosted, proactive AI agents that autonomously manage tasks, delegate to sub-agents, and augment daily life. ## Table of Contents 1. [Vision & Definition](#1-vision--definition) 2. [What Has Been Tried](#2-what-has-been-tried) 3. [Architectural Patterns](#3-architectural-patterns) 4. [Proactive vs. Reactive Agents](#4-proactive-vs-reactive-agents) 5. [Memory & Personalization for Long-Running Agents](#5-memory--personalization-for-long-running-agents) 6. [User Interaction Patterns](#6-user-interaction-patterns) 7. [What Works & What Fails](#7-what-works--what-fails) 8. [Creative Use Cases for a Life Augmentation Agent](#8-creative-use-cases-for-a-life-augmentation-agent) 9. [Privacy, Security & Trust](#9-privacy-security--trust) 10. [Recommended Architecture](#10-recommended-architecture) 11. [Key Papers & References](#11-key-papers--references) 12. [User Sentiment & Real-World Validation](#12-user-sentiment--real-world-validation) 13. [Audio Lifelogging & Continuous Transcription](#13-audio-lifelogging--continuous-transcription) --- ## 1. Vision & Definition A **life augmentation agent** is a persistent, cloud-hosted AI system that runs continuously (or is awakened by triggers), maintains a model of its user's life, and proactively delegates tasks to specialized sub-agents to improve the user's daily experience. Unlike a reactive chatbot that waits for prompts, this agent: - **Runs indefinitely** in the background (cloud-native, always-on) - **Proactively initiates** actions without explicit user commands - **Delegates work** to specialized sub-agents (research, scheduling, monitoring, communication) - **Maintains long-term memory** of user preferences, history, and context - **Learns and adapts** from user feedback over time - **Interacts minimally** — surfacing only what matters, when it matters This is the "user-centric agent" vision articulated by Zhang et al. (2026): *"The future of digital services should shift from a platform-centric to a user-centric agent... prioritizing privacy, aligning with user-defined goals, and granting users control over their preferences and actions."* (arXiv:2602.15682) --- ## 2. What Has Been Tried ### 2.1 Academic Research (2023–2026) The idea of proactive, autonomous agents has exploded in academic research, particularly from late 2024 onward: **Foundational Work:** - **Generative Agents** (Park et al., 2023) — Simulated 25 agents with day-long autonomy in a Smallville sandbox. Demonstrated memory retrieval, reflection, and planning over extended timeframes. Showed that believable long-running behavior is possible but requires careful memory architecture. - **Voyager** (Wang et al., 2023) — Lifelong learning agent in Minecraft. Demonstrated skill accumulation over 100+ hours, with a persistent skill library. Key insight: agents can self-improve if they maintain a growing capability repertoire. **Proactive Agent Research (2024–2026):** - **Proactive Agent** (Lu et al., 2024) — First systematic approach to shifting LLMs from reactive to proactive. Created ProactiveBench (6,790 events). Found that fine-tuned models achieve 66.47% F1 in proactive assistance — *meaning even the best models fail to anticipate user needs ~1/3 of the time*. - **Ask-before-Plan** (Zhang et al., 2024, EMNLP) — Showed that proactive clarification before action dramatically improves planning quality. The Clarification-Execution-Planning (CEP) framework with 3 specialized agents outperforms monolithic approaches. - **ContextAgent** (Yang et al., 2025, NeurIPS) — First context-aware proactive agent using wearable sensory data. 8.5% higher accuracy in proactive predictions vs baselines. Key: multi-dimensional context extraction from video/audio on AR glasses. - **ProPerSim** (Kim et al., 2025, ICLR 2026) — Proactive AND personalized assistant simulation. ProPerAssistant uses retrieval-augmented preference-aligned learning that continually adapts via user ratings. Key finding: combining proactivity + personalization is more than additive. - **BAO** (Yao et al., 2026) — Behavioral Agentic Optimization via RL. Addresses the critical autonomy-satisfaction tradeoff: overly proactive agents annoy users. BAO's behavior regularization suppresses redundant interactions. - **IntentRL** (Luo et al., 2026) — Trains proactive agents to clarify latent user intents before long-running research tasks. Addresses the "autonomy-interaction dilemma": high autonomy on ambiguous queries leads to prolonged execution with unsatisfactory outcomes. - **ProAgentBench** (Tang et al., 2026) — 28,000+ events from 500+ hours of real user sessions. Key finding: *real-world data shows bursty interaction patterns (B=0.787) completely absent in synthetic data*. Models trained on synthetic data fail in production. ### 2.2 Industry & Open-Source Projects **AutoGPT / BabyAGI (2023):** The original "run an LLM in a loop forever" experiments. AutoGPT set a high-level goal ("grow a business") and looped GPT-4 with tools. Key learnings: - **Rapid context degradation**: After 10-20 steps, the agent would lose track of its original goal, hallucinate sub-tasks, or loop endlessly between the same actions - **No principled stopping criteria**: Agents couldn't determine when they were "done" or when to yield to humans - **Cost explosion**: Continuous LLM invocation without intelligent routing consumed enormous API budgets - **Compounding errors**: Each LLM call had some error rate; over many steps, errors compounded catastrophically - As Latent.Space summarized: *"AutoGPTs have continuous modes which are fully autonomous but very likely to go wrong and therefore have to be closely monitored"* **Lindy.ai (2024–present):** A production personal AI assistant that manages inbox, meetings, and calendar. Focuses on narrow, well-defined tasks rather than open-ended autonomy. Their approach: tightly scoped agents for specific workflows (email triage, meeting prep, calendar optimization). Reports 400K+ users. Key insight: *narrow scope + high reliability > broad ambitions + frequent failures*. **Replit Agent (2024–2025):** Long-running coding agent for persistent development sessions. LangGraph-backed with checkpointing. Michele Catasta (VP AI): *"It's easy to build the prototype of a coding agent, but deceptively hard to improve its reliability."* They invested heavily in fine-grained control flow rather than pure LLM autonomy. **Rabbit R1 / Humane AI Pin (2024):** Hardware-based "life augmentation" devices. Both launched to enormous hype and largely failed. Key lessons: - **Latency kills**: Users expect sub-second responses for frequent interactions. Cloud-hosted LLM inference is too slow for "always-on" use - **Battery life**: Continuous sensing + cloud communication drains batteries within hours - **UI/UX mismatch**: Ambient AI assistants need fundamentally different interaction patterns than chat interfaces - **Overpromise, underdeliver**: Marketing implied general intelligence; reality was narrow, unreliable capabilities **12-Factor Agents (HumanLayer, 2025):** Practical engineering guide from Dex Horthy, who talked to 100+ SaaS builders. Key insights for long-running agents: - **Factor 5: Unify execution state and business state** — Agent state must be durable and inspectable - **Factor 6: Launch/Pause/Resume** — Agents must be pausable and resumable (critical for long-running tasks) - **Factor 7: Contact humans with tool calls** — Human interaction is just another tool, not a special case - **Factor 10: Small, focused agents** — Monolithic agents fail; specialized agents composed together succeed - **Factor 11: Trigger from anywhere** — Agents should respond to cron jobs, webhooks, user messages alike - **Factor 12: Stateless reducer** — Each agent invocation should be a pure function of its context **LangGraph Cloud (LangChain, 2024):** Purpose-built infrastructure for deploying long-running agents. Provides: - Horizontally-scaling task queues with Postgres checkpointer - Background jobs for long-running tasks (polling or webhook completion) - Cron jobs for scheduled tasks - Double-texting handling (managing new user input on running threads) - Time-travel debugging (inspect, edit, resume past states) --- ## 3. Architectural Patterns ### 3.1 Event Sourcing for Agents (ESAA) The ESAA architecture (dos Santos Filho, 2026, arXiv:2602.23193) separates **cognitive intention** from **state mutation**: ``` Agent emits structured intentions (JSON) ↓ Deterministic orchestrator validates ↓ Persists events in append-only log (activity.jsonl) ↓ Applies effects (file writes, API calls) ↓ Projects verifiable materialized view (roadmap.json) ``` Key properties: - **Immutability**: Completed task events are immutable and hash-verified - **Replay**: Full trajectory can be replayed from the event log - **Forensic traceability**: Every decision has a provenance chain - **Multi-agent**: Demonstrated with 4 concurrent agents, different LLMs, 50 tasks - **Separation of concerns**: LLM does thinking, deterministic code does execution This maps almost directly to Temporal.io-style durable workflows and is enormously relevant for a life augmentation agent that needs reliability over days/weeks. ### 3.2 Orchestrator-Workers with Task Queues From the Anthropic "Building Effective Agents" guide (2024): - Central orchestrator LLM breaks down tasks and delegates to worker LLMs - Workers are specialized for specific domains (research, scheduling, communication) - Results synthesized by orchestrator - Key: this pattern is "well-suited for complex tasks where you can't predict the subtasks needed" For a life agent, the orchestrator maintains a persistent task queue: ``` User's life context → Orchestrator → Priority queue → Worker agents → Results → User ↑ | └────────── feedback loop ─────────────────────┘ ``` ### 3.3 Declarative Workflow Orchestration Daunis (2025, arXiv:2512.19769) showed that most agent workflows can be expressed through a unified DSL rather than imperative code. At PayPal scale: 60% reduction in development time, 3x deployment velocity. Complex workflows expressed in <50 lines of DSL vs 500+ lines of imperative code. For life augmentation: define recurring agent workflows declaratively (morning briefing, email triage, expense tracking) and let the runtime handle scheduling, retry, and monitoring. ### 3.4 Internet of Agentic AI Yang & Zhu (2026, arXiv:2602.03145) propose a framework where autonomous, heterogeneous agents across cloud and edge dynamically form coalitions for task-driven workflows. Key concepts: - **Minimum-effort coalition selection**: Find the smallest group of agents to accomplish a task - **Network locality**: Prefer agents close to the data/user - **Economic implementability**: Incentive-compatible coordination - **MCP as coordination layer**: Uses Model Context Protocol above ### 3.5 Peak-Aware Long-Horizon Orchestration APEMO (Shi & DiFranzo, 2026, arXiv:2602.17910) introduces temporal-affective scheduling: - Detects trajectory instability through behavioral proxies - Targets computational resources at **peak moments** (critical decisions) and **endings** (final user-facing output) - Under fixed compute budgets, APEMO consistently enhances trajectory-level quality - Reframes alignment as a **temporal control problem** — critical for agents running over hours/days --- ## 4. Proactive vs. Reactive Agents This is perhaps the most critical dimension for a life augmentation agent. A reactive agent waits for "Hey, do X." A proactive agent notices you need X before you ask. ### 4.1 The Proactivity Spectrum From the research, three levels emerge: | Level | Description | Example | |-------|-------------|---------| | **Reactive** | Responds only to explicit requests | "Schedule a meeting with Bob" | | **Suggestion-based** | Notices patterns and suggests actions | "You usually meet Bob on Tuesdays — should I schedule?" | | **Fully proactive** | Acts autonomously based on context | Notices Bob emailed about Q2 planning, cross-references your calendar, schedules meeting, drafts agenda | ### 4.2 The Autonomy-Satisfaction Tradeoff BAO (2026) identifies a critical Pareto frontier: **more proactive ≠ better**. Overly proactive agents that constantly interrupt with suggestions are *worse* than reactive ones. The sweet spot requires: - **Timing prediction**: When to intervene (ProAgentBench found bursty patterns matter) - **Relevance filtering**: Not every observed intent deserves action - **Cost-of-interruption modeling**: ProMemAssist (2025, UIST) models the user's working memory to balance assistance value vs. interruption cost ### 4.3 Knowledge Gap Navigation PROPER (Kaur et al., 2026) introduces a two-agent architecture: 1. **Dimension Generating Agent (DGA)**: Identifies implicit dimensions the user hasn't considered 2. **Response Generating Agent (RGA)**: Balances explicit needs with discovered implicit needs The insight: proactive agents should address needs the user *doesn't know they have*, not just automate what they've already expressed. This achieved 84% quality gains in single-turn evaluations. ### 4.4 The Advisor/Coach/Delegate Paradigm Zhu et al. (2026, arXiv:2602.12089) ran a 243-person behavioral experiment comparing: - **Advisor**: Proactive recommendations - **Coach**: Reactive feedback on user's actions - **Delegate**: Autonomous execution Key finding: **users preferred the Advisor but achieved the highest gains with the Delegate**. This "preference-performance misalignment" is critical: users *want* to feel in control but *benefit* from delegation. Moreover, delegation created positive externalities — even non-adopters in delegate groups received higher-quality offers. Implication for life agents: default to delegation for high-confidence tasks, with transparent reporting. Allow users to override, but don't make them drive every decision. --- ## 5. Memory & Personalization for Long-Running Agents ### 5.1 O-Mem: Active User Profiling O-Mem (Wang et al., 2025, arXiv:2511.13593) is a memory framework based on **active user profiling**: - Dynamically extracts and updates user characteristics from interactions - Hierarchical retrieval of persona attributes and topic-related context - Achieves SOTA on LoCoMo (51.67%) and PERSONAMEM (62.99%) benchmarks - Key: avoids semantic-grouping-before-retrieval pitfall that loses critical but semantically distant user info ### 5.2 AMemGym: Interactive Memory Benchmarking AMemGym (Cheng et al., 2026, ICLR) reveals that existing memory systems (RAG, long-context LLMs, agentic memory) all have significant gaps in long-horizon conversations. Key insight: **on-policy evaluation** (interactive) reveals failures invisible in off-policy (static) evaluation. ### 5.3 Working Memory Modeling for Proactive Timing ProMemAssist (Pu et al., 2025, UIST) models the user's **cognitive working memory** in real-time: - Represents perceived information as memory items and episodes - Uses encoding mechanisms (displacement, interference) from cognitive psychology - Timing predictor balances assistance value vs. interruption cost - In user studies: more selective assistance, higher engagement vs. LLM-only baseline ### 5.4 The Personalized Agent Survey The comprehensive survey by Xu et al. (2026, arXiv:2602.22680) identifies four interdependent personalization components: 1. **Profile modeling**: Explicit (stated preferences) + implicit (behavioral signals) 2. **Memory**: Short-term working memory + long-term episodic/semantic memory 3. **Planning**: User-adapted goal decomposition and strategy selection 4. **Action execution**: Personalized tool selection and output formatting Key trade-off: **deeper personalization requires more user data**, creating tension with privacy. --- ## 6. User Interaction Patterns ### 6.1 Notification Fatigue The biggest failure mode for proactive agents. From the research: - ProAgentBench shows real user sessions have bursty patterns (B=0.787) — users interact in concentrated bursts, then go silent - During silent periods, proactive interruptions are maximally disruptive - BAO's behavior regularization specifically penalizes "redundant interactions" - Solution: **preference-aligned timing** — learn when the user is receptive ### 6.2 Trust Calibration From "Choose Your Agent" (2026): - Users systematically under-trust delegation despite its superior outcomes - Trust builds through **transparency** (showing reasoning) and **track record** (successful completions) - "Adoption-compatible interaction rules are a prerequisite to improving human welfare" - Start with low-stakes delegation (weather reminders, meeting summaries), build to high-stakes (financial decisions, communication on behalf) ### 6.3 The Right Interface From Egocentric Co-Pilot (2026, WWW): - Smart glasses + multimodal input (speech + gaze) enables "always-on assistive layer over daily life" - But introduces latency requirements: cloud offloading adds ~200ms+ vs. on-device inference - Hierarchical Context Compression supports long-horizon QA over continuous first-person video From "The Next Paradigm" (2026): - Desktop/mobile app with proactive notifications + on-demand deep interaction - Device-cloud pipeline: lightweight on-device model for urgency triage, cloud model for complex reasoning - User should control preference dials (proactivity level, domains to manage, notification frequency) ### 6.4 Human-in-the-Loop as a Tool From 12-Factor Agents: "Contact humans with tool calls" — the agent should model human interaction exactly like any other tool: ``` contact_human( channel="push_notification", message="I found a cheaper flight for your trip to NYC next week. Should I rebook?", urgency="medium", timeout="4h", fallback="keep_current_booking" ) ``` If the human doesn't respond within the timeout, the agent takes the fallback action. This makes human-in-the-loop a first-class, timeout-aware capability rather than a blocking dependency. --- ## 7. What Works & What Fails ### 7.1 What Works | Category | Examples | Why It Works | |----------|----------|-------------| | **Narrow, well-defined tasks** | Email triage, meeting prep, expense categorization | Clear success criteria, bounded scope, verifiable output | | **Information synthesis** | Morning briefings, research summaries, news digests | LLMs excel at summarization; failures are low-cost | | **Schedule optimization** | Calendar management, reminder timing, travel planning | Structured data, clear constraints, measurable outcomes | | **Monitoring & alerting** | Price tracking, deadline monitoring, health metric trends | Event-driven triggers, simple decision logic | | **Routine automation** | Form filling, template generation, data entry | Repetitive tasks with clear patterns | ### 7.2 What Fails | Category | Why It Fails | Reference | |----------|-----------|-----------| | **Open-ended goal pursuit** | Compounding errors over long horizons; context degradation | AutoGPT postmortems | | **High-stakes autonomy** | Users don't trust AI for financial/legal/medical decisions | Choose Your Agent (2026) | | **Social communication** | Nuance, tone, relationship context are hard to model | EchoGuard (2026) | | **Real-time continuous sensing** | Battery, latency, cost constraints for always-on processing | Rabbit R1, Humane AI Pin failures | | **Unbounded task decomposition** | LLMs hallucinate sub-tasks, struggle with novel problem structures | ProActiveBench real-world data | | **Cross-modal understanding** | Combining calendar + email + location + health data coherently | ContextAgent (2025) improvements still limited | ### 7.3 Critical Failure Patterns 1. **The Infinite Loop**: Agent creates task → fails → creates retry task → fails → ... Without proper stopping conditions, agents burn through budgets. Solution: circuit breakers, max iteration limits, exponential backoff. 2. **Context Window Exhaustion**: Long-running agents accumulate history. Eventually the context window fills and the agent "forgets" its goals. Solution: hierarchical memory with compression (Active Context Compression, 2026), event sourcing with materialized views (ESAA). 3. **Preference Drift**: User preferences change over time. An agent trained on January behavior may be wrong by March. Solution: continuous adaptation with recency weighting (O-Mem, ProPerSim). 4. **The 80% Problem**: From 12-Factor Agents — most agent frameworks get you to 80% quality, but the last 20% requires abandoning the framework and building custom. Solution: own your prompts, own your control flow, build on primitives not frameworks. 5. **Notification Fatigue Spiral**: Proactive agent sends too many suggestions → user ignores them → agent interprets silence as neutral → continues sending → user disables agent entirely. Solution: explicit feedback channels, decreasing proactivity in response to non-engagement. --- ## 8. Creative Use Cases for a Life Augmentation Agent ### 8.1 The "Morning Protocol" Agent Every morning, the agent: 1. Scans today's calendar, identifies preparation needs 2. Checks email for anything requiring pre-meeting context 3. Reviews weather + commute for schedule impacts 4. Surfaces any overnight deadline changes or urgent messages 5. Generates a personalized briefing pushed to phone/watch Delegate agents: CalendarAgent, EmailTriageAgent, WeatherAgent, CommutePlanner ### 8.2 The "Background Research" Agent User mentions interest in a topic (via conversation, browsing, or explicit request): 1. Agent queues a deep research task 2. ResearchAgent searches academic papers, blogs, news 3. Runs over hours/days, building a knowledge graph 4. Synthesizes findings into a briefing document 5. Proactively surfaces when user next has free time This is exactly the IntentRL (2026) pattern — clarify intent first, then execute autonomously. ### 8.3 The "Financial Health" Agent Continuous monitoring: - Tracks spending against budget categories - Monitors bills for price increases or better alternatives - Watches subscriptions for unused services - Alerts on unusual charges - Monthly financial health summary Requires: bank API integration, category classification, anomaly detection ### 8.4 The "Social Maintenance" Agent The hardest and most nuanced use case: - Tracks last contact date with important people - Suggests reaching out when it's been too long - Notices birthdays, life events from social media - Drafts (but never sends without approval) personal messages - Manages RSVP tracking Key constraint: **never send on behalf without explicit approval**. This is the "advisor" pattern from Choose Your Agent — suggest, don't act. ### 8.5 The "Health & Wellness" Agent - Monitors sleep patterns (via wearable data) - Tracks medication schedules - Suggests exercise based on calendar gaps - Monitors mood patterns from interaction data (very carefully, with explicit consent) - Connects health data to productivity patterns ProMemAssist (2025) provides the cognitive model: intervene when the user's working memory is overloaded, not when they're in flow state. ### 8.6 The "Travel Optimization" Agent For frequent travelers: - Monitors prices for upcoming trips - Manages loyalty programs and point optimization - Handles rebooking during disruptions - Creates location-aware travel guides - Manages expense reports post-trip ### 8.7 The "Knowledge Worker Copilot" Agent Always-on background support: - Monitors project deadlines across tools (Jira, GitHub, email) - Identifies tasks that are blocked or at risk - Pre-fetches context before meetings (related documents, recent changes) - Suggests task prioritization based on deadlines, impact, and dependencies - Creates end-of-week summaries ### 8.8 The "Home Automation Intelligence Layer" Layer on top of smart home: - Learns household patterns (wake times, preferences, routines) - Proactively adjusts based on calendar (early meeting → earlier alarm) - Coordinates grocery lists with meal planning - Manages maintenance reminders (HVAC filters, car service) - Integrates weather forecasts with heating/cooling optimization --- ## 9. Privacy, Security & Trust ### 9.1 The Privacy Paradox AgentScope (2026, arXiv:2603.04902) evaluates contextual privacy across agentic workflows accessing calendars, email, and personal files. Life augmentation agents need deep access to personal data, but: - Users want personalization (requires data) but fear surveillance - Cross-domain data aggregation (calendar + health + finance) creates sensitive profiles - Current LLM providers process data on external servers ### 9.2 User-Centric Architecture From "The Next Paradigm" (2026): - **On-device processing** for sensitive data (triage, classification) - **Cloud processing** only for complex reasoning, with minimal data transmission - **User-controlled data policies**: what data the agent can access, retain, and share - **Audit logs**: every data access and decision is traceable (cf. ESAA event sourcing) ### 9.3 Trust Hierarchy For a life agent, different actions require different trust levels: | Trust Level | Actions | Control | |------------|---------|---------| | **Full autonomy** | Reading calendar, checking weather, monitoring prices | No approval needed | | **Notify & act** | Sending standard replies, rescheduling non-critical meetings | Notification + undo window | | **Ask & act** | Sending personal messages, making purchases, changing plans | Explicit approval required | | **Never autonomous** | Financial transactions above threshold, medical decisions, legal actions | Human must initiate | --- ## 10. Recommended Architecture Based on all the research, here's a synthesized architecture for a life augmentation agent: ### 10.1 Core Components ``` ┌─────────────────────────────────────────────────────┐ │ Life Agent Core │ │ │ │ ┌───────────┐ ┌────────────┐ ┌──────────────┐ │ │ │ Event Bus │──→│Orchestrator│──→│ Task Queue │ │ │ │(triggers) │ │ (planner) │ │(priority heap)│ │ │ └───────────┘ └────────────┘ └──────────────┘ │ │ ↑ │ │ │ │ │ ↓ ↓ │ │ ┌───────────┐ ┌────────────┐ ┌──────────────┐ │ │ │ User │ │ Memory │ │ Worker Pool │ │ │ │ Interface │←──│ (O-Mem + │ │ (sub-agents) │ │ │ │ (multi-ch)│ │ profiles) │ │ │ │ │ └───────────┘ └────────────┘ └──────────────┘ │ │ │ │ │ │ ↓ ↓ │ │ ┌────────────────────────────┐ │ │ │ Event Store (append-only) │ │ │ │ + State Snapshots │ │ │ └────────────────────────────┘ │ └─────────────────────────────────────────────────────┘ ``` ### 10.2 Key Design Decisions 1. **Event-sourced state** (ESAA pattern): All agent actions recorded in append-only log. State reconstructable from events. Enables debugging, audit, and replay. 2. **Stateless orchestrator** (12-Factor #12): Each orchestrator invocation is a pure function of current state + new event. No hidden state. Enables horizontal scaling and crash recovery. 3. **Priority-based task queue**: Tasks have urgency, importance, and deadline. Orchestrator continuously re-prioritizes. Inspired by APEMO peak-aware scheduling. 4. **Multi-channel input/output**: Triggers from cron (morning briefing), webhooks (email arrival), user messages (chat), sensors (location change). Output via push notifications, email, chat, or ambient display. 5. **Specialized worker agents**: CalendarAgent, ResearchAgent, EmailAgent, FinanceAgent, HealthAgent. Each has narrow scope, specific tools, and measurable outcomes. 6. **Adaptive proactivity**: Use ProPerSim-style user modeling to learn intervention timing. Start conservative (suggestions only), build trust, increase autonomy over time. 7. **O-Mem-style memory**: Active user profiling with hierarchical retrieval. Short-term working memory (today's context), medium-term episodic memory (recent interactions), long-term semantic memory (stable preferences). 8. **Human-as-a-tool**: Human interaction modeled as a tool call with timeout and fallback (12-Factor #7). Never block on human response for time-sensitive tasks. ### 10.3 Technology Stack Recommendations - **Orchestration**: Durable workflow engine (Temporal.io, Azure Durable Functions, or custom event-sourced loop) - **Agent Framework**: Microsoft Agent Framework (for .NET) or LangGraph (for Python) — both support checkpointing and persistence - **Memory Store**: Vector DB (Qdrant/Pinecone) for semantic retrieval + PostgreSQL for structured user profiles - **Event Store**: Append-only log (PostgreSQL with immutable rows, or EventStoreDB) - **Task Queue**: Redis-backed priority queue or cloud-native queue (Azure Service Bus, SQS) - **Notifications**: Multi-channel push (FCM, APNs, email, Slack/Teams webhook) - **LLM**: Tiered — fast/cheap model for triage/classification, powerful model for reasoning/planning - **Monitoring**: OpenTelemetry for distributed tracing across agent invocations ### 10.4 Critical Engineering Principles 1. **Circuit breakers**: Max iterations per task, budget limits per day, error thresholds 2. **Graceful degradation**: If a worker agent fails, others continue operating 3. **Idempotent operations**: Every action can be safely retried 4. **Transparent reasoning**: Every decision should have a human-readable explanation 5. **Progressive trust**: Start with low-autonomy defaults, increase based on track record 6. **Cost accounting**: Track per-task LLM costs, optimize routing to smaller models where possible 7. **Offline-first**: Agent should handle network interruptions gracefully with local queuing --- ## 11. Key Papers & References ### 11.1 Papers Downloaded & Converted (Batch 6) | Paper | arXiv | Key Contribution | |-------|-------|-----------------| | Proactive Agent | 2410.12361 | First systematic reactive→proactive shift; ProactiveBench | | Ask-before-Plan | 2406.12639 | Proactive clarification + CEP framework (EMNLP 2024) | | ContextAgent | 2505.14668 | Wearable sensory context for proactive agents (NeurIPS 2025) | | ProAgent Sensory | 2512.06721 | End-to-end proactive system on AR glasses | | ProPerSim | 2509.21730 | Proactive + personalized assistants (ICLR 2026) | | BAO | 2602.11351 | Behavioral agentic optimization; autonomy-satisfaction Pareto | | IntentRL | 2602.03468 | RL for proactive intent clarification in deep research | | ProAgentBench | 2602.04482 | Real-world proactive assistance benchmark (28K+ events) | | PROPER | 2601.09926 | Knowledge gap navigation for proactivity | | O-Mem | 2511.13593 | Omni memory for personalized long-horizon agents | | Personalized LLM Agents Survey | 2602.22680 | Comprehensive survey: profile, memory, planning, action | | AMemGym | 2603.01966 | Interactive memory benchmarking for long-horizon (ICLR 2026) | | ProMemAssist | 2507.21378 | Working memory modeling for proactive timing (UIST'25) | | Egocentric Co-Pilot | 2603.01104 | Always-on smart glasses agent (WWW 2026) | | Next Paradigm: User-Centric | 2602.15682 | User-centric agent vs platform-centric service | | Choose Your Agent | 2602.12089 | Advisor vs Coach vs Delegate tradeoffs (N=243) | | ESAA | 2602.23193 | Event sourcing architecture for autonomous agents | | Alignment in Time | 2602.17910 | Peak-aware orchestration for long-horizon systems | | Internet of Agentic AI | 2602.03145 | Distributed agent coalition formation | | Declarative Agent Workflows | 2512.19769 | DSL for agent workflow orchestration (PayPal scale) | ### 11.2 Blog Posts & Practical Resources | Resource | Key Takeaway | |----------|-------------| | [Anthropic: Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents) | Start simple, use orchestrator-workers for complex tasks, agents for open-ended problems | | [12-Factor Agents](https://github.com/humanlayer/12-factor-agents) | Stateless reducer pattern, pause/resume, human-as-tool, small focused agents | | [LangGraph Cloud](https://blog.langchain.dev/langgraph-cloud/) | Task queues, cron, background jobs, time-travel debugging for long-running agents | | [Latent.Space: Anatomy of Autonomy](https://www.latent.space/p/agents) | AutoGPT/BabyAGI dissection, 5 capability levels, self-driving car analogy | | Lindy.ai | Production personal assistant — narrow scope + high reliability wins | ### 11.3 Papers Already in Collection (Relevant) - `generative-agents-stanford-2023` — Long-running agent simulation patterns - `voyager-lifelong-learning-2023` — Lifelong skill accumulation - `memgpt-llm-operating-system-2023` — OS-like memory management for agents - `caveagent-stateful-runtime-2026` — Stateful agent runtime with persistent containers - `ariadnemem-lifelong-memory-2026` — Lifelong episodic-semantic memory - `anatomy-agentic-memory-2026` — Comprehensive memory architecture taxonomy - `active-context-compression-2026` — Autonomous memory management - `agentic-ai-frameworks-protocols-2025` — Architecture & protocol landscape --- ## 12. User Sentiment & Real-World Validation > Added March 2026. Synthesized from 11 papers with real user studies plus web sources (Reddit, product reviews). Full analysis: `blueprints/life-agent/docs/user-sentiment-research.md`. ### 12.1 The Preference-Performance Paradox The single most important finding across all research: **users systematically prefer the interaction mode that gives them worse outcomes.** Choose Your Agent (2026, N=243, 6,561 trading decisions): - 44% preferred Advisor ("show me options, I'll decide") - 19.3% preferred Delegate ("just do it for me") - 21.4% preferred **no AI at all** - Yet Delegate produced statistically better economic outcomes (β=0.084, p=.034) - Users who preferred AI reported 20% higher mental effort than autonomy-seekers (p<.01) — they willingly accepted cognitive load for perceived control This means life agent designers face a fundamental tension: the optimal interaction pattern (delegation) is the one users resist most. The solution is progressive trust — start with advisory, earn the right to delegate through demonstrated reliability. ### 12.2 The "Less Is More" Principle (Empirically Proven) ProMemAssist (UIST 2025, N=12): delivered **60% fewer messages** while achieving **2.6× higher positive engagement** and lower frustration (NASA-TLX: 2.32 vs 3.14, p<.05). The mechanism: modeling the user's working memory load in real-time and deferring messages when cognitive capacity is low. ProPerSim (ICLR 2026, 32 personas): the system started at 24 recommendations/hour and **naturally converged to ~6/hour** through preference learning. This emergent "learning to be quiet" behavior improved satisfaction from 2.2/4 to 3.3/4 over 14 days. BAO (CMU/Salesforce/MIT): without behavior regularization, agent User Involvement Rate shoots from 0.21 to **0.91** — agents pester users 91% of the time. With regularization (information-seeking + over-thinking penalties), UR drops to 0.2148. **Design principle**: Max 6 proactive notifications per hour. Decrease on non-engagement. The cost of a false positive (unnecessary interruption) exceeds the cost of a false negative (missed opportunity to assist). ### 12.3 Trust Fragility Trust is asymmetric — one failure outweighs many successes: - ProMemAssist P5 indicated a single unhelpful suggestion could **break trust with the entire system** - Trust was influenced more by quality of assistance than timing of delivery - Algorithm aversion — tendency to abandon AI after observing even small errors — significantly reduces adoption - "Poorly calibrated proactivity can undermine user trust, agency, and interaction quality" (PROPER, 2026) **Design principle**: Optimize all proactive features for precision over recall. Don't ship a feature that fails >20% of the time. ### 12.4 Real Product Outcomes | Product | Result | Lesson | |---------|--------|--------| | **Lindy.ai** | 400K+ users | Narrow scope + high reliability beats broad ambitions | | **Rabbit R1** | Failed basic tasks (wrong location for weather, CAPTCHA loops for restaurant booking) | Hardware + breadth of capability doesn't compensate for unreliable execution | | **AutoGPT/BabyAGI** | Rapid context degradation after 10-20 steps | Unbounded autonomy without grounding fails catastrophically | | **Replit Agent** | "Easy to build the prototype, deceptively hard to improve reliability" | The demo-to-production gap is enormous | | **Smart glasses** (various) | RayNeo X2 Pro: ~20 min battery in continuous mode | Always-on hardware is bottlenecked by power, not by AI capability | ### 12.5 What Real Users Want (Direct Evidence) Reddit (r/artificial, 2025) — a neurodivergent parent posted: > *"I struggle with staying on top of daily tasks. I'd love to find an AI assistant that can: Talk to me, not just respond when I ask. Give me reminders and nudges, even when I'm distracted. Help manage tasks, routines, and my health needs (I'm autistic/ADHD). Adapt to my life as a parent."* No commenter had a real solution. Other Reddit threads (r/singularity) reported teams being cut from 50→30 people using AI agents, editorial staff of 17 laid off, software engineers shifting to "supervising AI" — but all in workplace automation, not personal life augmentation. **The gap**: workplace AI agent adoption is accelerating; personal life agent adoption has no viable product yet. ### 12.6 Surprising Empirical Findings 1. **Delegation benefits non-users**: non-users in Delegate groups had 21.6% higher surplus than baseline groups — AI improved outcomes for people who didn't even use it (market spillover) 2. **Introverts benefit most** from personalized AI (larger improvement than extraverts in ProPerSim) 3. **Chain-of-thought hurts proactive timing**: CoT prompting degrades performance on proactive prediction — recall dropped from 94.4% to 17.1% in one model (ProAgentBench) 4. **Humans are bursty, LLMs aren't**: real interaction burstiness B=0.787 vs LLM-simulated B=0.166 (ProAgentBench) — any LLM-generated training data misses this entirely 5. **Knowing when > knowing what**: timing prediction 64.4% accurate, content prediction maxes at 30.5% semantic similarity — solve timing first, content second 6. **Explicit feedback required**: ProPerSim showed implicit signals alone offer "limited benefit" — the agent must ask for ratings, not just observe behavior --- ## 13. Audio Lifelogging & Continuous Transcription > Full research document: `knowledge-base/audio-lifelogging-research.md` (14 sections, 17 academic papers, extensive product/community analysis) Audio lifelogging — always-on recording with live transcription, speaker attribution, and knowledge extraction — is the richest possible input for a life augmentation agent. It captures commitments, decisions, context, and social interactions at the moment they happen, with zero user effort. ### 13.1 State of the Art (2024–2025) The field has matured from academic concept to consumer product: | Product/Tool | Form Factor | Status | Key Capability | |---|---|---|---| | **say** (u/8ta4) | macOS app + Electron | Open source, 2+ years daily use | Deepgram streaming → LLM structuring → queryable archive | | **Omi** | BLE pendant (nRF/ESP32, ~$24) | Open source (MIT), 7.8k GitHub stars | Hardware + Flutter app + cloud pipeline | | **Limitless** | Pendant | Acquired by Meta (2025) | Meeting transcription + speaker ID | | **Plaud.ai** | Credit-card form factor | Shipping product | Business meeting transcription | | **Bee** (Shopify) | Mobile app | Shipping product | Personal AI diary from conversations | **Key technical components**: Deepgram nova-3 streaming ASR (~$0.0043/min), ECAPA-TDNN speaker embeddings (0.8% EER), Pyannote diarization (open source), Silero VAD (on-device filtering eliminates ~60% silence). ### 13.2 Critical Constraints 1. **iOS is not viable for always-on recording**: No persistent background audio entitlement, orange indicator dot, unpredictable app killing. Apple Watch battery cannot sustain 24/7 mic. A dedicated BLE pendant is the only reliable architecture. 2. **Noisy environments**: Transcription accuracy degrades significantly in crowds, showers, wind. No viable waterproof mic solution exists (IP67 housings kill mic sensitivity). 3. **Speaker attribution in the wild**: ECAPA-TDNN's 0.8% EER is measured in controlled conditions. Ambient audio with overlapping speakers, distance, and background noise reduces accuracy. Enrollment UX (10-second samples per contact) adds onboarding friction. 4. **Privacy/legal**: Recording laws vary by jurisdiction (one-party vs two-party consent). GDPR right to erasure conflicts with append-only architectures. Illinois BIPA applies to voiceprint biometrics. Privacy-by-design (transcribe → discard audio) is the minimum viable approach. 5. **Cost at scale**: ~$1/day for Deepgram with heavy use. VAD pre-filtering is essential. Deepgram offers $200 free credit (~4 months runway). ### 13.3 Implications for Life Agents Audio lifelogging transforms a life agent from reactive (user must type/speak commands) to truly ambient (agent passively absorbs context from all conversations): - **Episodic memory becomes conversational memory**: The agent doesn't just remember tasks — it remembers what was *said*, by whom, in what context. "What did Alex say about the deadline?" becomes answerable. - **Commitment capture**: Spoken "I need to..." or "Let's plan to..." automatically become tracked tasks (see scenario S60). - **Social context**: The agent builds a knowledge graph of what different people care about, what they told you, and when. This enables social scenarios (gift suggestions, relationship maintenance) that were previously impossible without manual data entry. - **Cognitive offloading**: The most consistent finding from daily users is that recording *reduces* cognitive load — people speak more freely when they know everything is captured and searchable. ### 13.4 Open Research Problems 1. **Waterproof/harsh-environment recording** — no solution exists 2. **Real-time commitment/intent detection** from ambient speech (high false-positive rate) 3. **Long-term conversational memory retrieval** at scale (15M+ words/year) 4. **Multi-language speaker diarization** (enrollment per language needed?) 5. **Emotional tone tracking** over time (Pierce & Mann, 2021) — feasible but not yet integrated into any lifelogging product 6. **Consent UX** — how to gracefully handle two-party consent jurisdictions in always-on recording ### 13.5 Key Academic References - Vemuri et al. (2006) — iRemember: proactive information retrieval from personal audio - Harvey et al. (2016) — Audio lifelogging for event summarization - Hodges et al. (2006) — SenseCam visual lifelogs for memory augmentation - Lee & Dey (2008) — Lifelogging memory appliance for automatic recording - Pierce & Mann (2021) — Affective lifelogging with multimodal sensing - Desplanques et al. (2020) — ECAPA-TDNN speaker verification *See `knowledge-base/audio-lifelogging-research.md` §4 for full paper list (17 papers).* --- *Document created: March 2026* *Papers downloaded: 20 new papers (Batch 6)* *Total papers in collection: 126* *User sentiment research added: March 2026 (11 papers + web sources)* *Audio lifelogging research added: March 2026 (17 papers + product/community analysis)*