<aside> <img src="/icons/alien-pixel_green.svg" alt="/icons/alien-pixel_green.svg" width="40px" /> Home


Mission & Vision

Paper

Researcher

Company & Product

Event

Github

Opportunities

WebAgent Review

Contributor

</aside>

<aside> <img src="/icons/verified_green.svg" alt="/icons/verified_green.svg" width="40px" /> About PAPER


This channel is dedicated to building and expanding a comprehensive paper database focused on the Web Agent field and the boarder GUI agent field. Let’s collaborate to enrich this database and advance research in the exciting world of web agents!

</aside>

🔥 Newly updated(Sep)

New~Aug 29 |UItron: Foundational GUI Agent with Advanced Perception and Planning

New~Aug 29 |Morae: Proactively Pausing UI Agents for User Choices

New~Aug 27 |CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

New~Aug 26 |Reliable Weak-to-Strong Monitoring of LLM Agents

Aug 27 |InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning

Aug 27 |SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

Aug 26 |AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration and User Assistance

Aug 25 |PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration

Aug 24 |DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards

Aug 23 |WebSight: A Vision-First Architecture for Robust Web Agents

Aug 22 |Structuring GUI Elements through Vision Language Models: Towards Action Space Generation

Aug 21 |Cybernaut: Towards Reliable Web Automation

Aug 18 |A Functionality-Grounded Benchmark for Evaluating Web Agents in E-commerce Domains

Aug 21 |Mobile-Agent-v3: Foundamental Agents for GUI Automation

Aug 20 |MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Aug 20 |Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Aug 19 |V2P: From Background Suppression to Center Peaking for Robust GUI Grounding Task

Aug 19 |ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents

Aug 18 |WebMall – A Multi-Shop Benchmark for Evaluating Web Agents

Aug 17 |You Don’t Know Until You Click:Automated GUI Testing for Production-Ready Software Evaluation

Aug 15 |UI-Venus Technical Report: Building High-performance UI Agents with RFT

Aug 15 |CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks

Aug 14 |MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Aug 12 |FineState-Bench: A Comprehensive Benchmark for Fine-Grained State Control in GUI Agents

Aug 12 |BrowseMaster: Towards Scalable Web Browsing via Tool-Augmented Programmatic Agent Pair

Aug 12 |Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents

Aug 12 |OpenCUA: Open Foundations for Computer-Use Agents

Aug 11 |WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Aug 11 |Browsing Like Human: A Multimodal Web Agent with Experiential Fast-and-Slow Thinking

Aug 8 |BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent

Aug 7 |Cognitive Duality for Adaptive Web Agents

Aug 7 |Test-Time Reinforcement Learning for GUI Grounding via Region Consistency

Aug 7 |InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Aug 7 |Assimilation and Accommodation: Task-Adaptive Hierarchical Abstraction for Solving Web Tasks

Aug 7 |WinSpot: GUI Grounding Benchmark with Multimodal Large Language Models

Aug 6 |Beyond Pixels: Exploring DOM Downsampling for LLM-Based Web Agents

Aug 6 |Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement

Aug 6 |GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning

Aug 6 |SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Aug 6 |HarmonyGuard: Toward Safety and Utility in Web Agents via Adaptive Policy Enhancement and Dual-Objective Optimization

Aug 5 |CoAct-1: Computer-using Agents with Coding as Actions

Aug 4 |NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks

Aug 3 |Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents

Aug 3 |A Survey of LLM-based Deep Search Agents: Paradigm, Optimization, Evaluation, and Challenges

Aug 2 |NatureGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory Dataset

Aug 1 |Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Untitled