<aside> <img src="/icons/alien-pixel_green.svg" alt="/icons/alien-pixel_green.svg" width="40px" /> Home
</aside>
<aside> <img src="/icons/verified_green.svg" alt="/icons/verified_green.svg" width="40px" /> About PAPER
This channel is dedicated to building and expanding a comprehensive paper database focused on the Web Agent field and the boarder GUI agent field. Let’s collaborate to enrich this database and advance research in the exciting world of web agents!
</aside>
🔥 Newly updated(Sep)
New~Aug 29 |UItron: Foundational GUI Agent with Advanced Perception and Planning
New~Aug 29 |Morae: Proactively Pausing UI Agents for User Choices
New~Aug 26 |Reliable Weak-to-Strong Monitoring of LLM Agents
Aug 27 |SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
Aug 25 |PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration
Aug 24 |DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards
Aug 23 |WebSight: A Vision-First Architecture for Robust Web Agents
Aug 22 |Structuring GUI Elements through Vision Language Models: Towards Action Space Generation
Aug 21 |Cybernaut: Towards Reliable Web Automation
Aug 18 |A Functionality-Grounded Benchmark for Evaluating Web Agents in E-commerce Domains
Aug 21 |Mobile-Agent-v3: Foundamental Agents for GUI Automation
Aug 20 |MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers
Aug 20 |Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Aug 19 |V2P: From Background Suppression to Center Peaking for Robust GUI Grounding Task
Aug 19 |ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
Aug 18 |WebMall – A Multi-Shop Benchmark for Evaluating Web Agents
Aug 17 |You Don’t Know Until You Click:Automated GUI Testing for Production-Ready Software Evaluation
Aug 15 |UI-Venus Technical Report: Building High-performance UI Agents with RFT
Aug 15 |CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks
Aug 14 |MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Aug 12 |FineState-Bench: A Comprehensive Benchmark for Fine-Grained State Control in GUI Agents
Aug 12 |BrowseMaster: Towards Scalable Web Browsing via Tool-Augmented Programmatic Agent Pair
Aug 12 |OpenCUA: Open Foundations for Computer-Use Agents
Aug 11 |WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Aug 11 |Browsing Like Human: A Multimodal Web Agent with Experiential Fast-and-Slow Thinking
Aug 8 |BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent
Aug 7 |Cognitive Duality for Adaptive Web Agents
Aug 7 |Test-Time Reinforcement Learning for GUI Grounding via Region Consistency
Aug 7 |InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization
Aug 7 |Assimilation and Accommodation: Task-Adaptive Hierarchical Abstraction for Solving Web Tasks
Aug 7 |WinSpot: GUI Grounding Benchmark with Multimodal Large Language Models
Aug 6 |Beyond Pixels: Exploring DOM Downsampling for LLM-Based Web Agents
Aug 6 |GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
Aug 6 |SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Aug 5 |CoAct-1: Computer-using Agents with Coding as Actions
Aug 4 |NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks
Aug 3 |Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents
Aug 3 |A Survey of LLM-based Deep Search Agents: Paradigm, Optimization, Evaluation, and Challenges
Aug 1 |Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training