<aside> <img src="/icons/alien-pixel_green.svg" alt="/icons/alien-pixel_green.svg" width="40px" /> Home


Mission & Vision

Paper

Researcher

Company & Product

Event

Github

Opportunities

WebAgent Review

Podcast

Contributor

</aside>

<aside> <img src="/icons/verified_green.svg" alt="/icons/verified_green.svg" width="40px" /> About PAPER


This channel is dedicated to building and expanding a comprehensive paper database focused on the Web Agent field and the boarder GUI agent field. Let’s collaborate to enrich this database and advance research in the exciting world of web agents!

</aside>

🔥 Newly updated(OCT)

New~Oct 28 |MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools

New~Oct 28 |OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents

New~Oct 28 |OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

New~Oct 28 |[MGA: Memory-Driven GUI Agent for Observation-Centric Interaction](https://webagentlab.notion.site/MGA-Memory-Driven-GUI-Agent-for-Observation-Centric-Interaction-29bcc62ec9f180f5bfbdf9bdec6140dc)

New~Oct 27 |BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents

New~Oct 27 |Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents

New~Oct 27 |Code Aesthetics with Agentic Reward Feedback

New~Oct 26 |How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations

New~Oct 24 |LightAgent: Mobile Agentic Foundation Models

New~Oct 24 |DeepAgent: A General Reasoning Agent with Scalable Toolsets

New~Oct 24 |[ColorEcosystem: Powering Personalized, Standardized, and Trustworthy Agentic Service in massive-agent Ecosystem ](https://webagentlab.notion.site/ColorEcosystem-Powering-Personalized-Standardized-and-Trustworthy-Agentic-Service-in-massive-agen-299cc62ec9f180e3aaeae00467e51ce7)

New~Oct 23 |UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning

New~Oct 22 |SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models

Oct 23 |GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?

Oct 23 |Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models

Oct 22 |Surfer 2: The Next Generation of Cross-Platform Computer Use Agents

Oct 22 |WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation

Oct 22 |ColorAgent: Building A Robust, Personalized, and Interactive OS Agent

Oct 22 |DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents

Oct 22 |See, Think, Act: Online Shopper Behavior Simulation with VLM Agents

Oct 22 |VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

Oct 21 |WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality

Oct 20 |UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action

Oct 18 |Branch-and-Browse: Efficient and Controllable Web Exploration with Tree-Structured Reasoning and Action Memory

Oct 17 |CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs

Oct 17 |Experience-Driven Exploration for Efficient API-Free AI Agents

Oct 17 |WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale

Oct 16 |GUIrilla: A Scalable Framework for Automated Desktop UI Exploration

Oct 16 |LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training

Oct 16 |Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents

Oct 16 |Agentic Entropy-Balanced Policy Optimization

Oct 15 |Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms

Oct 16 |ColorBench: Benchmarking Mobile Agents with Graph-Structured Framework for Complex Long-Horizon Tasks

Oct 16 |Hi-Agent: Hierarchical Vision-Language Agents for Mobile Device Control

Oct 14 |HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities

Oct 13 |AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model

Oct 13 |SusBench: An Online Benchmark for Evaluating Dark Pattern Susceptibility of Computer-Use Agents

Oct 11 |SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents

Oct 10 |Auto-scaling Continuous Memory for GUI Agent

Oct 10 |WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions

Oct 9 |Agent Learning via Early Experience

Oct 9 |Training-Free Group Relative Policy Optimization

Oct 9 |ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation

Oct 9 |Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents

Oct 9 |Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

Oct 8 |MacroBench: A Novel Testbed for Web Automation Scripts via Large Language Models

Oct 7 |BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks

Oct 7 |A Survey on Agentic Security: Applications, Threats and Defenses

Oct 6 |ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering

Oct 6 |3Dify: a Framework for Procedural 3D-CG Generation Assisted by LLMs Using MCP and RAG

Oct 6 |Watch and Learn: Learning to Use Computers from Online Videos

Oct 6 |A Case for Declarative LLM-friendly Interfaces for Improved Efficiency of Computer-Use Agents

Oct 5 |\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding

Oct 4 |PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents

Oct 3 |WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents

Oct 3 |FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

Oct 3 |Improving GUI Grounding with Explicit Position-to-Coordinate Mapping

Oct 3 |Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations

Oct 2 |Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

Oct 2 |Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents

Oct 1 |GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness

Untitled