<aside> <img src="/icons/alien-pixel_green.svg" alt="/icons/alien-pixel_green.svg" width="40px" /> Home
</aside>
<aside> <img src="/icons/verified_green.svg" alt="/icons/verified_green.svg" width="40px" /> About PAPER
This channel is dedicated to building and expanding a comprehensive paper database focused on the Web Agent field and the boarder GUI agent field. Let’s collaborate to enrich this database and advance research in the exciting world of web agents!
</aside>
🔥 Newly updated(OCT)
New~Oct 28 |MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools
New~Oct 28 |OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents
New~Oct 28 |OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
New~Oct 28 |[MGA: Memory-Driven GUI Agent for Observation-Centric Interaction](https://webagentlab.notion.site/MGA-Memory-Driven-GUI-Agent-for-Observation-Centric-Interaction-29bcc62ec9f180f5bfbdf9bdec6140dc)
New~Oct 27 |BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents
New~Oct 27 |Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents
New~Oct 27 |Code Aesthetics with Agentic Reward Feedback
New~Oct 26 |How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations
New~Oct 24 |LightAgent: Mobile Agentic Foundation Models
New~Oct 24 |DeepAgent: A General Reasoning Agent with Scalable Toolsets
New~Oct 24 |[ColorEcosystem: Powering Personalized, Standardized, and Trustworthy Agentic Service in massive-agent Ecosystem ](https://webagentlab.notion.site/ColorEcosystem-Powering-Personalized-Standardized-and-Trustworthy-Agentic-Service-in-massive-agen-299cc62ec9f180e3aaeae00467e51ce7)
New~Oct 23 |UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning
New~Oct 22 |SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models
Oct 22 |Surfer 2: The Next Generation of Cross-Platform Computer Use Agents
Oct 22 |WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation
Oct 22 |ColorAgent: Building A Robust, Personalized, and Interactive OS Agent
Oct 22 |DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents
Oct 22 |See, Think, Act: Online Shopper Behavior Simulation with VLM Agents
Oct 22 |VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos
Oct 21 |WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality
Oct 20 |UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action
Oct 17 |CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs
Oct 17 |Experience-Driven Exploration for Efficient API-Free AI Agents
Oct 16 |GUIrilla: A Scalable Framework for Automated Desktop UI Exploration
Oct 16 |LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
Oct 16 |Agentic Entropy-Balanced Policy Optimization
Oct 15 |Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms
Oct 16 |Hi-Agent: Hierarchical Vision-Language Agents for Mobile Device Control
Oct 14 |HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities
Oct 13 |AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model
Oct 13 |SusBench: An Online Benchmark for Evaluating Dark Pattern Susceptibility of Computer-Use Agents
Oct 11 |SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents
Oct 10 |Auto-scaling Continuous Memory for GUI Agent
Oct 10 |WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions
Oct 9 |Agent Learning via Early Experience
Oct 9 |Training-Free Group Relative Policy Optimization
Oct 9 |ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation
Oct 9 |Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents
Oct 9 |Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window
Oct 8 |MacroBench: A Novel Testbed for Web Automation Scripts via Large Language Models
Oct 7 |BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks
Oct 7 |A Survey on Agentic Security: Applications, Threats and Defenses
Oct 6 |ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering
Oct 6 |3Dify: a Framework for Procedural 3D-CG Generation Assisted by LLMs Using MCP and RAG
Oct 6 |Watch and Learn: Learning to Use Computers from Online Videos
Oct 6 |A Case for Declarative LLM-friendly Interfaces for Improved Efficiency of Computer-Use Agents
Oct 5 |\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding
Oct 4 |PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents
Oct 3 |WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents
Oct 3 |FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents
Oct 3 |Improving GUI Grounding with Explicit Position-to-Coordinate Mapping
Oct 3 |Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations
Oct 2 |Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Oct 2 |Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents
Oct 1 |GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness