Paper | Notion

<aside> <img src="/icons/alien-pixel_green.svg" alt="/icons/alien-pixel_green.svg" width="40px" /> Home

</aside>

<aside> <img src="/icons/verified_green.svg" alt="/icons/verified_green.svg" width="40px" /> About PAPER

This channel is dedicated to building and expanding a comprehensive paper database focused on the Web Agent field and the boarder GUI agent field. Let’s collaborate to enrich this database and advance research in the exciting world of web agents!

</aside>

🔥 Newly updated（Apr）

New～Apr 09｜

New～Apr 14｜OpeFlo: Automated UX Evaluation via Simulated Human Web Interaction with GUI Grounding

New～Apr 14｜MobiFlow: Real-World Mobile Agent Benchmarking through Trajectory Fusion

New～Apr 14｜Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

New～Apr 13｜Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization

New～Apr 13｜WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent Benchmark

New～Apr 13｜SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context

New～Apr 13｜ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

New～Apr 12｜The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

New～Apr 10｜BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning

New～Apr 10｜EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience Learning

New～Apr 10｜HealthAdminBench: Evaluating Computer-Use Agents on Healthcare Administration Tasks

New～Apr 09｜What’s Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI Reasoning

New～Apr 09｜Preference Redirection via Attention Concentration: An Attack on Computer Use Agents

New～Apr 09｜MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

New～Apr 09｜Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element Injection

New～Apr 09｜Same Outcomes, Different Journeys: A Trace-Level Framework for Comparing Human and GUI-Agent Behavior in Production Search Systems

New～Apr 10｜CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation

New～Apr 09｜Structured Distillation of Web Agent Capabilities Enables Generalization

New～Apr 09｜ClawBench: Can AI Agents Complete Everyday Online Tasks?

New～Apr 09｜KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

New～Apr 09｜Learning to Retrieve from Agent Trajectories

New～Apr 09｜VenusBench-Mobile: A Challenging and User-Centric Benchmark for Mobile GUI Agents with Capability Diagnostics

Apr 08｜SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

New～Apr 07｜WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

Apr 04｜ClawSafety: “Safe” LLMs, Unsafe Agents

Apr 07｜Gym-Anything: Turn any Software into an Agent Environment

Apr 07｜Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents

Apr 07｜Don’t Act Blindly: Robust GUI Automation via Action-Effect Verification and Self-Correction

Apr 06｜IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents

Apr 06｜ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

Apr 06｜UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics

Apr 06｜GUIDE: Interpretable GUI Agent Evaluation via Hierarchical Diagnosis

New～Apr 04｜The Art of Building Verifiers for Computer Use Agents

Apr 04｜GPA: Learning GUI Process Automation from Demonstrations

Apr 03｜The Tool Illusion: Rethinking Tool Use in Web Agents

Apr 01｜When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web Navigation

Apr 01｜Internal APIs Are All You Need: Shadow APIs, Shared Discovery, and the Case Against Browser-First Agent Architectures

Untitled