<aside> <img src="/icons/alien-pixel_green.svg" alt="/icons/alien-pixel_green.svg" width="40px" /> Home


Mission & Vision

Paper

Researcher

Company & Product

Event

Github

Opportunities

WebAgent Review

Podcast

Contributor

</aside>

<aside> <img src="/icons/verified_green.svg" alt="/icons/verified_green.svg" width="40px" /> About PAPER


This channel is dedicated to building and expanding a comprehensive paper database focused on the Web Agent field and the boarder GUI agent field. Let’s collaborate to enrich this database and advance research in the exciting world of web agents!

</aside>

🔥 Newly updated(Mar)

New~Mar 16|

New~Mar 20|MedSPOT: A Workflow-Aware Sequential Grounding Benchmark for Clinical GUI

New~Mar 19|ClawTrap: A MITM-Based Red-Teaming Framework for Real-World OpenClaw Security Evaluation

New~Mar 19|AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents

New~Mar 19|OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

New~Mar 18|WebPII: Benchmarking Visual PII Detection for Computer-Use Agents

New~Mar 18|AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement

New~Mar 17|Anticipatory Planning for Multimodal AI Agents

New~Mar 16|Visual Confused Deputy: Exploiting and Defending Perception Failures in Computer-Using Agents

New~Mar 16|

New~Mar 16|

New~Mar 16|

New~Mar 16|OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

New~Mar 16|GUI-CEval: A Hierarchical and Comprehensive Chinese Benchmark for Mobile GUI Agents

New~Mar 15|Zoom to Essence: Trainless GUI Grounding by Inferring upon Interface Elements

New~Mar 15|Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

Mar 13|AI Planning Framework for LLM-Based Web Agents

Mar 13|Adaptive Vision-Language Model Routing for Computer Use Agents

Mar 12|CUAAudit: Meta-Evaluation of Vision-Language Models as Auditors of Autonomous Computer-Use Agents

Mar 12|MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning

Mar 12|HATS: Hardness-Aware Trajectory Synthesis for GUI Agents

Mar 11|Hybrid Self-evolving Structured Memory for GUI Agents

Mar 10|OpenClaw-RL: Train Any Agent Simply by Talking

Mar 10|Video-Based Reward Modeling for Computer-Use Agents

Mar 09|SecAgent: Efficient Mobile GUI Agent with Semantic Context

Mar 09|SlowBA: An efficiency backdoor attack towards VLM-based GUI agents

Mar 09|PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents

Mar 09|OSExpert: Computer-Use Agents Learning Professional Skills via Exploration

Mar 08|Generalization in Online Reinforcement Learning for Mobile Agents

Mar 07|Enhancing Web Agents with a Hierarchical Memory Tree

Mar 05|TimeWarp: Evaluating Web Agents by Revisiting the Past

Mar 05|STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

Mar 05|WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

Mar 05|WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces

Mar 04|Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

Mar 03|Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation

Mar 03|CGL: Advancing Continual GUI Learning via Reinforcement Fine-Tuning

Mar 03|See and Remember: A Multimodal Agent for Web Traversal

Untitled