<aside> <img src="/icons/alien-pixel_green.svg" alt="/icons/alien-pixel_green.svg" width="40px" /> Home
</aside>
<aside> <img src="/icons/verified_green.svg" alt="/icons/verified_green.svg" width="40px" /> About PAPER
This channel is dedicated to building and expanding a comprehensive paper database focused on the Web Agent field and the boarder GUI agent field. Let’s collaborate to enrich this database and advance research in the exciting world of web agents!
</aside>
🔥 Newly updated(Mar)
New~Mar 16|
New~Mar 20|MedSPOT: A Workflow-Aware Sequential Grounding Benchmark for Clinical GUI
New~Mar 19|ClawTrap: A MITM-Based Red-Teaming Framework for Real-World OpenClaw Security Evaluation
New~Mar 19|AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents
New~Mar 19|OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards
New~Mar 18|WebPII: Benchmarking Visual PII Detection for Computer-Use Agents
New~Mar 18|AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement
New~Mar 17|Anticipatory Planning for Multimodal AI Agents
New~Mar 16|Visual Confused Deputy: Exploiting and Defending Perception Failures in Computer-Using Agents
New~Mar 16|
New~Mar 16|
New~Mar 16|
New~Mar 16|OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
New~Mar 16|GUI-CEval: A Hierarchical and Comprehensive Chinese Benchmark for Mobile GUI Agents
New~Mar 15|Zoom to Essence: Trainless GUI Grounding by Inferring upon Interface Elements
New~Mar 15|Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective
Mar 13|AI Planning Framework for LLM-Based Web Agents
Mar 13|Adaptive Vision-Language Model Routing for Computer Use Agents
Mar 12|CUAAudit: Meta-Evaluation of Vision-Language Models as Auditors of Autonomous Computer-Use Agents
Mar 12|HATS: Hardness-Aware Trajectory Synthesis for GUI Agents
Mar 11|Hybrid Self-evolving Structured Memory for GUI Agents
Mar 10|OpenClaw-RL: Train Any Agent Simply by Talking
Mar 10|Video-Based Reward Modeling for Computer-Use Agents
Mar 09|SecAgent: Efficient Mobile GUI Agent with Semantic Context
Mar 09|SlowBA: An efficiency backdoor attack towards VLM-based GUI agents
Mar 09|OSExpert: Computer-Use Agents Learning Professional Skills via Exploration
Mar 08|Generalization in Online Reinforcement Learning for Mobile Agents
Mar 07|Enhancing Web Agents with a Hierarchical Memory Tree
Mar 05|TimeWarp: Evaluating Web Agents by Revisiting the Past
Mar 05|STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks
Mar 05|WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents
Mar 05|WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces
Mar 03|Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation
Mar 03|CGL: Advancing Continual GUI Learning via Reinforcement Fine-Tuning
Mar 03|See and Remember: A Multimodal Agent for Web Traversal