<aside> <img src="/icons/alien-pixel_green.svg" alt="/icons/alien-pixel_green.svg" width="40px" /> Home


Mission & Vision

Paper

Researcher

Company & Product

Event

Github

Opportunities

WebAgent Review

Contributor

</aside>

<aside> <img src="/icons/verified_green.svg" alt="/icons/verified_green.svg" width="40px" /> About PAPER


This channel is dedicated to building and expanding a comprehensive paper database focused on the Web Agent field and the boarder GUI agent field. Let’s collaborate to enrich this database and advance research in the exciting world of web agents!

</aside>

🔥 Newly updated(Jul)

New~Jul 15 |Let’s Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification

New~Jul 14 |NeuralOS: Towards Simulating Operating Systems via Neural Generative Models

New~Jul 13 |LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents

Jul 09 |VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation

Jul 08 |MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment

Jul 08 |GTA1: GUI Test-time Scaling Agent

Jul 08 |R-VLM: Region-Aware Vision Language Model for Precise GUI Grounding

Jul 06 |Hijacking JARVIS: Benchmarking Mobile GUI Agents against Unprivileged Third Parties

Jul 06 |WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis

Jul 05 |Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Jul 05 |How to Train Your LLM Web Agent: A Statistical Diagnosis

Jul 04 |Less is More: Empowering GUI Agent with Context-Aware Simplification

Jul 04 |WebSailor: Navigating Super-human Reasoning for Web Agent

Jul 03 |Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks

Jul 03 |WebArXiv: Evaluating Multimodal Agents on Time-Invariant arXiv Tasks

Jul 01 |SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents

Jul 01 |GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Jul 01 |Qwen-GUI-3B: A Lightweight Vision-Language Model for Cross-Resolution GUI Grounding

Jul 01 |From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows

Jul 01 |LineRetriever: Planning-Aware Observation Reduction for Web Agents

Untitled