Hybrid Self-evolving Structured Memory for GUI Agents

Sibo Zhu*, Wenyi Wu*, Kun Zhou, Stephen Wang, Biwei Huang
*Equal contribution    Corresponding author

Framework Overview

Hybrid memory graph with discrete strategy nodes and continuous trajectory embeddings.

HyMEM main figure

Abstract

The remarkable progress of vision-language models (VLMs) has enabled GUI agents to interact with computers in a human-like manner. Yet real-world computer-use tasks remain difficult due to long-horizon workflows, diverse interfaces, and frequent intermediate errors. HyMEM introduces a graph-based hybrid memory that couples discrete high-level symbolic strategies with continuous trajectory embeddings, supporting multi-hop retrieval, self-evolution via structured updates, and on-the-fly working-memory refresh during inference.

Demo Video

Hugging Face deployment: put your demo video at assets/hymem_demo.mov.

Case Studies

HyMEM case study success example
Amazon — HyMEM Success
Success
  • HyMEM retrieves diverse strategy memory instead of repetitive actions.
  • The agent adapts from initial search to constraint-focused filtering.
  • Task is completed within step budget with stable long-horizon planning.
Baseline failure case study example
Amazon — Baseline Failure
Failure
  • Single-agent baseline over-relies on generic search and misses key filters.
  • It enters repetitive browsing loops without satisfying constraints.
  • Execution terminates due to max-step limit before finding a valid result.

Performance Comparison Leaderboard

Task success rates (%) across WebVoyager, Mind2Web, and MMInA.

Backbone Model / Method WebVoyager Mind2Web MMInA Overall
AmzCourRecpMap InfoSvcEntTrav Wiki
Closed-Source
GPT-4o24.47.113.336.67.814.13.419.451.019.7
Gemini-Pro-Vision41.742.920.053.716.722.40.019.450.029.6
Claude-463.428.633.370.025.640.06.99.752.036.6
Open-Source
Qwen2.5-VL-32B46.326.26.729.314.120.06.99.743.022.5
CogAgent12.29.526.79.824.48.213.816.121.015.7
Websight24.44.813.329.310.33.53.40.012.011.2
UI-TARS-1.5-7BBaseline31.716.720.031.76.44.76.90.036.017.1
UI-TARS-1.5-7B+ Reasoning Bank0.011.96.712.27.78.23.76.744.011.2
UI-TARS-1.5-7B+ AWM4.97.14.414.610.33.53.73.326.08.6
UI-TARS-1.5-7B+ Discrete19.516.78.917.111.58.27.43.446.015.4
UI-TARS-1.5-7B+ Continuous43.933.324.443.916.728.210.33.254.028.7
UI-TARS-1.5-7B+ HyMEM58.528.626.753.716.725.910.36.554.031.2
Qwen2.5-VL-7BBaseline14.62.415.916.79.011.80.04.438.012.5
Qwen2.5-VL-7B+ Reasoning Bank29.39.56.729.39.020.03.46.544.017.5
Qwen2.5-VL-7B+ AWM17.14.811.129.37.710.60.06.531.013.1
Qwen2.5-VL-7B+ Discrete22.021.48.931.79.012.910.33.234.017.0
Qwen2.5-VL-7B+ Continuous24.417.18.934.116.723.510.312.947.021.7
Qwen2.5-VL-7B+ HyMEM63.454.820.053.717.923.53.422.656.035.0
Qwen3-VL-8BBaseline36.616.713.343.99.020.010.39.738.021.9
Qwen3-VL-8B+ Reasoning Bank36.611.917.831.710.324.717.26.548.022.7
Qwen3-VL-8B+ AWM39.019.08.931.712.516.56.916.148.022.1
Qwen3-VL-8B+ Discrete31.716.715.631.711.516.513.816.138.021.3
Qwen3-VL-8B+ Continuous43.919.017.834.112.517.93.419.442.023.3
Qwen3-VL-8B+ HyMEM46.319.026.739.017.920.06.925.847.027.6

BibTeX

@misc{zhu2026hymem,
  title={Hybrid Self-evolving Structured Memory for GUI Agents},
  author={Sibo Zhu and Wenyi Wu and Kun Zhou and Stephen Wang and Biwei Huang},
  year={2026},
  note={Manuscript}
}