|
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
Kevin QH. Lin†, Yuhao Zheng†, Hangyu Ran†, Dantong Zhu, Dongxing Mao, Linjie Li, Philip Torr, Alex JP. Wang.
Preprint, 2025
[project]
[paper]
[code]
[demo]
|
|
Paper2Video: Automatic Video Generation from Scientific Papers
Zeyu Zhu†, Kevin QH. Lin†, Mike Z. Shou.
Preprint, 2025
[project]
[paper]
[code]
[dataset]
1.4K github stars.
|
|
Code2Video: A Code-centric Paradigm for Educational Video Generation
Yanzhe Chen†, Kevin QH. Lin†, Mike Z. Shou.
Preprint, 2025
[project]
[paper]
[code]
[dataset]
[twitter]
800+ github stars.
|
|
Paper2Poster: Towards Multimodal Poster
Automation from Scientific Papers
Wei Pang†, Kevin QH. Lin†, Xiangru
Jian†, Xi He, Philip Torr.
NeurIPS D&B, 2025
ICML MAS workshop, 2025. Oral
[paper]
[code]
[project]
[datasets]
[twitter]
[demo]
2.8K github stars.
|
|
Think or Not? Selective Reasoning via
Reinforcement Learning for Vision-Language Models
Jiaqi Wang†, Kevin QH. Lin†, James
Cheng, Mike Z. Shou.
NeurIPS, 2025
[paper]
[code]
[huggingface]
|
|
VideoMind: A Chain-of-LoRA Agent for Long
Video Reasoning
Ye Liu†, Kevin QH. Lin†, Chang Wen Chen, Mike Z. Shou.
Preprint, 2025
[paper]
[code]
[dataset]
[project]
[demo]
|
|
ShowUI: One Vision-Language-Action
Model for GUI Visual Agent
Kevin QH. Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Stan WX.
Lei, Lijuan Wang, Mike Z. Shou.
CVPR, 2025
NeurIPS OWA workshop, 2024. Oral
[paper]
[code]
[huggingface]
[dataset]
[demo]
Outstanding Paper Award, NeurIPS Open-World Agents Workshop
2024.
The model has been downloaded for over 240,000 times. 1.5K github stars
|
|
VLog: Video-Language Models by Generative
Retrieval of Narration Vocabulary
Kevin QH. Lin, Mike Z. Shou.
CVPR, 2025
[paper]
[code]
|
|
VideoGUI: A Benchmark for GUI Automation
from Instructional Videos
Kevin QH. Lin, Linjie Li, Difei Gao, Qinchen Wu, Mingyi Yan, Zhengyuan Yang, Lijuan
Wang, Mike Z. Shou.
NeurIPS D&B, 2024. Spotlight
[paper]
[code]
[project]
|
|
Learning Video Context as Interleaved
Multimodal Sequences Kevin QH. Lin, Pengchuan Zhang, Difei Gao,
Xide Xia, Joya Chen, Ziteng Gao, Jinheng Xie, Xuhong Xiao, Mike Z. Shou.
ECCV, 2024
[paper]
[code]
|
|
UniVTG: Towards Unified Video-Language
Temporal Grounding
Kevin QH. Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex JP.
Wang, Rui Yan, Mike Z. Shou.
ICCV, 2023
[paper]
[code]
[demo]
|
|
Egocentric Video-Language Pretraining
Kevin QH. Lin, Alex JP. Wang, M. Soldan, M. Wray, R. Yan, Eric ZC. Xu, D. Gao, R. Tu,
W. Zhao, W. Kong, C. Cai, H. Wang, D. Damen, B. Ghanem, W. Liu, Mike Z. Shou.
NeurIPS, 2022. Spotlight (1.7%)
[paper]
[code]
[project]
[poster]
[media]
EgoVis Distinguished Paper Award & PREMIA Best Student Paper
Award 2023.
Double champions in Ego4D & Epic-Kitchens CVPR 2022
challenges.
|