wzk's homepage
wzk's homepage
Home
News
Publications
Projects
Activities
CV
Light
Dark
Automatic
3
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
,
Xue Yang
,
Wenhan Dou
,
Zhaokai Wang
,
Jifeng Dai
,
Yu Qiao
,
Xizhou Zhu
Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning
Yihong Tang
,
Ao Qu
,
Zhaokai Wang
,
Dingyi Zhuang
,
Zhaofeng Wu
,
Wei Ma
,
Shenhao Wang
,
Yunhan Zheng
,
Zhan Zhao
,
Jinhua Zhao
Cite
×