Letian Zhang
Hi, I am Letian Zhang, a Computer Science Ph.D. student at University of California - Santa Cruz (UCSC), where I am fortunate to be advised by Professor Cihang Xie . I received my B.S. degree in Computer Science from Tongji University.
My research lies at the Vision-Language Learning and Multimodal Learning.
Email: zhanglt.gm@gmail.com
CV  | 
Scholar  | 
Github  
[Jan. 2026] Our latest unified tokenizer OpenVision 3 is released!🌟
[Aug. 2025] Join UC Santa Cruz as a PhD student! Supervised by Prof. Cihang Xie !
[Jun. 2025] Oasis got accepted by ICCV2025 !🎉
[Feb. 2024] C-VQA got accepted by CVPR2024 !🎉
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Letian Zhang *,
Sucheng Ren *,
Yanqing Liu ,
Xianhang Li ,
Zeyu Wang ,
Yuyin Zhou ,
Huaxiu Yao ,
Zeyu Zheng ,
Weili Nie ,
Guilin Liu ,
Zhiding Yu ,
Cihang Xie
arxiv, 2026
[page]
[paper]
[code]
[bibtex]
We develop advanced unified tokenizer that learns a single, unified visual representation for both understanding and generation.
Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis
Letian Zhang ,
Quan Cui ,
Bingchen Zhao ,
Cheng Yang
International Conference on Conputer Vision (ICCV) , 2025
[paper]
[code]
[bibtex]
We generate diverse and high-quality multimodal instruction-response data based only on images, without any prior prompt.
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
Letian Zhang ,
Xiaotong Zhai ,
Zhongkai Zhao ,
Yongshuo Zong ,
Xin Wen ,
Bingchen Zhao
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2024
[paper]
[code]
[bibtex]
We develop a benchmark (C-VQA) composed of counterfactual visual questions to evaluate the compositional reasoning ability of VQA models.