Letian Zhang

Hi, I am Letian Zhang, a Computer Science Ph.D. student at University of California - Santa Cruz (UCSC), where I am fortunate to be advised by Professor Cihang Xie. I received my B.S. degree in Computer Science from Tongji University.

My research lies at the Vision-Language Learning and Multimodal Learning.

Email: zhanglt.gm@gmail.com

CV  |  Scholar  |  Github  

profile photo
News
  • [Jan. 2026] Our latest unified tokenizer OpenVision 3 is released!🌟
  • [Aug. 2025] Join UC Santa Cruz as a PhD student! Supervised by Prof. Cihang Xie!
  • [Jun. 2025] Oasis got accepted by ICCV2025!🎉
  • [Feb. 2024] C-VQA got accepted by CVPR2024!🎉
Publications
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Letian Zhang*, Sucheng Ren*, Yanqing Liu, Xianhang Li, Zeyu Wang, Yuyin Zhou, Huaxiu Yao, Zeyu Zheng, Weili Nie, Guilin Liu, Zhiding Yu, Cihang Xie
arxiv, 2026
[page] [paper] [code] [bibtex]

We develop advanced unified tokenizer that learns a single, unified visual representation for both understanding and generation.

Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis

Letian Zhang, Quan Cui, Bingchen Zhao, Cheng Yang
International Conference on Conputer Vision (ICCV), 2025
[paper] [code] [bibtex]

We generate diverse and high-quality multimodal instruction-response data based only on images, without any prior prompt.

What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models

Letian Zhang, Xiaotong Zhai, Zhongkai Zhao, Yongshuo Zong, Xin Wen, Bingchen Zhao
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[paper] [code] [bibtex]

We develop a benchmark (C-VQA) composed of counterfactual visual questions to evaluate the compositional reasoning ability of VQA models.