LLaMA-VID

Paper: LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models GitHub Link Publisher: Arxiv Author Affiliation: CUHK Functional Division Understanding Generation ...

Nov 28, 2023 Arxiv

MMMU

Paper: MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI Project Link Publisher: Arxiv Author Affiliation: IN.AI Research

Nov 27, 2023 Arxiv

GeoChat

Paper: GeoChat: Grounded Large Vision-Language Model for Remote Sensing Project Link Publisher: CVPR 2024 Author Affiliation: Mohamed bin Zayed University of AI

Nov 24, 2023 CVPR 2024

Related Survey 2

Paper: Multimodal Large Language Models: A Survey Project Link: None Publisher: BigData 2023 Author Affiliation: Jinan University

Nov 22, 2023 BigData 2023

Related Survey 5

Paper: A Survey on Multimodal Large Language Models for Autonomous Driving Project Link Publisher: WACV 2024 Author Affiliation: Purdue University

Nov 21, 2023 WACV 2024

ShareGPT4V

Paper: ShareGPT4V: Improving Large Multi-Modal Models with Better Captions GitHub Link Publisher: Arxiv Author Affiliation: University of Science and Technology of China Functional Divis...

Nov 21, 2023 Arxiv

ShareGPT4V’s IT

Paper: ShareGPT4V: Improving Large Multi-Modal Models with Better Captions GitHub Link Publisher: Arxiv Author Affiliation: University of Science and Technology of China Type ...

Nov 21, 2023 Arxiv

LION

Paper: LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge GitHub Link Publisher: Arxiv Author Affiliation: Harbin Institute of Technology Functional Divis...

Nov 20, 2023 Arxiv

DocPedia

Paper: DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding GitHub Link: None Publisher: Arxiv Author Affiliation: Universi...