BRAVE

Paper: BRAVE: Broadening the visual encoding of vision-language models Project Link Publisher: ECCV 2024 Author Affiliation: Google Functional Division Understanding Gen...

Apr 10, 2024 ECCV 2024

InternLM-XComposer2-4KHD

Paper: InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Project Link Publisher: NeurIPS 2024 Author Affiliation: Shanghai Ar...

Apr 9, 2024 NeurIPS 2024

Cobra

Paper: Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference Project Link Publisher: Arxiv Author Affiliation: Westlake University Functional Division ...

Mar 21, 2024 Arxiv

VL-Mamba

Paper: VL-Mamba: Exploring State Space Models for Multimodal Learning GitHub Link Publisher: Arxiv Author Affiliation: The University of Adelaide Functional Division Understan...

Mar 20, 2024 Arxiv

KEBench

Paper: KEBench: A Benchmark on Knowledge Editing for Large Vision-Language Models Publisher: Arxiv Author Affiliation: University of Chinese Academy of Sciences

Mar 12, 2024 Arxiv

DeepSeek-VL

Paper: DeepSeek-VL: Towards Real-World Vision-Language Understanding GitHub Link Publisher: Arxiv Author Affiliation: DeepSeek-AI Functional Division Understanding Gener...

Mar 8, 2024 Arxiv

ASMv2

Paper: The All-Seeing Project V2 Towards General Relation Comprehension of the Open World GitHub Link Publisher: Arxiv Author Affiliation: Shanghai AI Laboratory Functional Division ...

Feb 29, 2024 Arxiv

CogBench

Paper: A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision Language Models Project Link Publisher: Arxiv Author Affiliation: Shanghai Jiao Tong University

Feb 28, 2024 Arxiv

AnyGPT

Paper: AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling GitHub Link Publisher: Arxiv Author Affiliation: Fudan University Functional Division Understanding ...

Feb 19, 2024 Arxiv

Any-to-any Multimodal Instruction Dataset

Paper: AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling GitHub Link Publisher: Arxiv Author Affiliation: Fudan University Multi-turn ✔ ✖ Input Mo...