BRAVE
Paper: BRAVE: Broadening the visual encoding of vision-language models Project Link Publisher: ECCV 2024 Author Affiliation: Google Functional Division Understanding Gen...
Paper: BRAVE: Broadening the visual encoding of vision-language models Project Link Publisher: ECCV 2024 Author Affiliation: Google Functional Division Understanding Gen...
Paper: InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Project Link Publisher: NeurIPS 2024 Author Affiliation: Shanghai Ar...
Paper: Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference Project Link Publisher: Arxiv Author Affiliation: Westlake University Functional Division ...
Paper: VL-Mamba: Exploring State Space Models for Multimodal Learning GitHub Link Publisher: Arxiv Author Affiliation: The University of Adelaide Functional Division Understan...
Paper: KEBench: A Benchmark on Knowledge Editing for Large Vision-Language Models Publisher: Arxiv Author Affiliation: University of Chinese Academy of Sciences
Paper: DeepSeek-VL: Towards Real-World Vision-Language Understanding GitHub Link Publisher: Arxiv Author Affiliation: DeepSeek-AI Functional Division Understanding Gener...
Paper: The All-Seeing Project V2 Towards General Relation Comprehension of the Open World GitHub Link Publisher: Arxiv Author Affiliation: Shanghai AI Laboratory Functional Division ...
Paper: A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision Language Models Project Link Publisher: Arxiv Author Affiliation: Shanghai Jiao Tong University
Paper: AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling GitHub Link Publisher: Arxiv Author Affiliation: Fudan University Functional Division Understanding ...
Paper: AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling GitHub Link Publisher: Arxiv Author Affiliation: Fudan University Multi-turn ✔ ✖ Input Mo...