InternLM-XComposer
Paper: InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition GitHub Link Publisher: Arxiv Author Affiliation: Shanghai Artificial Intellig...
Paper: InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition GitHub Link Publisher: Arxiv Author Affiliation: Shanghai Artificial Intellig...
Paper: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning GitHub Link Publisher: Arxiv Author Affiliation: FAIR Functional Division Understanding ...
Paper: PointLLM: Empowering Large Language Models to Understand Point Clouds GitHub Link Publisher: Arxiv Author Affiliation: The Chinese University of Hong Kong Functional Division ...
Paper: Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond GitHub Link Publisher: Arxiv Author Affiliation: Alibaba Group Functional Divi...
Paper: Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages GitHub Link Publisher: Arxiv Author Affiliation: Tsinghua University Functional Division ...
Paper: Introducing IDEFICS: An Open Reproduction of State-of-the-Art Visual Language Model GitHub Link Publisher: Website Functional Division Understanding Generation ...
Paper: StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data GitHub Link Publisher: Arxiv Author Affiliation: University of Technology Sydney Type ...
Paper: BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions GitHub Link Publisher: AAAI 2024 Author Affiliation: UC San Diego Functional Division U...
Paper: Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes GitHub Link Publisher: Arxiv Author Affiliation: Zhejiang University Functional Division ...
Paper: MMBench: Is Your Multi-modal Model an All-around Player? Project Link Publisher: Arxiv Author Affiliation: Shanghai AI Laboratory