TinyGPT-V
Paper: TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones GitHub Link Publisher: Arxiv Author Affiliation: Nanyang Technological University Functional Division ...
Paper: TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones GitHub Link Publisher: Arxiv Author Affiliation: Nanyang Technological University Functional Division ...
Paper: MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices GitHub Link Publisher: Arxiv Author Affiliation: Meituan Inc. Functional Division Unders...
Paper: Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey Project Link: None Publisher: Arxiv Author Affiliation: Nanyang Technological University,
Paper: V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs GitHub Link Publisher: Arxiv Author Affiliation: UC San Diego Functional Division Understanding Ge...
Paper: InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks GitHub Link Publisher: Arxiv Author Affiliation: Shanghai AI Laboratory Functional D...
Paper: Generative Multimodal Models are In-Context Learners GitHub Link Publisher: Arxiv Author Affiliation: Beijing Academy of Artificial Intelligence Functional Division Und...
Paper: Gemini: A Family of Highly Capable Multimodal Models GitHub Link: None Publisher: Arxiv Author Affiliation: Google Functional Division Understanding Generation ...
Paper: CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update GitHub Link Publisher: Arxiv Author Affiliation: Peking University Functional Division Understanding ...
Paper: https://arxiv.org/abs/2312.10032 GitHub Link Publisher: Arxiv Author Affiliation: Zhejiang University Functional Division Understanding Generation Desig...
Paper: https://arxiv.org/abs/2312.10032 GitHub Link Publisher: Arxiv Author Affiliation: Zhejiang University Type SFT RLHF Multi-turn ✔ ✖ ...