BuboGPT
Paper: BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs GitHub Link Publisher: Arxiv Author Affiliation: Bytedance Functional Division Understanding Generation ...
Paper: BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs GitHub Link Publisher: Arxiv Author Affiliation: Bytedance Functional Division Understanding Generation ...
Paper: BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs GitHub Link Publisher: Arxiv Author Affiliation: Bytedance Type SFT RLHF Multi-turn ✔ ...
Paper: Planting a SEED of Vision in Large Language Model GitHub Link Publisher: Arxiv Author Affiliation: Tencent AI Lab Functional Division Understanding Generation ...
Paper: Planting a SEED of Vision in Large Language Model Project Link Publisher: Arxiv Author Affiliation: Tencent AI Lab Input Modalities $\rightarrow$ Output Modalities (I: Image, V: V...
Paper: Bootstrapping Vision-Language Learning with Decoupled Language Pre-training GitHub Link Publisher: NeurIPS 2022 Author Affiliation: Dartmouth College Functional Division ...
Paper: Generative Pretraining in Multimodality GitHub Link Publisher: ICLR 2024 Author Affiliation: Beijing Academy of Artificial Intelligence Functional Division Understandin...
Paper: SVIT: Scaling up Visual Instruction Tuning GitHub Link Publisher: Arxiv Author Affiliation: Beijing Academy of Artificial Intelligence Type SFT RLHF Mul...
Paper: GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest GitHub Link Publisher: Arxiv Author Affiliation: The University of Hong Kong Functional Division ...
Paper: What Matters in Training a GPT4-Style Language Model with Multimodal Inputs? GitHub Link Publisher: Arxiv Author Affiliation: ByteDance Research Functional Division Und...
Paper: mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding GitHub Link Publisher: Arxiv Author Affiliation: Alibaba Group Functional Division ...