mPLUG-DocOwl’s IT
Paper: mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding GitHub Link Publisher: Arxiv Author Affiliation: Alibaba Group Type SFT RLHF ...
Paper: mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding GitHub Link Publisher: Arxiv Author Affiliation: Alibaba Group Type SFT RLHF ...
Paper: LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding GitHub Link Publisher: Arxiv Author Affiliation: Georgia Tech Type SFT RLHF ...
Paper: LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding GitHub Link Publisher: Arxiv Author Affiliation: Georgia Tech Functional Division Understan...
Paper: Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic GitHub Link Publisher: Arxiv Author Affiliation: SenseTime Research Functional Division Understanding ...
Paper: Kosmos-2: Grounding Multimodal Large Language Models to the World GitHub Link Publisher: Arxiv Author Affiliation: Microsoft Research Functional Division Understanding ...
Paper: A Survey on Multimodal Large Language Models Project Link Publisher: Arxiv Author Affiliation: Tencent YouTu Lab
Paper: AudioPaLM: A Large Language Model That Can Speak and Listen GitHub Link Publisher: Arxiv Author Affiliation: Google Functional Division Understanding Generation ...
Paper: LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark GitHub Link Publisher: NeurIPS 2023 Author Affiliation: Shanghai Artificial Intelligence Lab...
Paper: Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models GitHub Link Publisher: Arxiv Author Affiliation: Mohamed bin Zayed University of AI Functi...
Paper: Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models GitHub Link Publisher: Arxiv Author Affiliation: Mohamed bin Zayed University of AI Type ...