MoE-LLaVA
Paper: MoE-LLaVA: Mixture of Experts for Large Vision-Language Models GitHub Link Publisher: Arxiv Author Affiliation: Peking University Functional Division Understanding ...
Paper: MoE-LLaVA: Mixture of Experts for Large Vision-Language Models GitHub Link Publisher: Arxiv Author Affiliation: Peking University Functional Division Understanding ...
Paper: LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs GitHub Link: None Publisher: Arxiv Author Affiliation: Meituan Inc Functio...
Paper: InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Models GitHub Link Publisher: Arxiv Author Affiliation: Shanghai Artificia...
Paper: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models GitHub Link Publisher: Arxiv Author Affiliation: Zhejiang University Functional Division Under...
Paper: Yi-VL GitHub Link Publisher: Website Author Affiliation: 01-ai Functional Division Understanding Generation Design Division Tool-using ...
Paper: Small Language Model Meets with Reinforced Vision Vocabulary GitHub Link Publisher: Arxiv Author Affiliation: MEGVII Functional Division Understanding Generation ...
Paper: M3IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning GitHub Link Publisher: Arxiv Author Affiliation: The University of Hong Kong Type SFT ...
Paper: KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning GitHub Link: None Publisher: AAAI 2024 Author Affiliation: Samsung R&D Institute Functional Division ...
Paper: Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs GitHub Link Publisher: Arxiv Author Affiliation: Peking University Functional Divisi...
Paper: CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark Project Link Publisher: Arxiv Author Affiliation: Hong Kong University of Science and Technology