Qwen-Audio
Paper: Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models GitHub Link Publisher: Arxiv Author Affiliation: Alibaba Group Functional Divisio...
Paper: Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models GitHub Link Publisher: Arxiv Author Affiliation: Alibaba Group Functional Divisio...
Paper: Volcano: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision GitHub Link Publisher: Arxiv Author Affiliation: Korea University Functional Division ...
Paper: Image Resolution and Text Label Are Important Things for Large Multi-modal Models GitHub Link Publisher: Arxiv Author Affiliation: Huazhong University of Science and Technology Fu...
Paper: How to Bridge the Gap between Modalities: A Comprehensive Survey on Multimodal Large Language Model Project Link: None Publisher: Arxiv Author Affiliation: Hefei University of Techn...
Paper: LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents GitHub Link Publisher: Arxiv Author Affiliation: Tsinghua University Functional Division Understanding ...
Paper: TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models GitHub Link: None Publisher: Arxiv Author Affiliation: Tencent Functional Division Understanding ...
Paper: mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration GitHub Link Publisher: Arxiv Author Affiliation: Alibaba Group Functional Division ...
Paper: GLaMM: Pixel Grounding Large Multimodal Model GitHub Link Publisher: Arxiv Author Affiliation: Mohamed bin Zayed University of AI Functional Division Understanding ...
Paper: CogVLM: Visual Expert for Pretrained Language Models GitHub Link Publisher: Arxiv Author Affiliation: Tsinghua University Functional Division Understanding Genera...
Paper: Evaluating Object Hallucination in Large Vision-Language Models Project Link Publisher: EMNLP 2023 Author Affiliation: Renmin University of China