VL-GPT
Paper: VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation GitHub Link Publisher: Arxiv Author Affiliation: Tencent AI Lab Functional Divisi...
Paper: VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation GitHub Link Publisher: Arxiv Author Affiliation: Tencent AI Lab Functional Divisi...
Paper: CogAgent: A Visual Language Model for GUI Agents GitHub Link Publisher: Arxiv Author Affiliation: Tsinghua University Functional Division Understanding Generation...
Paper: VILA: On Pre-training for Visual Language Models GitHub Link: None Publisher: Arxiv Author Affiliation: NVIDIA Functional Division Understanding Generation ...
Paper: MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception GitHub Link Publisher: Arxiv Author Affiliation: Shanghai Artificial Intelligence Laboratory Funct...
Paper: Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects GitHub Link: None Publisher: Arxiv Author Affiliation: International Digi...
Paper: Silkie: Preference Distillation for Large Visual Language Models GitHub Link Publisher: Arxiv Author Affiliation: The University of Hong Kong Type SFT RLHF ...
Paper: Silkie: Preference Distillation for Large Visual Language Models GitHub Link Publisher: Arxiv Author Affiliation: The University of Hong Kong Functional Division Unders...
Paper: MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models Project Link Publisher: Arxiv Author Affiliation: Tencent Youtu Lab
Paper: MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models Project Link Publisher: Arxiv Author Affiliation: Tencent Youtu Lab
Paper: BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models Project Link Publisher: Arxiv Author Affiliation: Nanyang Technological University