Mobile-Agent

Paper: Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception GitHub Link Publisher: Arxiv Author Affiliation: Beijing Jiaotong University Functional Division ...

Jan 29, 2024 Arxiv

MoE-LLaVA

Paper: MoE-LLaVA: Mixture of Experts for Large Vision-Language Models GitHub Link Publisher: Arxiv Author Affiliation: Peking University Functional Division Understanding ...

Jan 29, 2024 Arxiv

LLaVA-MoLE

Paper: LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs GitHub Link: None Publisher: Arxiv Author Affiliation: Meituan Inc Functio...

Jan 29, 2024 Arxiv

InternLM-XComposer2

Paper: InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Models GitHub Link Publisher: Arxiv Author Affiliation: Shanghai Artificia...

Jan 29, 2024 Arxiv

WebVoyager

Paper: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models GitHub Link Publisher: Arxiv Author Affiliation: Zhejiang University Functional Division Under...

Jan 25, 2024 Arxiv

Yi-VL

Paper: Yi-VL GitHub Link Publisher: Website Author Affiliation: 01-ai Functional Division Understanding Generation Design Division Tool-using ...

Jan 23, 2024 Website

Vary-toy

Paper: Small Language Model Meets with Reinforced Vision Vocabulary GitHub Link Publisher: Arxiv Author Affiliation: MEGVII Functional Division Understanding Generation ...

Jan 23, 2024 Arxiv

RTVLM

Paper: M3IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning GitHub Link Publisher: Arxiv Author Affiliation: The University of Hong Kong Type SFT ...

Jan 23, 2024 Arxiv

KAM-CoT

Paper: KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning GitHub Link: None Publisher: AAAI 2024 Author Affiliation: Samsung R&D Institute Functional Division ...

Jan 23, 2024 AAAI 2024

RPG

Paper: Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs GitHub Link Publisher: Arxiv Author Affiliation: Peking University Functional Divisi...

Jan 22, 2024 Arxiv

Mobile-Agent

MoE-LLaVA

LLaVA-MoLE

InternLM-XComposer2

WebVoyager

Yi-VL

Vary-toy

RTVLM

KAM-CoT

RPG

Trending Tags