MLLM-Tool
Paper: MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning GitHub Link Publisher: Arxiv Author Affiliation: ShanghaiTech University Functional Division Unders...
Paper: MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning GitHub Link Publisher: Arxiv Author Affiliation: ShanghaiTech University Functional Division Unders...
Paper: SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model GitHub Link Publisher: Arxiv Author Affiliation: Northwestern Polytechnical...
Paper: MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer GitHub Link Publisher: Arxiv Author Affiliation: Shanghai AI Laboratory Functional ...
Paper: DiffusionGPT: LLM-Driven Text-to-Image Generation System GitHub Link Publisher: Arxiv Author Affiliation: ByteDance Inc Functional Division Understanding Generati...
Paper: Small LLMs Are Weak Tool Learners: A Multi-LLM Agent GitHub Link Publisher: Arxiv Author Affiliation: Sun Yat-sen University Functional Division Understanding Gen...
Paper: ModaVerse: Efficiently Transforming Modalities with LLMs GitHub Link: None Publisher: Arxiv Author Affiliation: University of Adelaide Functional Division Understanding...
Paper: GroundingGPT:Language Enhanced Multi-modal Grounding Model GitHub Link Publisher: Arxiv Author Affiliation: ByteDance Functional Division Understanding Generation...
Paper: 3DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding GitHub Link Publisher: Arxiv Author Affiliation: Beijing University of Posts and Telecommunications Functional Div...
Paper: GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse Project Link Publisher: Arxiv Author Affiliation: Hong Kong Baptist University
Paper: DocLLM: A layout-aware generative language model for multimodal document understanding GitHub Link: None Publisher: Arxiv Author Affiliation: JPMorgan AI Research Functional Divis...