ViGoR
Paper: ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling GitHub Link: None Publisher: Arxiv Author Affiliation: The University of Texas at...
Paper: ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling GitHub Link: None Publisher: Arxiv Author Affiliation: The University of Texas at...
Paper: ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling GitHub Link: None Publisher: Arxiv Author Affiliation: The University of Texas at...
Paper: SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models GitHub Link Publisher: Arxiv Author Affiliation: Shanghai AI Laboratory Functional Division...
Paper: MobileVLM V2: Faster and Stronger Baseline for Vision Language Model GitHub Link Publisher: Arxiv Author Affiliation: Meituan Inc. Functional Division Understanding ...
Paper: CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations GitHub Link Publisher: Arxiv Author Affiliation: Tsinghua University Functional Divis...
Paper: Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization GitHub Link Publisher: Arxiv Author Affiliation: Kuaishou Technology Functional Divisi...
Paper: Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models GitHub Link Publisher: Arxiv Author Affiliation: University of Edinburgh Functional Division ...
Paper: Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models GitHub Link Publisher: Arxiv Author Affiliation: University of Edinburgh Type SFT ...
Paper: LLaVA-NeXT: Improved reasoning, OCR, and world knowledge GitHub Link: None Publisher: Website Author Affiliation: University of Wisconsin-Madison Functional Division Un...
Paper: Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception GitHub Link Publisher: Arxiv Author Affiliation: Beijing Jiaotong University Functional Division ...