✍️Quick Start
MM-LLMs We build a websit for the latest advances in MM-LLMs. We aim to support researchers in MM-LLMs with our work, and we encourage everyone to contribute to the website by adding the latest...
MM-LLMs We build a websit for the latest advances in MM-LLMs. We aim to support researchers in MM-LLMs with our work, and we encourage everyone to contribute to the website by adding the latest...
Paper: DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Project Link Publisher: Arxiv Author Affiliation: DeepSeek-AI Functional Division ...
Paper: Apollo: An Exploration of Video Understanding in Large Multimodal Models Project Link Publisher: Arxiv Author Affiliation: Meta GenAI Functional Division Understanding ...
Paper: StreamChat: Chatting with Streaming Video Project Link Publisher: Arxiv Author Affiliation: CUHK MMLab Functional Division Understanding Generation Desi...
Paper: LinVT: Empower Your Image-level Large Language Model to Understand Videos Project Link Publisher: Arxiv Author Affiliation: Meituan Functional Division Understanding ...
Paper: CompCap: Improving Multimodal Large Language Models with Composite Captions Publisher: Arxiv Author Affiliation: Meta Functional Division Understanding Generation ...
Paper: T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs Project Link Publisher: Arxiv Author Affiliation: USTC Functional Division Understanding ...
Paper: LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Project Link Publisher: Arxiv Author Affiliation: Meta AI Functional Division Understa...
Paper: AURORACAP: EFFICIENT, PERFORMANT VIDEO DETAILED CAPTIONING AND A NEW BENCHMARK Project Link Publisher: Arxiv Author Affiliation: University of Washington Functional Division ...
Paper: OMG-LLaVA: Bridging Image-level,Object-level, Pixel-level Reasoning and Understanding Project Link Publisher: NeurIPS 2024 Author Affiliation: Wuhan University Functional Division...