VITRON
Paper: VITRON: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing Project Link Publisher: NeurIPS 2024 Author Affiliation: National University of Singapore...
Paper: VITRON: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing Project Link Publisher: NeurIPS 2024 Author Affiliation: National University of Singapore...
Paper: TroL: Traversal of Layers for Large Language and Vision Models Project Link Publisher: EMNLP 2024 Author Affiliation: KAIST Functional Division Understanding Gene...
Paper: VITA: Towards Open-Source Interactive Omni Multimodal LLM Project Link Publisher: Arxiv Author Affiliation: Tencent Youtu Lab Functional Division Understanding Ge...
Paper: EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Project Link Publisher: Arxiv Author Affiliation: NVIDIA Functional Division Understandin...
Paper: mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Project Link Publisher: Arxiv Author Affiliation: Alibaba Group Functional Division ...
Paper: Parrot: Multilingual Visual Instruction Tuning Project Link Publisher: Arxiv Author Affiliation: Nanjing University Functional Division Understanding Generation ...
Paper: video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models Project Link Publisher: ICML 2024 Author Affiliation: Tsinghua University Functional Division Understa...
Paper: VideoLLM-online: Online Video Large Language Model for Streaming Video Project Link Publisher: CVPR 2024 Author Affiliation: National University of Singapore Functional Division ...
Paper: Libra: Building Decoupled Vision System on Large Language Models Project Link Publisher: ICML 2024 Author Affiliation: Chinese Academy of Sciences Functional Division [...
Paper: CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts Project Link Publisher: Arxiv Author Affiliation: Georgia Tech & UIUC Functional Division Understan...