MosIT

Posted Sep 11, 2023

By 1 min read

Paper: NExT-GPT: Any-to-Any Multimodal LLM
GitHub Link
Publisher: ICLR 2024
Author Affiliation: National University of Singapore
Type
- SFT
- RLHF
Multi-turn
- ✔
- ✖
Input Modalities $\rightarrow$ Output Modalities
(I: Image, V: Video, A: Audio, 3D: Point Cloud, T: Text, B: Bounding box, Tab: Table, Web: Web page)
- I+V+A+T $\rightarrow$ I+V+A+T
Source
- Youtube, Google, Flickr30k, Midjourney, etc.
Method
- Auto. + Manu.
I/V/A Scale
- I
  - 4K
- V
  - 4K
- A
  - 4K
Dialog Turn
- 4.8
Instance Scale
- 5K

This post is licensed under CC BY 4.0 by the author.

Trending Tags