Any-to-any Multimodal Instruction Dataset
- Paper: AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
- GitHub Link
- Publisher:
Arxiv
- Author Affiliation:
Fudan University
- Multi-turn
- ✔
- ✖
- Input Modalities $\rightarrow$ Output Modalities
(I: Image, V: Video, A: Audio, 3D: Point Cloud, T: Text, B: Bounding box, Tab: Table, Web: Web page)- I+A(Speech+Music)+T $\rightarrow$ I+A(Speech+Music)+T
- I/V/A Scale
- I
Not report
- V
Not report
- A
Not report
- I
- Dialog Turn
Not report
- Instance Scale
180K
This post is licensed under CC BY 4.0 by the author.