Any-to-any Multimodal Instruction Dataset

Posted Feb 19, 2024 Updated Feb 24, 2024

By 1 min read

Paper: AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
GitHub Link
Publisher: Arxiv
Author Affiliation: Fudan University
Multi-turn
- ✔
- ✖
Input Modalities $\rightarrow$ Output Modalities
(I: Image, V: Video, A: Audio, 3D: Point Cloud, T: Text, B: Bounding box, Tab: Table, Web: Web page)
- I+A(Speech+Music)+T $\rightarrow$ I+A(Speech+Music)+T
I/V/A Scale
- I
  - Not report
- V
  - Not report
- A
  - Not report
Dialog Turn
- Not report
Instance Scale
- 180K

This post is licensed under CC BY 4.0 by the author.

Trending Tags