Post

Any-to-any Multimodal Instruction Dataset

  • Paper: AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
  • GitHub Link
  • Publisher: Arxiv
  • Author Affiliation: Fudan University
  • Multi-turn
  • Input Modalities $\rightarrow$ Output Modalities
    (I: Image, V: Video, A: Audio, 3D: Point Cloud, T: Text, B: Bounding box, Tab: Table, Web: Web page)
    • I+A(Speech+Music)+T $\rightarrow$ I+A(Speech+Music)+T
  • I/V/A Scale
    • I
      • Not report
    • V
      • Not report
    • A
      • Not report
  • Dialog Turn
    • Not report
  • Instance Scale
    • 180K
This post is licensed under CC BY 4.0 by the author.