Chat with a vision language model

Chat with text / text+image / text+video.

Message