The AI ​​of Xavier Niel sees (really) what you show him, be careful!

One of the assets of Moshivis lies in its ability to limit the needs in vocal data for its learning. Instead of requiring audio recordings, Kyutai has set up a textual “interior monologue” system, simulating internal dialogues that allow AI to form more economically. This approach allows a rapid skill rise, while reducing the necessary resources.

The first results are promising. On reference benchmarks such as OCR-VQA, VQAV2 or COCO, Moshivis reaches scores comparable to those of expert vision models. Crossed attention is also cache during inference, which further improves performance by reducing the calculation load.

Moshivis is available now online, free of charge, via a dedicated platform. Experimentation only requires providing an e-mail address to receive an access link. For the moment, the interaction with AI is done exclusively in English, but Kyutai suggests that other languages ​​could be taken care of in the future. An initiative that confirms the laboratory’s ambition to bring out a solid European alternative in the field of multimodal conversational AI.

Also Read  Amazon Announces Great Summer Sale, Offers Up to 75% Discount

Source link