reconstruct human vision from brain activities

2023-06-12 09:33:15

Decoding visual stimuli from brain recordings can provide a better understanding of the human visual system, thereby improving knowledge of the brain and cognitive processing. Researchers from the National University of Singapore and the Chinese University of Hong Kong have developed MinD-Video, an AI capable of creating high-quality videos from recorded brain activity using magnetic resonance imaging (fMRI) and an augmented Stable Diffusion model.

The idea of ​​using fMRI to decode and reconstruct video is not new: in 2011, researchers from the Jack Gallant Laboratory at the University of Berkeley, California, including Shinji Nishimoto, a postdoctoral researcher at the laboratory, l have associated with computer simulation to reconstruct, more or less approximately, trailers of Hollywood films viewed by the researchers.

Since then, progress in the field of deep learning has made it possible to develop generative AI to explore new approaches.

Thus, Shinji Nishimoto, now a professor at Osaka University in Japan, and one of his colleagues, Yu Takagi, used Stable Diffusion, the text-to-image generator released by Stability AI in August 2022, to transform cerebral activities measured by fMRI in still images.

Reconstructing vision from fMRI data

Our vision is a continuous and diverse stream of scenes, movements and objects, reconstructing it from fMRI data is a much more complex task than reconstructing still images.

Although fMRI can map brain activity at a specific location with high resolution, the oxygen-level-dependent (BOLD) blood signal it measures is notoriously slow: a pulse of neural activity causes a increase and decrease in BOLD over about 10 s. Each fMRI scan represents an average of brain activity during the snapshot.

In contrast, a typical video is around 30 frames per second (FPS). If an fMRI scan takes 2 seconds, during this time 60 video frames, potentially containing various objects, movements and scene changes, will be presented as visual stimuli. Thus, decoding fMRI and retrieving videos at much higher FPS than the temporal resolution of fMRI is a very complex task.

Mind-Video: high quality video reconstruction from brain activity

Jiaxin Qing, researcher from the Chinese University of Hong Kong as well as Zijiao Chen and Juan Helen Zhou from the National University of Singapore decided to tackle it: they propose a progressive learning approach to recover the continuous visual experience from fMRI, which allowed them to reconstruct high-quality videos with precise semantics and movements.

Cerebral decoding and video reconstruction. The researchers propose a progressive learning approach to recover continuous visual experience from fMRI. High-quality videos with precise semantics and movements are reconstructed.

They developed MinD-Video, a two-module pipeline designed to bridge the gap between decoding brain images and videos that they present as an extension of previous fMRI image reconstruction work: MinD-Vis (CVPR2023).


The two modules are trained separately and then refined together.

In the 1st module, the model used large-scale unsupervised learning with masked brain modeling to learn the general visual features of fMRI. The researchers then use the multimodality of the annotated dataset to train the fMRI encoder in the contrastive Language-Image (CLIP) pre-training space with contrastive learning.

In the second module, the learned features are refined through co-training with an augmented stable diffusion model, specially adapted for fMRI-guided video generation.

The researchers compared their results with those of different research devoted to fMRI-Video reconstruction: according to them, their method generates samples that are semantically more significant and correspond to the ground truth.

The reconstructed videos are for them of high quality with precise semantics, for example movements and scene dynamics. They evaluated their results with semantic and pixel metrics at the video and image levels: an accuracy of 85% is achieved in semantic metrics and 0.19 in SSIM, beating previous leading approaches by 45%.

For the team:

“Our method is still at the within-subject level, and the ability for between-subject generalization remains unexplored due to individual variations. Moreover, our method only uses less than 10% of cortical voxels for reconstructions, while the use of whole brain data remains unexploited.“.

The researchers conclude:

“We believe this area has promising applications as large models develop, from neuroscience to brain-computer interfaces. But government regulations and efforts by research communities are needed to ensure the confidentiality of a person’s biological data and prevent any misuse of this technology.”

Article reference: High-quality Video Reconstruction from Brain Activity

Authors: Jiaxin Qing, Chinese University of Hong Kong

Zijiao Chen and Juan Helen Zhou, National University of Singapore

1686564327
#reconstruct #human #vision #brain #activities

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.