Researchers have made significant strides in reconstructing images from brain activity using functional magnetic resonance imaging (fMRI). While previous methods relied solely on visual information decoded from brain signals, these yielded limited accuracy and quality. To address this, a new approach has been developed that combines semantically complex details with visual details for reconstruction.
This innovative method employs two modules: visual reconstruction and semantic reconstruction. In the visual reconstruction module, a deep generator network (DGN) produces images based on visual information decoded from brain data. The generated images are then analyzed by a VGG19 network to extract visual features. Image optimization is performed iteratively to minimize the error between the decoded brain features and the extracted image features.
The semantic reconstruction module utilizes two models: BLIP and LDM. BLIP generates captions for each training image, from which semantic features are extracted. These features, along with brain data from training sessions, are used to train a decoder. This decoder is then employed to decode semantic features from human brain activity. The reconstructed image from the visual reconstruction module is then used as input to the LDM model, with the decoded semantic features provided as conditional input for semantic reconstruction.
The inclusion of decoded semantic features significantly improves reconstruction quality. This approach outperforms previous methods, achieving an accuracy of 0.812 and 0.815 for the inception and contrastive language-image pre-training (CLIP) metrics, respectively. Furthermore, it achieves an accuracy of 0.328 in the structural similarity index measure (SSIM), indicating superior performance as a low-level metric.
This groundbreaking research demonstrates the potential of combining visual and semantic information for reconstructing images from brain activity. It opens up new avenues for understanding human cognition and potentially even creating new forms of communication.