Carnegie Mellon University
December 12, 2022

A Better Way to Clean Neuroscience Data

By Stacy Kish

New technologies are helping scientific disciplines gather large volumes of data, but sifting through and cleaning this data is laborious. Neuroimaging is not immune to this modern research complication.

Functional Magnetic Resonance Imaging (fMRI) produces high-resolution brain images for neuroscience research. Several hours of scans can result in thousands of distinct images. The data can also be very noisy.

A multi-institutional team of researchers developed GLMsingle, a scalable, user-friendly toolbox to clean and reduce the data variability. The toolbox works in MATLAB and Python. GLMsingle requires only two inputs — fMRI time-series data and a design matrix. The work is published in the online journal eLife.

“As the fields of cognitive neuroscience and artificial intelligence have converged in recent years, task fMRI datasets have grown rapidly in size and scope,” said project contributor Michael Tarr, the Kavčić-Moura Professor of Cognitive and Brain Science at Carnegie Mellon University. He also is head of the Department of Psychology and a member of the Neuroscience Institute. “The hope is that bigger datasets will unlock a new array of data-driven discoveries, and GLMsingle could impact the quality and sensitivity of hundreds of existing and future datasets.”

According to the researchers, GLMsingle ‘learns’ how to differentiate the data signal from the noise and automatically adjusts its strategy to match the unique attributes of the dataset. Researchers can then analyze the clean fMRI data to produce more robust and replicable results.

To test the new tool package, the team applied it to the Natural Scenes Dataset, a large-scale fMRI dataset, and BOLD5000, a human fMRI study that includes almost 5,000 distinct images depicting real-world scenes. They found GLMsingle offered improved visual information data by disentangling neural responses to neighboring trial data, improving between-subject representational similarity, and enabling fine-grained image-level pattern decoding. These improvements led to more accurate analyses to advance neuroscience research. 

“Subjects in fMRI studies [perform tasks and] have their brain activity measured while seeing, hearing, doing or thinking specific things,” said Jacob S. Prince, a Ph.D. student at Harvard University and former member of the Tarr lab. Prince is first author on the paper. “The GLMsingle toolbox was designed with these task fMRI datasets in mind, which are basically the ‘bread and butter’ of cognitive neuroimaging.”

GLMsingle was designed for general applications and may be applied to nearly any fMRI experiments that use discrete events. The program is a publicly available tool that may improve the quality of past, present and future neuroimaging datasets that sample brain activity across many experimental conditions.


Tarr and Prince were joined by Ian Charest at the University of Montreal, Jan Kurzawski at New York University, John Pyles at the University of Washington and Kendrick Kay at the University of Minnesota on the project titled, “Improving the accuracy of single-trial fMRI response estimates using GLMsingle.” This work received support from the National Science Foundation.