A Distributed Computing Platform for fMRI Big Data Analytics

Abstract:

Since the BRAIN Initiative and Human Brain Project began, a few efforts have been made to address the computational challenges of neuroscience Big Data. The promises of these two projects were to model the complex interaction of brain and behavior and to understand and diagnose brain diseases by collecting and analyzing large quanitites of data. Archiving, analyzing, and sharing the growing neuroimaging datasets posed major challenges. New computational methods and technologies have emerged in the domain of Big Data but have not been fully adapted for use in neuroimaging. In this work, we introduce the current challenges of neuroimaging in a big data context. We review our efforts toward creating a data management system to organize the large-scale fMRI datasets, and present our novel algorithms/methods for the distributed fMRI data processing that employs Hadoop and Spark. Finally, we demonstrate the significant performance gains of our algorithms/methods to perform distributed dictionary learning.

Existing Work:

The neuroscience has entered into the bigdata era just as other leading sciences. This arrival though requires a cultural shift a the community from enormous isolated efforts applying a single technique to the smaller problems in laboratories toward more horizontal approaches researchers integrate data collected using a variety of techniques to solve bigger problems addressing the central questions of how the brain functionally and structurally connected. We have categorized the current computational efforts of neuroscience experts for in dealing with the bigdata challenges in 6 groups of data management, data visualization, Cloud storage, computing platforms, processing pipelines and processing engines.

Proposed Work:

We introduced our endeavors to address each of the above categories, notably for fMRI data types. We introduced HELPNI as an efficient neuroinformatics platform for data storage, processing pipelines, and data visualization. We used our HAFNI method to represent the fMRI data through a dictionary learning algorithm, and then we developed and implemented the D-r1DL framework on Spark for distributed functional network analysis on large-scale neuroimaging data. We tested its performance on both the individual and group-wise fMRI data from HCP Q1 release dataset and demonstrated the results through an online visualization tool

CONCLUSION:

The neuroscience has entered into the bigdata era just as other leading sciences. This arrival though requires a cultural shift a the community from enormous isolated efforts applying a single technique to the smaller problems in laboratories toward more horizontal approaches researchers integrate data collected using a variety of techniques to solve bigger problems addressing the central questions of how the brain functionally and structurally connected. We have categorized the current computational efforts of neuroscience experts for in dealing with the bigdata challenges in 6 groups of data management, data visualization, Cloud storage, computing platforms, processing pipelines and processing engines. In this work, we introduced our endeavors to address each of the above categories, notably for fMRI data types. We introduced HELPNI as an efficient neuroinformatics platform for data storage, processing pipelines, and data visualization. We used our HAFNI method to represent the fMRI data through a dictionary learning algorithm, and then we developed and implemented the D-r1DL framework on Spark for distributed functional network analysis on large-scale neuroimaging data. We tested its performance on both the individual and group-wise fMRI data from HCP Q1 release dataset and demonstrated the results through an online visualization tool. The results show that the framework can meet the desired scalability and reproducibility requirements for fMRI bigdata analysis and serve as a useful tool for the community. The framework and the neuroinformatics system are both online as a web service for public usage and testing. Currently, we are working on applying the same algorithm using the Apache Flink framework on larger data. While Spark is vastly superior to Hadoop MapReduce for highly iterative computations, Flink possesses a few domain-specific advantages over Spark that yields additional performance gains for D-r1DL. We are also working on a general solution for fRMI signals to combine deep learning techniques with parallel processing engines to exhibit a new processing method for fMRI signals.

References

[1] Kaye, J., Heeney, C., Hawkins, N., De Vries, J., & Boddington, P. (2009). Data sharing in genomics—re-shaping scientific practice. Nature Reviews Genetics, 10(5), 331-335.

[2] Milham, M. P. (2012). Open neuroscience solutions for the connectomewide association era. Neuron, 73(2), 214-218.

[3] Leonelli, S. (2014). Data interpretation in the digital age. Perspectives on Science, 22(3), 397-417.

[4] “Brain Initiative,” 2014. [Online]. Available: https://www.braininitiative.nih.gov.

[5] “Human Brian Project,” 2013. [Online]. Available: https://www.humanbrainproject.eu.

[6] Choudhury, S., Fishman, J. R., McGowan, M. L., & Juengst, E. T. (2014). Big data, open science and the brain: lessons learned from genomics. Frontiers in human neuroscience, 8, 239.

[7] Van Essen, D. C., Smith, S. M., Barch, D. M., Behrens, T. E., Yacoub, E., Ugurbil, K., & WU-Minn HCP Consortium. (2013). The WU-Minn human connectome project: an overview. Neuroimage, 80, 62-79.

[8] Smith, S. M., Beckmann, C. F., Andersson, J., Auerbach, E. J., Bijsterbosch, J., Douaud, G., … & Kelly, M. (2013). Resting-state fMRI in the human connectome project. Neuroimage, 80, 144-168.

[9] Behrens, T. E., & Sporns, O. (2012). Human connectomics. Current opinion in neurobiology, 22(1), 144-153.

[10] Jbabdi, S., Sotiropoulos, S. N., Haber, S. N., Van Essen, D. C., & Behrens, T. E. (2015). Measuring macroscopic brain connections in vivo. Nature neuroscience, 18(11), 1546-1555.

[11] Mennes, M., Biswal, B. B., Castellanos, F. X., & Milham, M. P. (2013). Making data sharing work: the FCP/INDI experience. Neuroimage, 82, 683-691.

[12] Poldrack, R. A., Barch, D. M., Mitchell, J., Wager, T., Wagner, A. D., Devlin, J. T., … & Milham, M. (2013). Toward open sharing of task-based fMRI data: the OpenfMRI project. Frontiers in neuroinformatics, 7, 12.

[13] Sejnowski, T. J., Churchland, P. S., & Movshon, J. A. (2014). Putting big data to good use in neuroscience. Nature neuroscience, 17(11), 1440-1441.

[14] Ferguson, A. R., Nielson, J. L., Cragin, M. H., Bandrowski, A. E., & Martone, M. E. (2014). Big data from small data: data-sharing in the’long tail’of neuroscience. Nature neuroscience, 17(11), 1442-1447.

[15] Poldrack, R. A., & Gorgolewski, K. J. (2014). Making big data open: data sharing in neuroimaging. Nature neuroscience, 17(11), 1510-1517.

[16] Van Horn, J. D., & Toga, A. W. (2014). Human neuroimaging as a “Big Data” science. Brain imaging and behavior, 8(2), 323-331.

[17] Abolghasemi, V., Ferdowsi, S., & Sanei, S. (2015). Fast and incoherent dictionary learning algorithms with application to fMRI. Signal, Image and Video Processing, 9(1), 147-158.

[18] Ardekani, B. A., & Kanno, I. (1998). Statistical methods for detecting activated regions in functional MRI of the brain. Magnetic Resonance Imaging, 16(10), 1217-1225.

[19] Andersen, A. H., Gash, D. M., & Avison, M. J. (1999). Principal component analysis of the dynamic response measured by fMRI: a generalized linear systems framework. Magnetic Resonance Imaging, 17(6), 795-815.

[20] Bandettini, P. A., Jesmanowicz, A., Wong, E. C., & Hyde, J. S. (1993). Processing strategies for time‐course data sets in functional MRI of the human brain. Magnetic resonance in medicine, 30(2), 161-173