Interview with David Reshef, Continued:
Decoded Science: Does this program require manual data entry, or does it have the capacity to accept input from existing databases?
D. Reshef: Not at all (manual entry, that is). In fact, you can more or less feed in any existing data.
Decoded Science: If it is compatible with existing databases, is there any particular format that is most efficient?
D. Reshef: The only constraint on the data is that the current input format is just columns of numbers (e.g. comma or tab delimited files that can be generated by many different platforms). So any data set that can be formatted in this way (e.g. many columns where each column is a different variable) can be easily read in by the software.
Decoded Science: What inspired this project?
D. Reshef: We actually didn’t set out to address this problem at all originally. We were looking to create a new data analysis platform that would enable researchers to explore datasets easily and intuitively. Naturally, we thought the first thing such a program should do is help the user figure out which variables are the most interesting ones to look at. But that problem, which was much easier to state than to answer, turned out to be fascinating and we’ve been hung up on it ever since!
Decoded Science: What do you consider to be the most important implications of this project, for data mining, and for society in general?
D. Reshef: Our ability to collect and store data is growing every day. We see this project as part of a larger push, which is occurring simultaneously in many disciplines, to do more with the massive amounts of data being collected. To us, developing tools to help ease the data load on scientists, economists, doctors, politicians, and financial analysts alike is a compelling goal and we hope our work has brought us one step closer to achieving it.
Data Mining Takes Off
Our ability to sort, track, and prioritize patterns within huge sets of data increases all the time. With MINE, data mining has taken off like a rocket, and computing may never be the same.
Reshef, DN et al. Detecting novel associations in large data sets. Science DOI: 10.1126/Science1205438. Accessed December 15, 2011.