You are here

Are Datasets Like Documents?: Evaluating Similarity-Based Ranked Search Over Scientific Data

Title	Are Datasets Like Documents?: Evaluating Similarity-Based Ranked Search Over Scientific Data
Publication Type	Journal Article
Year of Publication	2014
Authors	Megler VM, Maier D
Journal Title	IEEE Transactions on Knowledge and Data Engineering
Issue	99
ISSN	1041-4347
Abstract	The past decade has seen a dramatic increase in the amount of data captured and made available to scientists for research. This increase amplifies the difficulty scientists face in finding the data most relevant to their information needs. In prior work, we hypothesized that Information Retrieval-style ranked search can be applied to datasets to help a scientist discover the most relevant data amongst the thousands of datasets in many formats, much like text-based ranked search helps users make sense of the vast number of Internet documents. To test this hypothesis, we explored the use of ranked search for scientific data using an existing multi-terabyte observational archive as our test-bed. In this paper, we investigate whether the concept of varying relevance, and therefore ranked search, applies to numeric data – that is, are data sets are enough like documents for Information Retrieval techniques and evaluation measures to apply? We present a user study that demonstrates that dataset similarity resonates with users as a basis for relevance and, therefore, for ranked search. We evaluate a prototype implementation of ranked search over datasets with a second user study and demonstrate that ranked search improves a scientist’s ability to find needed data.
DOI	10.1109/TKDE.2014.2320737

Google Scholar