https://doi.org/10.1140/epjds/s13688-016-0089-x
Regular article
Quantifying decision making for data science: from data acquisition to modeling
iCeNSA, Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
* e-mail: nchawla@nd.edu
Received:
27
January
2016
Accepted:
9
August
2016
Published online:
20
August
2016
Organizations, irrespective of their size and type, are increasingly becoming data-driven or aspire to become data-driven. There is a rush to quantify value of their own internal data or the value of integrating their internal data with external data, and performing modeling on such data. A question that analytics teams often grapple with is whether to acquire more data or expend additional effort on more complex modeling, or both. If these decisions can be quantified a priori, it can be used to guide budget and investment decisions. To that end, we quantify the Net Present Value (NPV) of the tasks of additional data acquisition or more complex modeling, which are critical to the data science process. We develop a framework, NPVModel, for a comparative analysis of various external data acquisition and in-house model development scenarios using NPVs of costs and returns as a measure of feasibility. We then demonstrate the effectiveness of NPVModel in prescribing strategies for various scenarios. Our framework not only acts as a suggestion engine, but it also provides valuable insights into budgeting and roadmap planning for Big Data ventures.
Key words: cost sensitive learning / business value / external data
© Nagrecha and Chawla, 2016