Leveraging for big data regression

Ping Ma, Xiaoxiao Sun

Research output: Contribution to journalArticlepeer-review

72 Scopus citations

Abstract

Rapid advance in science and technology in the past decade brings an extraordinary amount of data, offering researchers an unprecedented opportunity to tackle complex research challenges. The opportunity, however, has not yet been fully utilized, because effective and efficient statistical tools for analyzing super-large dataset are still lacking. One major challenge is that the advance of computing resources still lags far behind the exponential growth of database. To facilitate scientific discoveries using current computing resources, one may use an emerging family of statistical methods, called leveraging. Leveraging methods are designed under a subsampling framework, in which one samples a small proportion of the data (subsample) from the full sample, and then performs intended computations for the full sample using the small subsample as a surrogate. The key of the success of the leveraging methods is to construct nonuniform sampling probabilities so that influential data points are sampled with high probabilities. These methods stand as the very unique development of their type in big data analytics and allow pervasive access to massive amounts of information without resorting to high performance computing and cloud computing.

Original languageEnglish (US)
Pages (from-to)70-76
Number of pages7
JournalWiley Interdisciplinary Reviews: Computational Statistics
Volume7
Issue number1
DOIs
StatePublished - Jan 1 2015
Externally publishedYes

Keywords

  • Big data
  • Estimation
  • Least squares
  • Leverage
  • Subsampling

ASJC Scopus subject areas

  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Leveraging for big data regression'. Together they form a unique fingerprint.

Cite this