About >

Presentations

 

Using data mining techniques in modelling of bird distribution data

Authors: W. M. Hochachka, Rich Caruana, Daniel Fink, Art Munson, Mirek Riedewald, Daria Sorokina, and Steve Kelling

Location & Date: 17th International Meeting of the European Bird Census Council. Chiavenna, Italy (April 2007)

Description: This talk was a summary of the material covered in the paper (to appear in the August 2007 issue of the Journal of Wildlife Management): an introduction to the philosophy and methods of data mining for ecologists. Examples were used to briefly introduce the audience to three strengths of data mining analyses: (1) production of accurate predictions, (2) identification of important predictor variables, and (3) identification of functional forms of relationships between predictors and the response variable.

keywords/keyphrases: data mining, exploratory analysis, bird distribution

Detecting Statistical Interactions with Groves of Trees

Authors: Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink

Location & Date: The Second North East Student Colloquium on Artificial Intelligence (NESCAI'07), Cornell University, Ithaca, NY, April 2007.

Description:  We propose a new approach for the problem of interaction detection based on comparing performance of different regression models. Our method is based on a new machine learning algorithm, a Grove of trees, which combines additive models with regression trees in a way that allows variable interactions to be carefully controlled. By comparing the performance of restricted and unrestricted groves of trees, the existence and degree of variable interactions in the response function can be reliably detected and estimated.

Semiparametric Analysis of Large-Scale Observational Data Using Hierarchical Predictive Models

Authors: Daniel Fink and Wesley M. Hochachka

Location & Date: The European Union for Bird Ringing (EURING) Technical Meetings, Dunedin, New Zealand (January 2007)

Description: Hierarchical models have emerged as the preferred tool for analyzing large sets of observational data, because (1) complicated, multifaceted processes can be factored into a series of simpler, conditionally independent sub-processes, and (2) a wide variety of parametric models can be incorporated and their validity explicitly tested.  However, there are many problems where there is insufficient a priori knowledge to justifiably specify parametric models at all stages of the hierarchy, even though an accurate (predictive) model is desired or even needed. For example, management of threatened or endangered species may require predictions of species' habitat preferences or responses to habitat alteration, even though insufficient information is available to construct a parametric model that is known to be a good abstraction of reality.  Ecological problems characterized by having more predictor information than prior knowledge are only likely to increase as large data sets become easier to obtain.  For example, the Avian Knowledge Network (AKN) currently contains millions of bird monitoring records linked to nearly 1000 landscape predictors.  As the number of predictors grows it becomes more difficult to determine what predictors most affect the distribution and abundance of bird populations and the specific character of their effects for parametric modeling.

We propose a new semiparametric regression technique which we call the hierarchical predictive model (HPM) to produce highly accurate predictions even when a fully parametric model cannot be specified with confidence. HPMs specify the parametric hierarchy, where justified, while relying on the complementary strengths of powerful nonparametric data mining methods to automatically discover and fit important structure elsewhere in the hierarchy. The practical appeal of this approach is that it allows one to include as much parametric structure as is justified by subject-area knowledge. At the same time, this semiparametric regression model employs nonparametric techniques to automatically account for additional predictors and processes that are less well understood. This makes HPMs well suited for the exploratory analysis of large observational data sets, especially data sets containing large numbers of potentially informative covariates.


Exploring the ecological consistency of bird conservation regions across a gradient of human density

Authors: W. M. Hochachka, D. Fink, D. N. Bonter, R. A. Caruana, S. T. Kelling, A. Munson, M. Riedewald, D. Sorokina

Location & Date: 4th North American Ornithological Congress. Veracruz, Mexico. (October 2006)

Description: The impact of humans of their environment varies, and one axis of this variation is along a gradient of human population density. In our talk we exampled whether the impacts of varying human density could be extrapolated from one ecological region within North America to another, using the NABCI Bird Conservation Regions (BCRs) to define ecological regions. We found that, even qualitatively, the relationships between human density and bird abundance would vary among BCRs for some of the 17 species that we examined in our analyses of data from the eastern U.S. And adjacent Canada.

keywords/keyphrases: rural – urban gradient, Bird Conservation Region (BCR), prevalence, urbanization


Data mining to explore spatial and temporal variation in bird distribution: irruptive winter migrants

Authors: D. Fink, W.M. Hochachka, R. Caruana, S. Kelling, A. Munson, M. Riedewald, D. Sorokin

Location & Date: 4th North American Ornithological Congress. Veracruz, Mexico. (October 2006)

Description:The ability to describe and visualize spatial and temporal variation in bird distribution, without presupposing specific underlying patterns, is important during exploration of bird monitoring data. We demonstrate how data mining techniques can be used to model spatial and temporal variation from landscape level monitoring data while controlling for variation in detectability based on covariate information. We use these techniques to explore the irruptive winter migrations of several species in the Eastern United States. The analyses are based on data from the citizen-science based winter monitoring program, Project FeederWatch. Variation in detection rates is modeled as a function of effort spent watching birds as well as effort spent attracting birds to back yard feeders. We describe how migration patterns vary over regions and within the winter season. Statistical methods are then used to test for associations between winter migration patterns and large-scale habitat characteristics.

Hierarchical Predictive Models

Author:Daniel Fink

Location & Date: Interface 2006, 38th Symposium on the Interface of Statistics, Computing Science, and Applications. Massive Data Sets and Streams.(Pasadena, California, May 2006)

Description:Semiparametric regression models incorporate flexible nonparametric components within a parametric hierarchical modeling framework. The practical appeal of this approach is that it allows one to include as much parametric structure as is justified by subject-area knowledge. At the same time, the semiparametric regression model employs nonparametric techniques to account for additional predictors and processes that are less well understood. Most semiparametric regression techniques cannot model more than a handful of predictors nonparametrically. Methodology for including a general class of nonparametric predictive models within the hierarchical framework is presented for the regression and binary classification problems. Utilizing data-mining techniques for the predictive model we show that many more predictors can be handled nonparametrically. We also show that this method can be viewed as a general approach for extending data-mining techniques to deal with dependent data. Simulation studies are used to evaluate the hierarchical predictive models. The method is also used to predict patterns of variation in North American bird populations from a large spatial data set. The information from these models provides useful information for conservation and land management.


The value of predicting variation in distribution and abundance of birds using data-mining and machine learning techniques.

Authors:  D. FINK, W . HOCHACHKA, M. RIEDEWALD and S. KELLING

Location & Date: American Ornithological Union, (Santa Barbara, California, 2005)

Description: Predicting variation in the distribution and abundance of birds across a landscape is important from many perspectives, including use of this information in conservation and management.
Traditionally, ornithologists have used parametric statistical techniques to identify environmental predictors of birds’ distributions. However, suites of potentially suitable techniques for the same purpose are actively under development in the fields of data-mining and machine learning, and to date these techniques are largely unknown to ornithologists. In this talk we will introduce several of these methods (including support vector machines and decision trees), explain the relative strengths of these methods in comparison to the more familiar parametric classification techniques, and assess the relative predictive power of multiple techniques in describing bird distributions from simulated data in which the
true distributions are known. Our results indicate that these new techniques, when coupled with data on both presence and absence of birds, are a tool that should be more widely used by ornithologists.