NCEAS Working Groups
Machine learning for the environment
Project Description
We believe that environmental science, ecology, and conservation biology would be greatly enriched by expanding the ecologist's analytical toolbox to include machine learning (ML) approaches to data analysis. We use the term ML loosely to distinguish between parametric statistics and a variety of new, computational methods for recognizing and analyzing patterns in data. Generally, parametric methods assume highly restrictive theoretical properties of data, such as additivity, linearity, independence, and distribution (e.g., normality). Ecological data, by contrast, represent highly complex systems and commonly violate these assumptions [1-3]. Unfortunately, failure to appreciate these subtleties of ecological data often results in misguided analysis and incomplete or incorrect conclusions. In recent years, ML researchers have developed techniques for analyzing data not suited to parametric statistics. Older machine learning algorithms include neural networks and decision trees. Now, newer techniques like boosting and kernel methods (e.g., support vector machines), provide new opportunities for extracting subtle patterns from complex data, while hybrid methods integrate parametric models and ML to exploit computation and hard-won biological understanding simultaneously. Despite successes elsewhere (e.g., bioinformatics, astrophysics) ML has not been widely adopted by ecologists. Complex situations that might be addressed with ML include identifying optimal policies for managing ecological systems under uncertainty, forecasting, nonlinear modeling, and scientific inference with non-independent data. Accommodating these scientific and statistical difficulties within parametric statistics ranges from cumbersome to impossible. Therefore, we propose a working group to identify obstacles, scope out promising research, produce case studies, and develop a book length tutorial for ecologists on the practical application of ML.

Principal Investigator(s)
John M. Drake, William T. Langford
Project Dates
Start: June 1, 2006
completed
Participants
- Peter M. Buston
- Consejo Superior de Investigaciones Científicas (CSIC)
- Rich Caruana
- Cornell University
- Jonathan M. Chase
- Washington University in St. Louis
- T. Jonathan Davies
- University of California, Santa Barbara
- Thomas G. Dietterich
- Oregon State University
- Andrew P. Dobson
- Princeton University
- John M. Drake
- University of Georgia
- Saso Dzeroski
- Jozef Stefan Institute
- Jane Elith
- University of Melbourne
- Cesare Furlanello
- Istituto Trentino Di Cultura
- Trevor Hastie
- Unknown
- Reuben P. Keller
- University of Notre Dame
- Andreas Krause
- Carnegie Mellon University
- William T. Langford
- RMIT University
- Dragos Margineantu
- Unknown
- Julian D. Olden
- University of Washington
- Gill Ward
- Stanford University
- Matt White
- Arthur Rylah Institute for Environmental Research
- Bianca Zadrozny
- Universidade Federal Fluminense
Products
-
Journal Article / 2011
Determinants of reproductive success in dominant pairs of clownfish: A boosted regression tree analysis
-
Journal Article / 2012
Trait-based risk assessment for invasive species: High performance across diverse taxonomic groups, geographic ranges and machine learning/statistical tools
-
Journal Article / 2011
Scavenging: How carnivores and carrion structure communities