Projects undertaken by NCEAS staff and collaborators address key challenges including:
- Storage and management of environmental data
- Discovery and preparation of data for further analysis and synthesis
- Advanced automated machine processing of information and models
- Making these capabilities available to practicing scientists
Knowledge Network for Biocomplexity (KNB)
KNB is a network for data sharing that facilitates ecological and environmental research. KNB is a collaborative effort including ecologists and technologists. Partners include NCEAS, Long Term Ecological Research Network (LTER), San Diego Supercomputer Center (SDSC), and Texas Tech University (TTU). The goal of KNB is to enable efficient discovery, access, interpretation, integration, and analysis of complex ecological data from a highly distributed set of field stations, laboratories, research sites, and individual researchers. KNB software products include applications to describe, store, and query ecological data from a common framework. KNB produced a structured metadata format for ecological data (EML), software to generate this format (Morpho), and a robust metadata and data management system (Metacat) that enables researchers to participate in a distributed global network of Data Repositories. Funded in 1999 by the National Science Foundation - Knowledge & Distributed Intelligence Program DEB-0072909
Production Implementation of the Knowledge Network for Biocomplexity
In this project, our goal is to refine the software tools and technology frameworks developed as part of the KNB research effort, so that these are highly usable by research scientists on a practical basis. Dedicated software engineers and metadata coordinators will optimize and assist in the use of KNB technologies, with the specific aim of promoting the use of the KNB as a rich information source for the ecological community. Populating the KNB has two components: 1) identifying and locating appropriate data, and 2) facilitating their inclusion in the KNB system. Raising the awareness of these tools within the ecological community will both make the system more useful, and imbue the community with the interest and skills needed to enhance the long-term value of data. This, in turn, should stimulate news ways of conducting ecological research. KNB technologies facilitate the development and support of data registries being used by a growing number of organizations. For example, there is currently a prototype operating on behalf of the Ecological Society of America to allow authors to register data associated with their journal articles. Funded in 2003 by the Andrew W. Mellon Foundation
Science Environment for Ecological Knowledge (SEEK)
SEEK is a multi-institutional collaboration of ecologists, systematists, and computer scientists researching scientific workflow modeling with advanced semantics. The goals of SEEK are to make fundamental improvements in how researchers can 1) gain broad access to ecological data and information, 2) rapidly locate and utilize distributed computational services, and 3) employ powerful new methods for capturing, reproducing, and extending the analysis process itself. The SEEK approaches to data are compatible with the KNB technologies, but significantly extend these to incorporate data resources from the natural history museum and biodiversity science communities, as well as the geosciences and remote-sensing communities. Products include the scientific workflow application, Kepler, and the EcoGrid, a network of networks of ecologically-relevant data and analytical components. This project incorporates cutting-edge advances in semantic mediation and knowledge representation. SEEK is a collaborative project of LTER, NCEAS, SDSC, the University of Kansas (Biodiversity Research Center) and the University of California, Davis. Funded in 2002 by the National Science Foundation - Information Technology Research program DBI-0225676
Kepler: The Kepler Project's overall goal is to produce an open-source scientific workflow system that allows scientists to design scientific workflows and execute them efficiently using emerging Grid-based approaches to distributed computation. Kepler work at NCEAS was originally funded as part of the Science Environment for Ecological Knowledge (SEEK). The transition from a research prototype to reliable software system has been funded under the Kepler/CORE project, a multi-institutional collaboration with UC Davis, NCEAS, and UC San Diego. The Kepler project has grown to become a cross-project collaboration with contributing members from Kepler/CORE, SEEK, SDM Center , Ptolemy, GEON, and many others. Funded, in part, in 2002 by the National Science Foundation - Information Technology Research Program DBI-0225676 and in 2007 by the National Science Foundation - Office of Cyberinfrastructure OCI-0722079.
Ecological Metadata Language (EML): EML is a metadata specification that can be used to comprehensively describe ecological data in terms of content, structure, and research context. EML is a formalization and extension of prior work done by the Ecological Society of America and associated efforts (Michener et al., 1997, Ecological Applications). EML is defined and revised through an on-going community effort, particularly involving the participation of ecological research station information managers and other interested parties. A number of prominent research organizations, such as NCEAS, LTER, the UC Natural Reserve System, Kruger National Park, and OBFS, are expressing interest in or actively using EML as their interchange language and cataloguing standard for ecological metadata. EML was generated largely as a product of the Knowledge Network for Biocomplexity.
Real-time Environment for Analytical Processing (REAP): The REAP project's goal is to extend the Kepler scientific workflow system to fully integrate access to sensor networks. New capabilities will include the ability to include sensor data in workflows, monitor, inspect and control sensor networks, and simulate the design of new sensor networks. REAP is a collaboration among NCEAS, San Diego Supercomputer Center, UC Davis, OPeNDAP, UC Los Angeles, and Oregon State University Funded in 2006 by the National Science Foundation - Cyberinfrastructure for Environmental Observatories (CEOP) Program
FIRST Project: The Faculty Institutes for Reforming Science Teaching (FIRST) project is developing new metadata standards for assessment in ecological education to facilitate the exchange of educational assessment data. Participants will be developing means for semantically describing assessment instruments to allow comparison of different assessment techniques. NCEAS is participating as a subcontractor on this project lead by researchers at Michigan State University. Funded in 2006 by the National Science Foundation
Jalama: Capturing Data in the Field: Jalama, developed jointly with scientists at the Marine Science Institute at UCSB, investigated how rich metadata can be used to develop flexible, easy to use forms for data entry in lab and field environments for ecology. Research focused on clarifying how to automate the creation of effective user interfaces for data collection. Software products from the project target both desktop and handheld computers. Funded in 2002 by the National Science Foundation - Biological Databases & Informatics program DBI-0131178
UC Natural Reserve System Data Registry (NRS): The University of California Natural Reserve System contributes to the understanding and management of the Earth and its natural systems by supporting university-level teaching, research and public service at protected natural areas throughout California. NCEAS collaborates with the NRS in building an information management system that facilitates research and education in the UC NRS. One of the major projects is the UC NRS Data Registry , which is based on the KNB technologies. Funded in 1999 by the University of California
VegBank - US National Vegetation Classification: The VegBank online data repository is being developed to store vegetation data in support of the US National Vegetation Classification. The system is comprised of three components used to archive vegetation plots data, plant taxonomic data, and vegetation community data. Major programming efforts and technology infrastructure are located at NCEAS, in partnership with investigators at the University of North Carolina, and the Panel on Vegetation Classification, Ecological Society of America. Funded in 2000 and 2002 to the University of North Carolina by the National Science Foundation Biological Databases and Informatics Program DBI-0213794, DBI-9905838
Resource Discovery Initiative for Field Stations (RDIFS): RDIFS Research Coordination Network (RCN) activities focus principally on enhancing the ecological informatics infrastructure for field biology and developing mechanisms for discovery of data and information resources that can facilitate research and education at North American biological field stations. These objectives are being accomplished through two integrated networking activities: (1) research that encompasses five inter-related resource discovery activities and (2) an intensive training component that provides field station personnel with a solid foundation in the computational and informatics skills that are critical for developing, archiving, managing, and communicating data and information resources. As part of the LTER-lead RDIFS effort, NCEAS has adapted tools from the KNB to create the Organization of Biological Field Stations Data Registry. LTER Network Office - funded in 2001 by the National Science Foundation - Research Collaboration Networks Program
NCEAS Data Repository is a standards-based documentation of metadata and data from synthesis projects arising at NCEAS, based on KNB technologies.
LTER Data Catalog is a collaboration with LTER Data Managers and the LTER Network Office to develop metadata standards and promote search capabilities for data and metadata. It is based largely on KNB technologies.
Interaction Web Database provides web-based access and submissions of data concerning ecological interactions, particularly pollination/pollinator relationships.
Global Population Dynamics Database is an extensive collection of time series data from plant and animal populations, hosted by the Center for Population Biology at Silwood Park, and co-developed with NCEAS, and the Department of Ecology and Evolution at the University of Tennessee.
Kruger National Park, South Africa NCEAS collaborates with information managers and scientists at Kruger National Park to develop effective informatics solutions for data collected within the park, as well as for research purposes, decision-making, and public edification. The collaboration has a special focus on the confederation of data, and the development and deployment of Kepler workflow solutions for conservation management analyses. The project is based on KNB and SEEK technologies. This approach is also being adopted at each of the 22 South African national parks (SANParks). SANParks Data Repository Funded by the Andrew W. Mellon Foundation
Paleobiology Database is a web-based resource of fossil information that includes 52,000 collection records and 511,889 taxonomic occurrences from 13,962 published references. The project is led by research scientist Dr. John Alroy, who is a former NCEAS Postdoctoral Associate.
Webs on the Web (WOW) This project will develop the information technology needed to increase the quality, sophistication, and pedagogical accessibility of analyses and visualizations of ecological network data. The complexity of ecological network data is immense and therefore represents a challenging opportunity for software development targeting the ecological sciences. Funded in 2002 by the National Science Foundation - Biological Databases & Informatics Program
Metadata Editor: This project was an early effort designed to test the effectiveness of the then emerging Extensible Markup Language (XML) for representing ecological metadata. A prototype metadata editor was developed that allowed us to develop plans for the current Morpho and Metacat system developed under the KNB project. Funded by a 1997 National Science Foundation supplement to NCEAS
Postdoctoral Training in the Management of Environmental Information: NCEAS is involved with collaborators in developing new techniques for managing environmental information. For this project we have recruited postdoctoral researchers to work in three fundamental areas of informatics: Knowledge Representation, Taxonomic Nomenclature and Classification, and Informatics Training. The informatics training position has been developed in conjunction with the LTER Network Office, where the position is located. Through this project NCEAS is playing a pivotal role in the training of young scientists in the management and analysis of Environmental Information. Funded in 2003 by the Andrew W. Mellon Foundation


