Data Science Fellowship Program
Please note - We are not accepting applications for our Fellows Program at this time. Please check back in 2023 for new opportunities.
This practicum-style fellowship program gives early career researchers the opportunity to gain practical knowledge and skills that are needed to manage national-scale data repositories.
Through the program, fellows gain experience in one or more of the following activities:
- Solve data and software issues relating to environmental science, working closely with our data and informatics team
- Undertake research related to open data infrastructure and practice
- Conduct outreach and create learning materials designed to enhance awareness and understanding of reproducible research
Data Science Fellows are in residence at NCEAS for 8-12 months.
Fellows gain experience and mentorship in activities directly related to the research project undertaken. We also encourage fellows to develop new projects and collaborations.
Fellowship Benefits
- Programming skills, including skills in new data management tools and languages
- Exposure to the day-to-day activities of managing national-scale data repositories
- Data science research, development, and outreach experience with the NCEAS informatics team
- A deep understanding of data management and software for data systems
- Experience working with a team passionate about environmental data science
Program Contact
Jeanette Clark
Projects Data Coordinator
Current Project Opportunities
We are currently interested in supporting projects through the Arctic Data Center at NCEAS that support of Arctic social science data preservation and management and/or the Arctic social science community directly. Below are some of the type of projects that would fit within this area.
- Arctic social science synthesis research
Opportunity for fellow identified research activity that leverages social science data available within publicly accessible repositories, analysed with the support to the Arctic Data Center team - Evaluation of Social Science researcher perspectives and practices around data sharing and reuse
This work would support the current social science working group in development, implementation and analysis of a survey - Analysis of impact of open data reuse on social science disciplines
In conjunction with a social science working group, examine the reuse of open data within different social science disciplines, and estimate the impact on those disciplines. For example, one might contrast the impact of reuse of large survey datasets like the US Census and the American Community Survey with available small community well-being surveys from individual projects or communities. - Analysis of current social science data available through public repositories
Work in support of the current social science working group to provide independent data on evidence of data sharing across social sciences. Includes cross reference of funder awards database against available data and evaluation of funder policies - Community informed development of a social science data portal
Collaboration with the Arctic Data Center team on the development of a data portal(s) supporting the social science community. Includes outreach and communication, focus group conversations and usability assessment to understand and meet the needs of the social science research community - Data management training materials in support of social science research
In collaboration with the training team, develop education modules addressing topics of specific interest to the social science research community for integration into the NCEAS Learning Hub / Arctic training catalog. - Social science representation on arcticdata.io
Create resource pages that collates relevant information for social science researchers and directly addresses questions that are high priority for social science data.
Former Project Opportunities
We have a comprehensive suite of R packages that allow us to process data and metadata. We can make the tools more accessible to a broader community of scientists by converting useful functions from our current R packages (arcticdatautils, datamgmt, metajam, recordr) to Python. This library, or libraries, should be built using the pre-existing DataONE python library https://github.com/DataONEorg/d1_python as a dependency. This project will require familiarity with DataONE infrastructures and the current packages; before advancing into the development phase for the python package.
The goal of this project is to incorporate the Arctic Data Center support team’s data and metadata quality standards into the automated Quality Reports accessed from specific landing pages in the UI. There is a need to add checks for coordination between data objects, system metadata, and metadata to the currently existing metadata checks. This includes renovation of existing checks of EML objects, reading and checking data objects against the EML, and system metadata evaluation across all objects. A sophisticated approach is required so that the UI display renders quickly in all browsers and OSs, and displays a clear breakdown of how scores are calculated. This project entails both front-end design and back-end development and provides ample opportunities to collaborate with the development and support teams.
We currently have several R packages that have been developed organically through several rounds of development. The goal of this project is to consolidate our R packages in a coherent way for our users to interact with DataONE API. We will use a hierarchical approach with low level packages designed for advanced users and public facing packages for high level interactions. The packages within scope of the project are: R dataone, datapack (helpers, public facing), arcticdatautils (helpers, non-public facing), and datamgmt (helpers, non-public facing), recordr (prov) and metajam (download helper, public facing). There will also a need to organize interactions with external packages such as EML, Assembly line from EDI, rdryad and zenodo. There is a need to figure out how we can take the best parts of each of these, remove redundancy and put them into a coherent set of packages on CRAN.
The Arctic Data Center supports the preservation and curation of research data from within the panArctic region. The Center also provides documentation and training on best practices for data management. In this project, the fellow will create instructional materials designed for university level students that can be taught as a component within established curricula. The materials will explore discovery, integration and use of Arctic research data and provide information on using the Arctic Data Center as part of the hands-on lesson. Other modules will explore the importance and creation of metadata as well as policies surrounding data preservation, use and reuse. Intended for use in classes focussed on geography, environmental science, archeology etc the modules will be augmented with thematic case studies drawn from data within the Arctic Data Center. Finally, the fellow will support the publication of these materials to the website, as part of the Skillbuilding Hub and promote them across appropriate research institutions and networks.
The Arctic Data Center supports the preservation and curation of research data from within the panArctic region. All current NSF funded Arctic research metadata are deposited to the Arctic Data Center and the Center also includes a large number of historic and previously funded data sets. Many of these data have been cited in publications by the data authors and by other conducting related or synthesis research. This publication database needs to be updated and the citations linked to the data within the Center. This project will undertake a literature and text search approach to identifying relevant publications in peer-reviewed journals. In addition, the Arctic Data Center and the KNB Data Repository need a mechanism to track these data set citations and record them in a structured database as annotations that are accessible to the Metacat data management server. Another component of this project would be to work with the software development team at NCEAS to design and implement such an extension, and utilize it to trigger updates to our DOI metadata that is sent to DataCite for managed data packages.
The LTER Network Office (LNO) fosters enhanced communication, collaboration, synthesis, training, and engagement across the LTER Network. To promote analysis and synthesis of LTER data, the LNO fund and support synthesis working groups. NCEAS computing team provides data science support to these scientists and helps them with any data challenges they might have. This is a unique opportunity to get hands-on training in data science, work with synthesis projects, and interact with researchers. The tasks will include: Helping with data acquisition, processing, and analysis challenges; Wrangling data for heterogeneous ecological and climate datasets; Testing and setting up web-based tools to assist with scientific collaboration; Preserving scientific products by documenting and archiving scientific findings.
-
The Next Generation of Environmental Scientists are Data Scientists
Hear from four of our previous data science fellows about what they found valuable about their experiences.
Read
The department is especially interested in candidates who can contribute to the diversity and excellence of the academic community through research, teaching and service.
The University of California is an Equal Opportunity/Affirmative Action Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability status, protected veteran status, or any other characteristic protected by law.