Skip to main content

National Center for Ecological Analysis and Synthesis

Michelle Mohr stepped into the classroom. Through the tall third-floor windows, she could see blue sky, swaying palm and eucalyptus, and green hills marking the eastern edge of Santa Barbara. The room was furnished with heavy, dark wooden bookshelves and tables arranged lengthwise. A large TV screen faced the bright windows. Mohr and the other students found seats and pulled out their laptops, facing one another across the long tables. Mohr was excited, but she was also very nervous.

She had come to meet someone she wasn’t very fond of - the programming language R.

Mohr had just finished her master’s degree in environmental science and was looking forward to starting her PhD at UC Santa Barbara’s Bren School of Environmental Science and Management. She was confident about conducting fieldwork, but knew that she needed to boost her coding skills to successfully analyze the data she would soon collect. She came to NCEAS in April of 2023 to participate in coreR - a five-day immersive course in R, data management, and best practices for reproducibility.

R - the programming language - sat hulkingly in a corner of the classroom. They were a cryptic figure, always muttering in lines that Mohr couldn’t quite understand. She only ever saw them in their Studio, which was difficult to access. Some of her grad school colleagues were able to waltz right in, and R would work with them willingly. But every time she entered R’s Studio and tried to engage with them, R would clam up or speak gibberish. What was she even supposed to ask of them? She knew that you could load packages with tools that she needed into the Studio, but she didn’t know how to access them. “This is so difficult,” she thought to herself, “I don’t know why anybody uses it.”

Mohr decided that she was just bad at working with R, plain and simple. Throughout graduate school, there wasn’t much she could do about it besides trying to teach herself. The university she attended for her master’s degree didn’t offer any courses in R. Mohr relied on Microsoft Excel to analyze the field data she collected herself. When she needed something coded in R, she got help from her colleagues. At her university, knowledge of R was a badge, an indication of a person’s educational journey. These lucky ones carried R with them like an inexhaustible treasure.

Mohr wanted to be one of those people, able to help others with her knowledge and to advance her own research. She knew that R had packages developed specifically for her field, dendrochronology – the study of dating tree rings to the exact year they were formed. If only she could get on better terms with R, then she could get access to those packages and link her raw tree ring data with climate data. Ultimately, she hoped to synthesize new findings about growth patterns and climate change.

Mohr was glad to be in the classroom. Working from the NCEAS server with up-to-date equipment and versions meant that she didn’t have the same trouble loading content into the Studio.

The two instructors, Halina Do-Linh and Camila Vargas Poulsen, began the first day by welcoming R to the TV screen at the front of the room. In one fluid motion, R swept past Mohr and threw open the door to their Studio. All of the students had ample view of what Do-Linh and Vargas Poulsen were doing in R’s Studio and could follow along on their own laptops.  

Both Do-Linh and Vargas Poulsen are on good terms with R; they often work with them in their Studio, even when they aren’t teaching. They also invite him into their own workspaces in GitHub, an online platform for open-access collaborative coding and project management. In fact, all of the coreR course materials are neatly stored in GitHub for anyone to access.

The five-day immersive course was quickly underway. Mohr and her fellow classmates spent eight hours each day leapfrogging from skill to skill, starting with the most primary building blocks of R. On the first day of the coreR course, Mohr and the other students took a break from the computer screens to learn the principles of reproducibility and documentation using LEGOS. She soon learned the concept and principles of literate analysis and tidy data. She learned how to use Git and GitHub and, even more, how to use these tools to work collaboratively with others. She was introduced to the Data Life Cycle and the importance of Data Management when working with data. Most importantly, she learned how these tools work together to set up a reproducible workflow.

Two women sit at a table. One is assembling a structure made out of LEGOS blocks. The other is reading instructions from a piece of paper.
Two coreR participants practice an exercise in reproducibility and documentation during the April 2023 course offering, using LEGOS and written instructions.

The coreR course changed Mohr’s perception of reproducibility and open science. Before the course, she thought of it as a less formal, less necessary measure. If someone was really interested in the code and data underlying a study, couldn’t they just contact the project lead or the first author to get it?

The classroom atmosphere helped Mohr learn quickly. Whenever she started to feel lost during a lesson, she could look around the table and recognize her own confusion in her classmates’ faces. R would sigh and roll their eyes. Do-Lihn and Vargas Poulsen would pause, patiently answering questions and troubleshooting until everyone was on the same page again. Do-Lihn and Vargas Poulsen know from their own experience that learning these tools can be hard. They strive to provide a supportive learning environment in which it is safe to struggle and ask for support.

An instructor leans over a participant's shoulder to help troubleshoot on the participant's laptop.
Camila Vargas Poulsen helps a coreR student troubleshoot a line of code during the October 2023 course offering. In-person support and instruction is a huge advantage of attending coreR.

And then, things started to change for Mohr. R seemed a little kinder to her each day of the course.  She learned how to clean and wrangle her data, how to create maps, and how to work with spatial data with R - something she didn’t even know was possible before. “Ok, wow, this is actually fun!” she thought to herself. R smiled, watching Mohr dart around the Studio, load packages onto the shelves, and arrange code into spatial visualizations. 

* One year later… *

Mohr is well into her first year as a PhD candidate at UC Santa Barbara. She is studying the spread of an invasive pathogen called white pine blister rust in Yosemite National Park. Last summer, Mohr led a team of undergraduate students to survey infections among sugar pines.  Next summer they will return to look for and examine any individuals able to resist the pathogen.

Mohr calls up R just about everyday. Together they set up shop in GitHub, where Mohr can easily collaborate on code with others. Reproducibility is the name of the game. Not only can everyone keep track of edits, add contributions, and make comments on lines of code, but they can also restore archived versions if they need to go back to a previous iteration of the work.

“It also helps me to collaborate with myself,” she says. “When I do go to publish a paper, [the documentation is] all organized there and easily accessible, not only for me, but for other people.” And in fact, Mohr is getting ready to publish the first paper of her PhD work, stemming from data that she collected and analyzed herself in R.

After last season’s sugar pine survey, an undergrad student caught the bug. Not blister rust, but the coding bug. Mohr introduced the student to her new colleagues, R and GitHub. Knowing from experience that these tools are great for collaboration she set up a GitHub account for the student and shared code with her through a GitHub repository. This way, Mohr can keep track of the student’s progress, and the student can easily point Mohr to issues when things aren’t working correctly. They can also document all of their conversations about the code in the same repository instead of scattering thoughts and decisions throughout emails or other messaging threads.

Mohr reflects on her year of growth with gratitude. “It was something that helped me overcome a fear of coding,” she said of the coreR course.  “It was such a great experience that I would not have been able to do paying out of my own pocket.” Mohr is one of five recipients of the NCEAS Director’s Scholarship, which supports participants for whom the costs of registration, travel, and lodging would be prohibitive. “It really changed my trajectory and what I could do for my PhD.”

A woman stands on a third floor terrace patio in Santa Barbara, CA on a sunny day.
Michelle Mohr at NCEAS, April 2023.

The Learning Hub will offer the next coreR course October 7-11 2024 in Santa Barbara, CA. Registration for this course is now open and early bird pricing is available to those who register before June 30. NCEAS is committed to advancing diversity and equity among workshop participants. We recognize the financial barrier that registration and travel costs present. For the upcoming course we are able to award 3 full Director’s Scholarships. Apply here by August 2. 

For more information, please visit our website or email us at learning-hub@nceas.ucsb.edu.

Keep it c.o.r.e!

Category: Feature

Tags: Data Science, Environmental Data Science, Learning Hub, coreR, Scholarship