All Categories

Using GitHub Desktop to work with RStudio and GitHub

31/5/2023

We earlier wrote this post to show how we can use SSH key to connect GitHub with RStudio. This post was quite popular, but I think I give up teaching how to use SSH to connect to GitHub now, as I had too much trouble with it, especially teaching the course I coordinate (Big Data in BEES). For example, to directly access GitHub via RStudio on a Mac computer, we need to have Git installed; this means one needs to also install the XCode which is a big piece of software, and not everyone’s computer has space left for it.

Now, I know,it is so much easier to connect to GitHub with GitHub Desktop – no SSH key required.

Thus, my new workflow is:

Create a GitHub repo on the GitHub webpage (my profile page).
Clone the repo my computer using GitHub Desktop to create a local folder (note that you need to log in GitHub Desktop using your GitHub username and password).
Open a RStudio project using this local folder.
Then, you can just use RStudio normally but commit, push, and pull using GitHub Desktop.

In RStudio, it was not so easy to resolve problems, for example when a committed file was too big to push. In GitHub Desktop it is more intuitive to resolve. Also, it is easy to open the GitHub repo page via GitHub Desktop.

Now, I teach the above workflow at my class too and it seems students could start using GitHub and RStudio together quite smoothly via GitHub Desktop!

Here is a good YouTube video more on Git, GitHub and GitHub Desktop (note this video dooes not use RStudio but it is still useful to watch).

Diversity in academia: are we doing enough?

29/4/2023

by Pietro Pollo

From https://www.tlnt.com/diversity-is-not-just-about-gender-and-race/

Following the last I-DEEL blog post (by Samantha), I’d like to continue the conversation about biases we see in academia (and elsewhere). I am glad to see this issue getting attention, especially because discrimination hits too close to home. I am even happier that I am part of a lab that is particularly open to this conversation, discussing what we can do through equity, diversity, and inclusion initiatives. Fostering an environment in which everyone feels comfortable is, after all, vital. However, is it enough to ensure our workplace is a safe place for all, if not all can get here?

Academia is full of people studying or working in a country they were not born or raised in. Australia is similar, as around half of its citizens’ parents were born elsewhere, making it a place filled with distinct cultures. However, the diversity in academia and in Australia ends up being superficial: the more powerful a position is (socially and professionally), the more likely it is to be filled by people of a certain phenotype and background (e.g. cis heterosexual white men from rich countries). This is a pattern seen everywhere, including in the country where I’m from (Brazil). This, in my opinion, ends up being the true root of the problem. Is there anything we can do about it?

First, research centres, universities, and PIs hold substantial power as they are responsible for hiring and recruiting new professors, researchers, and postgraduate students. Instead of relying on “objective” metrics (e.g. number of papers, journal impact factor, university rankings) to award those that have “done more/better”, they could consider candidates’ diverse backgrounds (in a meaningful and positive way). The metrics currently used to assess performance are not only problematic on their own (do they really reflect advances in science?), but they are also related to one’s phenotype and background for multiple reasons. Being born in a non-English-speaking country already represents a major constraint to people’s productivity, as they first need to dominate the language before being able to read the literature and write their research manuscripts. Thus, creating new ways to assess academics’ journeys that minimises bias is essential to build a diversity community.

Junior scientists are not exempt from responsibility either. Biases can occur when networking or reviewing manuscripts, even if unconscious. For instance, Fox et al. (2023) performed randomised trials in which they compared single- and double-blind reviews (i.e. reviewers did or didn’t have access to author information, respectively) in the journal Functional Ecology, showing that authors from developed countries received better review scores in single-blind reviews than in double-blind reviews. In other words, authors from rich countries were favoured by reviewers when their information was disclosed. Fox et al. (2023) also showed that editors, who can always see the authors’ information, sent manuscripts from developed countries’ authors for review much more often than manuscripts from others. These results were not because of language differences, as authors from both English-speaking and non-English-speaking developed countries were equally favoured. This means that even individuals with less power can perpetuate — or break — cycles of biased decisions.

Promoting diversity is a difficult task: there is no magical solution for the bias we see in academia. However, it is imperative to improve the current situation. I do not believe it should be a contest about who deserves more pity, but I also do not think we should ignore the barriers imposed on minorities and the clear advantages given to already privileged groups. It might seem unfair to deviate from a “pure meritocratic system” to seek alternatives, but meritocracy is an illusion when society is so unequal. If all of us could have the same opportunities that a rich, white British man like Charles Darwin had, perhaps more of us could advance science as much as he did.

References:
Fox, C.W., Meyer, J. & Aimé, E. (2023). Double-blind peer review affects reviewer ratings and editor decisions at an ecology journal. Functional Ecology, 1–14.

Happy Women’s History Month

26/3/2023

by Samantha Burke

Melbourne march in solidarity with US overturning Roe v Wade PC: Matt Hrkac 2021

This Women’s History Month, a friend asked me to contribute an audio recording about being a woman for a podcast. Anastasia Shavrova runs the podcast Conversations with Chordates (aka Convos with Chordates). Chordates is one of many science podcasts that I recommend to people looking for a new podcast, but Chordates has a much more relaxed and chatty style and sometimes delves into topics that are more tangentially related to science.

I’m incredibly honoured to contribute to this podcast alongside many amazing women. Please check out the episode Convos with Women.

Women’s history is incredibly interesting and integral to the growth of countries and society as a whole. However, like many minorities, much of women’s history has been left out from general historical education. I-DEEL is committed to open and transparent science though transparency shouldn’t be limited to science. I wanted to use my audio recording to highlight some of these moments in history to bring them back into the narrative. Please check out a blog post on my website that goes into this history in more detail.

This Women’s History Month, please take a moment to think about the transparency of women’s influence on the development of the world. In particular, consider the records (or lack thereof) of women’s involvement in science both historically and currently. Many women were not recognized as they should have for their involvement in scientific developments. See this list of the untold stories of women in STEM and the Netflix documentary Picture a Scientist.

I-DEEL has recently updated our policies around Equity, Diversity, and Inclusion initiatives to better our support for each other within the lab, as it’s incredibly difficult to separate one’s identities from their work. We encourage others to do similarly. Acknowledging differences marks the first step to equality (such as Scripps Institute publishing an internal review on gender disparities in lab space).

Happy Women’s History Month, and thank you to all the women who’ve advanced science to what it is today.

The non-consumptive effects of academia

28/2/2023

By April Martinig

From the start of my academic journey, I have found myself at a crossroads. Research is supposed to be all consuming, and it is, if you let it be. My problem, or rather, the problem I am told by others to have, is that it is not my only passion.

I didn't quite grasp that this was an issue until a specific Wednesday during my PhD. Several hours into a lab meeting, I had to leave “early” to go to my varsity wrestling practice. I delayed going for as long as I could, but finally at 5 pm, already late, I had to go otherwise I would miss practice entirely. A lab-mate turned to me and very loudly said to the room how inappropriate it was that I was not prioritizing my graduate studies over all else. I still left the meeting, but their remark highlighted a sentiment[1] we don’t discourage often enough…

My lifestyle is not entirely different from pursuing a work-life balance; something many of us know, regardless of vocation, is already hard to maintain. Unlike what we might think of when hearing the latter, juggling two passions, fully and completely, means having very little time for life outside of them. But, without wrestling, I wouldn't be able to conduct research and without research, I wouldn't be the wrestler and coach I am today.

However, this dichotomy poses a fundamental issue as I pursue success in academia. As I continue to move up the academic ladder, the duality within me (i.e., being wildly passionate about something else in addition to research) does not seem to exist in “successful” academics. To be successful - to make it in the field - I find myself confronted with examples of researchers that work long hours and make sacrifices I deem too costly. This culture, which seems to continue to be glorified and perpetuated, is an example I won't buy into.

We shouldn't have to fight for space to be a balanced human. I can be a good academic and not make academia my whole life.

The crossroad is where I meet my passions, rather than follow them.

[1] As the old adage goes: one that chases two rabbits ends up with none.

Can ChatGPT do screening for a systematic review? Yes and more!!!

2/1/2023

by Shinichi Nakagawa

Before my Xmas break, I met ChatGPT (Generative Pre-trained Transformer). Since then, she has been my teacher, wise but admits her mistakes. Also, she is humorous (when I ask her to be) and very patient.

I decided to see whether ChatGPT can actually do the first stage of screening, i.e. title and abstract screening. After negotiating with her for a few hours, I cracked the code and passed her a carefully worded selection criteria based on PECOS: Population, Exposure, Comparator, Outcome and Study design. And there she was. ChatGPT was telling me whether I should exclude or include a particular study after evaluating a study’s title and abstract.

I used lists of studies and criteria related to this protocol:

Vendl C, Taylor MD, Braeunig J, Gibson MJ, Hesselson D, Neely GG, Lagisz M, Nakagawa S. Profiling research on PFAS in wildlife: Protocol of a systematic evidence map and bibliometric analysis. Ecological Solutions and Evidence. 2021 Oct;2(4):e12106.

What amazed me was that ChatGPT matched the study with our criteria and summarized reasons. Wow, this is better than I can do (see examples: one recommending inclusion and the other recommending exclusion = both are spot on!)

I tested for around 15+ abstracts and ChatGPT was able to reproduce our decisions. So, I stopped there and then started to test whether she can extract some data from the text. This turned out to be more difficult as ChatGPT does not seem to take more than ~2,000 words as an input (although she claims there are no limits). Anyway, as long as I do not give her too much text, ChatGPT seems to be able to extract what animals were studied, PFAS chemicals and locations mentioned, in a format below:

That is all astounding. But some questions remain. How reproducible is this data? Can we make this process much more systematic?

I am hoping to work with a computer scientist and see whether some of these processes can be automated for multiple articles. We are entering an exciting but uncertain time. One thing I can say is that I will be trying to incorporate ChatGPT into some parts of my systematic review workflow from now on, not as a replacement for a human screener but as an addition for now.

DuPont loses challenge over cancer victim's $40 mln verdict in PFAS case

19/12/2022

by Lorenzo Ricolfi

Image by 3D Animation Production Company from Pixabay, modified by adding pfas picture from https://www.setac.org/page/PFAS

On the 5th of December, a federal appeals court of the United States released the verdict on the legal litigation between the American multinational chemical company DuPont and a cancer survivor.

The official verdict document is available here and the full article by Clark Mindock (Reuters) is available here.

Per- and polyfluoroalkyl substances (PFAS) are a group of manufactured chemicals used in various industrial and consumer products. They are also known as "forever chemicals" because they do not break down easily in the environment or the human body. Some PFAS chemicals, including perfluorooctanoic acid (PFOA) and perfluorooctanesulfonic acid (PFOS), have been linked to specific health effects, including cancer, immune system effects, and developmental effects in infants and children.

In this case, the plaintiff, Travis Abbott, claimed that prolonged exposure to PFOA in his drinking water caused him to develop testicular cancer twice. A jury awarded him $40 million in damages after finding that PFOA was likely the cause of his illness. DuPont, the chemical manufacturer, had argued that Abbott's level of exposure was unlikely to have caused his cancer and had challenged the verdict, claiming it had been unfairly kept from raising defences based on the specifics of Abbott's alleged exposure.

The 6th Circuit Court of Appeals has upheld the jury's verdict, stating that DuPont could not challenge the decision, which relied on a finding in related cases that PFOA was linked to the man's cancer. This case is one of the thousands consolidated in multidistrict litigation (MDL) in Ohio, which claims that DuPont poisoned drinking water by discharging PFOA into waterways from its plant in West Virginia.

Picture from https://btlaw.com/insights/blogs/fast-facts-what-is-pfas

Legal disputes over the adverse health effects of environmental contaminants can be tricky for several reasons. One of the main challenges is the difficulty in proving a causal link between exposure to a particular contaminant and developing a specific health condition. This is because many factors can affect an individual's health, and isolating the effects of a specific environmental pollutant can be challenging. Additionally, the impact of environmental contamination may not manifest for many years, making it difficult to determine the exact cause of a specific health condition. In many cases, the burden of proof is on the plaintiffs to demonstrate that their health problems were caused by exposure to a particular contaminant.

Photo: M.Lagisz

Scientific research plays an essential role in all of this. By studying the chemical and biological properties of specific contaminants, researchers can better understand how these substances interact with living organisms and their surroundings. This information can be used to develop strategies for addressing and mitigating the effects of environmental contamination, such as identifying the source of contamination, developing methods for cleaning up contaminated sites, and implementing policies to prevent future contamination. Additionally, research can help identify the specific health effects of different contaminants and provide guidance on protecting individuals from exposure.

By providing a solid evidentiary base, scientific research can help establish the link between environmental pollution and adverse health effects, which can be crucial in determining the responsibility of polluters and the appropriate remedies to be taken.

Split reference list helper for pilot and collaborative screening rounds

30/11/2022

by Coralie Williams

When screening for a systematic review or meta-analysis, we conduct several pilot screening rounds. Pilot screenings help us refine our search string, decision tree, and increase the overall accuracy of our screening for literature reviews [check out this nice guide from the I-DEEL team for more info: Foo et al, 2021].

During a pilot screening, we want to select a random subset of references that would be a representative sample of the full set. When possible, screening rounds are conducted in collaboration with another reviewer. To speed up the screening process, we sometimes want to randomly allocate a subset of papers to a collaborator by splitting a reference list into subsets.

There are two reasons we’d want to automate the selection and splitting of a reference list:

It is time consuming to randomly select papers (>100 papers is tedious to select by hand!)
We are not really good at selecting things at random (actually computers aren’t really good at selecting truly at random either*)

Below is the R (www.r-project.org) code to run two functions that may come in useful when conducting your pilot and collaborative screenings with Rayyan (https://rayyan.ai/), or any other software where you can upload your pilot reference list.

1. Select random pilot set:

First, load the getpilotref function below in your environment:

# -----------------------------------
# getpilotref function 
# -----------------------------------
## Description: 
#     Function to obtain a random subset of references for pilot screening.
#
# Arguments
# - x: data frame with reference list
# - n: number of papers for pilot subset (default is 10)
# - write: logical argument whether to save the pilot list as a csv file 
#   in the current working directory (default is FALSE).
# - fileName: name of file (default is "pilot")

getpilotref <- function(x, n=10, write=FALSE, fileName="pilot"){
  
  if (length(n) == 1L && n%%1==0 && n>0 && n<=nrow(x)) { 
    
    # sample randomly the vector n of row indexes and remove id column in the final dataset
    x$ids <- 1:nrow(x)
    pdat <- x[which(x$ids %in% sample(x$ids, n)),]
    pilot <- pdat[,-which(colnames(pdat)=="ids")]
    
    } else {
      # error message n value provided is not valid 
      stop("Incompatible value n supplied, please check. 
      #n must be a positive integer no higher than the total number of references provided.") 
    }
  
  if (write==T){
    
    # save generated pilot list in working directory using the name provided
    write_csv(pilot, paste(fileName, ".csv", sep=""), na="")
    
    # print out summary of saved file name
    cat(paste("Pilot random sample set of ", n, " articles is saved as: ", fileName, ".csv", sep=""))
    
  }
  
  return(pilot)
}

Let’s try it out
Load example csv file that was exported from Rayyan (a reference list of papers in Ecology & Evolutionary Biology having the word “butterflies” in their title):


# Read example butterfly reference list
articles<-read.csv("https://raw.githubusercontent.com/coraliewilliams/2022/main/data/articles_butterfly.csv")

First, let’s obtain a random set of 10 papers without saving it as a csv file:

p10 <- getpilotref(articles)

Now, let’s obtain a subset of 100 papers for a pilot screening and save the subset as a csv file called pilot100.csv. Make sure you have the readr package installed and loaded in your environment.

library(readr)
p100 <- getpilotref(articles, n=100, write=T, fileName="pilot100")

## Pilot random sample set of 100 articles is saved as: pilot100.csv

This will save a csv file pilot100.csv in your working directory. If you are unsure where is your working directory run this command getwd() in your console.

2. Split reference list with another collaborator

Load the splitref_prop function in your environment:

# -----------------------------------
# splitref_prop function 
# -----------------------------------
## Description: 
#     Function to split in two a reference list based on input proportions.
#
## Arguments: 
# - x: data frame with reference list
# - p: vector of two numerical proportions for each split, it must have two positive numerical values that sum to 1.
# - write: logical argument whether to save the pilot list as csv in current working directory.
# - fileName: name to give to the suffix of the two split csv files.

splitref_prop <- function(x, p=c(0.5, 0.5), write=F, sname="split") {
  
    if (length(p) == 2L && is.numeric(p) && sum(p) == 1 && all(p > 0)) {
      
      # randomly allocated a numerical id to each reference
      rids <- sample(1:nrow(x))
      
      # get index of row to split on using the proportion values provided
      spl <- floor(p[-length(p)] * nrow(x))
      
      # get indices of two data frames based on split ids
      indx1 <- rids[1:spl]
      indx2 <- rids[(spl + 1):nrow(x)]
      
      # save split subsets in two separate datasets
      split1 <<- x[indx1,]
      split2 <<- x[indx2,]
      
      # print out summary message
      cat(paste(c("Reference list was randomly split into",length(p), "proportions of", p[1]*100, "% and", p[2]*100, "%")))
      
      if (write == T) {
        # save files
        write_csv(split1, paste(sname, "_set1", ".csv", sep = ""), na ="")
        write_csv(split2, paste(sname, "_set2", ".csv", sep = ""), na ="")
        }
      
      } else {
      # error message if provided n value is not valid
      stop("Incompatible values for p (proportions) supplied, please check.
           Proportion values must be positive integers less than 1, and the total sum of all proportions should equal to 1.")
        
    }
}

Let’s try it out
Using the example butterfly reference list, let’s first split the reference list in two equal splits (50% each):

splitref_prop(articles)

## Reference list was randomly split into 2 proportions of 50 % and 50 %

This will give you two separate data frames to share between two reviewers: split1 and split2.

Now let’s get 30% of references in the first subset (split1) and 70% in the second subset (split2), for example if one reviewer has more time to spend on the screening:

splitref_prop(articles, p=c(0.3,0.7))

## Reference list was randomly split into 2 proportions of 30 % and 70 %

Let’s save the 30% and 70% split list of references as csv files with the suffix “testsplit”:

splitref_prop(articles, p=c(0.3,0.7), write=T, sname="testsplit")

## Reference list was randomly split into 2 proportions of 30 % and 70 %

This will save two csv files, testsplit_set1.csv and testsplit_set2.csv, in your working directory.

*computers aren’t really good at selecting truly at random…Random number generators from most computer programs are actually “pseudo-random”, meaning they are produced from a deterministic mathematical model or algorithm. The R code above uses a pseudo-random number generator. Pseudo-random number generators are usually good enough for their intended purpose (basically better than what any human could do). A good pseudo-random number generator will reproduce statistics that are consistent with true randomness, but they are not truly random. A truly random number can be generated based on a constantly changing physical process that can’t be modelled as an algorithm. If you’re curious about true randomness check out these websites: https://www.random.org/; https://qrng.anu.edu.au/random-colours/.

(Any comments, questions or feedback, you can reach me at: coralie.williams@unsw.edu.au)

Say goodbye to fixed- and random-effects meta-analyses

27/10/2022

By Yefeng Yang

As I have been doing more surveys on meta-analytic practices in many disciplines and re-analysing more published meta-analysis (MA) papers, there is one “recommendation” that is growing stronger and stronger in my brain. That is, we should say goodbye to traditional fixed- and random-effects MAs and conduct our MAs using advanced methods like multilevel and multivariate models because meta-analytic datasets are often multilevel and multivariate in nature. Doing so can make sure you properly handle statistical issues like dependency, and heteroscedasticity, resulting in more robust parameter estimations and inferences. My main argument is that in the “worst-case” scenario, where your dataset does not have a complex structure thereof, these advanced models will automatically reduce into a normal fixed- and random-effects models, all with similar (or identical) results to those expected. More importantly, applying advanced methods can help you decompose variances (Figure 1) and separate correlations of true effects from observed effects (Figure 2), delivering new biological insights. I can see the between-study heterogeneity and correlation are overestimated in many published meta-analyses using fixed-and random-effects models.

Figure 1. Imaginary example of hierarchical data structure.

Although these advanced methods are good, there are (at least) three remarks worth noting here. First, all your models should be built strictly based on predefined questions (e.g., a priori hypotheses). Second, before applying these models, you need to correctly understand the statistical theory behind them. Otherwise, you very likely disseminate misleading information if you published results from them. Third (but not the last), do not use complex models to fit a small-sample-size dataset. This is especially true for multivariate models, which are often heavily parameterized (even overparameterized). So, always do (at least some basic) model checking (e.g., likelihood profile, convergence) to ensure stability of your model fitting.

Figure 2. Joint probability distribution (bivariate normal joint density). Photo source: Multivariate normal distribution. (2022, October 16). In Wikipedia. https://en.wikipedia.org/wiki/Multivariate_normal_distribution

As I have been knowing more about statistics, I realised that many methods are just a special form of a more general framework. For example, (two-sample) Student t-test is a special form of ANOVA, which is a special form of linear regression, which is a special form of generalized linear model or linear mixed model, which is a special form of generalized linear mixed model, which is a special form of the generalized additive mixed model. In the same vein, fixed-effect MA is a special form of random-effects MA, which is a special form of a multilevel or multivariate model. I can imagine that one might disagree with “say goodbye to fixed- and random-effects meta-analyses”. For example, fixed-effects MA can still provide valid inferences if limiting your results to the included studies (e.g., conditional inference). I acknowledge this is true as long as you are not goanna generalize results beyond the included studies. I know asking people to resort to complex methods is difficult because people like easily-understandable tools - just think about P-value. I am always open and happy to see different ideas. Lastly, all the above claims only represent my personal intuition and opinion (I might extend it into a paper in future). They might be wrong and do not necessarily speak for my lab’s attitudes toward meta-analyses.

Attending an overseas conference – Ecological Society of America 2022

28/9/2022

by Samantha Burke

After over two years of lockdown, I had the opportunity to leave Australia to attend the Ecological Society of America’s (ESA) joint conference with the Canadian Society for Ecology and Evolution (CSEE). This conference marked my first time presenting an oral talk outside of UNSW. While it was exciting to share my research with others, I found learning about others’ research and networking with new people to be an equally exciting experience.

As my projects consist of systematic-like research, I was thrilled to see ESA created an entire session dedicated to meta-analysis in ecology. Ecologists are relatively new to conducting meta-analysis of their data, so this session was well-attended and directed conversation towards improving meta-science while it’s still in its early stages in ecology. These talks were all excellent and highlighted the upcoming importance and challenges of conducting systematic-like research in ecology and evolutionary biology.

In addition to meeting new people, I was able to connect with researchers I already knew. While in Montreal, I was able to meet I-DEEL’s newest post-doc, April Martinig, in person. April has been working remotely for the past few months, so it was great to attend her presentation on her previous work examining predator-prey interactions in culvert animal passages. As a Canadian citizen, she knew of the best places to go in Montreal, and we chatted over a delicious vegan lunch. We should all look forward to the research she’ll conduct with I-DEEL.

I also had the opportunity to meet members of the Society for Open, Reliable, and Transparent Ecology and Evolutionary biology (SORTEE), of which I’m a member. Even though I went to Canada intending to attend the ESA conference, SORTEE members attending the conference gathered for a mini meetup in Montreal. The society was able to reach out to more ecologists at the conference, and many people came to the meetup to hear firsthand what SORTEE is all about. If you’re interested, please check out a previous blog post by Rose O’Dea and the SORTEE website.

Attending a conference was such a privilege, especially one as diverse as ESA’s 2022 Conference. I look forward to continuing to share my work and learn from others.

SORTEE meetup at the ESA conference. Photo Credit: Dominique Roche

I-deel at ESEB2022 congress

24/8/2022

By Losia Lagisz
13 - 19 August 2022 has been a very busy and fun week – a week at ESEB (European Society for Evolutionary Biology) Congress in Prague, Czech Republic.

This congress was very special to us for five reasons:

We had four I-deel members attending (Shinichi, Losia, Szymek and Patrice) and one associated member (Totoro). Unfortunately, somehow, we do not have a photo with all of us together!
For Shinichi and Losia it was first in-person conference in three years, also first overseas travel since the start of the Covid pandemics. For Totoro it was his first conference ever (and he did very well with his poster presentation).
There were hundreds of great presentations and posters – physically impossible to see them all. The diversity of topics and ideas was exciting and inspiring, as usual at ESEB.
We got to catch up with many of our good old friends and collaborators. We also met many interesting new people.
We organised a SORTEE in-person meet-up, with over 20 people attending from around the world. Some new members potentially will be joining SORTEE and their forces for credibility revolution in ecology and evolution!

Big thanks to the organisers of ESEB2022 and we hope to be able to attend the next one – ESEB2025 to be held in Barcelona, Spain!