I-DEEL: Inter-Disciplinary Ecology and Evolution Lab
  • Home
  • People
  • Research
  • Publications
  • Blog
  • Open Science
    • Registrations
    • Registered Reports
    • Published Protocols
    • Preprints
    • EDI
    • Other
  • Opportunities
  • Links

Evaluating AI-assisted data extraction in ecological evidence synthesis: performance of NotebookLM in systematic mapping and meta-analysis

15/3/2026

0 Comments

 
by Aneta Arct

Systematic reviews and meta-analyses increasingly shape research directions, conservation strategies, and policy decisions in ecology and evolution. Yet before any statistical model is fitted, before effect sizes are estimated and heterogeneity quantified, all evidence must first be extracted from primary studies. This stage is often treated as technical or clerical, but in reality, it is foundational. It is where unstructured text, tables, and supplementary materials are transformed into structured data that directly determine downstream inference.
​

With the rapid development of large language models (LLMs), tools such as NotebookLM promise to substantially accelerate this process. However, despite growing enthusiasm for AI-assisted evidence synthesis, empirical evaluations of their quantitative accuracy in ecological data extraction remain rare. A key advantage of NotebookLM is that it operates exclusively on user-provided source documents. Unlike open-ended chat models, it does not draw on general web knowledge but instead grounds its responses directly in the uploaded PDFs, providing citation-linked excerpts from those documents. This source-constrained design substantially reduces the risk of uncontrolled “hallucinations” and makes verification more transparent and auditable.

For the purposes of our most recent meta-analysis, we evaluated the freely available version of NotebookLM (Gemini 1.5) as a first-pass extractor in a challenging ecological domain: avian extra-pair paternity (EPP). We analyzed 189 peer-reviewed studies published between 2017 and 2025, each reporting genetic parentage data. These papers represent exactly the kind of material that makes extraction difficult: heterogeneous terminology, complex tables, and key values distributed across main text and supplementary materials.

To maintain contextual focus and reduce cross-document mixing, studies were uploaded and processed in controlled batches of 10–15 PDFs per session. This approach allowed the model to work within a limited and well-defined source set while still benefiting from contextual understanding across related studies. NotebookLM was prompted to extract 11 standardised study-level variables spanning both descriptive variables (such as bird species identity, molecular marker type used to determine parentage, nest-box use, start and end years of the study, reported start and end months of the breeding season, whether EPP data were provided at the population–year level) and quantitative outcomes (including offspring and/or brood sample size, proportion of extra-pair young (EPY), and brood-level extra-pair paternity (EPbr)).  In total, 2,079 individual values were collected.

Every extracted value was manually audited against the original PDF by our team (Aneta Arct, Agnieszka Gudowska, Monika Gronowska, Karolina Skorb). In total, all 2,079 values were audited. Overall extraction accuracy reached 87.8%, meaning that 12.2% of values required manual correction. However, performance differed markedly depending on the type of data being extracted (Fig. 1).

Picture
Figure 1. Proportion of LLM-extracted values requiring manual correction across 189 avian parentage studies included in the meta-analysis. Colors indicate variable types: categorical descriptors, numeric descriptors, and outcome variables.

NotebookLM performed exceptionally well for categorical study descriptors. Species identity, marker type, and nest-box breeding status showed error rates below 3%, indicating that AI-assisted extraction is highly reliable for descriptive categorical information. Numeric study descriptors, such as geographic coordinates and study period, showed moderate correction rates (approximately 8–13%), suggesting that LLMs can generally extract structured numeric study information with reasonable accuracy. In contrast, quantitative outcome variables showed substantially higher error rates. Estimates of EPY proportion, brood-level EPP, and associated sample sizes (numbers of offspring or broods sampled) required correction in roughly 20–30% of cases. Statistical modelling confirmed that these outcome variables were nearly seven times more likely to require correction than categorical descriptors.
​

We also observed heterogeneity in LLM-data extraction performance across studies. Nearly half of the studies were extracted without any corrections at all (Fig. 2). Some papers were straightforward and consistently well handled by the model; others required substantial manual revision. This variability likely reflects differences in how ecological data are reported across studies and highlights the importance of systematic manual verification of extracted data.​
Picture
Figure 2. Distribution of manual corrections following LLM-based data extraction across 189 studies included in the most recent meta-analysis of avian EPP.

​Taken together, our results indicate that AI-assisted extraction is already a powerful support tool for ecological evidence synthesis. In particular, its high reliability for categorical descriptors suggests that such tools are especially well suited for systematic maps, where the primary goal is to catalogue species, study characteristics, methodological approaches, and geographic coverage rather than to compute effect sizes. In contrast, quantitative meta-analysis places much heavier inferential weight on numeric outcomes, such as proportions, sample sizes, and variance components. Our findings demonstrate that numerical fields remain vulnerable to extraction error and therefore require systematic and comprehensive human validation.
​


Accordingly, LLM-based workflows may currently offer the greatest efficiency gains at the stage of systematic mapping and metadata compilation, whereas fully automated quantitative synthesis remains premature without expert auditing. Rather than replacing human expertise, LLMs are best viewed as amplifiers of it: they can substantially accelerate structured data compilation while preserving inferential integrity when paired with transparent expert-led verification.
​

0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    Author

    Posts are written by our group members and guests.

    Archives

    February 2026
    January 2026
    December 2025
    November 2025
    October 2025
    September 2025
    August 2025
    July 2025
    June 2025
    May 2025
    April 2025
    March 2025
    January 2025
    December 2024
    November 2024
    October 2024
    September 2024
    August 2024
    July 2024
    June 2024
    May 2024
    April 2024
    March 2024
    February 2024
    January 2024
    December 2023
    November 2023
    October 2023
    September 2023
    August 2023
    July 2023
    June 2023
    May 2023
    April 2023
    March 2023
    February 2023
    January 2023
    December 2022
    November 2022
    October 2022
    September 2022
    August 2022
    July 2022
    June 2022
    May 2022
    April 2022
    March 2022
    February 2022
    January 2022
    December 2021
    November 2021
    October 2021
    September 2021
    August 2021
    July 2021
    June 2021
    April 2021
    March 2021
    February 2021
    January 2021
    December 2020
    November 2020
    August 2020
    July 2020
    June 2020
    April 2020
    December 2019
    November 2019
    October 2019
    September 2019
    June 2019
    April 2019
    March 2019
    February 2019
    January 2019
    December 2018
    November 2018
    October 2018
    September 2018
    August 2018
    July 2018
    June 2018
    May 2018
    March 2018
    January 2018
    December 2017
    October 2017
    September 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    January 2017
    October 2016
    August 2016
    July 2016
    June 2016
    May 2016
    March 2016

    Categories

    All

    RSS Feed

HOME
PEOPLE
RESEARCH
PUBLICATIONS
OPEN SCIENCE
OPPORTUNITIES
LINKS
BLOG

Created by Losia Lagisz, last modified on June 24, 2015