Abstract
Introduction
Concentrations of elements found in plant tissues (also called the ionome by \citet{Salt_2008}) has been linked to crop yield for nearly two centuries. Such concept emerged from the law of the minimum and the law of the optimum \cite{de_Wit_1992}. In the 70s, E. R. Beaufils found that using nutrients alone was somewhat unstable \cite{Walworth_1987}, so he developed DRIS (diagnosis and recommendation integrated system) with an arguably naive mathematical framework, which was difficult to compute at that time. Using tools of compositional data analysis \cite{Aitchison_1986}, \citet{dafir1992} used mathematically sound centered log-ratios and developed the CND, for compositional nutrition diagnosis.
Since about 2011, I have published several papers on data analysis in plant ionomics (listed on my
ORCID profile). It began with the application of nutrient balances in leaves of several fruit crop species
\cite{Parent_2013}. The approach was to free nutrient concentrations from their 0 to 100% constrained space using a transformation technique named the isometric log-ratio
\cite{Egozcue_2003}. Indeed, concentrations are compositional data, and not transforming this kind of data will likely return biased results
\cite{Aitchison_1986}. Recent works in biology confirmed such problems
\cite{Silverman_2017,Mandal_2015,Friedman_2012,Morton_2017,Rivera_Pinto_2018,Jeanne_2019}.
Even when compositions are preprocessed with log-ratios, how a healthy state is defined has a fundamental importance on health diagnosis and on the correction measures to be prescribed. And when the healthy state is defined, we must define its borders in the ionomic space. This task can be approached with machine learning techniques, which can detect complex patterns better than any other statistical techniques I explored (e.g. in \citealp{Parent_2013a}). What's more: machine learning algorithms are nowadays quite easy to use.
Predicting the healthy state of a plant would be useless without an accurate recommendation system. The correction measures needed to recover health can be summarized by a translation from an imbalanced nutrient status to a balanced one. A translation from a compositional vector to another is called a perturbation. However, interpreting a perturbation is still a difficult cognitive task.
This article describes the logic behind the approach I recommend to diagnose plant ionomes. I show why we should not use confidence intervals and linear statistics, and how machine learning and perturbation of well documented Humboldtian agroecosystems can successfully address nutrient imbalance. I apply the proposed approach to a data set comprising macro- and oligo-elements measured in blueberry leaves from Québec, Canada.