Concept

Example 2: Ecology

A study is conducted to understand better an ecosystem in a forest area and preserve biodiversity. The question is to determine which factor influences which other factor.

The dataset consists of nn jointly recorded values of many variables (Xk,Yk),k=1,N( X_k , Y_k ), k = 1, ⋯ N , in different locations, such as soil, humidity, lighting, presence of certain plants or animals, etc., resulting in samples Sk={(xk1,yk1),,(xkn,ykn}S_k = \{( x_{k 1} , y_{k 1} ), ⋯ , ( x_{kn} , y_{kn} \} .

Determining which factor influences which other factor is a complicated process, however, prior knowledge of physics and biology may allow us to label some pairs with ground truth G=gkG = g_k (for example the aspect of a slope can influence hill shade and not vice-versa). The labeled dataset thus obtained {(S1,g1),,(Sk,gk),},k=1N\{( S_1 , g_1 ), ⋯ , ( S_k , g_k ), ⋯\}, k = 1⋯ N , is an empirical sample of the “mother distribution”.

It is hoped that we can label automatically more pairs with a classifier trained on such data, if the mechanisms of the other pairs bear some similarity.

0

1

Updated 2020-07-14

Tags

Data Science

Related