Concept

Example 1: Genomics

Pharmaceutical companies are interested in determining the influence of genes on each other. A dataset may consist of jointly recorded gene activities for NN pairs of genes (Xk,Yk),k=1,N( X_k , Y_k ), k = 1, ⋯ N and for nn patients Sk={(xk1,yk1),,(xkn,ykn}S_k = \{( x_{k1} , y_{k 1 }), ⋯ , ( x_{ kn} , y_{kn} \} .

Determining which gene influences which other gene is a costly experimental process, so only a subset of pairs of genes are generally labeled with ground truth of causal relationship G=gkG = g_k .

The labeled dataset {(S1,g1),,(Sk,gk),},k=1N\{( S_1 , g_1 ), ⋯ , ( S_k , g_k ), ⋯\}, k = 1⋯ N , is an empirical sample of the “mother distribution” over pairs of genes and their causal labels, which may be used as training data to obtain a classifier, such that predictions of unknown causal labels can be made on new pairs of genes.

0

1

Updated 2020-07-14

Tags

Data Science

Related