Learn Before
Concept

Kernel Based Embeddings

Kernel embeddings for learning machines have proven themselves to achieve great performance through strong representational power. To leverage this performance, Lopez-Paz et al. introduced kernel-based embedding for feature construction in pairwise causal discovery.

Starting from the dataset of empirical distributions S={(xij,yij)j=1ni}i=1nS = \{ (x_{ij},y_{ij})_{j=1}^{n_i} \}_{i=1}^n , a kernel mean embedding allows to project all those empirical distributions into the same Reproducing Kernel Hilbert Space (RKHS) Hk\mathcal{H}_k. To obtain a homogeneous and low dimension embedding, Lopez-Paz et al. uses random cosine based embeddings that approximate empirical kernel mean embeddings in low dimension: μk,m(PSj)=2CkSxij,yijSj(cos(wjxxij+wjyyij+bj))j=1mRm\mu_{k,m}(P_{S_j}) \\= \frac{2C_k}{|S|} \sum_{x_{ij},y_{ij} \in S_j} (\cos (w_j^x \ast x_{ij} + w_j^y \ast y_{ij} + b_j))_{j=1}^m \in \mathbb{R}^m where {wj,bj}j=1m\{w_j,b_j\}_{j=1}^m are the kernel parameters sampled i.i.d. in N0,2×[0,2π]\mathbb{N}_{0,2} \times [0, 2\pi], as well as their number mm defining the number of dimensions of the output space, PSP_S is the empirical distribution, and Ck=Lpk(w)dwC_k = \int_{\mathcal{L}} p_k(w)dw, with pk:RdRp_k : \mathbb{R}^d \rightarrow \mathbb{R} the positive and integrable Fourier transform of the chosen kernel kk , equal to 1 in this case.

0

1

Updated 2020-07-28

Tags

Data Science