Estimating Derivatives for Gaussian Processes

Building the basics for solving differential equations.

In a previous post I wrote about basic Gaussian Processes. Sometimes it is of interest to also estimate the gradient and maybe have a confidence interval for that as well. For a given covariance function $$k(x_i,x_j)$$ and an estimated function $$f(x)$$ the following relations should hold:

$$\begin{array}{rcl} \text{cov} \left( f_i, \frac{\partial f_j}{\partial x_j} \right) & = & \frac{\partial k(x_i, x_j)}{\partial x_j} \ \text{cov} \left( \frac{\partial f_i}{\partial x_i}, f_j \right) & = & \frac{\partial k(x_i, x_j)}{\partial x_i} \ \text{cov} \left( \frac{\partial f_i}{\partial x_i}, \frac{\partial f_j}{\partial x_j} \right) & = & \frac{\partial^2 k(x_i, x_j)}{\partial x_i \partial x_j} \end{array}$$

The joint distribution for two variables with normal distribution where $$f$$ represents the measured values and $$f_*$$ contains the values we want to estimate:

$$\left( \begin{array}{c} f \ f_* \end{array} \right) \sim N \left( \left( \begin{array}{c} 0 \ 0 \end{array} \right), \left( \begin{array}{cc} K(X,X) + \sigma^2_n I & K(X,X_) \ K(X_,X) & K(X_,X_) \end{array} \right) \right)$$

This means that we have:

$$\left( \begin{array}{c} f \ \frac{\partial f_}{\partial x_} \end{array} \right) \sim N \left( \left( \begin{array}{c} 0 \ \vdots \ 0 \ \hline 0 \ \vdots \ 0 \end{array} \right), \left( \begin{array}{cccc|cccc} k(x_1,x_1) + \sigma^2_n & k(x_1,x_2) & \cdots & k(x_1,x_n) & \frac{\partial k(x_1,x^_1)}{\partial x^_1} & \frac{\partial k(x_1,x^_2)}{\partial x^_2} & \cdots & \frac{\partial k(x_1,x^_n)}{\partial x^_n} \ \vdots & \vdots & & \vdots & \vdots & \vdots & & \vdots \ k(x_n,x_1) & k(x_n,x_2) & \cdots & k(x_n,x_n) + \sigma^2_n & \frac{\partial k(x_n,x^_1)}{\partial x^_1} & \frac{\partial k(x_n,x^_2)}{\partial x^_2} & \cdots & \frac{\partial k(x_n,x^_n)}{\partial x^_n} \ \hline \frac{\partial k(x^_1,x_1)}{\partial x^_1} & \frac{\partial k(x^_1,x_2)}{\partial x^_1} & \cdots & \frac{\partial k(x^_1,x_n)}{\partial x^_1} & \frac{\partial^2 k(x^_1,x^_1)}{\partial {x^_1}^2} & \frac{\partial^2 k(x^_1,x^_2)}{\partial x^_1 \partial x^_2} & \cdots & \frac{\partial^2 k(x^_1,x^_n)}{\partial x^_1 \partial x^_n} \ \vdots & \vdots & & \vdots & \vdots & \vdots & & \vdots \ \frac{\partial k(x^_n,x_1)}{\partial x^_n} & \frac{\partial k(x^_n,x_2)}{\partial x^_n} & \cdots & \frac{\partial k(x^_n,x_n)}{\partial x^_n} & \frac{\partial^2 k(x^_n,x^_1)}{\partial x^_n \partial x^_1} & \frac{\partial^2 k(x^_n,x^_2)}{\partial x^_n \partial x^_2} & \cdots & \frac{\partial^2 k(x^_n,x^_n)}{\partial {x^_n}^2} \end{array} \right) \right) $$

These expressions are used in the same way when calculating the conditional outcome:

$$\left. \frac{\partial f_}{\partial X_} \right| X_, f, X \sim N \left( \underbrace{ \frac{\partial K(X_,X)}{\partial X_} \left( K(X,X) + \sigma^2_n I \right)^{-1}}{\text{Expected derivative}}, \underbrace{ \frac{\partial^2 K(X,X_)}{\partial X^2_} - \frac{\partial K(X_,X)}{\partial X_} \left( K(X,X) + \sigma^2_n I \right)^{-1} \frac{\partial K(X,X_)}{\partial X_} }_{\text{Variance of derivative}} \right)$$

So, to make this work we need to calculate the explicit expressions for the covariance function. Using a quadratic exponential covariance function:

$$k(x_i,x_j) = \exp \left( -\frac{1}{2} \frac{||x_i-x_j||^2_2}{\ell^2} \right) $$

The derivatives with respect to $$x_i$$ and $$x_j$$ are:

$$\frac{\partial k(x_i,x_j)}{\partial x_i} = - \frac{x_i - x_j}{\ell^2} \left( -\frac{1}{2} \frac{||x_i-x_j||^2_2}{\ell^2} \right) $$

$$ \frac{\partial k(x_i,x_j)}{\partial x_j} = \frac{x_i - x_j}{\ell^2} \left( -\frac{1}{2} \frac{||x_i-x_j||^2_2}{\ell^2} \right) $$

and the second order derivatives

$$ \frac{\partial^2 k(x_i, x_j)}{\partial x^2_i} = \frac{1}{\ell^2} \left( \frac{(x_i-x_j)^2}{\ell^2} - 1 \right) \left( -\frac{1}{2} \frac{||x_i-x_j||^2_2}{\ell^2} \right) $$

$$ \frac{\partial^2 k(x_i, x_j)}{\partial x^2_j} = \frac{1}{\ell^2} \left( \frac{(x_i-x_j)^2}{\ell^2} - 1 \right) \left( -\frac{1}{2} \frac{||x_i-x_j||^2_2}{\ell^2} \right) $$

$$ \frac{\partial^2 k(x_i, x_j)}{\partial x_i \partial x_j} = \frac{1}{\ell^2} \left( 1 - \frac{(x_i-x_j)^2}{\ell^2} \right) \left( -\frac{1}{2} \frac{||x_i-x_j||^2_2}{\ell^2} \right) $$

Combining all of this gives a result shown in the following figure:

Derivatives.

The blue line is the expected function value, the dashed blue line is the 95%-confidence interval. The thick red lines are expected derivatives and the dashed red lines are 95%-confidence intervals for the derivatives. A strange result is that the derivative is very uncertain close to an observation while it is more certain in-between. The uncertainty of the derivatives is highly dependent on the length scale (a longer length scale reduces the uncertainty) and many points in a tight interval also helps.

Python code for creating the plot is given below:

#!/usr/bin/env python2.7
import numpy as np
import matplotlib.pyplot as plt

def covfun(x1, x2, ell):
    k = np.exp(-0.5 * ((x1 - x2) / ell) ** 2.0)
    return k

def dx1_covfun(x1, x2, ell):
    ell2 = ell ** 2.0
    k = (x2 - x1) / ell2 * covfun(x1, x2, ell)
    return k

def dx2_covfun(x1, x2, ell):
    ell2 = ell ** 2.0
    k = (x1 - x2) / ell2 * covfun(x1, x2, ell)
    return k

def ddx1x2_covfun(x1, x2, ell):
    ell2 = ell ** 2.0
    ell4 = ell ** 4.0
    k = ((ell2 - (x1 - x2) ** 2.0) / ell4 * covfun(x1, x2, ell))
    return k

# Characteristic length.
l = np.sqrt(0.025)

# Noise variance.
s = 0.01

# Measured points.
x = np.array([0.1, 0.15, 0.18, 0.2, 0.4, 0.6, 0.8, 0.9])
y = np.array([0.2, 0.3, 0.4, 0.5, 0.7, 0.4, 0.3, 0.2])

# Grid to predict values for.
xg = np.linspace(0.0, 1.0, 61)

# Initialization of covariance matrix.
K11 = np.zeros((len(x), len(x)))
K12 = np.zeros((len(x), len(xg)))
K21 = np.zeros((len(xg), len(x)))
K22 = np.zeros((len(xg), len(xg)))

D11 = K11
D12 = np.zeros((len(x), len(xg)))
D21 = np.zeros((len(xg), len(x)))
D22 = np.zeros((len(xg), len(xg)))

# Fill covariance matrix with values.
for row in xrange(len(x)):
    for col in xrange(len(x)):
        K11[row, col] = covfun(x[row], x[col], l)

# Do the same for the parts where the grid is involved.
for row in xrange(len(x)):
    for col in xrange(len(xg)):
        K12[row, col] = covfun(x[row], xg[col], l)
        K21[col, row] = covfun(xg[col], x[row], l)
        D12[row, col] = dx2_covfun(x[row], xg[col], l)
        D21[col, row] = dx1_covfun(xg[col], x[row], l)

# And the grid itself.
for row in xrange(len(xg)):
    for col in xrange(len(xg)):
        K22[row, col] = covfun(xg[row], xg[col], l)
        D22[row, col] = ddx1x2_covfun(xg[row], xg[col], l)

# Expected values for grid.
Kinv = np.linalg.pinv(K11 + s * np.eye(len(x)))
yg = np.dot(np.dot(K21, Kinv), y)
covyg = np.diag(K22 - np.dot(np.dot(K21, Kinv), K12))

dyg = np.dot(np.dot(D21, Kinv), y)
dcovyg = (D22 - np.dot(np.dot(D21, Kinv), D12))
dcovyg = np.diag(dcovyg)

plt.plot(xg, yg, 'b', linewidth=2)
plt.hold(True)
plt.plot(xg, yg + covyg * 1.96, 'b--')
plt.plot(xg, yg - covyg * 1.96, 'b--')
L = 0.025
for k in xrange(5, len(xg), 5):
    plt.plot(np.array([xg[k] - L, xg[k] + L]),
             np.array([yg[k] - dyg[k] * L,
                      yg[k] + dyg[k] * L]), 'r', linewidth=3)

    plt.plot(np.array([xg[k] - L, xg[k] + L]),
             np.array([yg[k] - (dyg[k] + 1.96 * dcovyg[k]) * L,
                      yg[k] + (dyg[k] + 1.96 * dcovyg[k]) * L]), 'r--')

    plt.plot(np.array([xg[k] - L, xg[k] + L]),
             np.array([yg[k] - (dyg[k] - 1.96 * dcovyg[k]) * L,
                      yg[k] + (dyg[k] - 1.96 * dcovyg[k]) * L]), 'r--')
plt.plot(x, y, 'ok')

plt.axis([0.0, 1.0, 0.0, 1.0])
plt.xlabel('$x$')
plt.ylabel('$f(x)$')
plt.show()

Estimating Derivatives for Gaussian Processes - 2012-07-15