Gaussian Processes and Differential Equations

Naïve experiment to solve differential equations using GP.

So, this is inevitable… Of course, I wanted to try to solve a differential equation using Gaussian Processes and at he same time finding the necessary control signal to pass through a given set of points. I am not 100% confident that I am doing all steps correct here, but it demonstrates how to proceed.

I am starting with a simple differential equation $$\frac{df(t)}{dt} = -a f(t) + u(t)$$ and given some measurements $$f(\tau)$$ for a set of times $$\tau$$ a solution for $$f(t)$$ and $$u(t)$$ is to be found.

I am jumping to the results directly. The figure below shows the solution. The upper subplot contains $$f(t)$$ where the blue line is the outcome based on Gaussian Processes, and the red line is the answer given an ODE-solver with $$u(t)$$ as input.

Ordinary differential equation.

The match is not perfect, and if the constant $$a$$ is increased and not compensated with a shorter length scale for the Gaussian Process the control signal cannot keep up with the variations due to the differential equation.

So, to solve this problem I set up a larger covariance matrix compared to the other problems I wrote about earlier.

$$\begin{array}{r|cccc} & f(t) & f_(t) & \frac{df_(t)}{dt} & u(t) = \frac{df_(t)}{dt} + a f_(t) \ \hline f(t) & D_{11} & D_{12} & D_{13} & D_{14} \ f_(t) & D_{21} & D_{22} & D_{23} & D_{24} \ \frac{df_(t)}{dt} & D_{31} & D_{32} & D_{33} & D_{34} \ u(t) = \frac{df_(t)}{dt} + a f_(t) & D_{41} & D_{42} & D_{43} & D_{44} \end{array} $$

The covariance matrix has earlier been written as:

$$ K = \left( \begin{array}{cc} K_{11} & K_{12} \ K_{21} & K_{22} \end{array} \right) $$

which means that we can identify the parts of the matrix as:

$$ \begin{array}{rcl} K_{11} & = & D_{11} \ K_{12} & = & \left( \begin{array}{ccc} D_{12} & D_{13} & D_{14} \end{array} \right) \ K_{21} & = & \left( \begin{array}{c} D_{21} \ D_{31} \ D_{41} \end{array} \right) \ K_{22} & = & \left( \begin{array}{ccc} D_{22} & D_{23} & D_{24} \ D_{32} & D_{33} & D_{34} \ D_{42} & D_{43} & D_{44} \end{array} \right) \end{array}$$

Each element is a matrix containing:

$$\begin{array}{rl} D_{11} &= \text{cov}\left(f(t_i), f(t_j)\right) \ D_{12} &= \text{cov}\left(f(t_i), f(t^j)\right) \ D{13} &= \text{cov}\left(f(t_i), \frac{df(t^_j)}{dt^j}\right) \ D{14} &= \text{cov}\left(f(t_i), \frac{df(t^_j)}{dt^_j} + a \frac{df(t^_j)}{dt^_j}\right) = \text{cov}\left(f(t_i), \frac{df(t^_j)}{dt^_j}\right) + a,\text{cov}\left(f(t_i), f(t^j)\right) \ D{21} &= \text{cov}\left(f(t^i), f(t_j)\right) \ D{22} &= \text{cov}\left(f(t^_i), f(t^j) \right) \ D{23} &= \text{cov}\left(f(t^_i), \frac{df(t^_j)}{dt^j} \right) \ D{24} &= \text{cov}\left(f(t^_i), \frac{df(t^_j)}{dt^_j} + a \frac{df(t^_j}{dt^_j}\right) = \text{cov}\left(f(t^_i), \frac{df(t^_j)}{dt^_j} \right) + a,\text{cov}\left(f(t^_i), f(t^j)\right) \ D{31} &= \text{cov}\left(\frac{df(t^_i)}{t^i}, f(t_j)\right) \ D{32} &= \text{cov}\left(\frac{df(t^_i)}{t^_i}, f(t^j) \right) \ D{33} &= \text{cov}\left(\frac{df(t^_i)}{t^_i}, \frac{df(t^_j)}{dt^j} \right) \ D{34} &= \text{cov}\left(\frac{df(t^_i)}{t^_i}, \frac{df(t^_j)}{dt^_j} + a f(t^_j)\right) = \text{cov}\left(\frac{df(t^_i)}{t^_i}, \frac{df(t^_j) }{dt^_j} \right) + a,\text{cov}\left(\frac{df(t^_i)}{t^_i}, f(t^j)\right) \ D{41} &= \text{cov}\left(\frac{df(t^_i)}{t^_i} + af(t^_i), f(t_j)\right) = \text{cov}\left(\frac{df(t^_i)}{t^_i}, f(t_j)\right) + a,\text{cov}\left(f(t^i), f(t_j)\right) \ D{42} &= \text{cov}\left(\frac{df(t^_i)}{t^_i} + af(t^_i), f(t^_j) \right) = \text{cov}\left(\frac{df (t^_i)}{t^_i}, f(t^_j)\right) + a, \text{cov}\left(f(t^_i), f(t^j)\right) \ D{43} &= \text{cov}\left(\frac{df(t^_i) }{t^_i} + af(t^_i), \frac{df(t^_j)}{dt^_j} \right) = \text{cov}\left(\frac{df(t^_i)}{t^_i}, \frac{df(t^_j) }{dt^_j}\right) + a,\text{cov}\left(f(t^_i), \frac{df(t^_j)}{dt^j}\right) \ D{44} &= \text{cov}\left(\frac{df(t^_i)}{t^_i} + af(t^_i), \frac{df(t^_j)}{dt^_j} + a f(t^_j)\right) = \text{cov}\left(\frac{df(t^_i)}{t^_i}, \frac{df(t^_j)}{dt^_j} \right) + a,\text{cov}\left(\frac{df(t^_i) }{t^_i}, f(t^_j)\right) + a,\text{cov}\left(f(t^_i), \frac{df(t^_j)}{t^_j}\right) + a^2, \text{cov}\left(f(t^_i),f(t^_j)\right) \end{array}$$

Python code to perform all calculations:

#!/usr/bin/env python2.7
import numpy as np
import matplotlib.pyplot as plt
import scipy.integrate as si

def covfun(x1, x2, ell):
    k = np.exp(-0.5 * ((x1 - x2) / ell) ** 2.0)
    return k

def dx1_covfun(x1, x2, ell):
    ell2 = ell ** 2.0
    k = (x2 - x1) / ell2 * covfun(x1, x2, ell)
    return k

def dx2_covfun(x1, x2, ell):
    ell2 = ell ** 2.0
    k = (x1 - x2) / ell2 * covfun(x1, x2, ell)
    return k

def ddx1x2_covfun(x1, x2, ell):
    ell2 = ell ** 2.0
    ell4 = ell ** 4.0
    k = ((ell2 - (x1 - x2) ** 2.0) / ell4 * covfun(x1, x2, ell))
    return k

# Characteristic length.
l = np.sqrt(0.02)

# Noise variance.
s = 0.01

# Time constant.
a = 0.5

# Measured points.
x = np.array([0.0, 0.1, 0.2, 0.4, 0.5, 0.6, 0.8, 1.0])
y = np.array([0.0, 0.2, 0.5, 0.7, -0.1, 0.4, 0.3, 0.0])

# Grid to predict values for.
xg = np.linspace(0.0, 1.0, 61)

# Matrix to solve for.
# K11 | K12   K13   K14
# ----+----------------
# K21 | K22   K23   K24
# K31 | K32   K33   K34
# K41 | K42   K43   K44

# Initialization of covariance matrix.
K11 = np.zeros((len(x), len(x)))
K12 = np.zeros((len(x), len(xg)))
K13 = np.zeros((len(x), len(xg)))
K14 = np.zeros((len(x), len(xg)))
K21 = np.zeros((len(xg), len(x)))
K31 = np.zeros((len(xg), len(x)))
K41 = np.zeros((len(xg), len(x)))

K22 = np.zeros((len(xg), len(xg)))
K23 = np.zeros((len(xg), len(xg)))
K24 = np.zeros((len(xg), len(xg)))
K32 = np.zeros((len(xg), len(xg)))
K33 = np.zeros((len(xg), len(xg)))
K34 = np.zeros((len(xg), len(xg)))
K42 = np.zeros((len(xg), len(xg)))
K43 = np.zeros((len(xg), len(xg)))
K44 = np.zeros((len(xg), len(xg)))

# Fill covariance matrix with values.
for row in xrange(len(x)):
    for col in xrange(len(x)):
        K11[row, col] = covfun(x[row], x[col], l)

# Do the same for the parts where the grid is involved.
for row in xrange(len(x)):
    for col in xrange(len(xg)):
        K12[row, col] = covfun(x[row], xg[col], l)
        K13[row, col] = dx2_covfun(x[row], xg[col], l)
        K14[row, col] = (dx2_covfun(x[row], xg[col], l) +
                            a * covfun(x[row], xg[col], l))

        K21[col, row] = covfun(xg[col], x[row], l)
        K31[col, row] = dx1_covfun(xg[col], x[row], l)
        K41[col, row] = (dx1_covfun(xg[col], x[row], l) +
                            a * covfun(xg[col], x[row], l))

# And the grid itself.
for row in xrange(len(xg)):
    for col in xrange(len(xg)):
        K22[row, col] = covfun(xg[row], xg[col], l)
        K33[row, col] = ddx1x2_covfun(xg[row], xg[col], l)
        K44[row, col] = (a ** 2.0 * covfun(xg[row], xg[col], l) +
                         a * dx1_covfun(xg[row], xg[col], l) +
                         a * dx2_covfun(xg[row], xg[col], l) +
                         ddx1x2_covfun(xg[row], xg[col], l))

        K23[row, col] = dx2_covfun(xg[row], xg[col], l)
        K32[col, row] = dx1_covfun(xg[col], xg[row], l)

        K24[row, col] = (dx2_covfun(xg[row], xg[col], l) +
                         a * covfun(xg[row], xg[col], l))
        K42[col, row] = (dx1_covfun(xg[col], xg[row], l) +
                         a * covfun(xg[col], xg[row], l))

        K34[row, col] = (ddx1x2_covfun(xg[row], xg[col], l) +
                         a * dx1_covfun(xg[row], xg[col], l))
        K43[col, row] = (ddx1x2_covfun(xg[col], xg[row], l) +
                         a * dx2_covfun(xg[col], xg[row], l))

D12 = np.hstack((K12, K13, K14))
D21 = np.vstack((K21, K31, K41))
D22 = np.vstack((np.hstack((K22, K23, K24)),
                 np.hstack((K32, K33, K34)),
                 np.hstack((K42, K43, K44))))

# Expected values for grid.
Kinv = np.linalg.pinv(K11 + s * np.eye(len(x)))

# Estimates
xe = np.dot(np.dot(D21, Kinv), y)
varxe = np.diag(D22 - np.dot(np.dot(D21, Kinv), D12))

f = xe[0:len(xg)]
df = xe[len(xg):2 * len(xg)]
u = xe[2 * len(xg):]

varf = varxe[0:len(xg)]
vardf = varxe[len(xg):2 * len(xg)]
varu = varxe[2 * len(xg):]

# Solve differential equation based on u.
fsim = si.odeint(func=lambda x, t: -a * t + np.interp(t, xg, u),
                 y0=0.0, t=xg)

plt.subplot(211)
plt.plot(xg, f, 'b', linewidth=2)
plt.hold(True)
plt.plot(xg, fsim, 'r', linewidth=1)
plt.plot(xg, f + 1.96 * varf, 'b--')
plt.plot(xg, f - 1.96 * varf, 'b--')
plt.plot(x, y, 'ok')

plt.axis([0.0, 1.0, -0.5, 1.5])
plt.ylabel('$f(t)$')

plt.subplot(212)
plt.plot(xg, u, 'b', linewidth=2)
plt.axis([0.0, 1.0, -12.0, 12.0])
plt.xlabel('$t$')
plt.ylabel('$u(t)$')

plt.show()

Gaussian Processes and Differential Equations - 2012-07-17