Stefan's Site

Gaussian Processes and Differential Equations

So, this is inevitable... Of course, I wanted to try to solve a differential equation using Gaussian Processes and at he same time finding the necessary control signal to pass through a given set of points. I am not 100% confident that I am doing all steps correct here, but it demonstrates how to proceed.

I am starting with a simple differential equation \(\frac{df(t)}{dt} = -a f(t) + u(t)\) and given some measurements \(f(\tau)\) for a set of times \(\tau\) a solution for \(f(t)\) and \(u(t)\) is to be found.

I am jumping to the results directly. The figure below shows the solution. The upper subplot contains \(f(t)\) where the blue line is the outcome based on Gaussian Processes, and the red line is the answer given an ODE-solver with \(u(t)\) as input.

Ordinary differential equation.

The match is not perfect, and if the constant \(a\) is increased and not compensated with a shorter length scale for the Gaussian Process the control signal cannot keep up with the variations due to the differential equation.

So, to solve this problem I set up a larger covariance matrix compared to the other problems I wrote about earlier.

\[\begin{array}{r|cccc} & f(t) & f_*(t) & \frac{df_*(t)}{dt} & u(t) = \frac{df_*(t)}{dt} + a f_*(t) \\ \hline f(t) & D_{11} & D_{12} & D_{13} & D_{14} \\ f_*(t) & D_{21} & D_{22} & D_{23} & D_{24} \\ \frac{df_*(t)}{dt} & D_{31} & D_{32} & D_{33} & D_{34} \\ u(t) = \frac{df_*(t)}{dt} + a f_*(t) & D_{41} & D_{42} & D_{43} & D_{44} \end{array} \]

The covariance matrix has earlier been written as:

\[ K = \left( \begin{array}{cc} K_{11} & K_{12} \\ K_{21} & K_{22} \end{array} \right) \]

which means that we can identify the parts of the matrix as:

\[ \begin{array}{rcl} K_{11} & = & D_{11} \\ K_{12} & = & \left( \begin{array}{ccc} D_{12} & D_{13} & D_{14} \end{array} \right) \\ K_{21} & = & \left( \begin{array}{c} D_{21} \\ D_{31} \\ D_{41} \end{array} \right) \\ K_{22} & = & \left( \begin{array}{ccc} D_{22} & D_{23} & D_{24} \\ D_{32} & D_{33} & D_{34} \\ D_{42} & D_{43} & D_{44} \end{array} \right) \end{array}\]

Each element is a matrix containing:

\[\begin{align} D_{11} &= \text{cov}\left(f(t_i), f(t_j)\right) \\ D_{12} &= \text{cov}\left(f(t_i), f(t^*_j)\right) \\ D_{13} &= \text{cov}\left(f(t_i), \frac{df(t^*_j)}{dt^*_j}\right) \\ D_{14} &= \text{cov}\left(f(t_i), \frac{df(t^*_j)}{dt^*_j} + a \frac{df(t^*_j)}{dt^*_j}\right) = \text{cov}\left(f(t_i), \frac{df(t^*_j)}{dt^*_j}\right) + a\,\text{cov}\left(f(t_i), f(t^*_j)\right) \\ D_{21} &= \text{cov}\left(f(t^*_i), f(t_j)\right) \\ D_{22} &= \text{cov}\left(f(t^*_i), f(t^*_j) \right) \\ D_{23} &= \text{cov}\left(f(t^*_i), \frac{df(t^*_j)}{dt^*_j} \right) \\ D_{24} &= \text{cov}\left(f(t^*_i), \frac{df(t^*_j)}{dt^*_j} + a \frac{df(t^*_j}{dt^*_j}\right) = \text{cov}\left(f(t^*_i), \frac{df(t^*_j)}{dt^*_j} \right) + a\,\text{cov}\left(f(t^*_i), f(t^*_j)\right) \\ D_{31} &= \text{cov}\left(\frac{df(t^*_i)}{t^*_i}, f(t_j)\right) \\ D_{32} &= \text{cov}\left(\frac{df(t^*_i)}{t^*_i}, f(t^*_j) \right) \\ D_{33} &= \text{cov}\left(\frac{df(t^*_i)}{t^*_i}, \frac{df(t^*_j)}{dt^*_j} \right) \\ D_{34} &= \text{cov}\left(\frac{df(t^*_i)}{t^*_i}, \frac{df(t^*_j)}{dt^*_j} + a f(t^*_j)\right) = \text{cov}\left(\frac{df(t^*_i)}{t^*_i}, \frac{df(t^*_j) }{dt^*_j} \right) + a\,\text{cov}\left(\frac{df(t^*_i)}{t^*_i}, f(t^*_j)\right) \\ D_{41} &= \text{cov}\left(\frac{df(t^*_i)}{t^*_i} + af(t^*_i), f(t_j)\right) = \text{cov}\left(\frac{df(t^*_i)}{t^*_i}, f(t_j)\right) + a\,\text{cov}\left(f(t^*_i), f(t_j)\right) \\ D_{42} &= \text{cov}\left(\frac{df(t^*_i)}{t^*_i} + af(t^*_i), f(t^*_j) \right) = \text{cov}\left(\frac{df (t^*_i)}{t^*_i}, f(t^*_j)\right) + a\, \text{cov}\left(f(t^*_i), f(t^*_j)\right) \\ D_{43} &= \text{cov}\left(\frac{df(t^*_i) }{t^*_i} + af(t^*_i), \frac{df(t^*_j)}{dt^*_j} \right) = \text{cov}\left(\frac{df(t^*_i)}{t^*_i}, \frac{df(t^*_j) }{dt^*_j}\right) + a\,\text{cov}\left(f(t^*_i), \frac{df(t^*_j)}{dt^*_j}\right) \\ D_{44} &= \text{cov}\left(\frac{df(t^*_i)}{t^*_i} + af(t^*_i), \frac{df(t^*_j)}{dt^*_j} + a f(t^*_j)\right) = \text{cov}\left(\frac{df(t^*_i)}{t^*_i}, \frac{df(t^*_j)}{dt^*_j} \right) + a\,\text{cov}\left(\frac{df(t^*_i) }{t^*_i}, f(t^*_j)\right) + a\,\text{cov}\left(f(t^*_i), \frac{df(t^*_j)}{t^*_j}\right) + a^2\, \text{cov}\left(f(t^*_i),f(t^*_j)\right) \end{align}\]

Python code to perform all calculations:

#!/usr/bin/env python2.7

import numpy as np
import matplotlib.pyplot as plt
import scipy.integrate as si

def covfun(x1, x2, ell):
    k = np.exp(-0.5 * ((x1 - x2) / ell) ** 2.0)
    return k

def dx1_covfun(x1, x2, ell):
    ell2 = ell ** 2.0
    k = (x2 - x1) / ell2 * covfun(x1, x2, ell)
    return k

def dx2_covfun(x1, x2, ell):
    ell2 = ell ** 2.0
    k = (x1 - x2) / ell2 * covfun(x1, x2, ell)
    return k

def ddx1x2_covfun(x1, x2, ell):
    ell2 = ell ** 2.0
    ell4 = ell ** 4.0
    k = ((ell2 - (x1 - x2) ** 2.0) / ell4 * covfun(x1, x2, ell))
    return k

# Characteristic length.

l = np.sqrt(0.02)

# Noise variance.

s = 0.01

# Time constant.

a = 0.5

# Measured points.

x = np.array([0.0, 0.1, 0.2, 0.4, 0.5, 0.6, 0.8, 1.0])
y = np.array([0.0, 0.2, 0.5, 0.7, -0.1, 0.4, 0.3, 0.0])

# Grid to predict values for.

xg = np.linspace(0.0, 1.0, 61)

# Matrix to solve for.

# K11 | K12   K13   K14

# ----+----------------

# K21 | K22   K23   K24

# K31 | K32   K33   K34

# K41 | K42   K43   K44


# Initialization of covariance matrix.

K11 = np.zeros((len(x), len(x)))
K12 = np.zeros((len(x), len(xg)))
K13 = np.zeros((len(x), len(xg)))
K14 = np.zeros((len(x), len(xg)))
K21 = np.zeros((len(xg), len(x)))
K31 = np.zeros((len(xg), len(x)))
K41 = np.zeros((len(xg), len(x)))

K22 = np.zeros((len(xg), len(xg)))
K23 = np.zeros((len(xg), len(xg)))
K24 = np.zeros((len(xg), len(xg)))
K32 = np.zeros((len(xg), len(xg)))
K33 = np.zeros((len(xg), len(xg)))
K34 = np.zeros((len(xg), len(xg)))
K42 = np.zeros((len(xg), len(xg)))
K43 = np.zeros((len(xg), len(xg)))
K44 = np.zeros((len(xg), len(xg)))

# Fill covariance matrix with values.

for row in xrange(len(x)):
    for col in xrange(len(x)):
        K11[row, col] = covfun(x[row], x[col], l)

# Do the same for the parts where the grid is involved.

for row in xrange(len(x)):
    for col in xrange(len(xg)):
        K12[row, col] = covfun(x[row], xg[col], l)
        K13[row, col] = dx2_covfun(x[row], xg[col], l)
        K14[row, col] = (dx2_covfun(x[row], xg[col], l) +
                            a * covfun(x[row], xg[col], l))

        K21[col, row] = covfun(xg[col], x[row], l)
        K31[col, row] = dx1_covfun(xg[col], x[row], l)
        K41[col, row] = (dx1_covfun(xg[col], x[row], l) +
                            a * covfun(xg[col], x[row], l))

# And the grid itself.

for row in xrange(len(xg)):
    for col in xrange(len(xg)):
        K22[row, col] = covfun(xg[row], xg[col], l)
        K33[row, col] = ddx1x2_covfun(xg[row], xg[col], l)
        K44[row, col] = (a ** 2.0 * covfun(xg[row], xg[col], l) +
                         a * dx1_covfun(xg[row], xg[col], l) +
                         a * dx2_covfun(xg[row], xg[col], l) +
                         ddx1x2_covfun(xg[row], xg[col], l))

        K23[row, col] = dx2_covfun(xg[row], xg[col], l)
        K32[col, row] = dx1_covfun(xg[col], xg[row], l)

        K24[row, col] = (dx2_covfun(xg[row], xg[col], l) +
                         a * covfun(xg[row], xg[col], l))
        K42[col, row] = (dx1_covfun(xg[col], xg[row], l) +
                         a * covfun(xg[col], xg[row], l))

        K34[row, col] = (ddx1x2_covfun(xg[row], xg[col], l) +
                         a * dx1_covfun(xg[row], xg[col], l))
        K43[col, row] = (ddx1x2_covfun(xg[col], xg[row], l) +
                         a * dx2_covfun(xg[col], xg[row], l))

D12 = np.hstack((K12, K13, K14))
D21 = np.vstack((K21, K31, K41))
D22 = np.vstack((np.hstack((K22, K23, K24)),
                 np.hstack((K32, K33, K34)),
                 np.hstack((K42, K43, K44))))

# Expected values for grid.

Kinv = np.linalg.pinv(K11 + s * np.eye(len(x)))

# Estimates

xe = np.dot(np.dot(D21, Kinv), y)
varxe = np.diag(D22 - np.dot(np.dot(D21, Kinv), D12))

f = xe[0:len(xg)]
df = xe[len(xg):2 * len(xg)]
u = xe[2 * len(xg):]

varf = varxe[0:len(xg)]
vardf = varxe[len(xg):2 * len(xg)]
varu = varxe[2 * len(xg):]

# Solve differential equation based on u.

fsim = si.odeint(func=lambda x, t: -a * t + np.interp(t, xg, u),
                 y0=0.0, t=xg)

plt.subplot(211)
plt.plot(xg, f, 'b', linewidth=2)
plt.hold(True)
plt.plot(xg, fsim, 'r', linewidth=1)
plt.plot(xg, f + 1.96 * varf, 'b--')
plt.plot(xg, f - 1.96 * varf, 'b--')
plt.plot(x, y, 'ok')

plt.axis([0.0, 1.0, -0.5, 1.5])
plt.ylabel('$f(t)$')

plt.subplot(212)
plt.plot(xg, u, 'b', linewidth=2)
plt.axis([0.0, 1.0, -12.0, 12.0])
plt.xlabel('$t$')
plt.ylabel('$u(t)$')

plt.show()