MATHEMATICAL AND NUMERICAL METHODS

Abstract. These notes contain all examinable theoretical material for the

Methods course in 2021–2022, although I shall add further examples during

the year.

Contents

1. Introduction 3

1.1. Reading List 3

2. The Geometric Brownian Motion Universe 6

2.1. European Puts and Calls 10

2.2. Digital Options 12

2.3. European Puts and Calls 13

3. Brownian Motion 16

3.1. Simple Random Walk 16

3.2. Discrete Brownian Motion 17

3.3. Basic Properties of Brownian Motion 17

3.4. Martingales 18

3.5. Brownian Motion and Martingales 18

3.6. The Black–Scholes Equation 19

3.7. Itˆo Calculus 21

3.8. Itˆo rules and SDEs 26

3.9. Multivariate Geometric Brownian Motion 27

3.10. Asian Options 31

3.11. The Ornstein–Uhlenbeck Process 33

3.12. Feynman–Kac 35

3.13. Feynman–Kac and Black–Scholes I 36

3.14. Feynman–Kac and Black–Scholes II 37

4. The Binomial Model Universe 38

4.1. The Binomial Model and Delta Hedging 43

4.2. ∆-Hedging for GBM 44

5. The Partial Differential Equation Approach 46

5.1. The Diffusion Equation 46

5.2. Finite Difference Methods for the Diffusion Equation 48

5.3. The Fourier Transform and the von Neumann Stability Test 55

5.4. Stability and the Fourier Transform 57

5.5. Option Pricing via the Fourier transform 59

5.6. Fourier Transform Conventions 61

6. Mathematical Background Material 63

1

2 BRAD BAXTER

6.1. Probability Theory 63

6.2. Differential Equations 65

6.3. Recurrence Relations 66

6.4. Mortgages – a once exotic instrument 69

6.5. Pricing Mortgages via lack of arbitrage 70

7. Numerical Linear Algebra 71

7.1. Orthogonal Matrices 71

MATHEMATICAL AND NUMERICAL METHODS 3

1. Introduction

You can access these notes, and other material, via my office machine:

http://econ109.econ.bbk.ac.uk/brad/Methods/

My main lecture notes are available here:

http://econ109.econ.bbk.ac.uk/brad/Methods/new_methods_notes.pdf }

These notes are fairly stable, having evolved while teaching three different MSc

programmes: MSc Mathematical Finance at Imperial College, London, MSc Financial Engineering here, and now MSc Mathematical Finance. I do still add new

examples and make minor changes, so please check you have the latest version.

For those students who are only taking the Autumn Term of this course (also

known as Continuous Time Stochastic Processes), Sections 2 and 3 are the

key sections, although I shall also include some material from Section 4 (specifically

Delta Hedging for the Binomial Model, to motive its continuous time analogue).

Some students will also be taking my Matlab course, but my Matlab notes are

available to all:

http://econ109.econ.bbk.ac.uk/brad/Methods/matlab_intro_notes.pdf

My friend and colleague Raymond Brummelhuis provided very lucid notes for

an earlier version of this course:

http://econ109.econ.bbk.ac.uk/brad/Methods/old_methods_notes_RB.pdf

Raymond’s notes are still highly useful, but please do be remember that these

notes define the current syllabus.

Past exams can be downloaded from

http://econ109.econ.bbk.ac.uk/brad/FinEngExams/

Many students will find my Numerical Analysis notes helpful too:

http://econ109.econ.bbk.ac.uk/brad/Methods/nabook.pdf

I wrote these notes for an undergraduate course in Numerical Analysis when

lecturing at Imperial College, London, from 1995–2001. However, they have often

been found useful by MSc students who need to improve their general understanding

of theoretical Numerical Analysis. The first section of the notes is on matrix algebra

and contain many examples and exercises, together with solutions.

Finally, there is lots of interesting material, including extensive notes for several

related courses (e.g. Analysis) available on my office Linux server, so please do

explore:

http://econ109.econ.bbk.ac.uk/brad/

Despite my providing you with lots of online notes, in my experience,

students will benefit enormously from the old-fashioned method of taking

notes as I lecture.

1.1. Reading List. There are many books on mathematical finance, but very few

good ones. My strongest recommendations are for the books by Higham and Kwok.

However, the following books are all useful, but these notes are mostly either selfcontained or refer to my own online notes.

(i) M. Baxter and A. Rennie, Financial Calculus, Cambridge University Press.

This gives a fairly informal description of the mathematics of pricing, concentrating on martingales. It’s not a source of information for efficient

numerical methods.

4 BRAD BAXTER

(ii) A. Etheridge, A Course in Financial Calculus, Cambridge University Press.

This does not focus on the algorithmic side but is very lucid, although it

is probably better read once you are familiar with the contents of the first

term.

(iii) D. Higham, An Introduction to Financial Option Valuation, Cambridge

University Press. This book provides many excellent Matlab examples,

although its mathematical level is undergraduate.

(iv) J. Hull, Options, Futures and Other Derivatives, 6th edition. [Earlier editions are probably equally suitable for much of the course.] Fairly clear,

with lots of background information on finance. The mathematical treatment is lower than the level of much of our course (and this is not a

mathematically rigorous book), but it’s still the market leader in many

ways.

(v) D. Kennedy, Stochastic Financial Models, Chapman and Hall. This is

an excellent mathematical treatment, but probably best left until after

completing the first term of Methods.

(vi) J. Michael Steele, Stochastic Calculus and Financial Applications, Springer.

This is an excellent book, but is one to read near the end of this term,

once you are more comfortable with fundamentals.

(vii) P. Wilmott, S. Howison and J. Dewynne, The Mathematics of Financial

Derivatives, Cambridge University Press. This book is very useful for

its information on partial differential equations. If your first degree was

in engineering, mathematics or physics, then you probably spent many

happy hours learning about the diffusion equation. This book is very

much mathematical finance from the perspective of a traditional applied

mathematician. It places much less emphasis on probability theory than

most books on finance.

(viii) P. Wilmott, Paul Wilmott introduces Quantitative Finance, 2nd edition,

John Wiley. More chatty than his previous book. The author’s ego grew

enormously between the appearance of these texts, but there’s some good

material here.

(ix) Y.-K. Kwok, Mathematical Models of Financial Derivatives, Springer. Rather

dry, but very detailed treatment of finite difference methods. If you need

a single book for general reference work, then this is probably it.

There are lots of books suitable for mathematical revision. The Schaum series publishes many good inexpensive textbooks providing worked examples. The

inexpensive paperback Calculus, by K. G. Binmore (Cambridge University Press)

will also be useful to students wanting an introduction to, say, multiple integrals, as

will Mathematical Methods for Science Students, by Geoff Stephenson. At a slightly

higher level, All you wanted to know about Mathematics but were afraid to ask, by

L. Lyons (Cambridge University Press, 2 vols), is useful and informal.

The ubiquitous Numerical Recipes in C++, by S. Teukolsky et al, is extremely

useful. Its coverage of numerical methods is generally reliable and it’s available

online at www.nr.com. A good hard book on partial differential equations is that

of A. Iserles (Cambridge University Press).

At the time of writing, finance is going through a turbulent period, in which

politicians profess their longstanding doubts that the subject was well-founded –

surprisingly, many omitted to voice such doubts earlier! It is good to know that

MATHEMATICAL AND NUMERICAL METHODS 5

we have been here before. The following books are included for general cultural

interest.

(i) M. Balen, A Very English Deceit: The Secret History of the South Sea

Bubble and the First Great Financial Scandal.

(ii) C. Eagleton and J. William (eds), Money: A History.

(iii) C. P. Kindleberger, R. Aliber and R. Solow, Manias, Panics, and Crashes:

A History of Financial Crises, Wiley. This is still a classic.

(iv) J. Lanchester, How to Speak Money, Faber. This is an excellent introduction to finance and economics for all readers. Lanchester is a journalist

and author, as well as being a gifted expositor.

(v) N. N. Taleb, The Black Swan. In my view, this is greatly over-rated, but

you should still read it.

No text is perfect: please report any slips to b.baxter@bbk.ac.uk.

6 BRAD BAXTER

2. The Geometric Brownian Motion Universe

We shall begin with a brisk introduction to the main topics, filling in the details

later. The real economy is vastly complex, so mathematical finance begins with

vast oversimplification.

Let r be the risk-free interest rate, which we shall assume constant. This is

really the interest paid by the state when borrowing money via selling bonds, and

it is nominally risk-free in any state that issues its own currency, although the

real value of that currency can greatly decrease. We assume that everyone in our

mathematical economy can borrow and lend at this rate, so that such debts (or

investments, if lent) satisfy Bt = B0 exp(rt). In reality, banks and companies

borrow and lend at a higher rate r + δ, where δ increases with the perceived risk of

the lender, but this complication is ignored here.

Notation: In most (but not all) areas of mathematics, a function B depending on time t would be denoted B(t), but mathematical finance often uses the

alternative notation Bt which is very common in probability theory, statistics and

economics. I shall be consistent in using S(t) to denote the share price in Section

2, but we shall move to St in Section 3.

We shall assume that every risky asset (such as a share) is described by a random

process called geometric Brownian motion (GBM):

(2.1) S(t) = S(0)e

βt+σWt

, t > 0,

where Wt denotes Brownian motion, β ∈ R and σ is a non-negative parameter called

the volatility of the asset. You can think of Brownian motion as an important

generalization of random walk, but we shall postpone its detailed definition and

properties until Section 3. Fortunately all we need for now is the fundamental

property that WT is a normal (or Gaussian) random variable with mean zero and

variance T, that is,

(2.2) Wt ∼ N(0, t), for all t > 0.

As we shall see later, option pricing requires us to use β = r − σ

2/2, that is,

(2.3) S(t) = S(0)e

(r−σ

2/2)t+σWt

, t > 0,

and this is usually called risk neutral GBM. The reason for the disappearance of the

parameter β when pricing options is rather deep and extremely important, but will

be explained later. All you need to know at present is that (2.1) is the mathematical

model for share prices in the real world, but the risk neutral variant (2.3) is used

when pricing options, i.e. contracts whose value depends on the asset price.

Thus, when pricing options, to generate sample prices S(T) at some future time

T given the initial price S(0), we use

(2.4) S(T) = S(0) exp

(r − σ

2

/2)T + σ

√

T ZT

, where ZT ∼ N(0, 1),

because WT ∼ N(0, T), so that we can write WT = T

1/2ZT .

Example 2.1. Generating sample prices at a fixed time T using (2.4) is particularly

easy in Matlab and Octave:

S = S0*exp((r-sigma^2/2)*T + sigma*sqrt(T)*randn(m,1)

will construct a column vector of m sample prices once you’ve defined S0, r, σ and

T. To calculate the sample average price, we type sum(S)/m

MATHEMATICAL AND NUMERICAL METHODS 7

To analytically calculate ES(T) we need the following simple, yet crucial, lemma.

Lemma 2.1. If W ∼ N(0, 1), then E exp(λW) = exp(λ

2/2).

Proof. We have

Ee

λW =

Z ∞

−∞

e

λt(2π)

−1/2

e

−t

2/2

dt = (2π)

−1/2

Z ∞

−∞

e

− 1

2

(t

2−2λt)

dt.

The trick now is to complete the square in the exponent, that is,

t

2 − 2λt = (t − λ)

2 − λ

2

.

Thus

Ee

λW = (2π)

−1/2

Z ∞

−∞

exp

−

1

2

([t − λ]
2 − λ

2

)

dt = e

λ

2/2

.

This is also described in detail in Example 6.3.

Lemma 2.2. For every σ ≥ 0, we have the expected growth

(2.5) ES(t) = S(0)e

rt, t ≥ 0.

Proof. This is an easy consequence of Lemma 2.1.

The option pricing risk-neutral geometric Brownian motion universe might therefore seem rather strange, because every asset has the same expected growth e

rt as

the risk-free interest rate. Thus our universe of all possible assets in a risk-neutral

world is specified by one parameter only: the volatility σ. Please do remember that

this is not the asset price in the market, but simply a mathematical device required

for pricing options based on the asset.

A financial derivative is any function f(S, t). We shall concentrate on the following particular class of derivatives.

Definition 2.1. A European option is any function f ≡ f(S, t) that satisfies the

conditional expectation equation

(2.6) f(S(t), t) = e

−rhE

f(S(t + h), t + h)|S(t)

, for any h > 0.

We shall often simply write this as

f(S(t), t) = e

−rhEf(S(t + h), t + h)

but you should take care to remember that this is an expected future value given

the asset’s current value S(t). We see that (2.6) describes a contract f(S, t) whose

current value is the discounted value of its expected future value in the risk-neutral

GBM universe.

We can learn a great deal by studying the mathematical consequences of (2.6)

and (2.3).

Example 2.2. A plain vanilla European put option is simply an insurance contract

that allows us to sell one unit of the asset, for exercise price K, at time T in the

future. If the asset’s price S(T) is less than K at this expiry time, then the option is

worth K −S(T), otherwise it’s worthless. Such contracts protect us if we’re worried

that the asset’s price might drop. The pricing problem here to calculate the value

of the contract at time zero given its value at expiry, namely

(2.7) fP (S(T), T) = (K − S(T))+ ,

8 BRAD BAXTER

where (z)+ := max{z, 0}.

Typically, we know the value of the option f(S(T), T) for all values of the asset

S(T) at some future time T. Our problem is to compute its value at some earlier

time, because we’re buying or selling this option.

Example 2.3. A plain vanilla European call option gives us the right to buy one

unit of the asset at the exercise price K at time T. If the asset’s price S(T) exceeds

K at this expiry time, then the option is worth S(T) − K, otherwise it’s worthless,

implying the expiry value

(2.8) fC (S(T), T) = (S(T) − K)+ ,

using the same notation as Example 2.2. Such contracts protect us if we’re worried

that the asset’s price might rise.

How do we compute f(S(0), 0)? The difficult part is computing the expected

future value Ef(S(T), T). This can be done analytically for a tiny number of

options, including the European Put and Call (see Theorem 2.5), but usually we

must resort to a numerical calculation. This leads us to our first algorithm: Monte

Carlo simulation. Here we choose a large integer N and generate N pseudo-random

numbers Z1, Z2, . . . , ZN that have the normalized Gaussian distribution; in Matlab,

we simply write Z = randn(N,1). Using (2.3), these generate the future asset prices

(2.9) Sk = S(0) exp

(r −

σ

2

2

)T + σ

√

T Zk

, k = 1, . . . , N.

We then approximate the future expected value by an average, that is, we take

(2.10) f(S(0), 0) ≈

e

−rT

N

X

N

k=1

f(Sk, T).

Monte Carlo simulation has the great advantage that it is extremely simple to

program. Its disadvantage is that the error is usually a multiple of 1/

√

N, so that

very large N is needed for high accuracy (each decimal place of accuracy requires

about a hundred times more work). We note that (2.10) will compute the value of

any European option that is completely defined by a known final value f(S(T), T).

We shall now use Monte Carlo to approximately evaluate the European Call and

Put contracts. In fact, Put-Call parity, described below in Theorem 2.3, implies

that we only need a program to calculate one of these, because they are related by

the simple formula

(2.11) fC (S(0), 0) − fP (S(0), 0) = S(0) − Ke−rT

.

Here’s the Matlab program for the European Put.

%

% These are the parameters chosen in Example 11.6 of

% OPTIONS, FUTURES AND OTHER DERIVATIVES,

% by John C. Hull (Prentice Hall, 4th edn, 2000)

%

%% initial stock price

S0 = 42;

% unit of time = year

% 250 working days per year

MATHEMATICAL AND NUMERICAL METHODS 9

% continuous compounding risk-free rate

r = 0.1;

% exercise price

K = 40;

% time to expiration in years

T = 0.5;

% volatility of 20 per cent annually

sigma = 0.2;

% generate asset prices at expiry

Z = randn(N,1);

ST = S0*exp( (r-(sigma^2)/2)*T + sigma*sqrt(T)*Z );

% calculate put contract values at expiry

fput = max(K – ST,0.0);

% average put values at expiry and discount to present

mc_put = exp(-r*T)*sum(fput)/N

% calculate analytic value of put contract

wK = (log(K/S0) – (r – (sigma^2)/2)*T)/(sigma*sqrt(T));

a_put = K*exp(-r*T)*Phi(wK) – S0*Phi(wK – sigma*sqrt(T))

The function Phi denotes the cumulative distribution function for the normalized

Gaussian distribution, that is,

(2.12) Φ(x) = P(Z ≤ x) = Z x

−∞

(2π)

−1/2

e

−s

2/2

ds, for x ∈ R,

where Z ∼ N(0, 1).

Unfortunately, Matlab only provides the very similar error function, defined by

erf(y) = 2

√

π

Z y

0

exp(−s

2

) ds, y ∈ R.

It’s not hard to prove that

Φ(t) = 1

2

1 + erf(t/√

2)

, t ∈ R.

We can add this to Matlab using the following function.

function Y = Phi(t)

Y = 0.5*(1.0 + erf(t/sqrt(2)));

We have only revealed the tip of a massive iceberg in this brief introduction.

Firstly, the Black–Scholes model, where asset prices evolve according to (2.3), is

rather poor: reality is far messier. Further, there are many types of option which

are path-dependent: the value of the option at expiry depends not only on the final

price S(T), but on its previous values {S(t) : 0 ≤ t ≤ T}. In particular, there are

American options, where the contract can be exercised at any time before its expiry.

All of these points will be addressed in our course, but you should find that Hull’s

book provides excellent background reading (although his mathematical treatment

is often sketchy). Higham provides a clear Matlab-based exposition.

Although the future expected value usually requires numerical computation,

there are some simple cases that are analytically tractable. These are particularly

important because they often arise in examinations!

10 BRAD BAXTER

2.1. European Puts and Calls. It’s not too hard to calculate the values of these

options analytically. Further, the next theorem gives an important relation between

the prices of call and put options.

Theorem 2.3 (Put-Call parity). European Put and Call options, each with exercise

price K and expiry time T, satisfy

(2.13) fC (S, t) − fP (S, t) = S − Ke−rτ

, for S ∈ R and 0 ≤ t ≤ T,

where τ = T − t, the time-to-expiry.

Proof. The trick is the observation that

y = y+ − (−y)+,

for any y ∈ R. Thus

S(T) − K = (S(T) − K)+ − (K − S(T))+

= fC (S(T), T) − fP (S(T), T),

which implies

e

−rτE (S(T) − K|S(t) = S) = fC (S, t) − fP (S, t).

Now

E (S(T)|S(t) = S) = (2π)

−1/2

Z ∞

−∞

Se(r−σ

2/2)τ+σ

√

τwe

−w

2/2

dw

= Se(r−σ

2/2)τ

(2π)

−1/2

Z ∞

−∞

e

− 1

2 (w

2−2σ

√

τw) dw

= Serτ

,

and some simple algebraic manipulation completes the proof.

This is a useful check on the Monte Carlo approximations of the options’ values.

To derive their analytic values, we shall need the cumulative distribution function

(2.14) Φ(y) = (2π)

−1/2

Z y

−∞

e

−z

2/2

dz, y ∈ R,

for the Gaussian probability density, that is, P(Z ≤ y) = Φ(y) and P(a ≤ Z ≤ b) =

Φ(b)−Φ(a), for any normalized Gaussian random variable Z. Further, we have the

following relation which will be of use in subsequent formulae.

Lemma 2.4. We have 1 − Φ(a) = Φ(−a), for any a ∈ R.

Proof. Observe that

1 − Φ(a) = Z ∞

a

(2π)

−1/2

e

−s

2/2

ds

=

Z −a

−∞

(2π)

−1/2

e

−u

2/2

du

= Φ(−a),

where we have made the substitution u = −s.

MATHEMATICAL AND NUMERICAL METHODS 11

Theorem 2.5. A European Put option satisfies

(2.15) fP (S, t) = Ke−rτΦ(w(K)) − SΦ(w(K) − σ

√

τ ), for S ∈ R,

where τ = T − t, i.e. the time-to-expiry, and w(K) is defined by the equation

K = Se(r−σ

2/2)τ+σ

√

τw(K)

,

that is

(2.16) w(K) = log(K/S) − (r − σ

2/2)τ

σ

√

τ

.

Proof. We have

E (fP (S(T), T)|S(t) = S) = (2π)

−1/2

Z ∞

−∞

K − Se(r−σ

2/2)τ+σ

√

τw

+

e

−w

2/2

dw.

Now the function

w 7→ K − S exp((r − σ

2

/2)τ + σ

√

τw)

is strictly decreasing, so that

K − Se(r−σ

2/2)τ+σ

√

τw ≥ 0

if and only if w ≤ w(K), where w(K) is given by (2.16). Hence

E (fP (S(T), T)|S(t) = S) = (2π)

−1/2

Z w(K)

−∞

K − Se(r−σ

2/2)τ+σ

√

τw

e

−w

2/2

dw

= KΦ(w(K)) − Se(r−σ

2/2)τ

(2π)

−1/2

Z w(K)

−∞

e

− 1

2 (w

2−2σ

√

τw) dw

= KΦ(w(K)) − SerτΦ(w(K) − σ

√

τ ).

Thus

fP (S, t) = e

−rτE (fP (S(T), T)|S(t) = S)

= Ke−rτΦ(w(K)) − SΦ(w(K) − σ

√

τ ).

There is an almost standard notation for Theorem 2.5, which is contained in the

following corollary.

Corollary 2.6. A European Put option satisfies

(2.17) fP (S, t) = Ke−rτΦ(−d−) − SΦ(−d+), for S ∈ R,

where τ = T − t, i.e. the time-to-expiry, and

(2.18) d± =

log(S/K) + (r ± σ

2/2)τ

σ

√

τ

.

Proof. This is simply rewriting Theorem 2.5 in terms of (2.18).

We can now calculate the price of a European call using Corollary 2.6 and the

Put-Call parity Theorem 2.3.

12 BRAD BAXTER

Corollary 2.7. A European Call option satisfies

(2.19) fC (S, t) = SΦ(d+) − Ke−rτΦ(d−), for S ∈ R,

where τ = T − t, i.e. the time-to-expiry, and d± is given by (2.18).

Proof. Theorem 2.3 implies that

fC (S, t) = fP (S, t) + S − Ke−rτ

= Ke−rτΦ(−d−) − SΦ(−d+) + S − Ke−rτ

= S (1 − Φ(−d+)) − Ke−rτ (1 − Φ(−d−))

= SΦ(d+) − Ke−rτΦ(d−),

using Lemma 2.4.

Exercise 2.1. Modify the proof of Theorem 2.5 to derive the analytic price of a

European Call option. Check that your price agrees Corollary 2.7.

2.2. Digital Options. A digital option is simply an option that only takes the

values 0 and 1, that is, it is the indicator function for some event. Recall that, for

any indicator function IA, we have

EIA = P(A).

Our first example is the digital call option with exercise price K and expiry time

T is defined by

(2.20) fDC (S(T), T) = (

1 if S(T) ≥ K,

0 otherwise.

Theorem 2.8. The digital call option fDC satisfies

(2.21) fDC (S, t) = e

−rτΦ(d−),

where τ = T − t and d− is defined by (2.18).

Proof. Its price at any earlier time t ∈ [0, t) is therefore given by

fDC (S, t) = e

−rτE (fDC (S(T), T)|S(t) = S)

= e

−rτEfDC (Se(r−σ

2/2)τ+στ

1/2Z

(2.22) ,

(2.23)

where Z ∼ N(0, 1).

Now

Se(r−σ

2/2)τ+στ

1/2Z ≥ K

if and only if

log S + (r − σ

2

/2)τ + στ 1/2Z ≥ log K,

because the logarithm is an increasing function. Rearranging this inequality, we

find

Z ≥ −

log(S/K) − (r − σ

2/2)τ

στ 1/2

= −d−.

Thus

fDC (S, t) = e

−rτP (Z ≥ −d−)

= e

−rτ (1 − Φ(−d−))

= e

−rτΦ(d−),

MATHEMATICAL AND NUMERICAL METHODS 13

by Lemma 2.4.

It is now simple to define and price the digital put option fDP , which is defined

by

(2.24) fDP (S(T), T) = (

1 if S(T) < K,
0 otherwise.
A pair of digital put and call options with the same exercise price K and expiry
time T satisfy a digital put-call parity relation, specifically
fDC (S(T), T) + fDP (S(T), T) ≡ 1,
at expiry, which implies
(2.25) fDC (S, t) + fDP (S, t) ≡ e
−rτ
, for S ∈ R,
where τ = T − t.
Theorem 2.9. The digital put option fDP satisfies
(2.26) fDP (S, t) = e
−rτΦ(−d−),
where τ = T − t and d− is defined by (2.18).
Proof. We use (2.25) and Lemma 2.4:
fDP (S, t) = e
−rτ − fDC (S, t) = e
−rτ (1 − Φ(d−)) = e
−rτΦ(−d−).
Another way to express digital calls and puts is as follows. Observe that
(2.27) fDC (S(T), T) = (S(T) − K)
0
+ and fDP (S(T), T) = (K − S(T))0
+ .
Thus we have shown that
E
(S(T) − K)+ |S(t) = S
= SerτΦ(d+) − KΦ(d−),
E
(S(T) − K)
0
+ |S(t) = S
= Φ(d−),
E
(K − S(T))+ |S(t) = S
= KΦ(−d−) − SerτΦ(−d+),
E
(K − S(T))0
+ |S(t) = S
= Φ(−d−),
2.3. European Puts and Calls. The first and second partial derivatives of option
prices are important both numerically and financially. A fairly standard nomenclature has evolved: for any option f, its partial derivative ∂f /∂S ≡ ∂Sf is called the
“Delta” of the option. It is not our aim to provide an exhaustive list of the Greek
(and non-Greek) letters used to denote these partial derivatives, but it is important
to see how some of them are calculated.
Theorem 2.10. If we let fC denote the call option whose value is given by Theorem
2.7, then
(2.28) ∂SfC (S, t) = Φ(d+),
where d+ is defined by (2.18).
This calculation is made easier by several preliminary result
14 BRAD BAXTER
Lemma 2.11. We have
(2.29) ∂Sd± =
1
Sστ 1/2
.
Proof. Now
d± =
log S − log K + (r ± σ
2/2)τ
στ 1/2
,
so that
∂Sd± =
∂S log S
στ 1/2
=
1
Sστ 1/2
.
Lemma 2.12. The function Φ defined by (2.12) has derivative
(2.30) Φ0
(x) = (2π)
−1/2
e
−x
2/2
, for x ∈ R.
Proof. This is the fundamental theorem of calculus applied to the definition of
(2.12).
Lemma 2.13. We have
(2.31) 1
2
d
2
+ −
1
2
d
2
− = log(S/K) + rτ,
where d± is defined by (2.18).
Proof. Let use write d± = a ± b, where
a =
log(S/K) + rτ
στ 1/2
and
b =
σ
2
τ /2
στ 1/2
.
Hence
d
2
+ − d
2
− = (a + b)
2 − (a − b)
2 = 4ab,
and so
1
2
d
2
+ − d
2
−
= 2ab = log(S/K) + rτ.
Lemma 2.14. We have
(2.32) Se−d
2
+/2 − Ke−rτ e
−d
2
−/2 = 0.
Proof. Using Lemma 2.13, we see that
log K − rτ −
1
2
d
2
− = log S −
1
2
d
2
+,
and taking the exponential we find
Ke−rτ e
−d
2
−/2 = Se−d
2
+/2
,
as required.
We can now assemble these partial results to prove Theorem 2.10
MATHEMATICAL AND NUMERICAL METHODS 15
Proof. Theorem 2.10: Partially differentiating fC with respect to S, we obtain
∂SfC = Φ(d+) + L,
where
L =
Se−d
2
+/2 − Ke−rτ e
−d
2
−/2
Sστ 1/2(2π)
1/2
,
by Lemma 2.11 and Lemma 2.12. Hence ∂SfC = Φ(d+), by Lemma 2.14.
Theorem 2.15. If fP denotes the put option with exercise price K and expiry time
T, then
(2.33) ∂SfP (S, t) = −Φ(−d+),
where d+ is defined by (2.18).
Proof. If we partially differentiate the put-call parity relation (2.13) with respect
to S, then we obtain
∂fC − ∂fP = 1.
Hence, using Lemma 2.4,
∂fP = ∂fC − 1 = Φ(d+) − 1 = − (1 − Φ(d+)) = −Φ(−d+).
The Vega for any option V is simply ∂σV . It is also easily calculated for European
plain vanilla options.
Theorem 2.16. If fC denotes the call option with exercise price K and expiry
time T, then
(2.34) ∂σfC (S, t) = Sτ 1/2
(2π)
−1/2
e
−d
2
+/2
,
where d+ is defined by (2.18).
Proof. Partially differentiating (2.18) with respect to σ, we obtain
(2.35) ∂σd± = −
log(S/K) + rτ
τ
1/2σ
2
±
1
2
τ
1/2
.
Hence
∂σfC = SΦ
0
(d+)∂σd+ − Ke−rτΦ
0
(d−)∂σd−
= (2π)
−1/2
Se−d
2
+/2
∂σd+ − Ke−rτ e
−d
2
−/2
∂σd−
= G1 + G2,
where
G1 = (2π)
−1/2
Se−d
2
+/2 − Ke−rτ e
−d
2
−/2
− log(S/K) + rτ
σ
2τ
1/2
and
G2 =
1
2
τ
1/2
(2π)
−1/2
Se−d
2
+/2 + Ke−rτ e
−d
2
−/2
.
Applying Lemma 2.14, we see that G1 = 0 and
G2 = Sτ 1/2
(2π)
−1/2
e
−d
2
+/2
.
16 BRAD BAXTER
If we partially differentiate the put-call parity relation (2.13) with respect to σ,
then we find
∂σfC = ∂σfP .
The Theta of an option V is simply its partial derivative with respect to time,
i.e. ∂tV . For the plain vanilla calls, it’s useful to notice that
∂tV = −∂τV.
Theorem 2.17. If fC denotes the call option with exercise price K and expiry
time T, then
(2.36) ∂tfC (S, t) = −
σSe−d
2
+/2
2(2πτ )
1/2
− rKe−rτΦ(d−).
where d± is defined by (2.18).
Proof. We first note that ∂t = −∂τ , where τ = T −t is the time to expiry, as usual.
Partially differentiating fC with respect to τ , we obtain
∂τ fC (S, t) = SΦ
0
(d+)∂τ d+ + rKe−rτΦ(d+) − Ke−rτΦ
0
(d−)∂τ d−
= A1 + A2 + A3.
Now
A1 + A3 = SΦ
0
(d+)∂τ d+ − Ke−rτΦ
0
(d−)∂τ d−
= SΦ
0
(d+) (∂τ d+ − ∂τ d−),
using Lemma 2.14. It is not difficult to check that
∂τ d+ − ∂τ d− = στ −1/2
.
Hence
A1 + A3 =
1
2
στ −1/2Φ
0
(d+) = Sσ
2(2πτ )
1/2
e
−d
2
+/2
.
Adding this expression to A2, we obtain ∂τ fC .
3. Brownian Motion
3.1. Simple Random Walk. Let X1, X2, . . . be a sequence of independent random variables all of which satisfy
(3.1) P (Xi = ±1) = 1/2
and define
(3.2) Sn = X1 + X2 + · · · + Xn.
We can represent this graphically by plotting the points {(n, Sn) : n = 1, 2, . . .},
and one way to imagine this is as a random walk, in which the walker takes identical
steps forwards or backwards, each with probability 1/2. This model is called simple
random walk and, whilst easy to define, is a useful laboratory in which to improve
probabilistic intuition.
Another way to imagine Sn is to consider a game in which a fair coin is tossed
repeatedly. If I win the toss, then I win £1; losing the toss implies a loss of £1.
Thus Sn is my fortune at time n.
MATHEMATICAL AND NUMERICAL METHODS 17
Firstly note that
ESn = EX1 + · · · + EXn = 0.
Further, EX2
i = 1, for all i, so that var Xi = 1. Hence
var Sn = var X1 + var X2 + · · · + var Xn = n,
since X1, . . . , Xn are independent random variables.
3.2. Discrete Brownian Motion. We begin with a slightly more complicated
random walk this time. We choose a timestep h > 0 and let Z1, Z2, . . . be independent N(0, h) Gaussian random variables. We then define a curve B(h)

t by defining

B(h)

0 = 0 and

(3.3) B

(h)

(kh) = Z1 + Z2 + · · · + Zk,

for positive integer k. We then join the dots to obtain a piecewise linear function.

More precisely, we define

B

(h)

t = B

(h)

kh + (t − kh)

B(h)

(k+1)h − B(h)

kh

h

!

, for t ∈ (kh,(k + 1)h).

The resultant random walk is called discrete Brownian motion.

Proposition 3.1. If 0 ≤ a ≤ b ≤ c and a, b, c ∈ hZ, then the discrete Brownian motion increments B(h)

c − B(h)

b and B(h)

b − B(h)

a are independent random

variables. Further, B(h)

c − B(h)

b ∼ N(0, c − b) and B(h)

b − B(h)

a ∼ N(0, b − a).

Proof. Exercise.

3.3. Basic Properties of Brownian Motion. It’s not obvious that discrete

Brownian motion has a limit, in some sense, when we allow the timestep h to

converge to zero. However, it can be shown that this is indeed the case (and will

see the salient features of the L´evy–Cieselski construction of this limit later). For

the moment, we shall state the defining properties of Brownian motion.

Definition 3.1. There exists a stochastic process Wt, called Brownian motion,

which satisfies the following conditions:

(i) W0 = 0;

(ii) If 0 ≤ a ≤ b ≤ c, then the Brownian increments Wc − Wb and Wb − Wa

are independent random variables. Further, Wc − Wb ∼ N(0, c − b) and

Wb − Wa ∼ N(0, b − a);

(iii) Wt is continuous almost surely.

Proposition 3.2. Wt ∼ N(0, t) for all t > 0.

Proof. Just set a = 0 and b = t in (ii) of Definition 3.1.

The increments of Brownian motion are independent Gaussian random variables,

but the actual values Wa and Wb are not independent random variables, as we shall

now see.

Proposition 3.3. If a, b ∈ [0, ∞), then E (WaWb) = min{a, b}.

18 BRAD BAXTER

Proof. We assume 0 < a < b, the remaining cases being easily checked. Then
E (WaWb) = E
Wa [Wb − Wa] + W2
a
= E (Wa [Wb − Wa]) + E
W2
a
= 0 + a
= a.
Brownian motion is continuous almost surely but it is easy to see that it cannot
be differentiable. The key observation is that
(3.4) Wt+h − Wt
h
∼ N(0,
1
h
).
In other words, instead of converging to some limiting value, the variance of the
random variable (Wt+h − Wt)/h tends to infinity, as h → 0.
3.4. Martingales. A martingale is a mathematical version of a fair game, as we
shall first illustrate for simple random walk.
Proposition 3.4. We have
E (Sn+k|Sn) = Sn.
Proof. The key observation is that
Sn+k = Sn + Xn+1 + Xn+2 + · · · + Xn+k
and Xn+1, . . . , Xn+k are all independent of Sn = X1 + · · · + Xn. Thus
E (Sn+k|Sn) = Sn + EXn+1 + EXn+2 + · · · + EXn+k = Sn.
To see why this encodes the concept of a fair game, let us consider a biased coin
with the property that
E (Sn+10|Sn) = 1.1Sn.
Hence
E (Sn+10`|Sn) = 1.1
`Sn.
In other words, the expected fortune Sn+10` grows exponentially with `. For example, if we ensure that S4 = 3, by fixing the first four coin tosses in some fashion,
then our expected fortune will grow by 10% every 10 tosses thereafter.
3.5. Brownian Motion and Martingales.
Proposition 3.5. Brownian motion is a martingale, that is, E (Wt+h|Wt) = Wt,
for any h > 0.

Proof.

E (Wt+h|Wt) = E ([Wt+h − Wt] + Wt|Wt)

= E ([Wt+h − Wt]|Wt) + Wt

= E ([Wt+h − Wt]) + Wt

= 0 + Wt

= Wt.

MATHEMATICAL AND NUMERICAL METHODS 19

We can sometimes use a similar argument to prove that functionals of Brownian

motion are martingales.

Proposition 3.6. The stochastic process Xt = W2

t − t is a martingale, that is,

E (Xt+h|Xt) = Xt, for any h > 0.

Proof.

E (Xt+h|Xt) = E

[Wt+h − Wt + Wt]
2 − [t + h]|Wt

= E

t − t − h|Wt

= E[Wt+h − Wt] 2 + W2

t − t − h

= h + W2

t − t − h

= Xt.

The following example will be crucial.

Proposition 3.7. Geometric Brownian motion

Yt = e

α+βt+σWt

is a martingale, that is, E (Yt+h|Yt) = Yt, for any h > 0, if and only if β = −σ

2/2.

Proof.

E (Yt+h|Yt) = E

e

α+β(t+h)+σWt+h |Yt

= E

Yte

βh+σ(Wt+h−Wt)

|Yt

= YtEe

βh+σ(Wt+h−Wt)

= Yte

(β+σ

2/2)h

.

In this course, the mathematical model chosen for option pricing is risk-neutral

geometric Brownian motion: we choose a geometric Brownian motion St with the

property that Yt = e

−rtSt is a martingale. Thus we have

St = e

α+(β−r)t+σWt

and Proposition 3.7 implies that β − r = −σ

2/2, i.e.

St = e

α+(r−σ

2/2)t+σWt = S0e

(r−σ

2/2)t+σWt

.

3.6. The Black–Scholes Equation. We can also use (2.6) to derive the famous

Black–Scholes partial differential equation, which is satisfied by any European option. The key is to choose a small positive h in (2.6) and expand. We shall nee

20 BRAD BAXTER

Taylor’s theorem for functions of two variables, which states that

G(x + δx, y + δy) = G(x, y) +

∂G

∂x δx +

∂G

∂y δy

+

1

2

∂

2G

∂x2

(δx)

2 + 2

∂

2G

∂x∂y (δx)(δy) + ∂

2G

∂y2

(δy)

2

+ · · · .

(3.5)

Further, it simplifies matters to use “log-space”: we introduce u(t) := log S(t),

where log ≡ loge

in these notes (not logarithms to base 10). In log-space, (2.3)

becomes

(3.6) u(t + h) = u(t) + (r − σ

2

/2)h + σδWt,

where

(3.7) δWt = Wt+h − Wt ∼ N(0, h).

We also introduce

(3.8) g(u(t), t) := f(exp(u(t), t)),

so that (2.6) takes the form

(3.9) g(u(t), t) = e

−rhEg(u(t + h), t + h).

Now Taylor expansion yields the (initially daunting)

g(u(t + h), t + h) = g(u(t) + (r − σ

2

/2)h + σδWt, t + h)

= g(u(t), t) + ∂g

∂u

(r − σ

2

/2)h + σδWt

+

1

2

∂

2

g

∂u2

σ

2

(δWt)

2 + h

∂g

∂t (3.10) + · · · ,

ignoring all terms of higher order than h. Further, since δWt ∼ N(0, h), i.e. EδWt =

0 and E[(δWt)

2

] = h, we obtain

(3.11) Eg(u(t+h), t+h) = g(u(t), t) +h

∂g

∂u(r − σ

2

/2) + 1

2

∂

2

g

∂u2

σ

2 +

∂g

∂t

+· · · .

Recalling that

e

−rh = 1 − rh +

1

2

(rh)

2 + · · · ,

we find

g = (1 − rh + · · ·)

g + h

∂g

∂u(r − σ

2

/2) + 1

2

∂

2

g

∂u2

σ

2 +

∂g

∂t

+ · · ·

= g + h

−rg +

∂g

∂u(r − σ

2

/2) + 1

2

∂

2

g

∂u2

σ

2 +

∂g

∂t

(3.12) + · · · .

For this to be true for all h > 0, we must have

(3.13) −rg +

∂g

∂u(r − σ

2

/2) + 1

2

∂

2

g

∂u2

σ

2 +

∂g

∂t = 0,

and this is the celebrated Black–Scholes partial differential equation (PDE). Thus,

instead of computing an expected future value, we can calculate the solution of the

Black–Scholes PDE (3.13). The great advantage gained is that there are highly

efficient numerical methods for solving PDEs numerically. The disadvantages ar

MATHEMATICAL AND NUMERICAL METHODS 21

complexity of code and learning the mathematics needed to exploit these methods

effectively.

Exercise 3.1. Use the substitution S = exp(u) to transform (3.13) into the nonlinear form of the Black–Scholes PDE.

3.7. Itˆo Calculus. Equation (3.11) is really quite surprising, because the second

derivative contributes to the O(h) term. This observation is at the root of the Itˆo

rules. We begin by considering the quadratic variation In[a, b] of Brownian motion

on the interval [a, b]. Specifically, we choose a positive integer n and let nh = b−a.

We then define

(3.14) In[a, b] = Xn

k=1

Wa+kh − Wa+(k−1)h

2

.

We shall prove that EIn[a, b] = b − a, for every positive integer n, but that

var In[a, b] → 0, as n → ∞. We shall need the following simple property of Gaussian

random variables.

Lemma 3.8. Let Z ∼ N(0, 1). Then EZ

4 = 3.

Proof. Integrating by parts, we obtain

EZ

4 =

Z ∞

−∞

s

4

(2π)

−1/2

e

−s

2/2

ds

= (2π)

−1/2

Z ∞

−∞

s

3

d

ds

−e

−s

2/2

ds

= (2π)

−1/2

h

−s

3

e

−s

2/2

i∞

−∞

−

Z ∞

−∞

3s

2

−e

−s

2/2

ds

= 3.

Exercise 3.2. Calculate EZ

6 when Z ∼ N(0, 1). More generally, calculate EZ

2m

for any positive integer m.

Proposition 3.9. We have EIn[a, b] = b − a and var In[a, b] = 2(b − a)

2/n.

Proof. Firstly,

EIn[a, b] = Xn

k=1

E

Wa+kh − Wa+(k−1)h

2

=

Xn

k=1

h = nh = b − a.

Further, the Brownian increments Wa+kh−Wa+(k−1)h are independent N(0, h) random variables. We shall define independent N(0, 1) random variables Z1, Z2, . . . , Zn

by

Wa+kh − Wa+(k−1)h =

√

hZk, 1 ≤ k ≤

22 BRAD BAXTER

Hence

var In[a, b] = Xn

k=1

var √

hZk

2

=

Xn

k=1

var

hZ2

k

=

Xn

k=1

h

2

var

Z

2

k

=

Xn

k=1

h

2

EZ

4

k −

EZ

2

k

2

=

Xn

k=1

2h

2

= 2nh2

= 2(b − a)

2

/n.

With this in mind, we define

Z b

a

(dWt)

2 = b − a

and observe that we have shown that

Z b

a

(dWt)

2 =

Z b

a

dt,

for any 0 ≤ a < b. Thus we have really shown the Itˆo rule
dW2
t = dt.
Using a very similar technique, we can also prove that
dtdWt = 0.
We first define
Jn[a, b] = Xn
k=1
h
Wa+kh − Wa+(k−1)h
,
where nh = b − a, as before.
Proposition 3.10. We have EJn[a, b] = 0 and var Jn[a, b] = (b − a)
3/n2
.
Proof. Firstly,
EJn[a, b] = Xn
k=1
Eh
Wa+kh − Wa+(k−1)h
=
MATHEMATICAL AND NUMERICAL METHODS 23
The variance satisfies
var Jn[a, b] = Xn
k=1
var h
Wa+kh − Wa+(k−1)h
=
Xn
k=1
h
2
var
Wa+kh − Wa+(k−1)h
=
Xn
k=1
h
3
= nh3
= (b − a)
3
/n2
.
With this in mind, we define
Z b
a
dtdWt = 0, for any 0 ≤ a < b,
and observe that we have shown that
dtdWt = 0.
Exercise 3.3. Setting nh = b − a, define
Kn[a, b] = Xn
k=1
h
2
.
Prove that Kn[a, b] = (b − a)
2/n → 0, as n → ∞. Thus
Z b
a
(dt)
2 = 0,
for any 0 ≤ a < b. Hence we have (dt)
2 = 0.
Proposition 3.11 (Itˆo Rules). We have dW2
t = dt and dWtdt = dt2 = 0.
Proof. See Propositions 3.9, 3.10 and Exercise 3.3
The techniques used in Propositions 3.9 and 3.10 are crucial examples of the
basics of stochastic integration. We can generalize this technique to compute other
useful stochastic integrals, as we shall now see. However, computing these stochastic
integrals directly from limits of stochastic sums is cumbersome compared to direct
use of the Itˆo rules: compare the proof of Proposition 3.12 to the simplicity of
Example 3.3.
Proposition 3.12. We have
Z t
0
WsdWs =
1
2
W2
t − t
.
Proof. We have already seen that, when h = t/n,
(3.15) Xn
k=1
Wkh − W(k−1)h
2
24 BRAD BAXTER
as n → ∞. Further, we shall use the telescoping sum
(3.16) Xn
k=1
W2
kh − W2
(k−1)h
= W2
nh − W2
0 = W2
t
.
Subtracting (3.15) from (3.16), we obtain
(3.17)
Xn
k=1
hW2
kh − W2
(k−1)h
−
Wkh − W(k−1)h
2
i
= 2Xn
k=1
W(k−1)h
Wkh − W(k−1)h
.
Now the LHS converges to W2
t − t, whilst the RHS converges to
2
Z t
0
WsdWs,
whence (3.12).
Example 3.1. Here we shall derive a useful formula for
(3.18) Z t
0
f(s)dWs,
where f is continuously differentiable. The corresponding discrete stochastic sum is
(3.19) Sn =
Xn
k=1
f(kh)
Wkh − W(k−1)h
where nh = t, as usual. The key trick is to introduce another telescoping sum:
(3.20) Xn
k=1
f(kh)Wkh − f((k − 1)h)W(k−1)h
= f(t)Wt.
Subtracting (3.20) from (3.19) we find
Sn − f(t)Wt = −
Xn
k=1
(f(kh) − f((k − 1)h)) W(k−1)h
= −
Xn
k=1
hf0
(kh) + O(h
2
)
W(k−1)h
→ − Z t
0
f
0
(3.21) (s)Wsds,
as n → ∞. Hence
(3.22) Z t
0
f(s)dWs = f(t)Wt −
Z t
0
f
0
(s)Ws ds.
Exercise 3.4. Modify the technique of Example 3.1 to prove that
(3.23) E
"Z t
0
h(s)dWs
2
#
=
Z t
0
h(s)
2
ds.
This is the Itˆo isometry property.
We now come to Itˆo’s lemma it
MATHEMATICAL AND NUMERICAL METHODS 25
Lemma 3.13 (Itˆo’s Lemma for univariate functions). If f is any infinitely differentiable univariate function and Xt = f(Wt), then
(3.24) dXt = f
0
(Wt)dWt +
1
2
f
(2)(Wt)dt.
Proof. We have
Xt+dt = f(Wt+dt)
= f(Wt + dWt)
= f(Wt) + f
0
(Wt)dWt +
1
2
f
(2)(Wt)dW2
t
= Xt + f
0
(Wt)dWt +
1
2
f
(2)(Wt)dt.
Subtracting Xt from both sides gives (3.24).
Example 3.2. Let Xt = e
cWt
, where c ∈ C. Then, setting f(x) = exp(cx) in
Lemma 3.13, we obtain
dXt = Xt
cdWt +
1
2
c
2
dt
.
Example 3.3. Let Xt = W2
t
. Then, setting f(x) = x
2
in Lemma 3.13, we obtain
dXt = 2WtdWt + dt.
If we integrate this from 0 to T, say, then we obtain
XT − X0 = 2 Z T
0
WtdWt +
Z T
0
dt,
or
W2
T = 2 Z T
0
WtdWt + T,
that is
Z T
0
WtdWt =
1
2
W2
T − T
.
This is an excellent example of the Itˆo rules greatly simplifying direct calculation
with stochastic sums, because it is much easier than the direct proof of Proposition
3.12.
Example 3.4. Let Xt = Wn
t
, where n can be any positive integer. Then, setting
f(x) = x
n in Lemma 3.13, we obtain
dXt = nWn−1
t dWt +
1
2
n(n − 1)Wn−2
t dt.
We can also integrate Itˆo’s Lemma, as follows.
Example 3.5. Integrating (3.24) from a to b, we obtain
Z b
a
dXt =
Z b
a
f
0
(Wt)dWt +
1
2
Z b
a
f
(2)(Wt)dt,
i.e.
Xb − Xa =
Z b
a
f
0
(Wt)dWt +
1
2
Z b
a
f
(2)(Wt)dt
26 BRAD BAXTER
Lemma 3.13 is not quite sufficient to deal with geometric Brownian motion,
hence the following bivariate variant.
Lemma 3.14 (Itˆo’s Lemma for bivariate functions). If g(x1, t), for x1, t ∈ R, is
any infinitely differentiable function and Yt = g(Wt, t), then
(3.25) dYt =
∂g
∂x1
(Wt, t)dWt +
1
2
∂
2
g
∂x2
1
(Wt, t) + ∂g
∂t (Wt, t)
dt.
Proof. We have
Yt+dt = g(Wt+dt, t + dt)
= g(Wt + dWt, t + dt)
= g(Wt, t) + ∂g
∂x1
(Wt, t)dWt +
1
2
∂
2
g
∂x2
1
(Wt, t)dW2
t +
∂g
∂t (Wt, t)dt
= g(Wt, t) + ∂g
∂x1
(Wt, t)dWt +
1
2
∂
2
g
∂x2
1
(Wt, t) + ∂g
∂t (Wt, t)
dt
Subtracting Yt from both sides gives (3.25).
Example 3.6. Let Xt = e
α+βt+σWt
. Then, setting f(x1, t) = exp(α + βt + σx1)
in Lemma 3.13, we obtain
dXt = Xt
σdWt +
1
2
σ
2 + β
dt
.
Example 3.7. Let Xt = e
α+(r−σ
2/2)t+σWt
. Then, setting β = r−σ
2/2 in Example
3.6, we find
dXt = Xt (σdWt + rdt).
Exercise 3.5. Let Xt = W2
t − t. Find dXt.
3.8. Itˆo rules and SDEs. Suppose now that the asset price St is given by the
SDE
(3.26) dSt = St (µdt + σdWt),
that is, St is a geometric Brownian motion. Then the Itˆo rules imply that
(3.27) (dSt)
2 = σ
2S
2
t dt.
Hence, if we define Xt = f(St), then
(3.28)
dXt = f
0
(St)dSt+
1
2
f
(2)(St) (dSt)
2 = σf0
(St)StdWt+dt
µf0
(St)St +
1
2
σ
2S
2
t
f
(2)(St)
.
We illustrate this with the particularly important example of solving the SDE
for geometric Brownian motion.
Example 3.8. If f(x) = log x, then f
0
(x) = 1/x, f
(2)(x) = −1/x2 and (3.28)
becomes
dXt = σ
1
St
StdWt + dt
µ
1
St
St +
1
2
σ
2S
2
t
−1
S
2
t
= σdWt + dt
µ − σ
2
/2
MATHEMATICAL AND NUMERICAL METHODS 27
Integrating from t0 to t1, say, we obtain
Xt1 − Xt0 = σ (Wt1 − Wt0
) +
µ − σ
2
/2
(t1 − t0),
or
log St1
St0
= σ (Wt1 − Wt0
) +
µ − σ
2
/2
(t1 − t0).
Taking the exponential of both sides, we obtain
St1 = St0
e
(r−σ
2/2)(t1−t0)+σ(Wt1−Wt0 )
.
3.9. Multivariate Geometric Brownian Motion. So far we have considered
one asset only. In practice, we need to construct a multivariate GBM model that
allows us to incorporate dependencies between assets via a covariance matrix. To
do this, we first take a vector Brownian motion Wt ∈ R
n: its components are
independent Brownian motions. Its covariance matrix Ct at time t is simply a
multiple of the identity matrix:
Ct = EWtWT
t = tI.
Now take any real, invertible, symmetric n × n matrix A and define
Zt = AWt.
The covariance matrix Dt for this new stochastic process is given by
Dt = EZtZ
T
t = EAWtWT
t A = A
EWtWT
t
A = tA2
,
and A2
is a symmetric positive definite matrix.
Exercise 3.6. Prove that A2
is symmetric positive definite if A is real, symmetric
and invertible.
In practice, we calculate the covariance matrix M from historical data, hence
must construct a symmetric A satisfying A2 = M. Now a covariance matrix is
precisely a symmetric positive definite matrix, so that the following linear algebra
is vital. We shall use kxk to denote the Euclidean norm of the vector x ∈ R
n, that
is
(3.29) kxk =
Xn
k=1
x
2
k
1/2
, x ∈ R
n
.
Further, great algorithmic and theoretical importance attaches to those n×n matrices which preserve the Euclidean norm. More formally, an n × n matrix Q is called
orthogonal if kQxk = kxk, for all x ∈ R
n. It turns out that Q is an orthogonal
matrix if and only if QT Q = I, which is equivalent to stating that its columns are
orthonormal vectors. See Section 7 for further details.
Theorem 3.15. Let M ∈ R
n×n be symmetric. Then it can be written as M =
QDQT
, where Q is an orthogonal matrix and D is a diagonal matrix. The elements
of D are the eigenvalues of M, while the columns of Q are the eigenvectors. Further,
if M is positive definite, then its eigenvalues are all positive.
Proof. Any good linear algebra textbook should include a proof of this fact; a proof
is given in my numerical linear algebra note
28 BRAD BAXTER
Given the spectral decomposition M = QDQT
, with D = diag (λ1, λ2, . . . , λn),
we define
D1/2 = diag (λ
1/2
1
, λ1/2
2
, . . . , λ1/2
n
)
when M is positive definite. We can now define the matrix square-root M1/2 by
(3.30) M1/2 = QD1/2Q
T
.
Exercise 3.7. Prove that (M1/2
)
2 = M directly from (3.30).
Given the matrix square-root M1/2
for a chosen symmetric. positive definite
matrix M, we now define the assets
(3.31) Sk(t) = e
(r−Mkk/2)t+(M1/2Wt)
k , k = 1, 2, . . . , n,
where
M1/2Wt
k
denotes the kth element of the vector M1/2Wt. We now need
to check that our assets remain risk-neutral.
Proposition 3.16. Let the assets’ stochastic processes be defined by (3.31). Then
ESk(t) = e
rt
,
for all k ∈ {1, 2, . . . , n}.
Proof. The key calculation is
Ee
(M1/2Wt)
k = Ee
Pn
`=1(M1/2
)k`Wt(`)
= E
Yn
`=1
e
(M1/2
)k`Wt(`)
=
Yn
`=1
Ee
(M1/2
)k`Wt(`)
=
Yn
`=1
e
(M1/2
)
2
k`t/2
= e
(t/2)Pn
`=1(M1/2
)
2
k`
= e
(t/2)Mkk
,
using the independence of the components of Wt.
Exercise 3.8. Compute E[Sk(t)
2
].
Exercise 3.9. What’s the covariance matrix for the assets S1(t), . . . , Sn(t)?
In practice, it is usually easier to describe the covariance structure of multivariate
Brownian motion via the Itˆo rules, which take the simple form
(3.32) dWtdWT
t = M dt,
where M ∈ R
n×n is a symmetric positive definite matrix and Wt is an n-dimensional
Brownian motion.
Proposition 3.17. If Xt = f(Wt), then
(3.33) dXt = ∇f(Wt)
T
dWt +
1
2
dWT
t D2
f(Wt)dWt
or
(3.34) dXt =
Xn
j=1
∂f
∂xj
dWj,t +
dt
2
Xn
j=1
Xn
k=1
∂
2f
∂xj∂xk
Mjk
MATHEMATICAL AND NUMERICAL METHODS 29
Proof. This is left as an exercise.
Example 3.9. If n = 2 and
f(x1, x2) = e
a1x1+a2x2
,
and the correlated Brownian motions W1,t and W2,t satisfy
dW1,tdW2,t = ρdt,
for some constant correlation coefficient ρ ∈ [−1, 1], then Xt = f(W1,t, W2,t) satisfies
dXt =
a1dW1,t + a2dW2,t +
1
2
dt
a
2
1 + 2ρa1a2 + a
2
2
Xt.
Example 3.10. If n = 3 and
f(x1, x2, x3) = e
a1x1+a2x2+a3x3
,
and the correlated Brownian motions W1,t, W2,t, W3,t satisfy
dW1,tdW2,t = M12dt, dW2,tdW3,t = M23dt, dW3,tdW1,t = M31dt,
where M ∈ R
3×3
is a symmetric positive definite matrix which also satisfies
M11 = M22 = M33 = 1,
then Xt = f(W1,t, W2,t, W3,t) satisfies
dXt =
a1dW1,t + a2dW2,t + a3dW3,t +
1
2
dt
a
2
1 + a
2
2 + a
2
3 + 2a2a3M23 + 2a3a1M31 + 2a1a2M12
Xt.
Example 3.11. If
f(x) = e
a
T x
, x ∈ R
n
,
then
∇f(x) = af(x)
and
D2
f(x) = aaT
f(x).
Let Wt be any n-dimensional Brownian motion satisfying
dWtdWT
t = M dt,
where M ∈ R
n×n is a symmetric positive definite matrix. Then Xt = f(Wt)
satisfies
dXt =
a
T
dWt +
1
2
dWT
t M dWt
f(x),
or, in coordinate form,
dXt =
Xn
j=1
ajdWj,t +
1
2
dtXn
j=1
Xn
k=1
ajakMjk
f(x
30 BRAD BAXTER
Proposition 3.18. If Yt = g(Wt, t), then
(3.35) dYt = ∇g(Wt)
T
dWt +
∂g
∂t dt +
1
2
dWT
t D2
g(Wt)dWt
or
(3.36) dYt =
Xn
j=1
∂g
∂xj
dWj,t +
∂g
∂t dt +
dt
2
Xn
j=1
Xn
k=1
∂
2
g
∂xj∂xk
Mjk.
Proof. Exercise.
Example 3.12. If n = 2 and
g(x1, x2) = e
a1x1+a2x2+bt
,
and the correlated Brownian motions W1,t and W2,t satisfy
dW1,tdW2,t = ρdt,
for some constant correlation coefficient ρ ∈ [−1, 1], then Xt = f(W1,t, W2,t) satisfies
dXt =
a1dW1,t + a2dW2,t + bdt +
1
2
dt
a
2
1 + 2ρa1a2 + a
2
2
Xt.
Example 3.13. If
g(x, t) = e
a
T x+bt
, x ∈ R
n
,
then
∇g(x, t) = ag(x, t),
∂g
∂t = bg(x, t),
and
D2
g(x, t) = aaT
g(x, t).
Let Wt be any n-dimensional Brownian motion satisfying
dWtdWT
t = M dt,
where M ∈ R
n×n is a symmetric positive definite matrix. Then Yt = g(Wt, t)
satisfies
dYt =
a
T
dWt + bdt +
1
2
dWT
t M dWt
Yt,
or, in coordinate form
dYt =
Xn
j=1
ajdWj,t + bdt +
1
2
dtXn
j=1
Xn
k=1
ajakMjk
Yt
MATHEMATICAL AND NUMERICAL METHODS 31
3.10. Asian Options. European Put and Call options provide a useful laboratory
in which to understand and test methods. However, the main aim of Monte Carlo
is to calculate option prices for which there is no convenient analytic formula. We
shall illustrate this with Asian options. Specifically, we shall consider the option
(3.37) fA(S, T) =
S(T) −
1
T
Z T
0
S(τ ) dτ!
+
.
This is a path dependent option: its value depends on the history of the asset price,
not simply its final value.
Why would anyone trade Asian options? Consider a bank’s corporate client
trading in, say, Britain and the States. The client’s business is exposed to exchange
rate volatility: the pound’s value in dollars varies over time. Therefore the client
may well decide to hedge by buying an option to trade dollars for pounds at a
set rate at time T. This can be an expensive contract for the writer of the option,
because currency values can “blip”. An alternative contract is to make the exchange
rate at time T a time-average, as in (3.37). Any contract containing time-averages
of asset prices is usually called an Asian option, and there are many variants of
these. For example, the option dual to (3.37) (in the sense that a call option is dual
to a put option) is given by
(3.38) gA(S, T) =
1
T
Z T
0
S(τ ) dτ − S(T)
!
+
.
Pricing (3.37) via Monte Carlo is fairly simple. We choose a positive integer M
and subdivide the time interval [0, T] into M equal subintervals. We evolve the
asset price using the equation
(3.39) S(
(k + 1)T
M
) = S(
kT
M
)e
(r−σ
2/2) T
M +σ
√ T
M Zk
, k = 0, 1, . . . , M − 1,
where Z0, Z1, . . . , ZM−1 are independent N(0, 1) independent pseudorandom numbers. We can use the simple approximation
T
−1
Z T
0
S(τ ) dτ ≈ M−1
M
X−1
k=0
S(
kT
M
).
Exercise 3.10. Write a Matlab program to price the discrete Asian option defined
by
(3.40) fM(S, T) =
S(T) − M−1
M
X−1
k=0
S(kT /M)
!
+
.
We can also study the average
(3.41) A(T) = T
−1
Z T
0
S(t) dt
directly, and this is the subject of a recent paper of Raymond and myself. For
example,
(3.42) EA(T) = T
−1
Z T
0
ES(t) dt.
32 BRAD BAXTER
Exercise 3.11. Prove that
(3.43) EA(T) = S(0)
e
rT − 1
rT
.
Exercise 3.12. In a similar vein, find expressions for ES(a)S(b) and E
A(T)
2
MATHEMATICAL AND NUMERICAL METHODS 33
3.11. The Ornstein–Uhlenbeck Process. This interesting SDE displays meanreversion and requires a slightly more advanced technique. We consider the SDE
(3.44) dXt = −αXtdt + σdWt, t ≥ 0,
where α > 0 and σ ≥ 0 are constants and X0 = x0.

It’s very useful to consider the special case σ = 0 first, in which case the SDE

(3.44) becomes the ODE

(3.45) dXt

dt + αXt = 0.

There is a standard method for solving (3.45) using an integrating factor. Specifically, if we multiply (3.45) by exp(αt), then we obtain

(3.46) d

dt

Xte

αt

= 0,

so that Xt exp(αt) is constant. Hence, recalling the initial condition X0 = x0, we

must have

(3.47) Xt = x0e

−αt

.

Thus the solution decays exponentially to zero, at a rate determined by the positive

constant α, for any initial value x0.

Fortunately the integrating factor method also applies to the σ > 0 case, with a

little more work. Multiplying (3.44) by exp αt, we obtain

e

αt (dXt + αXtdt) = σeαtdWt,

or

(3.48) d

Xte

αt

= σeαtdWt,

using the infinitesimal increments variant on the product rule for differentiation.

Integrating (3.48) from 0 to s, we find

(3.49) Xse

αs − x0 =

Z s

0

d

Xte

αt

= σ

Z s

0

e

αtdWt,

or

(3.50) Xs = x0e

−αs + e

−αs Z s

0

e

αtdWt.

We can say more using the following important property of stochastic integrals.

Proposition 3.19. Let f : [0, ∞) → R be any infinitely differentiable function and

define the stochastic process

(3.51) Fs =

Z s

0

f(t)dWt.

Then EFs = 0 and

(3.52) var Fs =

Z s

0

f(t)

2

34 BRAD BAXTER

Proof. The key point is that (3.51) is the limit of the stochastic sum

Sn =

Xn

k=1

f(kh)

Wkh − W(k−1)h

,

where h > 0 and nh = s. Now the increments Wkh − W(k−1)h are independent and

satisfy

Wkh − W(k−1)h ∼ N(0, h),

by the axioms of Brownian motion, so

ESn = 0,

for all n. By independence of the terms in the sum, we see that

var Sn =

Xn

k=1

var

f(kh)

Wkh − W(k−1)h

=

Xn

k=1

f(kh)

2

var

Wkh − W(k−1)h

= h

Xn

k=1

f(kh)

2

→

Z s

0

f(t)

2

dt,

as n → ∞.

Applying Proposition 3.19 to the Ornstein–Uhlenbeck process solution (3.50),

we obtain EXs = x0 exp(−αs) and

var Xs = var σe−αs Z s

0

e

αtdWt

= σ

2

e

−2αs Z s

0

e

2αtdt

= σ

2

e

−2αs

e

2αs − 1

2α

= σ

2

1 − e

−2αs

2α

MATHEMATICAL AND NUMERICAL METHODS 35

3.12. Feynman–Kac. The derivation of the Black–Scholes PDE earlier is really

the first example of a much more general link between expectations of functions of

Brownian motion and PDEs.

If we consider the stochastic process Xt = u(Wt, t), then Itˆo’s Lemma 3.14 states

that

(3.53) du(Wt, t) = uxdWt +

ut +

1

2

uxx

dt.

Thus, if u satisfies the PDE

(3.54) ut +

1

2

uxx = 0,

then the dt component vanishes in (3.53), implying

(3.55) du(Wt, t) = ux(Wt, t)dWt.

If we now choose any times 0 ≤ t0 < t1, then integrating (3.55) yields
(3.56) u(Wt1
, t1) − u(Wt0
, t0) = Z t1
t0
du(Wt, t) = Z t1
t0
ux(Wt, t)dWt.
Taking the expectation of (3.56), conditioned on Wt0 = X, say, we obtain
(3.57) E (u(Wt1
, t1)|Wt0 = X)−u(X, t0) = E
Z t1
t0
ux(Wt, t) dWt|Wt0 = X
= 0,
by the independent increments property of Brownian motion. In other words, the
solution to the PDE (3.54) is given by
(3.58) u(X, t0) = E (u(Wt1
, t1)|Wt0 = X).
Now
Wt1 = Wt1 − Wt0 + Wt0 = Wt1 − Wt0 + X,
so we can rewrite (3.58) as
(3.59) u(X, t0) = Eu(X + Wt1 − Wt0
, t1),
or
(3.60) u(X, t0) = Eu(X +
√
t1 − t0Z, t1),
where Z ∼ N(0, 1). Now that we have derived (3.60), it’s clearer to replace t0 by
t, t1 by T and X by x, respectively, to obtain
(3.61) u(x, t) = Eu(x +
√
τZ, T),
where τ = T −t is the time to expiry. The key point here is that we have a solution
to the PDE (3.54) (which is called the time-reversed diffusion equation) expressed
as an expectation. If we express this expectation in terms of the N(0, 1) PDF, then
we obtain
(3.62) u(x, t) = Z ∞
−∞
u(x +
√
τs, T)(2π)
−1/2
e
−s
2/2
ds.
Example 3.14. Suppose the expiry value of u is given by
(3.63) u(y, T) = (
1 if a ≤ y ≤ b,
0 otherwise.
36 BRAD BAXTER
Then (3.62) implies that the solution of (3.54) is given by
u(x, t) = Z ∞
−∞
u(x +
√
τs, T)(2π)
−1/2
e
−s
2/2
ds
=
Z b√−x
τ
a√−x
τ
(2π)
−1/2
e
−s
2/2
ds
= Φ
b − x
√
τ
− Φ
a − x
√
τ
,
because
a ≤ x +
√
τs ≤ b
if and only if
a − x
√
τ
≤ x ≤
b − x
√
τ
.
Exercise 3.13. Show that, in terms of the time to expiry τ = T − t, the PDE
(3.54) becomes the diffusion equation
uτ =
1
2
uxx,
with solution
u(x, τ ) = Eu(x +
√
τZ, 0).
3.13. Feynman–Kac and Black–Scholes I. Suppose we consider the stochastic
process St defined by the SDE
(3.64) dSt
St
= rdt + σdWt,
which is, of course, geometric Brownian motion. We can solve this SDE using the
technique of Example 3.8.
Example 3.15. The stochastic process log St satisfies the SDE
(3.65) d log St =
r − σ
2
/2
dt + σdWt,
whence, integrating from t0 to t1, we obtain
(3.66) log St1
St0
=
r − σ
2
/2
(t1 − t0) + σ (Wt1 − Wt0
).
Taking the exponential, we find
(3.67) St1 = St0
e
(r−σ
2/2)(t1−t0)+σ(Wt1−Wt0 )
.
Applying Itˆo’s Lemma to the stochastic process V (St, t), where St is defined by
the SDE (3.64), we obtain
(3.68) dV (St, t) =
Vt +
1
2
σ
2S
2
t VSS + rStVS
dt + σStVSdWt.
Hence, if V (S, t) satisfies the PDE
(3.69) Vt +
1
2
σ
2S
2VSS + rSVS =
MATHEMATICAL AND NUMERICAL METHODS 37
together with the boundary condition
(3.70) V (S, T) = F(S),
for some known function F(S), then integrating (3.68) from t = t0 to t = T, we
find
(3.71) V (ST , T) − V (St0
, t0) = Z T
t0
σStVSdWt,
or, using the boundary condition (3.70),
(3.72) F(ST ) − V (St0
, t0) = Z T
t0
σStVSdWt,
If we now take the expectation of (3.71), conditioned on St0 = S, then
(3.73) E (F(ST )|St0 = S) − V (S, t0) = 0,
the RHS vanishing because of the independent increments property of Brownian
motion. Now, setting t0 = t and t1 = T in (3.67), we can rewrite (3.73) as
(3.74) V (S, t) = EF(Se(r−σ
2/2)τ+στ
1/2Z
),
or
(3.75) V (S, t) = EV (Se(r−σ
2/2)τ+στ
1/2Z
, T),
where
(3.76) τ = T − t
and Z ∼ N(0, 1).
3.14. Feynman–Kac and Black–Scholes II. To obtain Black–Scholes from Feynman–
Kac, we substitute
(3.77) V (S, t) = e
−rtU(S, t)
in (3.69). Now
VS = e
−rtUS, VSS = e
−rtUSS and Vt = e
−rt (Ut − rU),
so (3.69) becomes the Black–Scholes equation
(3.78) Ut +
1
2
σ
2S
2USS + rSUS − rU = 0.
Hence (3.75) becomes
(3.79) e
−rtU(S, t) = e
−rT EU(Se(r−σ
2/2)τ+στ
1/2Z
, T),
or
(3.80) U(S, t) = e
−rτEU(Se(r−σ
2/2)τ+στ
1/2Z
, T).
38 BRAD BAXTER
4. The Binomial Model Universe
The geometric Brownian Motion universe is an infinite one and, for practitioners,
has the added disadvantage of the mathematical difficulty of Brownian motion. It
is also possible to construct finite models with similar properties. This was first
demonstrated by Cox, Ross and Rubinstein in the late 1970s.
Our model will be entirely specified by two parameters, α > 0 and p ∈ [0, 1]. We

choose S0 > 0 and define

(4.1) Sk = Sk−1 exp(αXk), k > 0,

where the independent random variables X1, X2, . . . satisfy

(4.2) P (Xk = 1) = p, P (Xk = −1) = 1 − p =: q.

Thus

(4.3) Sm = S0e

α(X1+X2+···+Xm)

, m > 0.

It is usual to display this random process graphically.

1 e

−α

e

α

e

−2α

1

e

2α

e

−3α

e

−α

e

α

e

3α

e

−4α

e

−2α

1

e

2α

e

4α

p

q q

p

q

p

q

q

p

q

p

p

q

p

q

p

q

p

q

p

At this stage, we haven’t specified p and α. However, we can easily price a European option given these parameters. If Sk denotes our Binomial Model asset price

at time kh, for some positive time interval h, then the Binomial Model European

option requirement is given by

f(Sk−1,(k − 1)h) = e

−rhEf(Sk−1e

αXk

, kh)

= e

−rh

pf(Sk−1e

α

, kh) + (1 − p)f(Sk−1e

−α

, kh)

(4.4) .

Thus, given the m+1 possible asset prices at expiry time mh, and their corresponding option prices, we use (4.4) to calculate the m possible values of the option at

time (m − 1)h. Recurring this calculation provides the value of the option at time

0. Let’s illustrate this with an example

MATHEMATICAL AND NUMERICAL METHODS 39

Example 4.1. Suppose e

α = 2, p = 1/2 and D = e

−rh. Let m = 4 and let’s use

the Binomial Model to calculate all earlier values of the call option whose expiry

value is

f(S(mh), mh) = (S(mh) − 1)+ .

Using (4.4), we obtain the following diagram for the asset prices.

1 1/2

2

1/4

1

4

1/8

1/2

2

8

1/16

1/4

1

4

16

p

q q

p

q

p

q

q

p

q

p

p

q

p

q

p

q

p

q

p

The corresponding diagram for the option prices is as follows.

27D4/16 3D3/8

3D3

0

3D2/4

21D2/4

0

0

3D/2

9D

0

0

0

3

15

40 BRAD BAXTER

How do we choose the constants α and p? One way is to use them to mimic

geometric Brownian motion. Thus we choose a positive number h and use

(4.5) S(kh) = S((k − 1)h)e

(r−σ

2/2)h+σ

√

hZ, k > 0,

where, as usual Z ∼ N(0, 1).

Lemma 4.1. In the Geometric Brownian Motion Universe, we have

(4.6) ES(kh)|S((k − 1)h) = S((k − 1)h)e

rh

and

(4.7) ES(kh)

2

|S((k − 1)h) = S((k − 1)h)

2

e

(2r+σ

2

)h

.

Proof. These are easy exercises if you have digested Lemma 2.1 and Lemma 2.2:

everything rests on using the fact that E exp(cZ) = c

2/2 when Z ∼ N(0, 1), for any

real (or complex) number c.

There are analogous quantities in the Binomial Model.

Lemma 4.2. In the Binomial Model Universe, we have

(4.8) ESk|Sk−1 =

peα + (1 − p)e

−α

Sk−1

and

(4.9) ES

2

k

|Sk−1 =

pe2α + (1 − p)e

−2α

S

2

k−1

Proof. You should find these to be very easy given the definitions (4.1), (4.2) and

(4.3); revise elementary probability theory if this is not so!

One way to choose p and α is to require that the right hand sides of (4.6), (4.8)

and (4.7), (4.9) agree, that is,

e

rh = peα + (1 − p)e

−α

(4.10) ,

e

(2r+σ

2

)h = pe2α + (1 − p)e

−2α

(4.11) .

This ensures that our Binomial Model preserves risk neutrality.

Rearranging (4.10) and (4.11), we find

(4.12) p =

e

rh − e

−α

e

α − e−α

=

e

(2r+σ

2

)h − e

−2α

e

2α − e−2α

.

Further, the elementary algebraic identity

e

2α − e

−2α =

e

α + e

−α

e

α − e

−α

transforms (4.12) into

(4.13) e

rh − e

−α =

e

(2r+σ

2

)h − e

−2α

e

α + e−α

.

Exercise 4.1. Show that (4.12) implies the equation

(4.14) e

α + e

−α = e

(r+σ

2

)h + e

−r

MATHEMATICAL AND NUMERICAL METHODS 41

How do we solve (4.14)? The following analysis is a standard part of the theory of

hyperbolic trigonometric functions1

, but no background knowledge will be assumed.

If we write

(4.15) y =

1

2

e

(r+σ

2

)h + e

−rh

,

then (4.14) becomes

(4.16) e

α + e

−α = 2y,

that is

(4.17) (e

α

)

2 − 2y(e

α

) + 1 = 0.

This quadratic in e

α has solutions

(4.18) e

α = y ±

p

y

2 − 1

and, since (4.15) implies y ≥ 1, we see that each of these possible solutions is

positive. Thus the possible values for α are

(4.19) α1 = loge

y +

p

y

2 − 1

and

(4.20) α2 = loge

y −

p

y

2 − 1

.

Now

α1 + α2 = loge

hy +

p

y

2 − 1

i + loge

hy −

p

y

2 − 1

i

= loge

hy +

p

y

2 − 1

y −

p

y

2 − 1

i

= loge

y

2 − (y

2 − 1)

= loge 1

(4.21) = 0.

Since y +

p

y

2 − 1 ≥ 1, for y ≥ 1, we deduce that α1 ≥ 0 and α2 = −α1. Since we

have chosen α > 0, we conclude

(4.22) α = loge

h

y +

p

y

2 − 1

i

,

where y is given by (4.15).

Now (4.22) tells us the value of α required, but the expression is somewhat

complicated. However, if we return to (4.14), that is,

e

α + e

−α = e

(r+σ

2

)h + e

−rh

,

and consider small h, then α is also small and Taylor expansion yields

2 + α

2 + · · · = 1 + (r + σ

2

)h + · · · + 1 − rh + · · · ,

that is,

(4.23) α

2 + · · · = σ

2h + · · · .

1Specifically, this is the formula for the inverse hyperbolic cosine.

42 BRAD BAXTER

Cox and Ross had the excellent idea of ignoring the messy higher order terms, since

the model is only an approximation in any case. Thus the Cox–Ross Binomial Model

chooses

(4.24) α = σh1/2

.

The corresponding equation for the probability p becomes

(4.25) p =

e

rh − e

−σh1/2

e

σh1/2 − e−σh1/2

It’s useful, but tedious, to Taylor expand the RHS of (4.25). We obtain

p =

1 + rh + · · · −

1 − σh1/2 +

1

2

σ

2h + · · ·

2

σh1/2 + σ

3h

3/2/6 + · · ·

=

σh1/2 + (r − σ

2/2)h + · · ·

2σh1/2 (1 + σ

2h/6 + · · ·)

=

1

2

1 + σ

−1h

1/2

(r − σ

2/2) + · · ·

1 + σ

2h/6 + · · ·

=

1

2

h1 + σ

−1h

1/2

(r − σ

2

/2) + · · ·

1 − σ

2h/6 + · · ·

i

=

1

2

h

1 + σ

−1h

1/2

(r − σ

2

/2) + · · · i

(4.26) ,

to highest order, so that

(4.27) 1 − p =

1

2

h

1 − σ

−1h

1/2

(r − σ

2

/2) + · · · i

,

It’s tempting to omit the higher order terms, but we would then lose risk neutrality

in our Binomial Model.

Is the Binomial Model consistent with the Geometric Brownian Motion universe

as h → 0? We shall now show that the definition of a sufficiently smooth European

option in the Binomial Model still implies the Black–Scholes PDE in the limit as

h → 0.

Proposition 4.3. Let f : [0, ∞)×[0, ∞) → R be an infinitely differentiable function

satisfying

(4.28) f(S, t − h) = e

−rh

pf(Seσh1/2

, t) + (1 − p)f(Se−σh1/2

, t))

,

for all h > 0, where p is given by (4.25). Then f satisfies the Black–Scholes PDE.

Proof. As usual, it is much more convenient to use log-space. Thus we define

u = log S and

g(u, t) = f(S, t).

Hence (4.28) becomes

(4.29) g(u, t − h) = e

−rh

pg(u + σh1/2

, t) + (1 − p)g(u − σh1/2

, t)

MATHEMATICAL AND NUMERICAL METHODS 43

Using (4.26) and (4.27) and omitting terms whose order exceeds h for clarity, we

obtain

g − hgt + · · · = e

−rh1

2

h

1 + h

1/2σ

−1

(r − σ

2

/2)i

g + σh1/2

gu +

1

2

σ

2hguu

+

1

2

h

1 − h

1/2σ

−1

(r − σ

2

/2)i

g − σh1/2

gu +

1

2

σ

2hguu

= e

−rh

g + h

h

(r − σ

2

/2)gu +

1

2

σ

2

guui

=

1 − rh + O(h

2

)

g + h

h

(r − σ

2

/2)gu +

1

2

σ

2

guui

= g + h

h

−rg + (r − σ

2

/2)gu +

1

2

σ

2

guui

.

(4.30)

Equating the O(h) terms on both sides of equation (4.30) yields the Black–Scholes

equation

−gt = −rg + (r − σ

2

/2)gu +

1

2

σ

2

guu.

4.1. The Binomial Model and Delta Hedging. We begin with (4.1), as before,

but this time do not impose risk neutrality to determine the parameters p and α.

Instead, we use a delta hedging argument.

At time tn−1 = (n − 1)h, we construct a new portfolio

(4.31) Πn−1 = f(Sn−1, tn−1) − ∆n−1Sn−1.

and we choose ∆n−1 so that the evolution of Πn−1 is deterministic. Now, at time

tn = nh, the portfolio Πn−1 has the new value

(4.32) Πn = f(Sn−1e

αXn , tn) − ∆n−1Sn−1e

αXn .

Thus Πn is deterministic if the two possible values of (4.32) are equal, that is,

(4.33) f(Sn−1e

α

, tn) − ∆n−1Sn−1e

α = f(Sn−1e

−α

, tn) − ∆n−1Sn−1e

−α

.

It is useful to introduce the notation

(4.34) f± = f(Sn−1e

±α

, tn).

Then (4.33) and (4.34) imply that

(4.35) ∆n−1Sn−1 =

f+ − f−

e

α − e−α

.

Thus the resulting portfolio values are given by

(4.36) Πn−1 = f(Sn−1, tn−1) −

f+ − f−

e

α − e−α

and

Πn = f(Sn−1e

α

, tn) −

f+ − f−

e

α − e−α

e

α

=

f−e

α − f+e

−α

e

α − e−α

(4.37) .

44 BRAD BAXTER

Now that the portfolio’s evolution from Πn−1 to Πn is deterministic, we must have

Πn = exp(rh)Πn−1, i.e.

(4.38) f−e

α − f+e

−α

e

α − e−α

= e

rh

f(Sn−1, tn−1) −

f+ − f−

e

α − e−α

.

The key point here is that f(Sn−1, tn−1) is a linear combination of f+ and f−.

Specifically, if we introduce

(4.39) P =

e

rh − e

−α

e

α − e−α

,

then (4.38) becomes

(4.40) f(Sn−1, tn−1

) = e

−rh (P f+ + (1 − P) f−).

Note that original model probability p does not occur in this formula: instead, it

is as if we had begun with the alternative binomial model

(4.41) Sn = Sn−1e

αYn ,

where the independent Bernoulli random variables Y1, Y2, . . . , Yn satisfy P(Yk =

1) = P and P(Yk = −1) = 1 − P, where P is given by (4.39). Indeed, we have

(4.42) ESn|Sn−1 = Sn−1Ee

αYn = Sn−1e

rh

.

Exercise 4.2. Prove that ESn|Sn−1 = Sn−1e

rh

.

4.2. ∆-Hedging for GBM. We begin with the real world asset price

(4.43) St = e

α+βt+σWt

,

where we do not assume there is any connection between the parameters α, β and

σ: this is not risk-neutral GBM. It is a simple exercise in Itˆo calculus (see Example

3.6) to prove that

(4.44) dSt = St

σdWt +

β + σ

2

/2

dt

and

(4.45) (dSt)

2 = σ

2S

2

t dt.

By analogy with delta hedging in the Binomial Model (4.31), let us assume that

St = S and define the portfolio

(4.46) Πt = f(St, t) − ∆St,

where ∆ is a constant. Then

Πt+dt = f(St+dt, t + dt) − ∆St+dt

= f(S + dSt, t + dt) − ∆S − ∆dSt

= f(S, t) + dStfS +

1

2

dS2

t

fSS + dtft − ∆S − ∆dSt

= Πt + dSt (fS − ∆) + 1

2

dS2

t

fSS + dtft

= Πt + dSt (fS − ∆) +

1

2

σ

2S

2

fSS + ft

(4.47) dt.

In other words, we have the infinitesimal increment

(4.48) Πt+dt − Πt = dΠt = dSt (fS − ∆) +

1

2

σ

2S

2

fSS + ft

d

MATHEMATICAL AND NUMERICAL METHODS 45

Thus we eliminate the stochastic dSt component by setting

(4.49) ∆ = fS

and (4.47) then becomes

(4.50) dΠt =

ft +

1

2

σ

2S

2

fSS

dt,

or

(4.51) dΠt

dt = ft +

1

2

σ

2S

2

fSS.

Now there is no stochastic component in (4.50), so we must also have

(4.52) dΠt

dt = rΠt = r (f − fSS),

because all deterministic assets must grow at the risk-free rate. Equating (4.51)

and (4.52) yields

(4.53) ft +

1

2

σ

2S

2

fSS = r (f − fSS),

or

(4.54) ft − rf + rfSS +

1

2

σ

2S

2

fSS = 0,

which is the Black–Scholes PDE.

It is often useful to restate the Black–Scholes PDE in terms of the logarithm of

the asset price, i.e. via S = e

x

. Thus

∂S

dS

dx = ∂x,

or

(4.55) S∂S = ∂x.

Hence

∂xx = S∂S (S∂S)

= S (∂S + S∂SS)

= S∂S + S

2

(4.56) ∂SS,

or, using (4.55),

(4.57) S

2

∂SS = ∂xx − ∂x.

Therefore substituting (4.55) and (4.57) in (4.54) yields

0 = ft − rf + rfx +

1

2

σ

2

(fxx − fx)

= ft − rf +

r − σ

2

/2

fx +

1

2

σ

2

(4.58) fxx.

(4.59

46 BRAD BAXTER

5. The Partial Differential Equation Approach

One important way to price options is to solve the Black–Scholes partial differential equation (PDE), or some variant of Black–Scholes. Hence we study the

fundamentals of the numerical analysis of PDEs.

5.1. The Diffusion Equation. The diffusion equation arises in many physical

and stochastic situations. In the hope that the baroque will serve as a mnemonic,

we shall model the diffusion of poison along a line. Let u(x, t) be the density of

poison at location x and time t and consider the stochastic model

(5.1) u(x, t) = Eu(x + σ

√

hZ, t − h), x ∈ R, t ≥ 0,

where σ is a positive constant and Z ∼ N(0, 1). The idea here is that the poison

molecules perform a random walk along the line, just as share prices do in time. If

we assume that u has sufficiently many derivatives, then we obtain

u(x, t)

= E u(x, t) + σ

√

h

∂u

∂xZ +

1

2

σ

2hZ2 ∂

2u

∂x2

+ O(h

3/2

) − h

∂u

∂t + O(h

2

)

= u(x, t) + h

1

2

σ

2 ∂

2u

∂x2

−

∂u

∂t

+ O(h

3/2

).

In other words, dividing by h, we obtain

1

2

σ

2 ∂

2u

∂x2

−

∂u

∂t

+ O(h

1/2

) = 0.

Letting h → 0, we have derived the diffusion equation

(5.2) 1

2

σ

2 ∂

2u

∂x2

=

∂u

∂t .

This important partial differential equation is often called the heat equation.

Exercise 5.1. The d-dimensional form of our stochastic model for diffusion is

given by

u(x, t) = Eu(x + σ

√

hZ, t − h), x ∈ R

d

, t ≥ 0.

Here Z ∈ R

d

is a normalized Gaussian random vector: its component are independent N(0, 1) random variables and its probability density function is

p(z) = (2π)

−d/2

exp(−kzk

2

/2), z ∈ R

d

.

Assuming u is sufficiently differentiable, prove that u satisfies the d-dimensional

diffusion equation

∂u

∂t =

σ

2

2

X

d

k=1

∂

2u

∂x2

k

.

Variations on the diffusion equation occur in many fields including, of course,

mathematical finance. For example, the neutron density2 N(x, t) in Uranium 235

or Plutonium approximately obeys the partial differential equation

∂N

∂t = αN + β

X

d

k=1

∂

2N

∂x2

k

.

2

In mathematical finance, we choose our model to avoid exponential growth, but this is not

always the aim in nuclear physics.

MATHEMATICAL AND NUMERICAL METHODS 47

In fact, the Black–Scholes PDE is really the diffusion equation in disguise, as we

shall now show. In log-space, we consider any solution f(S, t ˜ ) of the Black–Scholes

equation, that is,

(5.3) −rf + (r − σ

2

/2) ∂f

∂S˜

+

1

2

σ

2 ∂

2f

∂S˜2

+

∂f

∂t = 0.

The inspired trick, a product of four centuries of mathematical play with differential

equations, is to substitute

(5.4) f(S, t ˜ ) = u(S, t ˜ )e

αS˜+βt

and to find the PDE satisfied by u. Now

∂f

∂S˜

=

uα +

∂u

∂S˜

e

αS˜+βt

and

∂

2f

∂S˜2

=

uα2 + 2α

∂u

∂S˜

+

∂

2u

∂S˜2

e

αS˜+βt

.

Substituting in the Black–Scholes equation results in

−ru +

r − σ

2

/2

αu −

∂u

∂S˜

+

σ

2

2

α

2u + 2α

∂u

∂S˜

+

∂

2u

∂S˜2

+ βu +

∂u

∂t = 0,

or

0 =

1

2

σ

2 ∂

2u

∂x2

+

∂u

∂t +

∂u

∂S˜

(r − σ

2

/2) + ασ2

+u

−r + α(r − σ

2

/2) + α

2σ

2

/2 + β

.

We can choose α and β to be any real numbers we please. In particular, if we set

α = −σ

−2

(r − σ

2/2), then the ∂u/∂S˜ term vanishes. We can then solve for β to

kill the u term.

Exercise 5.2. Find the value of β that annihilates the u term.

The practical consequence of this clever trick is that every problem involving the

Black–Scholes PDE can be transformed into an equivalent problem for the diffusion

equation. Therefore we now study methods for solving the diffusion equation.

There is an analytic solution for the diffusion equation that is sometimes useful.

If we set h = t in (5.1), then we obtain

(5.5) u(x, t) = Eu(x + σ

√

tZ, 0),

that is,

u(x, t) = Z ∞

−∞

u(x + σ

√

tz, 0)(2π)

−1/2

exp(−z

2

(5.6) /2) dz

=

Z ∞

−∞

(5.7) u(x − w, 0)G(w, t) dw,

using the substitution w = −σ

√

tz, where

G(w, t) = (2πσ2

t)

−1/2

exp

−

w

2

2σ

2t

, w ∈ R.

This is called the Green’s function for the diffusion equation. Of course, we must

now evaluate the integral. As for European options, analytic solutions exist for

some simple cases, but numerical integration must be used in gener

48 BRAD BAXTER

5.2. Finite Difference Methods for the Diffusion Equation. The simplest

finite difference method is called explicit Euler and it’s a BAD method. Fortunately

the insight gained from understanding why it’s bad enables us to construct good

methods. There is another excellent reason for you to be taught bad methods and

why they’re bad: stupidity is a renewable resource. In other words, simple bad

methods are often rediscovered.

We begin with some finite difference approximations to the time derivative

∂u

∂t ≈

u(x, t + k) − u(x, t)

k

and the space derivative, using the second central difference

∂

2u

∂x2

≈

u(x + h, t) − 2u(x, t) + u(x − h, t)

h

2

.

Exercise 5.3. Show that

g(x + h) − 2g(x) + g(x − h)

h

2

= g

(2)(x) + h

2

12

g

(4)(x) + O(h

4

)

and find the next two terms in the expansion.

Our model problem for this section will be the zero boundary value problem:

∂u

∂t =

∂

2u

∂x2

, 0 ≤ x ≤ 1, t ≥ 0,

(5.8) u(x, 0) = f(x), 0 ≤ x ≤ 1, u(0, t) = u(1, t) = 0, t ≥ 0.

We now choose a positive integer M and positive numbers T and k. We then

set h = 1/M, N = T /k and generate a discrete approximation

{U

n

m : 0 ≤ m ≤ M, 0 ≤ n ≤ N, }

to the values of the solution u at the points of the rectangular grid

{(mh, nk) : 0 ≤ m ≤ M, 0 ≤ n ≤ N}

using the recurrence

(5.9) U

n+1

m = U

n

m + µ

U

n

m+1 − 2U

n

m + U

n

m−1

, n ≥ 0, 1 ≤ m ≤ M − 1,

where

(5.10) µ =

k

h

2

and the boundary values for u imply the relations

(5.11) U

n

0 = U

n

M = 0 and U

0

m = u(mh, 0), 0 ≤ m ≤ M.

This is called explicit Euler.

In matrix terms3

, we have

(5.12) Un = TUn−1

, n ≥ 1,

3How do we find T? Equation ((5.12)) implies

U

n

m = (TUn−1

)m =

MX−1

`=1

Tm`U

n−1

` = µUn

m−1 + (1 − 2µ)U

n

m + µUn

m+1

MATHEMATICAL AND NUMERICAL METHODS 49

where

(5.13) Un =

U

n

1

.

.

.

U

n

M−1

∈ R

M−1

and T ∈ R

(M−1)×(M−1) is the tridiagonal symmetric Toeplitz (TST) matrix defined

by

(5.14) T =

1 − 2µ µ

µ 1 − 2µ µ

.

.

.

.

.

.

.

.

.

µ 1 − 2µ µ

µ 1 − 2µ

Hence

(5.15) Un = T

nU0

.

Unfortunately, explicit Euler is an unstable method unless µ ≤ 1/2. In other

words, the numbers {U

n

m : 1 ≤ m ≤ M − 1} grow exponentially as n → ∞. Here’s

an example using Matlab.

Example 5.1. The following Matlab fragment generates the explicit Euler approximations.

% Choose our parameters

mu = 0.7;

M=100; N=20;

% Pick (Gaussian) random initial values

uold = randn(M-1,1);

% construct the tridiagonal symmetric Toeplitz matrix T

T = (1-2*mu)*diag(ones(M-1,1)) + mu*( diag(ones(M-2,1),1) + diag(ones(M-2,1),-1) );

% iterate and plot

plot(uold)

hold on

for k=1:N

unew = T*uold;

plot(unew)

uold = unew;

end

If we run the above code for M = 6 and

U0 =

−0.034942

0.065171

−0.964159

0.406006

−1.450787

,

50 BRAD BAXTER

then we obtain

U20 =

−4972.4

8614.6

−9950.7

8620.5

−4978.3

.

Further, kU40k = 2.4 × 108

. The exponential instability is obvious. Experiment

with different values of µ, M and N.

The restriction µ ≤ 1/2, that is k ≤ h

2/2, might not seem particularly harmful at first. However, it means that small h values require tiny k values, and tiny

timesteps imply lots of work: an inefficient method. Now let’s derive this stability requirement. We begin by studying a more general problem based on (5.15).

Specifically, let A ∈ R

n×n be any symmetric matrix4

. Its spectral radius ρ(A) is

simply its largest eigenvalue in modulus, that is,

(5.16) ρ(A) = max{|λ1|, |λ2|, . . . , |λn|}.

Theorem 5.1. Let A ∈ R

n×n be any symmetric matrix and define the recurrence

xk = Axk−1, for k ≥ 1, where x0 ∈ R

n can be any initial vector.

(i) If ρ(A) < 1, then limk→∞ kxkk = 0, for any initial vector x0 ∈ R
n.
(ii) If ρ(A) ≤ 1, then the norms of the iterates kx1k, kx2k, . . . remain bounded.
(iii) If ρ(A) > 1, then we can choose x0 ∈ R

n such that limk→∞ kxkk = ∞.

Proof. We use the spectral decomposition introduced in Theorem 3.15, so that

A = QDQT

, where Q ∈ R

n×n is an orthogonal matrix and D ∈ R

n×n is a diagonal

matrix whose diagonal elements are the eigenvalues λ1, . . . , λn of A. Then

xk = Axk−1 = A

2xk−2 = · · · = A

kx0

and

A

k =

QDQT

QDQT

· · ·

QDQT

= QDkQ

T

.

Hence

xk = QDkQ

T x0

and, introducing zk := QT xk, we obtain

zk = Dk

z0,

and it is important to observe that kzkk = kQT xkk = kxkk, because QT

is an

orthogonal matrix. Since D is a diagonal matrix, this matrix equation is simply n

linear recurrences, namely

zk(`) = λ

k

`

z0(`), ` = 1, 2, . . . , n.

The following consequences are easily checked.

(i) If ρ(A) < 1, then each of these scalar sequences tends to zero, as k → ∞,
which implies that kxkk → 0.
(ii) If ρ(A) = 1, then each of these scalar sequences is bounded, which implies
that the sequence kxkk remains bounded.
4All of this theory can be generalized to nonsymmetric matrices using the Jordan canonical
form, but this advanced topic is not needed in this cour
MATHEMATICAL AND NUMERICAL METHODS 51
(iii) If ρ(A) > 1, then there is at least one eigenvalue, λi say, for which |λi

| > 1.

Hence, if z0(i) 6= 0, then the sequence |zk(i)| = |λ

k

i

z0(i)| → ∞, as k → ∞.

Definition 5.1. Let A ∈ R

n×n be any symmetric matrix. We say that A is spectrally stable if its spectral radius satisfies ρ(A) ≤ 1.

Example 5.2. Let A ∈ R

n×n be a symmetric matrix whose distinct eigenvalues

are 0.1, 0.1, . . . , 0.1, 10. If x0 ∈ R

n contains no component of the eigenvector corresponding to the eigenvalue 10, i.e. z0(n) = 0, then, in exact arithmetic, we shall

still obtain kxkk → 0, as k → ∞. However, a computer uses finite precision arithmetic, which implies that, even if z0(n) = 0, it is highly likely that z0(m) 6= 0,

for some m > 0, since the matrix-vector product is not computed exactly. This

(initially small) nonzero component will grow exponentially.

Theorem 5.1 is only useful when we can deduce the magnitude of the spectral

radius. Fortunately, this is possible for an important class of matrices.

Definition 5.2. We say that a matrix T(a, b) ∈ R

m×m is tridiagonal, symmetric

and Toeplitz (TST) if it has the form

(5.17) T(a, b) =

a b

b a b

.

.

.

.

.

.

.

.

.

b a b

b a

, a, b ∈ R.

TST matrices arise naturally in many applications. Fortunately they’re one of the

few nontrivial classes of matrices for which the eigenvalues and eigenvectors can

be analytically determined rather easily. In fact, every TST matrix has the same

eigenvectors, because

(5.18) T(a, b) = aI + 2bT0,

where T0 = T(0, 1/2) (this is not a recursive definition, simply an observation given

(5.18)). Hence, if T0v = λv, then T(a, b)v = (a + 2bλ)v. Thus we only need to

study T0.

In fact, every eigenvalue of T0 lies in the interval [−1, 1]. The proof is interesting

because it’s our only example of using a different norm. For any vector w ∈ R

m,

we define its infinity norm to be

kwk∞ = max{|w1|, |w2|, . . . , |wm|}.

Exercise 5.4. Show that

kT0zk∞ ≤ kzk∞,

for any vector z ∈ R

m.

We shall state our result formally for ease of reference.

Lemma 5.2. Every eigenvalue λ of T0 satisfies |λ| ≤ 1.

Proof. If T0v = λv, then

|λ|kvk∞ = kλvk∞ = kT0vk∞ ≤ kvk∞.

Hence |λ| ≤ 1.

52 BRAD BAXTER

Proposition 5.3. The eigenvalues of T0 ∈ R

m×m are given by

(5.19) λj = cos

jπ

m + 1

, j = 1, . . . , m,

and the corresponding eigenvector v

(j) has components

(5.20) v

(j)

k = sin

πjk

m + 1

, j, k = 1, . . . , m.

Proof. Suppose v is an eigenvector for T0, so that

vj+1 + vj−1 = 2λvj , 2 ≤ j ≤ m − 1,

and

v2 = 2λv1, vm−1 = 2λvm.

Thus the elements of the vector v are m values of the recurrence relation defined

by

vj+1 + vj−1 = 2λvj , j ∈ Z,

where v0 = vm+1 = 0. Here’s a rather slick trick: we know that |λ| ≤ 1, and a

general theoretical result states that the eigenvalues of a real symmetric matrix are

real, so we can write λ = cos θ, for some θ ∈ R. The associated equation for this

recurrence is therefore the quadratic

t

2 − 2t cos θ + 1 = 0

which we can factorize as

t − e

iθ t − e

−iθ

= 0.

Thus the general solution is

vj = reijθ + se−ijθ, j ∈ Z,

where r and s can be any complex numbers. But v0 = 0 implies s = −r, so we

obtain

vj = sin jθ, j ∈ Z,

on using the fact that every multiple of a sequence satisfying the recurrence is

another sequence satisfying the recurrence. The only other condition remaining to

be satisfied is vm+1 = 0, so that

sin ((m + 1)θ) = 0,

which implies (m + 1)θ is some integer multiple of π.

Exercise 5.5. Prove that the eigenvectors given in Proposition 5.3 are orthogonal

by direct calculation.

The spectral radius of the matrix T driving explicit Euler, defined by (5.15), is

now an easy consequence of our more general analysis of TST matrices.

Corollary 5.4. Let T ∈ R

(M−1)×(M−1) be the matrix driving explicit Euler, defined

by (5.15). Then ρ(T) ≤ 1 if and only if µ ≤ 1/2. Hence explicit Euler is spectrally

stable if and only if µ ≤ 1

MATHEMATICAL AND NUMERICAL METHODS 53

Proof. We need only observe that T = T(1−2µ, µ), so that Proposition 5.3 implies

that its eigenvalues are

λk = 1 − 2µ + 2µ cos(πk

M

)

= 1 − 4µ sin2

πk

2M

, k = 1, 2, . . . , M − 1.

Thus ρ(T) ≤ 1 if and only if µ ≤ 1/2, for otherwise |1 − 4µ| > 1.

We can also use TST matrices to understand implicit Euler: here we use

(5.21) U

n+1

m = U

n

m + µ

U

n+1

m+1 − 2U

n+1

m + U

n+1

m−1

, 1 ≤ m ≤ M − 1.

In matrix form, this becomes

(5.22) T(1 + 2µ, −µ)Un+1 = Un

,

using the notation of (5.17). Before using Proposition 5.3 to derive its eigenvalues,

we need a simple lemma.

Lemma 5.5. Let A ∈ R

n×n be any symmetric matrix, having spectral decomposition A = QDQT

. Then A−1 = QD−1QT

.

Proof. This is a very easy exercise.

Proposition 5.6. Implicit Euler is spectrally stable for all µ ≥ 0.

Proof. By Proposition 5.3, the eigenvalues of T(1 + 2µ, −µ) are given by

λk = 1 + 2µ − 2µ cos(πk

M

)

= 1 + 4µ sin2

(

πk

2M

).

Thus every eigenvalue of T(1 + 2µ, −µ) exceeds 1, which implies (by Lemma 5.5)

that every eigenvalue of its inverse lies in the interval (0, 1). Thus implicit Euler is

spectrally stable for all µ ≥ 0.

We have yet to prove that the answers produced by these methods converge to

the true solution as h → 0. We illustrate the general method using explicit Euler,

for µ ≤ 1/2, applied to the diffusion equation on [0, 1] with zero boundary (5.8). If

we define the error

(5.23) E

n

m := U

n

m − u(mh, nk).

then

(5.24) E

n+1

m − E

n

m − µ

E

n

m+1 − 2E

n

m + E

n

m−1

= L(x, t),

where the Local Truncation Error (LTE) L(x, t) is defined by

(5.25) L(x, t) = u(x, t + k) − u(x, t) − µ (u(x + h, t) − 2u(x, t) + u(x − h, t)),

recalling that, by definition,

(5.26) 0 = U

n+1

m − U

n

m − µ

U

n

m+1 − 2U

n

m + U

n

m−1

, 1 ≤ m ≤ M −

54 BRAD BAXTER

Thus we form the LTE by replacing U

n

m by u(x, t) in (5.26)5

. Taylor expanding and

recalling that k = µh2

, we obtain

L(x, t) = kut(x, t) + O(k

2

) − µ

h

2uxx(x, t) + O(h

4

)

= µh2uxx(x, t) − µh2uxx(x, t) + O(h

4

)

= O(h

4

(5.27) ),

using the fact that ut = uxx. Now choose a time interval [0, T]. Since L(x, t) is a

continuous function, (5.27) implies the inequality

(5.28) |L(x, t)| ≤ Ch4

, for 0 ≤ x ≤ 1 and 0 ≤ t ≤ T,

where the constant C depends on T. Further, rearranging (5.24) yields

(5.29) E

n+1

m = E

n

m + µ

E

n

m+1 − 2E

n

m + E

n

m−1

+ L(x, t),

and applying inequality (5.28), we obtain

(5.30) |E

n+1

m | ≤ (1 − 2µ)|E

n

m| + µ|E

n

m+1| + µ|E

n

m−1

| + Ch4

,

because 1 − 2µ ≥ 0 for µ ≤ 1/2. If we let ηn denote the maximum modulus error

at time nk, i.e.

(5.31) ηn = max{|E

n

1

|, |E

n

2

|, . . . , |E

n

M−1

|}

then (5.31) implies

(5.32) |E

n+1

m | ≤ (1 − 2µ)ηn + 2µηn + Ch4 = ηn + Ch4

,

whence

(5.33) ηn+1 ≤ (1 − 2µ)ηn + 2µηn + Ch4 = ηn + Ch4

.

Therefore, recurring (5.33)

(5.34) ηn ≤ ηn−1 + Ch4 ≤ ηn−2 + 2Ch4 ≤ · · · ≤ η0 + nCh4 = Cnh4

,

since E0

m ≡ 0. Now

(5.35) n ≤ N :=

T

k

=

T

µh2

,

so that (5.34) and (5.35) jointly provide the upper bound

(5.36) |U

n

m − u(mh, nk)| ≤ CT

µ

h

2

,

for 1 ≤ m ≤ M − 1 and 0 ≤ n ≤ N. Hence we have shown that the explicit Euler

approximation has uniform O(h

2

) convergence for 0 ≤ t ≤ T. The key here is the

order, not the constant in the bound: halving h reduces the error uniformly by 4.

Exercise 5.6. Refine the expansion of the LTE in (5.27) to obtain

L(x, t) = kut(x, t)+1

2

k

2utt(x, t)+O(k

3

)−µ

h

2uxx(x, t) + 1

12

h

4uxxxx(x, t) + O(h

6

)

.

Hence prove that

L(x, t) = 1

2

µh4utt(x, t) (µ − 1/6) + O(h

6

).

5You will see the same idea in the next section, where this will be called the associated functional equatio

MATHEMATICAL AND NUMERICAL METHODS 55

Hence show that, if µ = 1/6 in explicit Euler, we obtain the higher-order uniform

error

|U

n

m − u(mh, nk)| ≤ Dh4

,

for 1 ≤ m ≤ M − 1 and 0 ≤ n ≤ T /k, where D depends on T.

Implicit Euler owes its name to the fact that we must solve linear equations to

obtain the approximations at time (n + 1)h from those at time nh. This linear

system is tridiagonal, so Gaussian elimination only requires O(n) time to complete,

rather than the O(n

3

) time for a general n × n matrix. In fact, there is a classic

method that provides a higher order than implicit Euler together with excellent

stability: Crank–Nicolson is the implicit method defined by

U

n+1

m −

µ

2

U

n+1

m+1 − 2U

n+1

m + U

n+1

m−1

= U

n

m +

µ

2

U

n

m+1 − 2U

n

m + U

n

m−1

(5.37) .

In matrix form, we obtain

(5.38) T(1 + µ, −µ/2)Un+1 = T(1 − µ, µ/2)Un

,

or

(5.39) Un+1 = T(1 + µ, −µ/2)−1T(1 − µ, µ/2)Un

.

Now every TST matrix has the same eigenvectors. Thus the eigenvalues of the

product of TST matrices in (5.39) are given by

(5.40) λk =

1 − µ + µ cos( πk

M )

1 + µ − µ cos πk

M

=

1 − 2µ sin2

(

πk

2M )

1 + 2µ sin2

(

πk

2M )

.

Hence |λk| ∈ (0, 1) for all µ ≥ 0.

Exercise 5.7. Calculate the LTE of Crank-Nicolson when h = k.

5.3. The Fourier Transform and the von Neumann Stability Test. Given

any univariate function f : R → R for which the integral

(5.41) Z ∞

−∞

|f(x)| dx

is finite, we define its Fourier transform by the relation

(5.42) fb(z) = Z ∞

−∞

f(x) exp(−ixz) dx, z ∈ R.

The Fourier transform is used in this course to understand stability properties,

solve some partial differential equations and calculate the local truncation errors

for finite difference methods. It can also be used to derive analytic values of certain

options, as well as providing several key numerical methods.

Proposition 5.7. (i) Let

(5.43) Taf(x) = f(x + a), x ∈ R.

We say that Taf is the translate of f by a. Then

(5.44) Tdaf(z) = exp(iaz)fb(z), z ∈ R.

(ii) The Fourier transform of the derivative is given by

(5.45) fb0(z) = izfb(z), z ∈

56 BRAD BAXTER

Proof. (i)

Tdaf(z) = Z ∞

−∞

Taf(x)e

−ixz dx

=

Z ∞

−∞

f(x + a)e

−ixz dx

=

Z ∞

−∞

f(y)e

−i(y−a)z

dy

= e

iazfb(z).

(ii) Integrating by parts and using the fact that limx→±∞ f(x) = 0, which is

a consequence of (5.41), we obtain

fb0(z) = Z ∞

−∞

f

0

(x)e

−ixz dx

=

f(x)e

−ixzx=∞

x=−∞

−

Z ∞

−∞

f(x)

−ize−ixz

dx

= izfb(z).

Exercise 5.8. We have fd(2)(z) = (iz)

2fb(z) = −z

2fb(z). Find fd(k)(z)

Many students will have seen some use of the Fourier transform to solve differential equations. It is also vitally important to finite difference operators.

Example 5.3. Let’s analyse the second order central difference operator using the

Fourier transform. Thus we take

g(x) = f(x + h) − 2f(x) + f(x − h)

h

2

.

and observe that

gb(z) = h

−2

e

ihz − 2 + e

−ihz

fb(z) = 2h

−2

(cos(hz) − 1) fb(z).

Now6

cos(hz) = 1 −

h

2

z

2

2

+

h

4

z

4

4! − · · · ,

so that

gb(z) = 2h

−2

−

h

2

z

2

2

+

h

4

z

4

4! − · · ·

fb(z)

= −z

2

fb(z) + h

2

z

4

12

fb(z) + · · ·

= fd(2)(z) + h

2

fd(4)(z)

12

+ · · · .

Taking the inverse transform, we have computed the Taylor expansion of g:

g(x) = f

(2)(x) + (1/12)h

2

f

(4)(x) + · · · .

6Commit this Taylor expansion to memory if you don’t already know i

MATHEMATICAL AND NUMERICAL METHODS 57

Of course, there’s no need to use the Fourier transform to analyse the second

order central difference operator, but we have to learn to walk before we can run!

We shall also need the Fourier transform for functions of more than one variable.

For any bivariate function f(x1, x2) that tends to zero sufficiently rapidly at infinity,

we define

(5.46) fb(z1, z2) = Z ∞

−∞

Z ∞

−∞

f(x1, x2)e

−i(x1z1+x2z2)

dx1dx2, z1, z2 ∈ R.

In fact, it’s more convenient to write this using a slightly different notation:

(5.47) fb(z) = Z

R2

f(x) exp(−ix

T

z) dx, z ∈ R

2

.

This is still a double integral, although only one integration sign is used. Similarly,

for a function f(x), x ∈ R

n, we define

(5.48) fb(z) = Z

Rn

f(x) exp(−ix

T

z) dx, z ∈ R

n

.

Here

x

T

z =

Xn

k=1

xkzk, x, z ∈ R

n

.

The multivariate version of Proposition 5.7 is as follows.

Proposition 5.8. (i) Let

(5.49) Taf(x) = f(x + a), x ∈ R

n

.

We say that Taf is the translate of f by a. Then

(5.50) Tdaf(z) = exp(ia

T

z)fb(z), z ∈ R

n

.

Further, if α1, . . . , αn are non-negative integers and |α| = α1 + · · · + αn,

then

(ii)

(5.51) \∂

|α|f

∂xα1

1

∂xα2

2

· · · ∂xαn

n

(z) = (iz1)

α1

(iz2)

α2

· · ·(izn)

αn fb(z), z ∈ R

n

.

Proof. The proof is not formally examinable, but very similar to the multivariate

result.

5.4. Stability and the Fourier Transform. We can also use Fourier analysis to

avoid eigenanalysis when studying stability. We shall begin abstractly, but soon

apply the analysis to explicit and implicit Euler for the diffusion equation.

Suppose we have two sequences {uk}k∈Z and {vk}k∈Z related by

(5.52) X

k∈Z

bkvk =

X

k∈Z

akuk.

In most applications, uk ≈ u(kh), for some underlying function, so we study the

associated functional equation

(5.53) X

k∈Z

bkv(x + kh) = X

k∈Z

aku(x + kh).

58 BRAD BAXTER

The advantage of widening our investigation is that we can use the Fourier transform

to study (5.53). Specifically, we have

(5.54) vb(z)

X

k∈Z

bke

ikhz = ub(z)

X

k∈Z

ake

ikhz

,

or

(5.55) vb(z) =

a(hz)

b(hz)

ub(z) =: R(hz)ub(z),

where

(5.56) a(w) = X

k∈Z

ake

ikw and b(w) = X

k∈Z

bke

ikw.

Example 5.4. For explicit Euler, we have

vk = µuk+1 + (1 − 2µ)uk + µuk−1, k ∈ Z,

so that the associated functional equation is

v(x) = µu(x + h) + (1 − 2µ)u(x) + µu(x − h), x ∈ R,

whose Fourier transform is given by

vb(z) =

µeihz + 1 − 2µ + µe−ihz

ub(z)

= (1 − 2µ(1 − cos(hz))) ub(z)

=

1 − 4µ sin2

(hz/2)

(5.57) ub(z).

Thus vb(z) = r(hz)ub(z), where r(w) = 1 − 4µ sin2

(w/2).

When we advance forwards n steps in time using explicit Euler, we obtain in

Fourier transform space

(5.58) ucn(z) = r(hz)u[n−1(z) = (r(hz))2

u[n−2(z) = · · · = (r(hz))n

uc0(z).

Thus, if |r(w)| < 1, for all w ∈ R,then limn→∞ ucn(z) = 0, for all z ∈ R. However, if
|r(hz0)| > 1, then, by continuity, |r(hz)| > 1 for z sufficiently close to z0. Further,

since r(hz) is periodic, with period π/h, we deduce that |r(hz)| > 1 on π/h-integer

shifts of an interval centred at z0. Hence limn→∞ ucn(z) = ∞. Further, there is an

intimate connection between u and ub in the following sense.

Theorem 5.9 (Parseval’s Theorem). If f : R → R is continuous, then

(5.59) Z ∞

−∞

|f(x)|

2

dx =

1

2π

Z ∞

−∞

|fb(z)|

2

dz.

Proof. Not examinable.

Hence limn→∞ ucn(z) = ∞ implies

limn→∞ Z ∞

−∞

|un(x)|

2

dx = ∞.

this motivated the brilliant Hungarian mathematician John von Neumann to analyse the stability of finite difference operators via the Fourier transform.

Definition 5.3. If |r(hz)| ≤ 1, for all z ∈ R, then we say that the finite difference

operator is von Neumann stable, or Fourier stable.

Theorem 5.10. Explicit Euler is von Neumann stable if and only if µ ≤ 1/2, while

implicit Euler is von Neumann stable for all µ >

MATHEMATICAL AND NUMERICAL METHODS 59

Proof. For explicit Euler, we have already seen that

ucn(z) = r(hz)u[n−1(z),

where

r(w) = 1 − 4µ sin2

(w/2).

Thus |r(w)| ≤ 1, for all w ∈ R, if and only if |1 − 4µ| ≤ 1, i.e. µ ≤ 1/2.

For implicit Euler, we have the associated functional equation

un+1(x) = un(x) + µ (un+1(x + h) − 2un+1(x) + un+1(x − h)), x ∈ R.

Hence

−µeihz + (1 + 2µ) − µe−ihz

u[n+1(z) = ucn(z), z ∈ R,

or

1 + 4µ sin2

(hz/2)

u[n+1(z) = ucn(z), z ∈ R.

Therefore

u\n+1(z) = 1

1 + 4µ sin2

(hz/2)

ucn(z) =: r(hz)ucn(z), z ∈ R,

and 0 ≤ r(hz) ≤ 1, for all z ∈ R.

Exercise 5.9. Prove that Crank–Nicolson (5.37) is von Neumann stable for all

µ ≥ 0.

5.5. Option Pricing via the Fourier transform. The Fourier transform can

also be used to calculate solutions of the Black–Scholes equation, and its variants,

and this approach provides a powerful analytic and numerical technique.

We begin with the Black–Scholes equation in “log-space”:

(5.60) 0 = −rg + (r − σ

2

/2)gx + (σ

2

/2)gxx + gt,

where the asset price S = e

x and subscripts denote partial derivatives. We now let

gb(z, t) denote the Fourier transform of the option price g(x, t) at time t, that is,

(5.61) gb(z, t) = Z ∞

−∞

g(x, t)e

−ixz dx, z ∈ R,

The Fourier transform of (5.60) is therefore given by

(5.62) 0 = −rgb + iz(r − σ

2

/2)gb −

1

2

σ

2

z

2

gb + gbt.

In other words, we have, for each fixed z ∈ R, the ordinary differential equation

(5.63) gbt = −

−r + iz(r − σ

2

/2) −

1

2

σ

2

z

2

g, , b

with solution

(5.64) gb(z, t) = gb(z, t0)e

−

−r+iz(r−σ

2/2)− 1

2

σ

2

z

2

(t−t0)

.

When pricing a European option, we know the option’s expiry value g(x, T) and

wish to calculate its initial price g(x, 0). Substituting t = T and t0 = 0 in (5.64),

we therefore obtain

(5.65) gb(z, 0) = e

−r+iz(r−σ

2/2)− 1

2

σ

2

z

2

T

gb(z, T

60 BRAD BAXTER

In order to apply this, we shall need to know the Fourier transform of a Gaussian.

Proposition 5.11. Let G(x) = exp(−λkxk

2

), for x ∈ R

d

, where λ is a positive

constant. Then its Fourier transform is the Gaussian

(5.66) Gb(z) = (π/λ)

d/2

exp(−kzk

2

/(4λ)), z ∈ R

d

.

Proof. It’s usual to derive this result via contour integration, but here is a neat

proof via Itˆo’s lemma and Brownian motion. Let c ∈ C be any complex number

and define the stochastic process Xt = exp(cWt), for t ≥ 0. Then a straightforward

application of Itˆo’s lemma implies the relation

dXt = Xt

cdWt + (c

2

/2)dt

.

Taking expectations and defining m(t) = EXt, we obtain the differential equation

m0

(t) = (c

2/2)m(t), whence m(t) = exp(c

2

t/2). In other words,

Ee

cWt = e

c

2

t/2

,

which implies, on recalling that Wt ∼ N(0, t) and setting α = ct1/2

,

Ee

αZ = e

α

2/2

,

for any complex number α ∈ C.

Corollary 5.12. The Fourier transform of the univariate Gaussian probability

density function

p(x) = (2πσ2

)

−1/2

e

−x

2/(2σ

2

)

, x ∈ R,

is

pb(z) = e

−σ

2

z

2/2

, z ∈ R.

Proof. We simply set λ = 1/(2σ

2

) in Proposition 5.11.

Exercise 5.10. Calculate the Fourier transform of the multivariate Gaussian probability density function

p(x) = (2πσ2

)

−d/2

e

−kxk

2/(2σ

2

)

, x ∈ R

d

.

The cumulative distribution function (CDF) for the Gaussian probability density

N(0, σ2

) is given by

(5.67) Φσ2 (x) = Z x

−∞

(2πσ2

)

−1/2

e

−y

2/(2σ

2

)

dy, x ∈ R.

Thus the fundamental theorem of calculus implies that

Φ

0

σ2 (x) = (2πσ2

)

−1/2

e

−x

2/(2σ

2

)

.

Exercise 5.11. Calculate the price of the option whose expiry price is given by

f(S(T), T) = (

1 if a ≤ S(T) ≤ b,

0 otherwise.

In other words, this option is simply a bet that pays £1 if the final asset price lies

in the interval [a, b]
MATHEMATICAL AND NUMERICAL METHODS 61

5.6. Fourier Transform Conventions. There are several essentially identical

Fourier conventions in common use, but their minor differences are often confusing.

The most general definition is

(5.68) fb(z) = A

Z ∞

−∞

f(x)e

iCxz dx,

where A and C are nonzero real constants. The Fourier Inversion Theorem then

takes the form

(5.69) f(x) = A−1C

2π

Z sign(C)∞

− sign(C)∞

fb(z)e

−iCxz dx,

where

sign(C) = (

1 C > 0,

−1 C < 0.
Example 5.5. The following four cases are probably the most commonly encountered.
(i) C = −1, A = 1:
(
fb(z) = R ∞
−∞ f(x)e
−ixz dx
f(x) = −1
2π
R −∞
+∞ fb(z)e
ixz dz =
1
2π
R ∞
−∞ fb(z)e
ixz dz.
(ii) C = 2π, A = 1:
(
fb(z) = R ∞
−∞ f(x)e
2πixz dx
f(x) = R ∞
−∞ fb(z)e
−2πixz dz.
(iii) C = 1, A = 1/
√
2π:
(
fb(z) = √
1
2π
R ∞
−∞ f(x)e
ixz dx
f(x) = √
1
2π
R ∞
−∞ fb(z)e
−ixz dz.
(iv) C = 1, A = 1:
(
fb(z) = R ∞
−∞ f(x)e
ixz dx
f(x) = 1
2π
R ∞
−∞ fb(z)e
−ixz dz.
It’s not hard to show that
(5.70) Tdaf(z) = e
−iaCzfb(z)
and
(5.71) fb0(z) = −iCzfb(z),
where Taf(x) = f(x + a), for any a ∈ C.
Example 5.6. For the same four examples given earlier, we obtain the following
shifting and differentiation formulae.
(i) C = −1, A = 1:
Tdaf(z) = e
iazfb(z), fb0(z) = izfb(z).
(ii) C = 2π, A = 1:
Tdaf(z) = e
−2πiazfb(z), fb0(z) = −2πizfb(z).
62 BRAD BAXTER
(iii) C = 1, A = 1/
√
2π:
Tdaf(z) = e
−iazfb(z), fb0(z) = −izfb(z).
(iv) C = 1, A = 1:
Tdaf(z) = e
−iazfb(z), fb0(z) = −izfb(z).
Which, then, should we choose? It’s entirely arbitrary but, once made, the choice
is likely to be permanent, since changing convention greatly increases the chance
of algebraic errors. I have chosen C = −1 and A = 1 in lectures, mainly because
it’s probably the most common choice in applied mathematics. It was also the
convention chosen by my undergraduate lecturers at Cambridge, so the real reason
is probably habit!
MATHEMATICAL AND NUMERICAL METHODS 63
6. Mathematical Background Material
I’ve collected here a miscellany of mathematical methods used (or reviewed)
during the course.
6.1. Probability Theory. You may find my more extensive notes on Probability
Theory useful:
http://econ109.econ.bbk.ac.uk/brad/Probability_Course/probnotes.pdf
A random variable X is said to have (continuous) probability density function
p(t) if
(6.1) P(a < X < b) = Z b
a
p(t) dt.
We shall assume that p(t) is a continuous function (no jumps in value). In particular, we have
1 = P(X ∈ R) = Z ∞
−∞
p(t) dt.
Further, because
0 ≤ P(a < X < a + δa) = Z a+δa
a
p(t) dt ≈ p(a)δa,
for small δa, we conclude that p(t) ≥ 0, for all t ≥ 0. In other words, a probability
density function is simply a non-negative function p(t) whose integral is one. Here
are two fundamental examples.
Example 6.1. The Gaussian probability density function, with mean µ and variance σ
2
, is defined by
(6.2) p(t) = (2πσ2
)
−1/2
exp
−
(t − µ)
2
2σ
2
.
We say that the Gaussian is normalized if µ = 0 and σ = 1.
To prove that this is truly a probability density function, we require the important identity
(6.3) Z ∞
−∞
e
−Cx2
dx =
p
π/C,
which is valid for any C > 0. [In fact it’s valid for any complex number C whose

real part is positive.]
Example 6.2. The Cauchy probability density function is defined by

(6.4) p(t) = 1

π(1 + t

2)

.

This distribution might also be called the Mad Machine Gunner distribution; imagine our killer sitting at the origin of the (x, y) plane. He7

is firing (at a constant

rate) at the infinite line y = 1, his angle θ (with the x-axis) of fire being uniformly

distributed in the interval (0, π). Then the bullets have the Cauchy density.

7The sexism is quite accurate, since males produce vastly more violent psychopaths than

females.

64 BRAD BAXTER

If you draw some graphs of these probability densities, you should find that,

for small σ, the graph is concentrated around the value µ. For large σ, the graph

is rather flat. There are two important definitions that capture this behaviour

mathematically.

Definition 6.1. The mean, or expected value, of a random variable X with p.d.f

p(t) is defined by

(6.5) EX := Z ∞

−∞

tp(t) dt.

It’s very common to write µ instead EX when no ambiguity can arise. Its variance

Var X is given by

(6.6) Var X := Z ∞

−∞

(t − µ)

2

p(t) dt.

Exercise 6.1. Show that the Gaussian p.d.f. really does have mean µ and variance

σ

2

.

Exercise 6.2. What happens when we try to determine the mean and variance of

the Cauchy probability density defined in Example 6.4?

Exercise 6.3. Prove that Var X = E(X2

) − (EX)

2

.

We shall frequently have to calculate the expected value of functions of random

variables.

Theorem 6.1. If

Z ∞

−∞

|f(t)|p(t) dt

is finite, then

(6.7) E (f(X)) = Z ∞

−∞

f(t)p(t) dt.

Example 6.3. Let X denote a normalized Gaussian random variable. We shall

show that

(6.8) Ee

λX = e

λ

2/2

,

Indeed, applying (6.7), we have

Ee

λX =

Z ∞

−∞

e

λt(2π)

−1/2

e

−t

2/2

dt = (2π)

−1/2

Z ∞

−∞

e

− 1

2

(t

2−2λt)

dt.

The trick now is to complete the square in the exponent, that is,

t

2 − 2λt = (t − λ)

2 − λ

2

.

Thus

Ee

λX = (2π)

−1/2

Z ∞

−∞

exp

−

1

2

([t − λ]
2 − λ

2

)

dt = e

λ

2/2

.

Exercise 6.4. Let W be any Gaussian random variable with mean zero. Prove

that

(6.9) E

e

W

= e

1

2

E(W2

)

MATHEMATICAL AND NUMERICAL METHODS 65

6.2. Differential Equations. A differential equation, or ordinary differential equation (ODE), is simply a functional relationship specifying first, or higher derivatives,

of a function; the order of the equation is just the degree of its highest derivatives.

For example,

y

0

(t) = 4t

3 + y(t)

2

is a univariate first-order differential equation, whilst or

y

0

(t) = Ay(t),

where y(t) ∈ R

d and A ∈ R

d×d

is a first-order differential equation in d-variables. A

tiny class of differential equations can be solved analytically, but numerical methods

are required for the vast majority. The numerical analysis of differential equations

has been one of the most active areas of research in computational mathematics

since the 1960s and excellent free software exists. It is extremely unlikely that any

individual can better this software without years of effort and scholarship, so you

should use this software for any practical problem. You can find lots of information

at www.netlib.org and www.nr.org. This section contains the minimum relevant

theory required to make use of this software.

You should commit to memory one crucial first-order ODE:

Proposition 6.2. The general solution to

(6.10) y

0

(t) = λy(t), t ∈ R,

where λ can be any complex number, is given by

(6.11) y(t) = c exp(λt), t ∈ R.

Here c ∈ C is a constant. Note that c = y(0), so we can also write the equation as

y(t) = y(0) exp(λt).

Proof. If we multiply the equation y

0 − λy = 0 by the integrating factor exp(−λt),

then we obtain

0 =

d

dt (y(t) exp(−λt)),

that is

y(t) exp(−λt) = c,

for all t ∈ R.

In fact, there’s a useful slogan for ODEs: try an exponential exp(λt) or use

reliable numerical software.

Example 6.4. If we try y(t) = exp(λt) as a trial solution in

y

00 + 2y

0 − 3y = 0,

then we obtain

0 = exp(λt)

λ

2 + 2λ − 3

.

Since exp(λt) 6= 0, for any t, we deduce the associated equation

λ

2 + 2λ − 3 = 0.

The roots of this quadratic are 1 and −3, which is left as an easy exercise. Now

this ODE is linear: any linear combination of solutions is still a solution. Thus we

have a general family of solutions

α exp(t) + β exp(−3t)

66 BRAD BAXTER

for any complex numbers α and β. We need two pieces of information to solve for

these constants, such as y(t1) and y(t2), or, more usually, y(t1) and y

0

(t1). In fact

this is the general solution of the equation.

In fact, we can always change an mth order equation is one variable into an

equivalent first order equation in m variables, a technique that I shall call vectorizing (some books prefer the more pompous phrase “reduction of order”). Most

ODE software packages are designed for first order systems, so vectorizing has both

practical and theoretical importance.

For example, given

y

00(t) = sin(t) + (y

0

(t))3 − 2 (y(t))2

,

we introduce the vector function

z(t) =

y(t)

y

0

(t)

,

Then

z

0

(t) =

y

0

y

00

=

y

0

sin(t) + (y

0

)

3 − 2 (y)

2

.

In other words, writing

z(t) =

z1(t)

z2(t)

≡

y(t)

y

0

(t)

,

we have derived

z

0 =

z2

sin(t) + z

3

2 − 2z

2

1

,

which we can write as

z

0 = f(z, t).

Exercise 6.5. You probably won’t need to consider ODEs of order exceeding two

very often in finance, but the same trick works. Given

y

(n)

(t) =

nX−1

k=0

ak(t)y

(k)

(t),

we define the vector function z(t) ∈ R

n−1

by

zk(t) = y

(k)

(t), k = 0, 1, . . . , n − 1.

Then z

0

(t) = Mz(t). Find the matrix M.

6.3. Recurrence Relations. In its most general form, a recurrence relation is

simply a sequence of vectors v

(1)

, v

(2)

, . . . for which some functional relation generates v

(n) given the earlier iterates v

(1)

, . . . , v

(n−1). At this level of generality, very

little more can be said. However, the theory of linear recurrence relations is simple

and very similar to the techniques of differential equations.

The first order linear recurrence relation is simply the sequence {an : n =

0, 1, . . .} of complex numbers defined by

an = can−1.

Thus

an = can−1 = c

2

an−2 = c

3

an−3 = · · · = c

n

a0

and the solution is complete.

MATHEMATICAL AND NUMERICAL METHODS 67

The second order linear recurrence relation is slightly more demanding. Here

an+1 + pan + qan−1 = 0

and, inspired by the solution for the first order recurrence, we try an = c

n, for some

c 6= 0. Then

0 = c

n−1

c

2 + pc + q

,

or

0 = c

2 + pc + q.

If this has two distinct roots c1 and c2, then one possible solution to the second

order recurrence is

un = p1c

n

1 + p2c

n

2

,

for constants p1 and p2. However, is this the full set of solutions? What happens if

the quadratic has only one root?

Proposition 6.3. Let {an : n ∈ Z} be the sequence of complex numbers satisfying

the recurrence relation

an+1 + pan + qan−1 = 0, n ∈ Z.

If α1 and α2 are the roots of the associated quadratic

t

2 + pt + q = 0,

then the general solution is

an = c1α

n

1 + c2α

n

2

when α1 6= α2. If α1 = α2, then the general solution is

an = (v1n + v2) α

n

1

.

Proof. The same vectorizing trick used to change second order differential equations

in one variable into first order differential equations in two variables can also be

used here. We define a new sequence {b

(n)

: n ∈ Z} by

b

(n) =

an−1

an

.

Thus

b

(n) =

an−1

−pan−1 − qan−2

,

that is,

(6.12) b

(n) = Ab

(n−1)

,

where

(6.13) A =

0 1

−q −p

.

This first order recurrence has the simple solution

(6.14) b

(n) = A

nb

(0)

,

so our analytic solution reduces to calculation of the matrix power An. Now let

us begin with the case when the eigenvalues λ1 and λ2 are distinct. Then the

corresponding eigenvectors w(1) and w(2) are linearly independent. Hence we can

write our initial vector b

(0) as a unique linear combination of these eigenvectors:

b

(0) = b1w(1) + b2w(2)

68 BRAD BAXTER

Thus

b

(n) = b1A

nw(1) + b2A

nw(2) = b1λ

n

1 w(1) + b1λ

n

2 w(2)

.

Looking at the second component of the vector, we obtain

an = c1λ

n

1 + c2λ

n

2

.

Now the eigenvalues of A are the roots of the quadratic equation

det (A − λI) = det

−λ 1

−q −p − λ

,

in other words the roots of the quadratic

λ

2 + pλ + q = 0.

Thus the associated equation is precisely the characteristic equation of the matrix

A in the vectorized problem. Hence an = c1α

n

1 + c2α

n

2

.

We only need this case in the course, but I shall lead you through a careful

analysis of the case of coincident roots. It’s a good exercise for your matrix skills.

First note that the roots are coincident if and only if p

2 = 4q, in which case

A =

0 1

−p

2/4 −p

,

and the eigenvalue is −p/2. In fact, subsequent algebra is simplified if we substitute

α = −p/2, obtaining

A =

0 1

−α

2 2α

.

The remainder of the proof is left as the following exercise.

Exercise 6.6. Show that

A = αI + uvT

,

where

u =

1

α

, v =

−α

1

and note that v

T u = 0. Show also that

A

2 = α

2

I + 2αuvT

, A3 = α

3

I + 3α

2uvT

,

and use proof by induction to demonstrate that

A

n = α

n

I + nαn−1uvT

.

Hence find the general solution for an.

MATHEMATICAL AND NUMERICAL METHODS 69

6.4. Mortgages – a once exotic instrument. The objective of this section is

to illustrate some of the above techniques for analysing difference and differential

equations via mortgage pricing. You are presumably all too familiar with a repayment mortgage: we borrow a large sum M for a fairly large slice T of our lifespan,

repaying capital and interest using N regular payments. The interest rate is assumed to be constant and it’s a secured loan: our homes are forfeit on default. How

do we calculate our repayments?

Let h = T /N be the interval between payments, let Dh : [0, T] → R be our debt

as a function of time, and let A(h) be our payment. We shall assume that our

initial debt is Dh(0) = 1, because we can always multiply by the true initial cost

M of our house after the calculation. Thus D must satisfy the equations

(6.15) Dh(0) = 1, Dh(T) = 0 and Dh(`h) = Dh((` − 1)h)e

rh − A(h).

We see that Dh(h) = e

rh − A(h), while

Dh(2h) = Dh(h)e

rh − A(h) = e

2rh − A(h)

1 + e

rh

.

The pattern is now fairly obvious:

(6.16) Dh(`h) = e

`rh − A(h)

X

`−1

k=0

e

krh

,

and summing the geometric series8

(6.17) Dh(`h) = e

`rh − A(h)

e

`rh − 1

e

rh − 1

.

In order to achieve D(T) = 0, we choose

(6.18) A(h) = e

rh − 1

1 − e−rT .

Exercise 6.7. What happens if T → ∞?

Exercise 6.8. Prove that

(6.19) Dh(`h) = 1 − e

−r(T −`h)

1 − e−rT .

Thus, if t = `h is constant (so we increase ` as we reduce h), then

(6.20) Dh(t) = 1 − e

−r(T −t)

1 − e−rT .

Almost all mortgages are repaid by 300 monthly payments for 25 years. However,

until recently, many mortgages calculated interest yearly, which means that we

choose h = 1 in Exercise 6.7 and then divide A(1) by 12 to obtain the monthly

payment.

Exercise 6.9. Calculate the monthly repayment A(1) when M = 105

, T = 25,

r = 0.05 and h = 1. Now repeat the calculation using h = 1/12. Interpret your

result.

8Many students forget the simple formula. If S = 1 + a + a

2 + · · · + am−2 + am−1

, then

aS = a + a

2 + · · · + am−1 + am. Subtracting these expressions implies (a − 1)S = am − 1, all

other terms cancelling

70 BRAD BAXTER

In principle, there’s no reason why our repayment could not be continuous,

with interest being recalculated on our constantly decreasing debt. For continuous

repayment, our debt D : [0, T] → R satisfies the relations

(6.21) D(0) = 1, D(T) = 0 and D(t + h) = D(t)e

rh − hA.

Exercise 6.10. Prove that

(6.22) D0

(t) − rD(t) = −A,

where, in particular, you should prove that (6.21) implies the differentiability of

D(t). Solve this differential equation using the integrating factor e

−rt. You should

find the solution

(6.23) D(t)e

−rt − 1 = A

Z t

0

(−e

−rτ ) dτ = A

e

−rt − 1

r

.

Hence show that

(6.24) A =

r

1 − e−rT

and

(6.25) D(t) = 1 − e

−r(T −t)

1 − e−rT ,

agreeing with (6.20), i.e. Dh(kh) = D(kh), for all k. Prove that limr→∞ D(t) = 1

for 0 < t < T and interpret.
Observe that
(6.26) A(h)
Ah =
e
rh − 1
rh ≈ 1 + (rh/2),
so that continuous repayment is optimal for the borrower, but that the mortgage
provider is making a substantial profit. Greater competition has made yearly recalculations much rarer, and interest is often paid daily, i.e. h = 1/250, which is
rather close to continuous repayment.
Exercise 6.11. Construct graphs of D(t) for various values of r. Calculate the
time t0(r) at which half of the debt has been paid.
6.5. Pricing Mortgages via lack of arbitrage. There is a very slick arbitrage argument to deduce the continuous repayment mortgage debt formula (6.25).
Specifically, the simple fact that D(t) is a deterministic financial instrument implies,
via arbitrage, that D(t) = a+b exp(rt), so we need only choose the constants a and
b to satisfy D(0) = 1 and D(T) = 1, which imply a + b = 1 and a + b exp(rT) = 0.
Solving these provides a = exp(rT)/(exp(rT) − 1) and b = −1/(exp(rT) − 1), and
equivalence to (6.25) is easily checked.
MATHEMATICAL AND NUMERICAL METHODS 71
7. Numerical Linear Algebra
I shall not include much explicitly here, because you have my longer lecture notes
on numerical linear algebra:
http://econ109.econ.bbk.ac.uk/brad/Methods/nabook.pdf
Please do revise the first long chapter of those notes if need to brush up on
matrix algebra.
You will also find my Matlab notes useful:
http://econ109.econ.bbk.ac.uk/brad/Methods/matlab_intro_notes.pdf
7.1. Orthogonal Matrices. Modern numerical linear algebra began with the computer during the Second World War, its progress accelerating enormously as computers became faster and more convenient in the 1960s. One of the most vital
conclusions of this research field is the enormous practical importance of matrices
which leave Euclidean length invariant. More formally:
Definition 7.1. We shall say that Q ∈ R
n×n is distance-preserving if kQxk = kxk,
for all x ∈ R
n.
The following simple result is very useful.
Lemma 7.1. Let M ∈ R
n×n be any symmetric matrix for which x
TMx = 0, for
every x ∈ R
n. Then M is the zero matrix.
Proof. Let e1, e2, . . . , en ∈ R
n be the usual coordinate vectors. Then
Mjk = e
T
j Mek =
1
2
(ej + ek)
T M (ej + ek) = 0, 1 ≤ j, k ≤ n.
Theorem 7.2. The matrix Q ∈ R
n is distance-preserving if and only if QT Q = I.
Proof. If QT Q = I, then
kQxk
2 = (Qx)
T
(Qx) = x
T Q
T Qx = x
T x = kxk
2
,
and Q is distance-preserving. Conversely, if kQxk
2 = kxk
2
, for all x ∈ R
n, then
x
T
Q
T Q − I
x = 0, x ∈ R
n
.
Since QT Q − I is a symmetric matrix, Lemma 7.1 implies QT Q − I = 0, i.e.
QT Q = I.
The condition QT Q = I simply states that the columns of Q are orthonormal
vectors, that is, if the columns of Q are q1, q2, . . . , qn, then kq1k = · · · = kqnk = 1
and q
T
j qk = 0 when j 6= k. For this reason, Q is also called an orthogonal matrix.
We shall let O(n) denote the set of all (real) n × n orthogonal matrices.
Exercise 7.1. Let Q ∈ O(n). Prove that Q−1 = QT
. Further, prove that O(n) is
closed under matrix multiplication, that is, Q1Q2 ∈ O(n) when Q1, Q2 ∈ O(n). (In
other words, O(n) forms a group under matrix multiplication. This observation is
important, and O(n) is often called the Orthogonal Group.)
Department of Economics, Mathematics and Statistics, Birkbeck College, University
of London, Malet Street, London WC1E 7HX, England
Email address: b.baxter@bbk.ac.u