-
Notifications
You must be signed in to change notification settings - Fork 0
/
diffusion.jl
691 lines (545 loc) · 23.2 KB
/
diffusion.jl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
# # Operator inference
#
# In this example, we will identify an unknown operator describing a physical system by
# combining tools from scientific computing (PDEs, discretizations) and machine learning
# (minimizing prediction errors, backpropagation).
#
# For this, the [Julia](https://julialang.org/) language is a natural choice,
# in this example because of automatic differentiation (AD) of native code.
#nb # ## The Julia bootstrap block (for Google Colab)
#nb # Source: https://colab.research.google.com/drive/1_4Yz3FKO5_uuYvamEfHqwtFT9WpCuSbm
#nb #
#nb # This should be run for the first time to install Julia kernel, and then refresh this page (e.g., Ctrl-R)
#nb # so that colab will redirect to the installed Julia kernel
#nb # and then doing your own work
#nb ## 1. install latest Julia using jill.py
#nb ## tip: one can install specific Julia version using e.g., `jill install 1.7`
#nb !pip install jill && jill install --upstream Official --confirm
#nb
#nb ## 2. install IJulia kernel
#nb ! julia -e 'using Pkg; pkg"add IJulia"; using IJulia; installkernel("Julia")'
#nb
#nb ## 3. hot-fix patch to strip the version suffix of the installed kernel so
#nb ## that this notebook kernelspec is version agnostic
#nb !jupyter kernelspec install $(jupyter kernelspec list | grep julia | tr -s ' ' | cut -d' ' -f3) --replace --name julia
#nb #-
#nb using Pkg; Pkg.add(["OrdinaryDiffEq", "DiffEqFlux", "Plots"])
# We start by loading some packages.
using LinearAlgebra
using SparseArrays
using OrdinaryDiffEq
using DiffEqFlux
using Plots
# That may have taken some time. Julia is a compiled language, and she won't
# hesitate to compile any code she can put her eyes on, even just to get a
# small specialization improvement. Once heavy simulations are launched,
# however, this efficiency will be worth your patience. The first time you run
# some of the cells below it may also take some time, as the functions, say,
# `heatmap`, specializes on your input types. Once a function is compiled
# however, it should stay so for the remainder of the session.
# Since the problems will consist of inferring various matrices, we define a small helper
# function for visualizing our results:
plotmat(A; kwargs...) = heatmap(
reverse(A; dims = 1);
aspect_ratio = :equal,
xlims = (1 / 2, size(A, 2) + 1 / 2),
ylims = (1 / 2, size(A, 1) + 1 / 2),
## xticks = nothing,
## yticks = nothing,
kwargs...,
)
# We also provide a specialized method for sparse matrices, which must be densified before
# plotting.
plotmat(A::AbstractSparseMatrix; kwargs...) = plotmat(Matrix(A); kwargs...)
# ## Problem statement
#
# Consider a linear ordinary differential equation (ODE) parameterized by some operator
# $\mathbf{A}$:
#
# $$\frac{\mathrm{d} \mathbf{u}}{\mathrm{d} t} = \mathbf{A} \mathbf{u}, \quad \mathbf{u}(0)
# = \mathbf{u}_0$$
#
# where $\mathbf{u}_0$ are some initial conditions. To solve this system, we will use the
# [OrdinaryDiffEq](https://github.com/SciML/OrdinaryDiffEq.jl) package. It provides
# differentiable ODE solvers for problems defined by a parametrized ODE function $f$
# defining the right hand side of the ODE for a given state $\mathbf{u}$, time $t$ and
# parameters $p$ (in our case: $p = \mathbf{A}$), i.e. $\frac{\mathrm{d}
# \mathbf{u}}{\mathrm{d} t} = f(\mathbf{u}, p, t)$.
f(u, A, t) = A * u
# For convenience, we will define a solver function $\mathbf{S}: (\mathbf{A}, \mathbf{u}_0,
# t) \mapsto \mathbf{u}(t)$, where $\mathbf{u}$ is the solution to the above system for a
# given operator $\mathbf{A}$ and initial conditions $\mathbf{u}_0$. The method `Tsit5`
# is a fourth order Runge Kutta method.
#nb ?Tsit5
#nb #-
function S(A, u₀, t)
problem = ODEProblem(ODEFunction(f), u₀, (0.0, t[end]), A)
solve(problem, Tsit5(); saveat = t)
end
# Consider now the diffusion equation
#
# $$\frac{\partial u}{\partial t} = \kappa \frac{\partial^2 u}{\partial x^2}, \quad x \in
# \Omega = [0, 1]$$
#
# with diffusivity $\kappa > 0$, homogeneous Dirichlet boundary conditions $u(0, t) = u(1,
# t) = 0$, and initial conditions $u(x, 0) = u_0(x)$.
κ = 0.005
a = 0.0
b = 1.0
L = b - a
# The domain $\Omega$ may be discretized using a uniform grid $\mathbf{x} = (x_n)_{0 \leq n
# \leq N}$ of $N + 1$ equidistant points. We will also make a refined grid `xfine` for
# plotting.
xfine = LinRange(a, b, 1001)
N = 50
x = LinRange(a, b, N + 1)
Δx = L / N
# On the above grid, the diffusion operator $\frac{\partial^2}{\partial x^2}$ with constant
# Dirichlet boundary conditions may be approximated using the matrix
#
# $$\mathbf{D} = \frac{1}{\Delta x^2} \begin{pmatrix}
# 0 & \dots & \dots & \dots & 0 \\
# 1 & -2 & 1 & & \\
# & \ddots & \ddots & \ddots & \\
# & & 1 & -2 & 1 \\
# 0 & \dots & \dots & \dots & 0 \\
# \end{pmatrix}.$$
#
# This approximation is second order accurate:
#
# $$(\mathbf{D} \mathbf{u})_n = \frac{\partial^2 u}{\partial x^2}(x_n) + \mathcal{O}(\Delta
# x^2).$$
D = 1 / Δx^2 * spdiagm(-1 => fill(1.0, N), 0 => fill(-2.0, N + 1), 1 => fill(1.0, N))
D[1, :] .= 0 # Do not change first value
D[end, :] .= 0 # Do not change last value
plotmat(D)
# The semi-discrete solution $t \mapsto \mathbf{u}(t) = (u(x_n, t))_n \in \mathbb{R}^{N +
# 1}$ may be approximated using the solver $\mathbf{S}$:
#
# $$\mathbf{u}(t) \approx \mathbf{S}(\kappa \mathbf{D}, u_0(\mathbf{x}), t),$$
#
# i.e. we set $\mathbf{A} = \kappa \mathbf{D}$ in the ODE.
A_ref = κ * D
plotmat(A_ref)
# The diffusion operator $\frac{\partial^2}{\partial x^2}$ with homogeneous boundary
# conditions on $\Omega$ admit an eigenfunction basis $(X_k)_{k \in \mathbb{N}^*}$ with
#
# $$X_k(x) = \sqrt{\frac{2}{L}} \sin \left( \frac{\pi k x}{L} \right).$$
#
# The associated eigenvalues are given by $\lambda_k = -\frac{\pi^2 k^2}{L^2}$. Since these
# functions form a basis of $L^2(\Omega)$, all solutions to the diffusion equation with
# homogeneous boundary conditions may be written on the form
#
# $$u(x, t) = \sum_{k \in \mathbb{N}^*} c_k \exp \left( - \kappa \frac{\pi^2 k^2}{L^2}
# t \right) X_k(x),$$
#
# where the coefficients $(c_k)_{k \in \mathbb{N}^*}$ are determined by projecting the
# initial conditions onto the (orthogonal) eigenfunction basis:
#
# $$c_k = \int_\Omega u_0(x) X_k(x) \, \mathrm{d} x.$$
#
# In particular, we may use this formulation to generate exact solutions to the diffusion
# equation, by providing arbitrary coefficients. These solutions may used as training data
# for identifying the "unknown" discrete diffusion operator $\mathbf{A}$ (of which a very
# promising second-order accurate candidate is given by $\mathbf{A}_\text{ref} = \kappa
# \mathbf{D}$).
X(k, x) = √(2 / L) * sin(π * k * (x - a) / L)
p = plot(; xlabel = "x")
for k = 1:5
plot!(p, xfine, X.(k, xfine); label = "k = $k")
end
p
#-
function create_solution(c, k)
u(x, t) = sum(c * exp(-κ * (π * k / L)^2 * t) * X(k, x) for (c, k) in zip(c, k))
∂u∂t(x, t) =
-sum(
c * κ * (π * k / L)^2 * exp(-κ * (π * k / L)^2 * t) * X(k, x) for
(c, k) ∈ zip(c, k)
)
u, ∂u∂t
end
# A particular solution is given below, containing three different frequencies. We may use
# this solution to test our solver, from now on referred to as the *full order model* (FOM).
k = [3, 7, 10]
c = [0.7, 0.3, 0.4]
u, ∂u∂t = create_solution(c, k)
p = plot();
tplot = LinRange(0.0, 1.0, 5)
sol = S(A_ref, u.(x, 0.0), tplot)
for (i, t) ∈ enumerate(tplot)
plot!(p, xfine, u.(xfine, t); label = "t = $t", color = i) # Exact
scatter!(p, x, sol[i]; label = nothing, markeralpha = 0.5, color = i) # FOM
end
p
# ## Learning the operator intrusively
#
# Before inferring the unknown operator, we need som "training" data to compare
# with. This will consist of snapshots of different initial conditions diffused
# for different durations. We will sample normally distributed random
# coefficients decaying with the frequency $k$, and put the results in a
# snapshot tensor of size $N \times n_\text{sample} \times n_t$.
tsnap = LinRange(0.0, 1.0, 51)[2:end]
nsample = 200
K = 50
k = 1:K
c = [randn(K) ./ k for _ = 1:nsample]
solutions = [create_solution(c, k) for c ∈ c]
u = [s[1] for s ∈ solutions]
∂u∂t = [s[2] for s ∈ solutions]
init = [u(x, 0.0) for x ∈ x, u ∈ u]
train = [u(x, t) for x ∈ x, u ∈ u, t ∈ tsnap]
∂train∂t = [∂u∂t(x, t) for x ∈ x, ∂u∂t ∈ ∂u∂t, t ∈ tsnap]
# We also need an instantaneous performance metric (loss/cost/objective
# function). This function should compare our predictions with a snapshots of
# the exact solutions. Here we will use a simple $L^2$-distance (mean squared
# error). Note that the `ODESolution` object behaves like an array of size $N
# \times n_\text{sample} \times n_t$, meaning that we solve for all the
# different initial conditions at the same time.
loss(A, u₀, uₜ, t) = sum(abs2, S(A, u₀, t) - uₜ) / prod(size(uₜ))
loss(A) = loss(A, init, train, tsnap)
# As an initial guess for the "unknown" operator $\mathbf{A}$ we will simply
# use an empty matrix.
A = zeros(N + 1, N + 1)
# We may also visualize the performance of our operator.
function ploterr(A, u, tplot = LinRange(0.0, 1.0, 5))
sol = S(A, u.(x, 0.0), tplot)
p = plot(; xlabel = "x")
for (i, t) ∈ enumerate(tplot)
plot!(p, xfine, u.(xfine, t); color = i, label = nothing)
scatter!(p, x, sol[i]; label = "t = $t", color = i, markeralpha = 0.5)
end
p
end
ploterr(A_ref, u[6])
# A callback function is called after every iteration of gradient descent, allowing us to
# check the performance of our operator in real time during training. The return value
# `false` simply stops the function from stopping the iterations. We can already check how
# our initial guess for $\mathbf{A}$ performs.
#
# You may choose any of the callbacks below. For Jupyter notebooks, the two
# first options will print a plot after every iteration, which will quickly become very
# verbose. You may want to add a counter, plotting only every 10th itereation or so, or use
# the less verbose third callback option.
function callback(A, loss)
println(loss)
flush(stdout)
display(
plot(
plotmat(A; title = "Predict"),
plotmat(A_ref; title = "Reference");
layout = (1, 2),
),
)
false
end
callback(A, loss(A))
#-
function callback(A, loss)
println(loss)
flush(stdout)
display(ploterr(A, u[1], LinRange(0.0, 2.0, 5)))
false
end
callback(A, loss(A))
#-
callback(A, loss) = (println(loss);
flush(stdout);
false)
callback(A, loss(A))
# The intrusive training consists of improving the operator through gradient
# descent applied to the loss function. The optimizer
# [`ADAM`](https://arxiv.org/abs/1412.6980) performs a first order gradient
# descent, but with some sophisiticated momentum terms exploiting the
# stochasticity of the loss function. For larger problems we could could use a
# subset of the different solutions $u$, time steps $t$ and spatial points $x$
# at every evaluation of `loss`, but for now we will just use the entire
# dataset.
result = DiffEqFlux.sciml_train(loss, A, ADAM(0.01); cb = callback, maxiters = 1000)
Afit = result.u
plotmat(Afit)
#-
ploterr(Afit, u[4])
# Notice that at no point did we explicitly specify the gradient of `loss`, `S`
# or even `f` with respect to the matrix `A`. Yet still we performed a gradient
# descent. Since the entire computational graph is composed of pure Julia code,
# automatic differentiation engines, in this particular case
# [Zygote](https://github.com/FluxML/Zygote.jl), can use the chain rule to
# compute gradients. We may access this gradient explicitly. Let us check for
# the initial guess first:
∂L∂A = first(Zygote.gradient(loss, A))
plotmat(∂L∂A)
# and for the final result:
∂L∂Afit = first(Zygote.gradient(loss, Afit))
plotmat(∂L∂Afit)
# The gradient of the fitted operator is indeed much closer to zero (indicating
# a possible minimum):
norm(∂L∂A)
#-
norm(∂L∂Afit)
# Note also that the optimizer used here is [ADAM](https://arxiv.org/abs/1412.6980), which
# is typically used to train neural networks. In fact, our ODE solver is not so different
# from a neural network. Here we used a 4/5 Runge Kutta solver, but consider for
# illustrative purposes a simple forward Euler scheme $\frac{\mathbf{u}^{n+1} -
# \mathbf{u}^n}{\Delta t} = \mathbf{A} \mathbf{u}^n$. It satisfies the definition of a
# "vanilla" neural network:
#
# | Neural network | ODE |
# | :------------: | :-: |
# | $$\mathbf{x} \to \left[ \operatorname{NN}_\theta \right] \to \mathbf{y}$$ | $$\mathbf{u}_0
# \to \left[ \frac{\mathrm{d} \mathbf{u}}{\mathrm{d} t} = \mathbf{A} \mathbf{u} \right] \to
# \mathbf{u}(T)$$ |
# | $$\mathbf{h}_0 = \mathbf{x}$$ | $$\mathbf{u}_0 = \mathbf{u}_0$$ |
# | $$\mathbf{h}_{k + 1} = \sigma_k(\mathbf{W}_k \mathbf{h}_k +
# \mathbf{b}_k)$$ | $$\mathbf{u}_{k + 1} = (\mathbf{I} + \Delta t_k \mathbf{A})
# \mathbf{u}_k$$ |
# | $$\mathbf{y} = \mathbf{h}_K$$ | $$\mathbf{u}(T) = \mathbf{u}_K$$ |
# | $$\underset{\theta}{\min} \mathbb{E}_{(\mathbf{x}, \mathbf{y}) \sim \mathcal P} \|
# \mathbf{y}_\theta(\mathbf{x}) - \mathbf{y} \|^2$$ | $$\underset{\mathbf{A}}{\min}
# \mathbb{E}_{\mathbf{u} \sim \mathcal{U}} \| \mathbf{u}_\mathbf{A}(T) - \mathbf{u}(T)
# \|^2$$ |
#
#
#
# ### Adding a regularization
#
# We may also try out different loss functions. Since the resulting matrix from the above
# fit is quite dense, we could maybe enforce sparsity by adding penalization jump for going
# from zero to non-zero coefficients.
ℓ₁loss(A) = loss(A) + 1e-2 * sum(abs, A) / prod(size(A))
result = DiffEqFlux.sciml_train(ℓ₁loss, A, ADAM(0.01); cb = callback, maxiters = 1000)
A_ℓ₁ = result.u
plotmat(A_ℓ₁)
#-
ℓ₂loss(A) = loss(A) + 1e-2 * sum(abs2, A) / prod(size(A))
result = DiffEqFlux.sciml_train(ℓ₂loss, A, ADAM(0.01); cb = callback, maxiters = 1000)
A_ℓ₂ = result.u
plotmat(A_ℓ₂)
#-
ploterr(A_ℓ₁, u[1])
#-
ploterr(A_ℓ₂, u[1])
# Let us compare the resulting matrices:
plot(
plotmat(Afit; title = "No reg"),
plotmat(A_ℓ₁; title = "L¹"),
plotmat(A_ℓ₂; title = "L²");
layout = (1, 3),
size = (900, 300),
)
# Indeed, the $L^1$-regularized matrix is sparser.
# ## Non-intrusive operator inference
#
# It is also possible to infer the operator $\mathbf{A}$ without ever computing the gradient
# of $\mathbf{S}$, using snapshot matrices only. Consider the above defined snapshot matrix
# $\mathbf{U}$ as well as its left hand side equivalent $\dot{\mathbf{U}}$ (containing
# snapshots of the time derivatives $\frac{\mathrm{d} \mathbf{u}}{\mathrm{d} t}(t_k)$). The
# operator that best satisfies the original ODE at the snapshots of interest should be the
# solution to the following minimization problem:
#
# $$\underset{\mathbf{A} \in \mathbb{R}^{N \times N}}{\min} \ell(\mathbf{A}),$$
#
# where $\ell$ is some performance metric, typically consisting of a data fitting term
# $\ell_\text{data}(\mathbf{A}) = \| \mathbf{A} \mathbf{U} - \dot{\mathbf{U}} \|_F^2$, where
# $\| \mathbf{X} \|_F = \sqrt{\sum_{i j} X_{i j}^2}$ is the Frobenius norm (we could use any
# discrete norm here). We would also like to add a regularization term $\ell_\text{reg}$ to
# enforce some expected behavior on the operator. Since the first and last component of the
# solution vector $\mathbf{u}$ are zero (because of the boundary conditions), the first and
# last columns of $\mathbf{A}$ do not affect the value of $\ell_\text{data}$. An
# $L^2$-regularization $\ell_\text{reg}(\mathbf{A}) = \lambda \| \mathbf{A} \|_F^2$ simply
# incites these two columns to take the value zero.
#
# The solution in the case of an $L^2$-regularization is given by
#
# $$\mathbf{A} = \underset{\mathbf{A} \in \mathbb{R}^{N \times N}}{\operatorname{argmin}} \|
# \mathbf{A} \mathbf{U} - \dot{\mathbf{U}} \|_F^2 + \lambda \| \mathbf{A} \|_F^2 =
# \dot{\mathbf{U}} \mathbf{U}^\mathsf{T} (\mathbf{U} \mathbf{U}^\mathsf{T} + \lambda
# \mathbf{I})^{-1}.$$
# Information from one single solution may not fully describe the dynamics of other
# solutions, as they may contain different frequencies. We may build an augmented snapshot
# matrix by concatenating snapshot matrices for many different initial conditions. Here we
# will use the exact solutions, but we could also have used the approximations as above.
U = reshape(train, N + 1, :)
∂U∂t = reshape(∂train∂t, N + 1, :)
plotmat(reshape(permutedims(train, (1, 3, 2)), N + 1, :); aspect_ratio = :none)
# We may zoom in on the first five solutions;
plotmat(
reshape(permutedims(train, (1, 3, 2)), N + 1, :)[:, 1:(5 * length(tsnap))];
aspect_ratio = :none,
)
#-
A_ls = ∂U∂t * U' / (U * U' + 1e-8I)
plotmat(A_ls)
#-
ploterr(A_ls, u[7], LinRange(0.0, 10.0, 5))
# # Proper orthogonal decomposition (POD)
#
# Above we learned the discrete diffusion operator in the canonical basis of
# $\mathbb{R}^N$. Another useful basis is obtained from a *proper orthogonal
# decomposition* (POD). It is determined from snapshot data of the solution at
# different time steps (`tsnap`) and possibly different initial conditions.
# Truncating this basis at a level $P \ll N$ will yield *the* basis of size $P$
# with the smallest error energy for the training data (among all possibile
# bases spanning $P$-dimensional subspaces of $L^2(\Omega)$).
# The POD basis is simply just a collection of left singular vectors of our snapshot matrix
# $\mathbf{U}$. We will keep the $P$ first basis functions (they are the most important, as
# `svd` orders them by decreasing singular value). The basis functions will be stored as
# columns in the matrix $\mathbf{\Phi} \in \mathbb{R}^{N \times P}$.
P = 20
decomp = svd(U)
Φ = decomp.U[:, 1:P]
# We may plot some POD modes.
p = plot(; xlabel = "x");
i = 0
for k ∈ [1, 3, 7]
i += 1
plot!(p, x, Φ[:, k]; label = "Mode $k", color = i)
## plot!(p, x, X.(k, x); label = "k = $k", color = i)
## plot!(p, x, XK[:, k]; label = nothing, linestyle = :dash, color = i)
end
p
# We may check the orthogonality by computing the inner product between each basis function
# pair.
plotmat(Φ'Φ)
# The matrix $\mathbf{Φ} \mathbf{Φ}^\mathsf{T}$ can be considered to be a so called
# "autoencoder", with "encoder" $\mathbf{Φ}^\mathsf{T}$ and "decoder" $\mathbf{Φ}$. The
# autoencoder should be closer to identity when keeping more modes, i.e. we may be tempted
# to write something like $\mathbf{Φ} \mathbf{Φ}^\mathsf{T} \underset{P \to N}{\to}
# \mathbf{I}$ (by abuse of mathematical notation).
plotmat(Φ * Φ')
# Projecting the full order model onto the POD basis yields the reduced order model
#
# $$\frac{\mathrm{d} \tilde{\mathbf{u}}}{\mathrm{d} t} = \tilde{\mathbf{A}}
# \tilde{\mathbf{u}},$$
#
# where $\tilde{\mathbf{u}}$ are the coordinates of the ROM solution in the POD basis and
# $\tilde{\mathbf{A}} = \mathbf{\Phi}^\mathsf{T} \mathbf{A} \mathbf{\Phi} \in \mathbb{R}^{P
# \times P}$ is the reduced order operator. Later, we will try to infer this operator
# directly from data. Note that the ROM solution is given by $\mathbf{u}_\text{ROM} =
# \mathbf{\Phi} \tilde{\mathbf{u}}$.
#
# Note also that the FOM and ROM have the same form, just different sizes (and "tilde"s
# appearing everywhere). The ROM solution may thus simply be computed by
#
# $$\mathbf{u}_\text{ROM}(t) = \mathbf{\Phi} \mathbf{S}(\tilde{\mathbf{A}},
# \mathbf{\Phi}^\mathsf{T} \mathbf{u}_0, t).$$
S_POD(A_pod, u₀, t) = Φ * S(A_pod, Φ'u₀, t)
# Let us compare the solution of the ROM and FOM:
tplot = LinRange(0.0, 1.0, 5)
sample = 3
sol = S(A_ref, u[sample].(x, 0.0), tplot)
sol_pod = S_POD(Φ' * A_ref * Φ, u[sample].(x, 0.0), tplot)
p = plot(; xlabel = "x");
for (i, t) ∈ enumerate(tplot)
## scatter!(p, x, u.(x, t); label = nothing, markeralpha = 0.5, color = i) # Exact
plot!(p, x, sol[i]; label = "t = $t", color = i) # FOM
scatter!(p, x, sol_pod[:, i]; label = nothing, markeralpha = 0.5, color = i) # ROM
end
p
# Try using fewer modes and see what happens!
# ## Learning the operator in the POD basis
#
# Similarly to the full order model case, we may fit the POD operator
# $\tilde{A}$ using intrusive and non-intrusive approaches.
# ### Non-intrusive approach
V = Φ'U
∂V∂t = Φ'∂U∂t
A_POD_ls = ∂V∂t * V' / (V * V' + 1e-8I)
plotmat(A_POD_ls)
plotmat(Φ * A_POD_ls * Φ')
ploterr(Φ * A_POD_ls * Φ', u[7], LinRange(0.0, 1.0, 5))
# ### Intrusive approach
#
# First we need to create snapshot tensors of the POD solutions/observations:
init_POD = Φ'init
train_POD = zeros(P, nsample, length(tsnap))
for i ∈ eachindex(tsnap)
train_POD[:, :, i] = Φ' * train[:, :, i]
end
# We may also reuse the loss function and ODE solver from the full order case
# (only the sizes change).
loss_POD(A_POD) = loss(A_POD, init_POD, train_POD, tsnap)
result = DiffEqFlux.sciml_train(
loss_POD,
zeros(P, P),
ADAM(0.01);
cb = (A, l) -> callback(Φ * A * Φ', l),
maxiters = 1000,
)
A_POD_fit = result.u
#-
plotmat(A_POD_fit)
#-
plotmat(Φ * A_POD_fit * Φ')
#-
ploterr(Φ * A_POD_fit * Φ', u[7], LinRange(0.0, 1.0, 5))
# ## Learning the operator in the eigenfunction basis
#
# In the continuous eigenfunction basis $(X_k)_k$, the continuous diffusion
# operator is an infinite diagonal matrix. Projecting the solution on to a
# truncated eigenfunction basis would thus result in a finite diagonal matrix.
# ### Non-intrusive approach
K_eig = 20
XK = √Δx .* X.((1:K_eig)', x)
#-
plotmat(XK'XK)
#-
plotmat(XK * XK')
#-
W = XK'U
∂W∂t = XK'∂U∂t
A_eig_ls = ∂W∂t * W' / (W * W' + 1e-8I)
plotmat(A_eig_ls)
#-
plotmat(XK * A_eig_ls * XK')
#-
ploterr(XK * A_eig_ls * XK', u[7], LinRange(0.0, 1.0, 5))
# ### Intrusive approach
init_eig = XK'init
train_eig = zeros(P, nsample, length(tsnap))
for i ∈ eachindex(tsnap)
train_eig[:, :, i] = XK' * train[:, :, i]
end
#-
loss_eig(A) = loss(A, init_eig, train_eig, tsnap)
#-
result = DiffEqFlux.sciml_train(
loss_eig,
zeros(K_eig, K_eig),
ADAM(0.01);
cb = (A, l) -> callback(XK * A * XK', l),
maxiters = 1000,
)
A_eig_fit = result.u
#-
plotmat(A_eig_fit)
#-
plotmat(XK * A_eig_fit * XK')
#-
ploterr(XK * A_eig_fit * XK', u[7], LinRange(0.0, 1.0, 5))
# ## Summary
#
# In this example, we considered a uniformly discretized diffusion equation
# with homogeneous Dirichlet boundary conditions for three discrete functional
# bases:
#
# * the canonical basis of $\mathbb{R}^N$: $\mathbf{I} = (\mathbf{e}_n)_n \in
# \mathbb{R}^{N \times N}$
# * the eigenfunction basis of $\frac{\partial^2}{\partial x^2}$: $\mathbf{X} =
# (X_k(\mathbf{x}))_k \in \mathbb{R}^{N \times K}$, $K \ll N$
# * the POD basis $\mathbf{\Phi} = (\mathbf{\phi}_p)_p \in \mathbb{R}^{N \times
# P}$, $P \ll N$
#
# where the diffusion operator $\frac{\partial^2}{\partial x^2}$ was
# represented by $\mathbf{A}$, $\bar{\mathbf{A}}$, and $\tilde{\mathbf{A}}$
# respectively. These three operators, of respective sizes $N \times N$, $K
# \times K$, and $P \times P$, were then inferred using two methods:
#
# * intrusive inference, where the ODE-solver $\mathbf{S}$ needs to be available and
# differentiable, and the operator is trained using gradient descent;
# * non-intrusive inference, using snapshot matrices only.
#
# For this simple test case, the latter option seems to work well. However, for
# nonlinear equations where the differential operator is nonlinear, a simple
# least squares fit will not do. Having access to differentiable ODE solver
# opens up a new world.