Letting the computer do science
+Letting the computer do science
Let's think a little. What do you think is the process by which scientific discoveries are made?
-First, you have some situation or event of interest from which you want to discover the rules that govern it. Second, you carefully design the experiments to get as much unbiased data as you can. Third, you analyze that data to gain some knowledge and, hopefully, you can begin to write some equations that condense the underling process. Finally, you keep doing experiments to confirm that the equations you have invented are correct. You are doing science, my friend!
+First, you have some situation or event of interest from which you want to discover the rules that govern it. Second, you carefully design the experiments to get as much unbiased data as you can. Third, you analyze that data to gain some knowledge and, hopefully, you can begin to write some equations that condense the underlying process. Finally, you keep doing experiments to confirm that the equations you have invented are correct. You are doing science, my friend!
Throughout the book, we were learning a wide variety of statistical methods that sought to be as general as possible, but that required us to define the model to be used. The equations, then, were already defined and the algorithm only had to find the best parameters (or distribution of them) to fit that model to the data.
-But what if I tell you that now we can start "talking" with the computer. That we can ask the computer to learn the model itself with the data. Not the parameters. But the equations that govern the process generating the data we give to the computer. Even more, that now we can "share" some incomplete knowledge that we have of some process and ask the computer to learn, with minimun data, the part of the knowledge that we lack. What? Is that even posible? Let´s see.
+But what if I tell you that now we can start "talking" with the computer. That we can ask the computer to learn the model itself with the data. Not the parameters. But the equations that govern the process generating the data we give to the computer. Even more, that now we can "share" some incomplete knowledge that we have of some process and ask the computer to learn, with minimum data, the part of the knowledge that we lack. What? Is that even possible?
The language of science
-In order to start understanding if that farytale is possible, first we need to understand the ways we have to "encoding" the dynamics of those processes. As Steven Strogatz said "Since Newton, mankind has come to realize that the law of physics are always expressed in the language of differential equations". And we can argue that it is a language that not only belongs to physics, but to all science and, in general, to the world in which we live.
-But before any of you run off in fear, let's demystify this subject a little. What is a differential equation and why are them useful?
-Well the first thing to denote is that differential equations emerge whenever it´s easier to describe change than absolute values. As we saw in the Ultima Online Catastrophe, it is much easier to describe and define why populations grow or shrink, rather than explain why they have the particular absolute values in a particular point in time. Come on! it´s much more easy to comprehend that if there are lots of predators, the prey´s population will shrink than understand why there are, for example, 223,543 prays and 112,764 predators the 6 of may. Does this make sense?
+In order to start understanding if that fairytale is possible, first we need to understand the ways we have to "encoding" the dynamics of those processes. As Steven Strogatz said "Since Newton, mankind has come to realize that the laws of physics are always expressed in the language of differential equations". And we can argue that it is a language that not only belongs to physics, but to all science and, in general, to the world in which we live.
+But before any of you run off in fear, let's demystify this subject a little. What is a differential equation and why are they useful?
+Well the first thing to denote is that differential equations emerge whenever it's easier to describe change than absolute values. As we saw in the Ultima Online Catastrophe, it is much easier to describe and define why populations grow or shrink, rather than explain why they have the particular absolute values in a particular point in time. Come on! It's much more easy to comprehend that if there are lots of predators, the prey's population will shrink than understand why there are, for example, 223,543 prays and 112,764 predators the 6 of may. Does this make sense?
Remember that d can be read as change and the hole expresion "
Now, try to think a way to put the absolute values of each population over time into equations. Have any clue? No? As we said, change is much easier to decribe.
-Or you can take a much more familiar example: In Newtonian Mechanics motion is describe in terms of Force.
+Remember that d can be read as change and the hole expression "
Now, try to think a way to put the absolute values of each population over time into equations. Have any clue? No? As we said, change is much easier to describe.
+Or you can take a much more familiar example: In Newtonian Mechanics motion is described in terms of Force.
But Force determines acceleration, which itself is a statement about change. We are so familiar with that equation that we tend to forget that it is a differential equation (and as Steven mention, is the mother of all differential equations).
+But Force determines acceleration, which itself is a statement about change. We are so familiar with that equation that we tend to forget that it is a differential equation (and as Steven mentions, is the mother of all differential equations).
This transformation is just showing something that everyone already knows: Acceleration is the change of Velocity over time, and Velocity is the change of position over time. And that implies that Acceleration is the second derivative (change) on position over time.
-We just learned that the language of differential equations is fundamental for doing science. So, if we want the computer to learn equations that explain scientific events, it must need to know how to deal with this type of equations. And this is easily solved by the Scientific Machine Learning (SciML) ecosystem.
+This transformation is just showing something that everyone already knows: Acceleration is the change of Velocity over time, and Velocity is the change of position over time. And that implies that Acceleration is the second derivative (change) on position over time.
+We just learned that the language of differential equations is fundamental for doing science. So, if we want the computer to learn equations that explain scientific events, it must know how to deal with this type of equations. And this is easily solved by the Scientific Machine Learning (SciML) ecosystem.
Scientific Machine Learning for model discovery
-But dealing with differential equations is not the main thing that SciML has to offer us. Istead it give us the way to do science in cooperation with the artificial intelligence. What?? To be able to comprehen this, let´s rewiew how "classic" machine learning works.
-It turns out that an neural network is literally a function. Is a function in the sense that it takes a bunch of numbers, applies a series of transformations, and return another bunch of numbers:
+But dealing with differential equations is not the main thing that SciML has to offer us. Instead it give us the way to do science in cooperation with the artificial intelligence. What? To be able to comprehend this, let's rewiew how "classic" machine learning works.
+It turns out that a neural network is literally a function. Is a function in the sense that it takes a bunch of numbers, applies a series of transformations, and return another bunch of numbers:
So, Artificial Neural Networks are functions. But they are especial function, as they can change the connections that made the specific function they represent. They do this in a process called training where they adjust its connections (parameters) in order to correctly predict. So, with only one neural network, we can "represent" lots of functions. What's more, there is this Universal Approximation Theorem that says that a neural network that is deep and wide enough (that is, has enough parameters) can approximate any function. You only need to feed it with enough data, so it can learn the optimal set of weights for its parameters.
-This is why neural networks come hand in hand with big data: you need lot of data in order to let the neural network learn the correct weights. But there is a problem: Big data cost billions, or may not even be available! (if you don't believe me, ask the Large Hadron Collider scientists to run 1 million experiments to train a NN, I'm sure they'll be happy to help you :P)
-Can you imagine a way to drastically reduce the data needed to train the NN in a significant way? Well, how about incorporating scientific knowledge into machine learning?. If we think it for a moment, we can realize that a scientific model is worth a thousand datasets. The equations works like a proxy of thousand of experiments, people investigating, years of research. in other words: tons of data.
-So if we create a way to inform all of that precious data, so it can focus in learning an specific part of the equation (some part that we don´t know), it could do it with a minimum quantity of data! Lucky us, Christopher Rackauckas and his team already found a way.
-The concept about we are talking is called "Universal Differential Equations". Let´s use them to recover some missing equation components from the Virtual Catastrophe from the last chapter!
+So, artificial neural networks are functions. But they are special functions, as they can change the connections that made the specific function they represent. They do this in a process called training where they adjust its connections (parameters) in order to correctly predict. So, with only one neural network, we can "represent" lots of functions. What's more, there is this Universal Approximation Theorem that says that a neural network that is deep and wide enough (that is, has enough parameters) can approximate any function. You only need to feed it with enough data, so it can learn the optimal set of weights for its parameters.
+This is why neural networks come hand in hand with big data: you need a lot of data in order to let the neural network learn the correct weights. But there is a problem: Big data costs billions, or may not even be available! (if you don't believe me, ask the Large Hadron Collider scientists to run 1 million experiments to train a NN, I'm sure they'll be happy to help you :P)
+Can you imagine a way to drastically reduce the data needed to train the NN in a significant way? Well, how about incorporating scientific knowledge into machine learning? If we think it for a moment, we can realize that a scientific model is worth a thousand datasets. The equations work like a proxy of thousands of experiments, people investigating, years of research. in other words: tons of data.
+So if we create a way to inform all of that precious data, so it can focus in learning an specific part of the equation (some part that we don't know), it could do it with a minimum quantity of data! Lucky us, Christopher Rackauckas and his team already found a way.
+The concept we are talking about is called "Universal Differential Equations". Let's use them to recover some missing equation components from the Virtual Catastrophe from the last chapter!
Looking for the catastrophe culprit
-So lets imagine again (yes, we imagine lots of things in this book) that we are Richard Garriott a day before the release of his game. He was tuning the last details of his virtual ecosystem. The model is simple but powerful, and ready to go:
+So let's imagine again (yes, we imagine lots of things in this book) that we are Richard Garriott a day before the release of his game. He was tuning the last details of his virtual ecosystem. The model is simple but powerful, and ready to go:
So after a delicate tuning, he determines that the best parameters for his virtual ecosystem are:
@@ -188,97 +186,165 @@Looking for the catastrophe culprit
He smiles and happily goes to sleep, thinking that tomorrow is the big day.
-Let´s see how were the system equilibrium that he decided.
-xxxxxxxxxx
begin
using DifferentialEquations, DiffEqSensitivity, StatsPlots
using Plots
gr()
end
xxxxxxxxxx
begin
#The Lotka-Volterra model Garriot define for Ultima Online
function lotka_volterra(du,u,p,t)
prey, pred = u
birth_prey, mort_prey, birth_pred, mort_pred = p
du[1] = dprey = (birth_prey - mort_prey * pred)*prey
du[2] = dpred = (birth_pred * prey - mort_pred)*pred
end
p0 = Float32[1.3, 0.9, 0.8, 1.8]
u0 = Float32[0.44249296,4.6280594]
prob_ = ODEProblem(lotka_volterra,u0,(0.0,40.0),p0)
end;
xxxxxxxxxx
sol = solve(prob_,Tsit5());
Let's see how were the system equilibrium that he decided.
+xxxxxxxxxx
begin
using DifferentialEquations, DiffEqSensitivity, StatsPlots
using Plots
gr()
end
xxxxxxxxxx
begin
#The Lotka-Volterra model Garriott define for Ultima Online
function lotka_volterra(du,u,p,t)
prey, pred = u
birth_prey, mort_prey, birth_pred, mort_pred = p
du[1] = dprey = (birth_prey - mort_prey * pred)*prey
du[2] = dpred = (birth_pred * prey - mort_pred)*pred
end
p0 = Float32[1.3, 0.9, 0.8, 1.8]
u0 = Float32[0.44249296,4.6280594]
prob_ = ODEProblem(lotka_volterra,u0,(0.0,40.0),p0)
end;
xxxxxxxxxx
sol = solve(prob_,Tsit5());
xxxxxxxxxx
plot(sol)
So the system seems in complete equilibrium.
+xxxxxxxxxx
plot(sol)
So the system seems in complete equilibrium.
The infamous day begins.
And finally we arrive at the day when the madness begins.
-Garriot wakes up early, doesn´t have any breakfast and goes to meet his team. Everything is ready. The countdown start: 3, 2, 1... And the game is online, running.
-After the champagne, hugs and a little celebration Garriot returns to work and starts to analyze the metrics to see if everything is alright, and it does. He relax a little bit until something calls his attention: The curves of carnivorous and herbivorous animals are a little different than they should be. There are still too few points (only four hours from the release) to be alarmed, but he decides to do a deeper analysis. Luckily, a few days ago, he had read a paper on the Universal ODEs, so he thinks they can help him in this case.
-lotka_volterra_players (generic function with 1 method)
xxxxxxxxxx
function lotka_volterra_players(du,u,p,t)
#Lotka-Volterra function with players that hunt
#Of course, Garriot doesn´t know about this new players part of the equation.
#He only saw some differences in the real curve vs the one he expected.
birth_prey, mort_prey, birth_pred, mort_pred, players_prey, players_pred = p
du[1] = (birth_prey - mort_prey * u[2] - players_prey)*u[1]
du[2] = (birth_pred * u[1] - mort_pred - players_pred)*u[2]
end
xxxxxxxxxx
begin
tspan = (0.0f0,4.0f0)
p_ = Float32[1.3, 0.9, 0.8, 1.8, 0.4, 0.4]
prob = ODEProblem(lotka_volterra_players, u0,tspan, p_)
solution = solve(prob, Vern7(), abstol=1e-12, reltol=1e-12, saveat = 0.1)
end;
Garriott wakes up early, doesn't have any breakfast and goes to meet his team. Everything is ready. The countdown starts: 3, 2, 1... And the game is online, running.
+After the champagne, hugs and a little celebration Garriott returns to work and starts to analyze the metrics to see if everything is alright, and it does. He relax a little bit until something calls his attention: The curves of carnivorous and herbivorous animals are a little different than they should be. There are still too few points (only four hours from the release) to be alarmed, but he decides to do a deeper analysis. Luckily, a few days ago, he had read a paper on the Universal ODEs, so he thinks they can help him in this case.
+lotka_volterra_players (generic function with 1 method)
xxxxxxxxxx
function lotka_volterra_players(du,u,p,t)
#Lotka-Volterra function with players that hunt
#Of course, Garriott doesn't know about this new players part of the equation.
#He only saw some differences in the real curve vs the one he expected.
birth_prey, mort_prey, birth_pred, mort_pred, players_prey, players_pred = p
du[1] = (birth_prey - mort_prey * u[2] - players_prey)*u[1]
du[2] = (birth_pred * u[1] - mort_pred - players_pred)*u[2]
end
xxxxxxxxxx
begin
tspan = (0.0f0,4.0f0)
p_ = Float32[1.3, 0.9, 0.8, 1.8, 0.4, 0.4]
prob = ODEProblem(lotka_volterra_players, u0,tspan, p_)
solution = solve(prob, Vern7(), abstol=1e-12, reltol=1e-12, saveat = 0.1)
end;
xxxxxxxxxx
begin
scatter(solution, alpha = 0.25, title="The data Garriot was seeing")
plot!(solution, alpha = 0.5)
end
xxxxxxxxxx
begin
expected_prob = ODEProblem(lotka_volterra, u0,tspan, p0)
expected_solution = solve(expected_prob, Vern7(), abstol=1e-12, reltol=1e-12, saveat = 0.1)
end;
xxxxxxxxxx
begin
scatter(solution, alpha = 0.25, title="The data Garriott was seeing")
plot!(solution, alpha = 0.5)
end
xxxxxxxxxx
begin
expected_prob = ODEProblem(lotka_volterra, u0,tspan, p0)
expected_solution = solve(expected_prob, Vern7(), abstol=1e-12, reltol=1e-12, saveat = 0.1)
end;
xxxxxxxxxx
begin
scatter(expected_solution, alpha = 0.25, title="The data Garriot was expecting to see")
plot!(expected_solution, alpha = 0.5)
end
As you can see, the animals were taking more time to recover. The period of the cycle was longer than ir should be: A clear sing that something were killing them. But he wanted to be sure. The Universal ODEs were key to do so.
-So, he start thinking "I know that the model has to be running cause I can see it in the code! So maybe, something external is producing this divergence. Something that I don´t know. But something that a Neural Network could find out" Let´s see
-xxxxxxxxxx
begin
using OrdinaryDiffEq
using ModelingToolkit
using DataDrivenDiffEq
using LinearAlgebra, Optim
using DiffEqFlux, Flux
end
2×41 Array{Float32,2}:
- 0.44198 0.330608 0.26585 0.223638 … 1.67964 1.83585 2.00891
- 4.62752 3.83069 3.14661 2.5762 0.00398262 0.00372037 0.00460533
xxxxxxxxxx
begin
X = Array(solution)
#And let´s add some noise to make it more difficult. Why? Because its fun!
Xₙ = X + Float32(1e-3)*randn(eltype(X), size(X))
end
dudt_ (generic function with 1 method)
xxxxxxxxxx
begin
# Define the neueral network which learns L(x, y, y(t-τ))
L = FastChain(FastDense(2, 32, tanh),FastDense(32, 32, tanh), FastDense(32, 2))
p = initial_params(L)
function dudt_(u, p,t)
prey, pred = u
z = L(u,p)
[p_[1]*prey - p_[2]*prey*pred + z[1],
-p_[4]*pred + p_[3]*prey*pred + z[2]]
end
end
So lets stop for a minute to analize the code that Garriot just propose.
-In the first two lines, he just define the Neural Network that is going to learn the missing components of the two equations (one for the dynamics of the Pray and other for the dynamics of the Predator) and fill the variable p with its untrained parameters.
-Then, he is defining the Universal Differential Equation. Where he is specifying the parts of the model that he knows, and adding a Neural Network to learn others things that might be happening (and we know that indeed were happening). In other words, he is proposing:
+xxxxxxxxxx
begin
scatter(expected_solution, alpha = 0.25, title="The data Garriott was expecting to see")
plot!(expected_solution, alpha = 0.5)
end
As you can see, the animals were taking more time to recover. The period of the cycle was longer than it should be: A clear sign that something was killing them. But he wanted to be sure. The Universal ODEs were key to do so.
+So, he start thinking "I know that the model has to be running cause I can see it in the code! So maybe, something external is producing this divergence. Something that I don't know. But something that a neural network could find out" Let's see
+xxxxxxxxxx
begin
using OrdinaryDiffEq
using ModelingToolkit
using DataDrivenDiffEq
using LinearAlgebra, Optim
using DiffEqFlux, Flux
end
2×41 Array{Float32,2}:
+ 0.442105 0.331631 0.264585 0.224956 … 1.67959 1.83622 2.01051
+ 4.62858 3.82721 3.14701 2.57497 0.00630082 0.00561034 0.00321225
xxxxxxxxxx
begin
X = Array(solution)
#And let's add some noise to make it more difficult. Why? Because its fun!
Xₙ = X + Float32(1e-3)*randn(eltype(X), size(X))
end
dudt_ (generic function with 1 method)
xxxxxxxxxx
begin
# Define the neueral network which learns L(x, y, y(t-τ))
L = FastChain(FastDense(2, 32, tanh),FastDense(32, 32, tanh), FastDense(32, 2))
p = initial_params(L)
function dudt_(u, p,t)
prey, pred = u
z = L(u,p)
[p_[1]*prey - p_[2]*prey*pred + z[1],
-p_[4]*pred + p_[3]*prey*pred + z[2]]
end
end
So let's stop for a minute to analyze the code that Garriott just proposed
+Then, he is defining the Universal Differential Equation. Where he is specifying the parts of the model that he knows, and adding a neural network to learn other things that might be happening (and we know that indeed were happening). In other words, he is proposing:
+Then, he is defining the Universal Differential Equation. Where he is specifying the parts of the model that he knows, and adding a Neural Network to learn others things that might be happening (and we know that indeed were happening). In other words, he is proposing:
So, as we already know, he is just adding a function. Which one? We already know that those are
xxxxxxxxxx
begin
prob_nn = ODEProblem(dudt_,u0, tspan, p)
sol_nn = solve(prob_nn, Tsit5(), u0 = u0, p = p, saveat = solution.t)
end;
So, as we already know, he is just adding a function. Which one? We already know that those are
xxxxxxxxxx
begin
prob_nn = ODEProblem(dudt_,u0, tspan, p)
sol_nn = solve(prob_nn, Tsit5(), u0 = u0, p = p, saveat = solution.t)
end;
xxxxxxxxxx
begin
plot(solution)
plot!(sol_nn, title="The untrained NN is far from the real curve")
end
predict (generic function with 1 method)
xxxxxxxxxx
function predict(θ)
Array(solve(prob_nn, Vern7(), u0 = u0, p=θ, saveat = solution.t,
abstol=1e-6, reltol=1e-6,
sensealg = InterpolatingAdjoint(autojacvec=ReverseDiffVJP())))
end
loss (generic function with 1 method)
xxxxxxxxxx
function loss(θ)
pred = predict(θ)
sum(abs2, Xₙ .- pred), pred
end
callback (generic function with 1 method)
xxxxxxxxxx
begin
const losses = []
#just adding a callback to supervise the network´s learning
callback(θ,l,pred) = begin
push!(losses, l)
if length(losses)%50==0
println("Current loss after $(length(losses)) iterations: $(losses[end])")
end
false
end
end
And lets train the NN!!
-xxxxxxxxxx
# First train with ADAM for better convergence
res1 = DiffEqFlux.sciml_train(loss, p, ADAM(0.01), cb=callback, maxiters = 200);
xxxxxxxxxx
# Train with BFGS
res2 = DiffEqFlux.sciml_train(loss, res1.minimizer, BFGS(initial_stepnorm=0.01), cb=callback, maxiters = 10000);
xxxxxxxxxx
+ begin
+ plot(solution)
+ plot!(sol_nn, title="The untrained NN is far from the real curve")
+ end
+ predict (generic function with 1 method)
+ xxxxxxxxxx
+ function predict(θ)
+ Array(solve(prob_nn, Vern7(), u0 = u0, p=θ, saveat = solution.t,
+ abstol=1e-6, reltol=1e-6,
+ sensealg = InterpolatingAdjoint(autojacvec=ReverseDiffVJP())))
+ end
+ loss (generic function with 1 method)
+ xxxxxxxxxx
+ function loss(θ)
+ pred = predict(θ)
+ sum(abs2, Xₙ .- pred), pred
+ end
+ callback (generic function with 1 method)
+ xxxxxxxxxx
+ begin
+
+ const losses = []
+
+ #just adding a callback to supervise the network's learning
+ callback(θ,l,pred) = begin
+ push!(losses, l)
+ if length(losses)%50==0
+ println("Current loss after $(length(losses)) iterations: $(losses[end])")
+ end
+ false
+ end
+
+ end
+ And lets train the NN!!
+xxxxxxxxxx
+ # First train with ADAM for better convergence
+ res1 = DiffEqFlux.sciml_train(loss, p, ADAM(0.01), cb=callback, maxiters = 200);
+ xxxxxxxxxx
+ # Train with BFGS
+ res2 = DiffEqFlux.sciml_train(loss, res1.minimizer, BFGS(initial_stepnorm=0.01), cb=callback, maxiters = 10000);
+ xxxxxxxxxx
# Plot the losses
plot(losses, yaxis = :log, xaxis = :log, xlabel = "Iterations", ylabel = "Loss")
xxxxxxxxxx
+ # Plot the losses
+ plot(losses, yaxis = :log, xaxis = :log, xlabel = "Iterations", ylabel = "Loss")
+ xxxxxxxxxx
begin
# Neural network guess
L̂ = L(Xₙ,res2.minimizer)
# Plot the data and the approximation
NNsolution = predict(res2.minimizer)
# Trained on noisy data vs real solution
plot(solution.t, NNsolution')
plot!(solution.t, X', title="The trained NN have fitted well")
end
Nice! Now that we have our Neural Network already learned the Input-Output relation in order to the entire system behave as the data Garriot were seeing in that Infamous morning, we need to transform that Input-Output behaviour into some function. We do this in order to gain interpretability of what may be happening and, in a scientific frame, learn the underling model. We do this by creating a Function Space in order to the NN learn which function (or linear combination of those) is the best one to describe that Input-Output relation. The loss function to do so is designed in a way that the result will be the least complex one, that is, the answer will be the simplest function that behave like the NN.
-xxxxxxxxxx
begin
## Sparse Identification
# Create a Basis
u[1:2]
# Lots of polynomials
polys = Operation[1]
for i ∈ 1:5
push!(polys, u[1]^i)
push!(polys, u[2]^i)
for j ∈ i:5
if i != j
push!(polys, (u[1]^i)*(u[2]^j))
push!(polys, u[2]^i*u[1]^i)
end
end
end
end
cos(u₁)
cos(u₂)
sin(u₁)
sin(u₂)
identity(1)
u₁ ^ 1
u₂ ^ 1
u₁ ^ 1 * u₂ ^ 2
u₂ ^ 1 * u₁ ^ 1
u₁ ^ 1 * u₂ ^ 3
u₂ ^ 1 * u₁ ^ 1
u₁ ^ 1 * u₂ ^ 4
u₂ ^ 1 * u₁ ^ 1
u₁ ^ 1 * u₂ ^ 5
u₂ ^ 1 * u₁ ^ 1
u₁ ^ 2
u₂ ^ 2
u₁ ^ 2 * u₂ ^ 3
u₂ ^ 2 * u₁ ^ 2
u₁ ^ 2 * u₂ ^ 4
u₂ ^ 2 * u₁ ^ 2
u₁ ^ 2 * u₂ ^ 5
u₂ ^ 2 * u₁ ^ 2
u₁ ^ 3
u₂ ^ 3
u₁ ^ 3 * u₂ ^ 4
u₂ ^ 3 * u₁ ^ 3
u₁ ^ 3 * u₂ ^ 5
u₂ ^ 3 * u₁ ^ 3
u₁ ^ 4
u₂ ^ 4
u₁ ^ 4 * u₂ ^ 5
u₂ ^ 4 * u₁ ^ 4
u₁ ^ 5
u₂ ^ 5
xxxxxxxxxx
begin
# And some other stuff
h = [cos.(u)...; sin.(u)...; polys...]
basis = Basis(h, u)
h
end
29 dimensional basis in ["u₁", "u₂"]
xxxxxxxxxx
basis
So, as you can see above, we just created a Function Space of 29 dimensions. That space include every possible linear combination of each dimension. And we are going to ask to SINDy to give us the simplest function that shows the same Input-Output behaviour the Neural Network just learned.
+xxxxxxxxxx
begin
# Neural network guess
L̂ = L(Xₙ,res2.minimizer)
# Plot the data and the approximation
NNsolution = predict(res2.minimizer)
# Trained on noisy data vs real solution
plot(solution.t, NNsolution')
plot!(solution.t, X', title="The trained NN have fitted well")
end
Nice! Now that we have our Neural Network already learned the Input-Output relation in order to the entire system behave as the data Garriott were seeing in that Infamous morning, we need to transform that Input-Output behaviour into some function. We do this in order to gain interpretability of what may be happening and, in a scientific frame, learn the underlying model. We do this by creating a Function Space in order to the NN learn which function (or linear combination of those) is the best one to describe that Input-Output relation. The loss function to do so is designed in a way that the result will be the least complex one, that is, the answer will be the simplest function that behave like the NN.
+xxxxxxxxxx
begin
## Sparse Identification
# Create a Basis
u[1:2]
# Lots of polynomials
polys = Operation[1]
for i ∈ 1:5
push!(polys, u[1]^i)
push!(polys, u[2]^i)
for j ∈ i:5
if i != j
push!(polys, (u[1]^i)*(u[2]^j))
push!(polys, u[2]^i*u[1]^i)
end
end
end
end
cos(u₁)
cos(u₂)
sin(u₁)
sin(u₂)
identity(1)
u₁ ^ 1
u₂ ^ 1
u₁ ^ 1 * u₂ ^ 2
u₂ ^ 1 * u₁ ^ 1
u₁ ^ 1 * u₂ ^ 3
u₂ ^ 1 * u₁ ^ 1
u₁ ^ 1 * u₂ ^ 4
u₂ ^ 1 * u₁ ^ 1
u₁ ^ 1 * u₂ ^ 5
u₂ ^ 1 * u₁ ^ 1
u₁ ^ 2
u₂ ^ 2
u₁ ^ 2 * u₂ ^ 3
u₂ ^ 2 * u₁ ^ 2
u₁ ^ 2 * u₂ ^ 4
u₂ ^ 2 * u₁ ^ 2
u₁ ^ 2 * u₂ ^ 5
u₂ ^ 2 * u₁ ^ 2
u₁ ^ 3
u₂ ^ 3
u₁ ^ 3 * u₂ ^ 4
u₂ ^ 3 * u₁ ^ 3
u₁ ^ 3 * u₂ ^ 5
u₂ ^ 3 * u₁ ^ 3
u₁ ^ 4
u₂ ^ 4
u₁ ^ 4 * u₂ ^ 5
u₂ ^ 4 * u₁ ^ 4
u₁ ^ 5
u₂ ^ 5
xxxxxxxxxx
begin
# And some other stuff
h = [cos.(u)...; sin.(u)...; polys...]
basis = Basis(h, u)
h
end
29 dimensional basis in ["u₁", "u₂"]
xxxxxxxxxx
basis
So, as you can see above, we just created a Function Space of 29 dimensions. That space include every possible linear combination of each dimension. And we are going ask to SINDy to give us the simplest function that shows the same Input-Output behaviour the neural network just learned.
Without saying more, let's do it!
-Sparse Identification Result with 2 active terms.
xxxxxxxxxx
begin
# Create an optimizer for the SINDy problem
opt = SR3()
# Create the thresholds which should be used in the search process
λ = exp10.(-7:0.1:3)
# Target function to choose the results from; x = L0 of coefficients and L2-Error of the model
g(x) = x[1] < 1 ? Inf : norm(x, 2)
Ψ = SINDy(Xₙ[:, 2:end], L̂[:, 2:end], basis, λ, opt, g = g, maxiter = 10000, normalize = true, denoise = true)
end
p₁ * u₁
xxxxxxxxxx
Ψ.equations[1]
p₂ * u₂
xxxxxxxxxx
Ψ.equations[2]
OMG! The equations were perfectly restored! You can read this as:
+Sparse Identification Result with 2 active terms.
xxxxxxxxxx
begin
# Create an optimizer for the SINDy problem
opt = SR3()
# Create the thresholds which should be used in the search process
λ = exp10.(-7:0.1:3)
# Target function to choose the results from; x = L0 of coefficients and L2-Error of the model
g(x) = x[1] < 1 ? Inf : norm(x, 2)
Ψ = SINDy(Xₙ[:, 2:end], L̂[:, 2:end], basis, λ, opt, g = g, maxiter = 10000, normalize = true, denoise = true)
end
p₁ * u₁
xxxxxxxxxx
Ψ.equations[1]
p₂ * u₂
xxxxxxxxxx
Ψ.equations[2]
OMG! The equations were perfectly restored! You can read this as:
So, Remembering that we define the data Garriot was seeing as:
+So, Remembering that we define the data Garriott was seeing as:
And that we also define that
And that we also define that
Lets ask for the parameters then:
-xxxxxxxxxx
parameters(Ψ)
So, the parameters are a bit off. But now that we have the equations restored, we can run another SINDy to gain much more accuracy:
-Sparse Identification Result with 2 active terms.
xxxxxxxxxx
begin
unknown_sys = ODESystem(Ψ)
unknown_eq = ODEFunction(unknown_sys)
# Just the equations
b = Basis((u, p, t)->unknown_eq(u, [1.; 1.], t), u)
# Retune for better parameters -> we could also use DiffEqFlux or other parameter estimation tools here.
Ψf = SINDy(Xₙ[:, 2:end], L̂[:, 2:end], b, STRRidge(0.01), maxiter = 100, convergence_error = 1e-18)
end
xxxxxxxxxx
parameters(Ψf)
So we recover the equations and its parameters with an outstanding acurracy. And that is even more incredible if we remember that we did this with a minimum of data.
-After seeing that, Garriot took a big deep breath. He immediately understood what was going on. The players were mass killing the animals. He called his team and start planning the strategy to face this, not knowing that already was a lost cause...
-References
+xxxxxxxxxx
parameters(Ψ)
So, the parameters are a bit off. But now that we have the equations restored, we can run another SINDy to gain much more accuracy:
+Sparse Identification Result with 2 active terms.
xxxxxxxxxx
begin
unknown_sys = ODESystem(Ψ)
unknown_eq = ODEFunction(unknown_sys)
# Just the equations
b = Basis((u, p, t)->unknown_eq(u, [1.; 1.], t), u)
# Retune for better parameters -> we could also use DiffEqFlux or other parameter estimation tools here.
Ψf = SINDy(Xₙ[:, 2:end], L̂[:, 2:end], b, STRRidge(0.01), maxiter = 100, convergence_error = 1e-18)
end
xxxxxxxxxx
parameters(Ψf)
So we recover the equations and its parameters with an outstanding accuracy. And that is even more incredible if we remember that we did this with a minimum of data.
+After seeing that, Garriott took a big deep breath. He immediately understood what was going on. The players were mass killing the animals. He called his team and start planning the strategy to face this, not knowing that already was a lost cause...
+Summary
+In this chapter, we continued to deepen our understanding of systems of differential equations and their complex behavior.We went a step further, introducing the concept of universal differential equations which allow us, given a very small amount of data, to estimate some unknown term of the system.This opens a very big door, connecting machine learning and science, which can greatly enhance the production of knowledge.
+