FlowHMM: Flow-based continuous hidden Markov models

Abstract

Continuous hidden Markov models (HMMs) assume that observations are generated from a mixture of Gaussian densities, limiting their ability to model more complex distributions. In this work, we address this shortcoming and propose a novel continuous HMM that allows to learn general continuous observation densities without constraining them to follow a Gaussian distribution or their mixtures. To that end, we leverage deep flow-based architectures that model complex, non-Gaussian functions. Moreover, to simplify optimization and avoid costly expectation-maximization algorithm, we use the co-occurrence matrix of discretized observations and consider the joint distribution of pairs of co-observed values. Even though our model is trained on discretized observations, it represents a continuous variant of HMM during inference, thanks to applying a separate flow model for each hidden state. The experiments on synthetic and real datasets show that our method outperforms Gaussian baselines.

Installation

Conda environment

Create the new conda environment

conda env create -f conda.yml

or update the existing one

conda env update -f conda.yml --prune

Packages installation

Install all the required packages with single command

poetry install

Examples

synthetic dataset with the following distributions (see Example 1 in paper):
- 2 gaussians
- 1 uniform

python main.py -e examples/SYNTHETIC_2G_1U.yaml \
 --nr_epochs=500 \
 --training_type=Q_training # or ML \
 --add_noise=True --noise_var=0.1 \
 --show_plots \
 --extra_n=$N

where N variable is the length of training observations. We chose N=1000, 10000, 100000; see SYNTHETIC_2G_1U.yaml for more details.

synthetic dataset with the following distributions (see Example 2 in paper):
- 1 beta
- 1 uniform
- 1 gaussian

python main.py -e examples/SYNTHETIC_1B_1U.yaml \
 --nr_epochs=1000 \
 --add_noise=True --noise_var=0.00005 \
 --training_type=Q_training # or ML \
 --show_plots \
 --extra_n=$N --extra_L=$L

where N variable is the length of training observations and L is the number of hidden states (flow models to learn). We chose N=1000, 10000, 100000 and L=2, 3; see SYNTHETIC_1B_1U_1G.yaml for more details.

2D synthetic dataset with two "Moons" and one Uniform distribution (see Example 5 in paper):

python main2d.py \
 -e examples/SYNTHETIC_2d_data_1U_2Moons.yaml # variant (a) \
 # -e examples/SYNTHETIC_2d_data_1U_2Moons_A2.yaml # variant (b) \
 --nr_epochs=500 \
 --training_type=Q_training # or ML \
 --extra_n=1000 --lrate=0.01 \
 --add_noise --noise_var=0.001 \
 --show_plots

2D synthetic dataset with one bivariate Gaussian, one Uniform and one related to geometric Brownian motion (see Example 6 in paper):

python main2d.py \
  -e examples/SYNTHETIC_2d_data_1G_1U_1GeomBrownianMotion.yaml # variant (a) \
  # -e examples/SYNTHETIC_2d_data_1G_1U_1GeomBrownianMotion_A2.yaml # variant (b) \
  --nr_epochs=500
  --training_type=Q_training # or ML \
  --extra_n=1000 \ 
  --lrate=0.01  \
  --add_noise --noise_var=0.001 \
  --show_plots

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
data		data
examples		examples
img		img
legacy		legacy
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda.yml		conda.yml
main.py		main.py
mainNd.py		mainNd.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlowHMM: Flow-based continuous hidden Markov models

Installation

Conda environment

Packages installation

Examples

About

Contributors 3

Languages

License

tooploox/flowhmm

Folders and files

Latest commit

History

Repository files navigation

FlowHMM: Flow-based continuous hidden Markov models

Installation

Conda environment

Packages installation

Examples

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

Languages