diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..ce48355
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,146 @@
+# MTSBenchmark
+**/checkpoints/
+**/dev
+**/data/
+**/.vscode
+
+.DS_Store
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+*.pkl
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..db3e2b5
--- /dev/null
+++ b/README.md
@@ -0,0 +1,95 @@
+## MTAD: Tools and Benchmark for Multivariate Time Series Anomaly Detection
+
+This repository is a **M**ultivariate **T**ime Series **A**nomaly **D**etection toolkit named ***MTAD*** with a comprehensive benchmarking protocol and contains state-of-the-art methods with a unified and easy-to-use interface.
+
+Multivariate time series are a group of inherently correlated time series. For example, in the area of manufacturing industry and Information Technology (IT) systems, an entity (e.g., a physical machine or software service) is generally equipped with a monitoring mechanism to ensure its security or reliability.
+
+In contrast to anomaly detection on single time series, extensive recent studies indicate that dependency hidden in MTS is of great importance for accurate anomaly detection, namely, the anomaly detector should consider the MTS as a whole. To this end, state-of-the-art methods have resort to deep learning-based methods to capture the dependency for more accurate anomaly detection.
+
+The ultimate goal of MTSAnomaly is to unify current various evaluation protocols to reveal the best performance of models proposed recently.
+
+
+
+### Our evaluation protocol
+
+**Threshold Selection:**
+
+- EVT-based method
+- Seaching for the optimal one
+
+**Metrics:**
+
+- Precision, Recall, F1-score: *how accruate a model is?*
+  - with **or** without point adjustment
+- Delay: *how timely can a model report an anomaly?*
+- Efficiency: *how fast can a model be trained and perform anomaly detection?*
+
+
+
+### Requirement
+
+> cd ./requirements
+>
+> pip install -r \<requirement file\>
+
+### Run our benchmark
+
+The following is an example to run benchmark for LSTM, whose configuration files are stored in the `./benchmark_config/`folder.
+
+```
+cd benchmark
+python lstm_benchmark.py
+```
+
+### Models integrated in this tool
+
+**General Machine Learning-based Models**
+
+| Model   | Paper reference                                              |
+| :------ | :----------------------------------------------------------- |
+| PCA     | **[2003]** Shyu M L, Chen S C, Sarinnapakorn K, et al. A novel anomaly detection scheme based on principal component classifier |
+| iForest | **[ICDM'2008]** Fei Tony Liu, Kai Ming Ting, Zhi-Hua Zhou: Isolation Forest |
+| LODA    | **[Machine Learning'2016]** Tomás Pevný. Loda**:** Lightweight online detector of anomalies |
+
+**Deep Learning-based Models**
+
+| Model       | Paper reference                                              |
+| :---------- | :----------------------------------------------------------- |
+| AE          | **[AAAI'2019]** Andrea Borghesi, Andrea Bartolini, Michele Lombardi, Michela Milano, Luca Benini. Anomaly Detection Using Autoencoders in High Performance Computing Systems |
+| LSTM        | **[KDD'2018]** Kyle Hundman, Valentino Constantinou, Christopher Laporte, Ian Colwell, Tom Söderström. Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding |
+| LSTM-VAE    | **[Arxiv'2017]** A Multimodal Anomaly Detector for Robot-Assisted Feeding Using an LSTM-based Variational Autoencoder |
+| DAGMM       | **[ICLR'2018]** Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Dae-ki Cho, Haifeng Chen. Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection |
+| MSCRED      | **[AAAI'19]** Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, Nitesh V. Chawla. A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. |
+| OmniAnomaly | **[KDD'2019]** Ya Su, Youjian Zhao, Chenhao Niu, Rong Liu, Wei Sun, Dan Pei. Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network |
+| MTAD-GAT | **[ICDM'2020]** Multivariate Time-series Anomaly Detection via Graph Attention Networks |
+| USAD | **[KDD'2020]** USAD: UnSupervised Anomaly Detection on Multivariate Time Series. |
+| InterFusion | **[KDD'2021]** Zhihan Li, Youjian Zhao, Jiaqi Han, Ya Su, Rui Jiao, Xidao Wen, Dan Pei. Multivariate Time Series Anomaly Detection and Interpretation using Hierarchical Inter-Metric and Temporal Embedding |
+| TranAD | **[VLDB'2021]** TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data |
+| RANSynCoders | **[KDD'2021]** Practical Approach to Asynchronous Multivariate Time Series Anomaly Detection and Localization |
+| AnomalyTransformer | **[ICLR'2022]** Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy |
+| GANF | **[ICLR'2022]** Graph-Augmented Normalizing Flows for Anomaly Detection of Multiple Time Series |
+
+
+
+### Datasets 
+
+The following datasets are kindly released by different institutions or schools. Raw datasets could be downloaded or applied from the link right behind the dataset names. The processed datasets can be found [here](https://drive.google.com/drive/folders/1NEGyB4y8CvUB8TX2Wh83Eas_QHtufGPR?usp=sharing)⬇️ (SMD, SMAP, and MSL).
+
+- Server Machine Datase (**SMD**) [Download raw datasets⬇️](https://github.com/NetManAIOps/OmniAnomaly.git)
+
+  > Collected from a large Internet company containing a 5-week-long monitoring KPIs of 28 machines. The meaning for each KPI could be found [here](https://github.com/NetManAIOps/OmniAnomaly/issues/22).
+
+- Soil Moisture Active Passive satellite (**SMAP**) and Mars Science Laboratory rovel (**MSL**) [Download raw datasets⬇️](link)
+
+  > They are collected from the running spacecraft and contain a set of telemetry anomalies corresponding to actual spacecraft issues involving various subsystems and channel types.
+
+- Secure Water Treatment (**WADI**) [Apply here\*](https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/)
+
+  >WADI is collected from a real-world industrial water treatment plant, which contains 11-day-long multivariate KPIs. Particularly, the system is in a normal state in the first seven days and is under attack in the following four days.
+
+- Water Distribution (**SWAT**) [Apply here\*](https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/)
+
+  > An extended dataset of SWAT. 14-day-long operation KPIs are collected when the system is running normally and 2-day-long KPIs are obtained when the system is in attack scenarios.
+
+\* WADI and SWAT datasets were released by iTrust, which should be individually applied. One can request the raw datasets and preprocess them with our preprocessing scripts.
+
diff --git a/benchmark/AutoEncoder_benchmark.py b/benchmark/AutoEncoder_benchmark.py
new file mode 100644
index 0000000..a7f4d2b
--- /dev/null
+++ b/benchmark/AutoEncoder_benchmark.py
@@ -0,0 +1,92 @@
+import sys
+from pyod.models.auto_encoder import AutoEncoder
+
+sys.path.append("../")
+
+import logging
+import argparse
+from common import data_preprocess
+from common.dataloader import load_dataset
+from common.utils import seed_everything, load_config, set_logger, print_to_json
+from common.evaluation import Evaluator, TimeTracker
+from common.exp import store_entity
+
+seed_everything()
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="./benchmark_config/",
+        help="The config directory.",
+    )
+    parser.add_argument("--expid", type=str, default="autoencoder_SMD")
+    parser.add_argument("--gpu", type=int, default=-1)
+    args = vars(parser.parse_args())
+
+    config_dir = args["config"]
+    experiment_id = args["expid"]
+
+    params = load_config(config_dir, experiment_id)
+    set_logger(params, args)
+    logging.info(print_to_json(params))
+
+    data_dict = load_dataset(
+        data_root=params["data_root"],
+        entities=params["entities"],
+        dim=params["dim"],
+        valid_ratio=params["valid_ratio"],
+        test_label_postfix=params["test_label_postfix"],
+        test_postfix=params["test_postfix"],
+        train_postfix=params["train_postfix"],
+        nrows=params["nrows"],
+    )
+
+    # preprocessing
+    pp = data_preprocess.preprocessor(model_root=params["model_root"])
+    data_dict = pp.normalize(data_dict, method=params["normalize"])
+
+    # train/test on each entity put here
+    evaluator = Evaluator(**params["eval"])
+    for entity in params["entities"]:
+        logging.info("Fitting dataset: {}".format(entity))
+
+        train = data_dict[entity]["train"]
+        test = data_dict[entity]["test"]
+
+        model = AutoEncoder(
+            hidden_neurons=params["hidden_neurons"],
+            batch_size=params["batch_size"],
+            epochs=params["nb_epoch"],
+            l2_regularizer=params["l2_regularizer"],
+            verbose=1,
+        )
+
+        tt = TimeTracker(nb_epoch=params["nb_epoch"])
+
+        tt.train_start()
+        model.fit(train)
+        tt.train_end()
+
+        train_anomaly_score = model.decision_function(train)
+
+        tt.test_start()
+        anomaly_score = model.decision_function(test)
+        tt.test_end()
+
+        anomaly_label = data_dict[entity]["test_label"]
+
+        store_entity(
+            params,
+            entity,
+            train_anomaly_score,
+            anomaly_score,
+            anomaly_label,
+            time_tracker=tt.get_data(),
+        )
+    evaluator.eval_exp(
+        exp_folder=params["model_root"],
+        entities=params["entities"],
+        merge_folder=params["benchmark_dir"],
+        extra_params=params,
+    )
diff --git a/benchmark/LODA_benchmark.py b/benchmark/LODA_benchmark.py
new file mode 100644
index 0000000..f01b072
--- /dev/null
+++ b/benchmark/LODA_benchmark.py
@@ -0,0 +1,85 @@
+import sys
+
+sys.path.append("../")
+import logging
+from common import data_preprocess
+from common.dataloader import load_dataset
+from common.utils import seed_everything, load_config, set_logger, print_to_json
+from common.exp import store_entity
+from common.evaluation import Evaluator, TimeTracker
+from pyod.models.loda import LODA
+
+seed_everything()
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="./benchmark_config/",
+        help="The config directory.",
+    )
+    parser.add_argument("--expid", type=str, default="loda_SMD")
+    parser.add_argument("--gpu", type=int, default=-1)
+    args = vars(parser.parse_args())
+
+    config_dir = args["config"]
+    experiment_id = args["expid"]
+    params = load_config(config_dir, experiment_id)
+    set_logger(params, args)
+    logging.info(print_to_json(params))
+
+    data_dict = load_dataset(
+        data_root=params["data_root"],
+        entities=params["entities"],
+        dim=params["dim"],
+        valid_ratio=params["valid_ratio"],
+        test_label_postfix=params["test_label_postfix"],
+        test_postfix=params["test_postfix"],
+        train_postfix=params["train_postfix"],
+        nrows=params["nrows"],
+    )
+
+    # preprocessing
+    pp = data_preprocess.preprocessor(model_root=params["model_root"])
+    data_dict = pp.normalize(data_dict, method=params["normalize"])
+
+    # train/test on each entity put here
+    evaluator = Evaluator(**params["eval"])
+    for entity in params["entities"]:
+        logging.info("Fitting dataset: {}".format(entity))
+
+        train = data_dict[entity]["train"]
+        test = data_dict[entity]["test"]
+        test_label = data_dict[entity]["test_label"]
+
+        model = LODA(n_bins=params["n_bins"], n_random_cuts=params["n_random_cuts"])
+
+        tt = TimeTracker()
+        tt.train_start()
+        model.fit(train)
+        tt.train_end()
+
+        train_anomaly_score = model.decision_function(train)
+
+        tt.test_start()
+        anomaly_score = model.decision_function(test)
+        tt.test_end()
+
+        anomaly_label = test_label
+
+        store_entity(
+            params,
+            entity,
+            train_anomaly_score,
+            anomaly_score,
+            anomaly_label,
+            time_tracker=tt.get_data(),
+        )
+    evaluator.eval_exp(
+        exp_folder=params["model_root"],
+        entities=params["entities"],
+        merge_folder=params["benchmark_dir"],
+        extra_params=params,
+    )
diff --git a/benchmark/PCA_benchmark.py b/benchmark/PCA_benchmark.py
new file mode 100644
index 0000000..91aec93
--- /dev/null
+++ b/benchmark/PCA_benchmark.py
@@ -0,0 +1,86 @@
+import sys
+import logging
+from pyod.models.pca import PCA
+
+sys.path.append("../")
+
+from common import data_preprocess
+from common.dataloader import load_dataset
+from common.utils import seed_everything, load_config, set_logger, print_to_json
+from common.evaluation import Evaluator, TimeTracker
+from common.exp import store_entity
+
+seed_everything()
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="./benchmark_config/",
+        help="The config directory.",
+    )
+    parser.add_argument("--expid", type=str, default="pca_SMD")
+    parser.add_argument("--gpu", type=int, default=-1)
+    args = vars(parser.parse_args())
+
+    config_dir = args["config"]
+    experiment_id = args["expid"]
+
+    params = load_config(config_dir, experiment_id)
+    set_logger(params, args)
+    logging.info(print_to_json(params))
+
+    data_dict = load_dataset(
+        data_root=params["data_root"],
+        entities=params["entities"],
+        dim=params["dim"],
+        valid_ratio=params["valid_ratio"],
+        test_label_postfix=params["test_label_postfix"],
+        test_postfix=params["test_postfix"],
+        train_postfix=params["train_postfix"],
+        nrows=params["nrows"],
+    )
+
+    # preprocessing
+    pp = data_preprocess.preprocessor(model_root=params["model_root"])
+    data_dict = pp.normalize(data_dict, method=params["normalize"])
+
+    # train/test on each entity put here
+    evaluator = Evaluator(**params["eval"])
+    for entity in params["entities"]:
+        logging.info("Fitting dataset: {}".format(entity))
+        train = data_dict[entity]["train"]
+        test = data_dict[entity]["test"]
+        test_label = data_dict[entity]["test_label"]
+
+        # data preprocessing for MSCRED
+        model = PCA()
+
+        tt = TimeTracker()
+        tt.train_start()
+        model.fit(train)
+        tt.train_end()
+
+        # get outlier scores
+        train_anomaly_score = model.decision_function(train)
+        tt.test_start()
+        anomaly_score = model.decision_function(test)
+        tt.test_end()
+        anomaly_label = test_label
+
+        store_entity(
+            params,
+            entity,
+            train_anomaly_score,
+            anomaly_score,
+            anomaly_label,
+            time_tracker=tt.get_data(),
+        )
+    evaluator.eval_exp(
+        exp_folder=params["model_root"],
+        entities=params["entities"],
+        merge_folder=params["benchmark_dir"],
+        extra_params=params,
+    )
diff --git a/benchmark/RANS_benchmark.py b/benchmark/RANS_benchmark.py
new file mode 100644
index 0000000..b9775eb
--- /dev/null
+++ b/benchmark/RANS_benchmark.py
@@ -0,0 +1,113 @@
+import os
+
+os.chdir(os.path.dirname(os.path.realpath(__file__)))
+import sys
+
+sys.path.append("../")
+import logging
+from common import data_preprocess
+from common.dataloader import load_dataset
+from common.utils import seed_everything, load_config, set_logger, print_to_json
+from common.evaluation import Evaluator, TimeTracker
+from common.exp import store_entity
+from networks.RANS import RANSynCoders
+
+
+seed_everything()
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="./benchmark_config/",
+        help="The config directory.",
+    )
+    parser.add_argument("--expid", type=str, default="rans_SMD")
+    parser.add_argument("--gpu", type=int, default=-1)
+    args = vars(parser.parse_args())
+
+    config_dir = args["config"]
+    experiment_id = args["expid"]
+
+    params = load_config(config_dir, experiment_id)
+    set_logger(params, args)
+    logging.info(print_to_json(params))
+
+    data_dict = load_dataset(
+        data_root=params["data_root"],
+        entities=params["entities"],
+        valid_ratio=params["valid_ratio"],
+        dim=params["dim"],
+        test_label_postfix=params["test_label_postfix"],
+        test_postfix=params["test_postfix"],
+        train_postfix=params["train_postfix"],
+        nrows=params["nrows"],
+    )
+
+    # preprocessing
+    pp = data_preprocess.preprocessor(model_root=params["model_root"])
+    data_dict = pp.normalize(data_dict, method=params["normalize"])
+
+    # train/test on each entity put here
+    evaluator = Evaluator(**params["eval"])
+    for entity in params["entities"]:
+        logging.info("Fitting dataset: {}".format(entity))
+        x_train = data_dict[entity]["train"]
+        x_test = data_dict[entity]["test"]
+
+        N = 5 * round((x_train.shape[1] / 3) / 5)
+        z = int((N / 2) - 1)
+
+        model = RANSynCoders(
+            n_estimators=N,
+            max_features=N,
+            encoding_depth=params["encoder_layers"],
+            latent_dim=z,
+            decoding_depth=params["decoder_layers"],
+            activation=params["activation"],
+            output_activation=params["output_activation"],
+            delta=params["delta"],
+            synchronize=params["synchronize"],
+            max_freqs=params["S"],
+        )
+
+        tt = TimeTracker(nb_epoch=params["nb_epoch"])
+        tt.train_start()
+
+        model.fit(
+            x_train,
+            epochs=params["nb_epoch"],
+            batch_size=params["batch_size"],
+            freq_warmup=params["freq_warmup"],
+            sin_warmup=params["sin_warmup"],
+        )
+        tt.train_end()
+
+        train_anomaly_score = model.predict_prob(
+            x_train, N, batch_size=10 * params["batch_size"]
+        )
+
+        tt.test_start()
+        anomaly_score = model.predict_prob(
+            x_test, N, batch_size=10 * params["batch_size"]
+        )
+        tt.test_end()
+
+        anomaly_label = data_dict[entity]["test_label"]
+
+        store_entity(
+            params,
+            entity,
+            train_anomaly_score,
+            anomaly_score,
+            anomaly_label,
+            time_tracker=tt.get_data(),
+        )
+    evaluator.eval_exp(
+        exp_folder=params["model_root"],
+        entities=params["entities"],
+        merge_folder=params["benchmark_dir"],
+        extra_params=params,
+    )
diff --git a/benchmark/anomaly_transformer_benchmark.py b/benchmark/anomaly_transformer_benchmark.py
new file mode 100644
index 0000000..a9a2026
--- /dev/null
+++ b/benchmark/anomaly_transformer_benchmark.py
@@ -0,0 +1,113 @@
+import sys
+
+sys.path.append("../")
+import logging
+import argparse
+from networks.anomaly_transformer.solver import AnomalyTransformer
+
+from common import data_preprocess
+from common.dataloader import get_dataloaders, load_dataset
+from common.utils import seed_everything, load_config, set_logger, print_to_json
+from common.evaluation import Evaluator, TimeTracker
+from common.exp import store_entity
+
+seed_everything()
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="./benchmark_config/",
+        help="The config directory.",
+    )
+    parser.add_argument("--expid", type=str, default="anomaly_transformer_SMD")
+    parser.add_argument("--gpu", type=int, default=-1)
+    args = vars(parser.parse_args())
+
+    config_dir = args["config"]
+    experiment_id = args["expid"]
+
+    params = load_config(config_dir, experiment_id)
+    set_logger(params, args)
+    logging.info(print_to_json(params))
+
+    data_dict = load_dataset(
+        data_root=params["data_root"],
+        entities=params["entities"],
+        dim=params["dim"],
+        valid_ratio=params["valid_ratio"],
+        test_label_postfix=params["test_label_postfix"],
+        test_postfix=params["test_postfix"],
+        train_postfix=params["train_postfix"],
+        nrows=params["nrows"],
+    )
+
+    # preprocessing
+    pp = data_preprocess.preprocessor(model_root=params["model_root"])
+    data_dict = pp.normalize(data_dict, method=params["normalize"])
+
+    # sliding windows
+    window_dict = data_preprocess.generate_windows(
+        data_dict,
+        window_size=params["window_size"],
+        stride=params["stride"],
+    )
+
+    # train/test on each entity put here
+    evaluator = Evaluator(**params["eval"])
+    for entity in params["entities"]:
+        logging.info("Fitting dataset: {}".format(entity))
+        windows = window_dict[entity]
+        train_windows = windows["train_windows"]
+        test_windows = windows["test_windows"]
+        test_windows_label = windows["test_label"]
+
+        train_loader, valid_loader, test_loader = get_dataloaders(
+            train_windows,
+            test_windows,
+            batch_size=params["batch_size"],
+            num_workers=params["num_workers"],
+        )
+
+        model = AnomalyTransformer(
+            lr=params["lr"],
+            num_epochs=params["nb_epoch"],
+            k=params["k"],
+            win_size=params["window_size"],
+            input_c=params["dim"],
+            output_c=params["dim"],
+            batch_size=params["batch_size"],
+            model_save_path=params["model_root"],
+            device=params["device"],
+        )
+
+        tt = TimeTracker(nb_epoch=params["nb_epoch"])
+
+        tt.train_start()
+        model.fit(train_loader, valid_loader)
+        tt.train_end()
+
+        tt.test_start()
+        anomaly_score, anomaly_label = model.predict_prob(
+            test_loader, test_windows_label
+        )
+        tt.test_end()
+
+        train_anomaly_score, anomaly_label = model.predict_prob(
+            train_loader, test_windows_label
+        )
+
+        store_entity(
+            params,
+            entity,
+            train_anomaly_score,
+            anomaly_score,
+            anomaly_label,
+            time_tracker=tt.get_data(),
+        )
+    evaluator.eval_exp(
+        exp_folder=params["model_root"],
+        entities=params["entities"],
+        merge_folder=params["benchmark_dir"],
+        extra_params=params,
+    )
diff --git a/benchmark/benchmark_config/dataset_config/ASD.yaml b/benchmark/benchmark_config/dataset_config/ASD.yaml
new file mode 100644
index 0000000..5c1ad33
--- /dev/null
+++ b/benchmark/benchmark_config/dataset_config/ASD.yaml
@@ -0,0 +1,40 @@
+Base:
+    dataset: "asd"
+    data_root: ../data/ASD/
+    model_root: "./benchmark_exp_details"
+    benchmark_dir: "./benchmark_results"
+    train_postfix: "train.pkl"
+    test_postfix: "test.pkl"
+    test_label_postfix: "test_label.pkl"
+    dim: 19
+    nrows: null
+    entities:
+        - omi-1
+        - omi-2
+        - omi-3
+        - omi-4
+        - omi-5
+        - omi-6
+        - omi-7
+        - omi-8
+        - omi-9
+        - omi-10
+        - omi-11
+        - omi-12
+
+ASD_x2_valid:
+    valid_ratio: 0
+    nrows: 2000
+    entities: ["omi-1", "omi-2"]
+
+ASD_x2:
+    valid_ratio: 0
+    nrows: 2000
+    entities: ["omi-1", "omi-2"]
+
+ASD_x2_full:
+    valid_ratio: 0
+    entities: ["omi-1", "omi-2"]
+
+ASD:
+    valid_ratio: 0
\ No newline at end of file
diff --git a/benchmark/benchmark_config/dataset_config/MSL.yaml b/benchmark/benchmark_config/dataset_config/MSL.yaml
new file mode 100644
index 0000000..fa5b16b
--- /dev/null
+++ b/benchmark/benchmark_config/dataset_config/MSL.yaml
@@ -0,0 +1,54 @@
+Base:
+    dataset: "msl"
+    data_root: ../data/MSL/
+    model_root: "./benchmark_exp_details"
+    benchmark_dir: "./benchmark_results"
+    train_postfix: "train.pkl"
+    test_postfix: "test.pkl"
+    test_label_postfix: "test_label.pkl"
+    dim: 55
+    nrows: null
+    entities:
+        ["M-6",
+        "M-1",
+        "M-2",
+        "S-2",
+        "P-10",
+        "T-4",
+        "T-5",
+        "F-7",
+        "M-3",
+        "M-4",
+        "M-5",
+        "P-15",
+        "C-1",
+        "C-2",
+        "T-12",
+        "T-13",
+        "F-4",
+        "F-5",
+        "D-14",
+        "T-9",
+        "P-14",
+        "T-8",
+        "P-11",
+        "D-15",
+        "D-16",
+        "M-7",
+        "F-8"]
+
+MSL_x2_valid:
+    valid_ratio: 0
+    nrows: 2000
+    entities: ["M-6", "M-1"]
+
+MSL_x2:
+    valid_ratio: 0
+    entities: ["M-6", "M-1"]
+
+MSL_usad_test:
+    valid_ratio: 0
+    entities: ["T-9"]
+
+MSL:
+    valid_ratio: 0
\ No newline at end of file
diff --git a/benchmark/benchmark_config/dataset_config/SMAP.yaml b/benchmark/benchmark_config/dataset_config/SMAP.yaml
new file mode 100644
index 0000000..4812e91
--- /dev/null
+++ b/benchmark/benchmark_config/dataset_config/SMAP.yaml
@@ -0,0 +1,97 @@
+Base:
+    dataset: "smap"
+    data_root: ../data/SMAP/
+    model_root: "./benchmark_exp_details"
+    benchmark_dir: "./benchmark_results"
+    train_postfix: "train.pkl"
+    test_postfix: "test.pkl"
+    test_label_postfix: "test_label.pkl"
+    dim: 25
+    nrows: null
+    entities:
+        [
+        "P-1",
+        "S-1",
+        "E-1",
+        "E-2",
+        "E-3",
+        "E-4",
+        "E-5",
+        "E-6",
+        "E-7",
+        "E-8",
+        "E-9",
+        "E-10",
+        "E-11",
+        "E-12",
+        "E-13",
+        "A-1",
+        "D-1",
+        "P-2",
+        "P-3",
+        "D-2",
+        "D-3",
+        "D-4",
+        "A-2",
+        "A-3",
+        "A-4",
+        "G-1",
+        "G-2",
+        "D-5",
+        "D-6",
+        "D-7",
+        "F-1",
+        "P-4",
+        "G-3",
+        "T-1",
+        "T-2",
+        "D-8",
+        "D-9",
+        "F-2",
+        "G-4",
+        "T-3",
+        "D-11",
+        # "D-12",
+        "B-1",
+        "G-6",
+        "G-7",
+        "P-7",
+        "R-1",
+        "A-5",
+        "A-6",
+        "A-7",
+        "D-13",
+        "P-2",
+        "A-8",
+        "A-9",
+        "F-3",
+    ]
+
+SMAP_x2_valid:
+    valid_ratio: 0
+    nrows: 2000
+    entities: ["P-1","S-1"]
+
+SMAP_test:
+    valid_ratio: 0
+    entities:       [  "B-1",
+        "G-6",
+        "G-7",
+        "P-7",
+        "R-1",
+        "A-5",
+        "A-6",
+        "A-7",
+        "D-13",
+        "P-2",
+        "A-8",
+        "A-9",
+        "F-3",
+    ]
+
+SMAP_x2_full:
+    valid_ratio: 0
+    entities: ["D-13"]
+
+SMAP:
+    valid_ratio: 0
\ No newline at end of file
diff --git a/benchmark/benchmark_config/dataset_config/SMD.yaml b/benchmark/benchmark_config/dataset_config/SMD.yaml
new file mode 100644
index 0000000..f2aa8d9
--- /dev/null
+++ b/benchmark/benchmark_config/dataset_config/SMD.yaml
@@ -0,0 +1,42 @@
+Base:
+    dataset: "smd"
+    data_root: ../data/SMD/
+    model_root: "./benchmark_exp_details"
+    benchmark_dir: "./benchmark_results"
+    train_postfix: "train.pkl"
+    test_postfix: "test.pkl"
+    test_label_postfix: "test_label.pkl"
+    dim: 38
+    nrows: null
+    entities:
+        - machine-1-1
+        - machine-1-2
+        - machine-1-3
+        - machine-1-4
+        - machine-1-5
+        - machine-1-6
+        - machine-1-7
+        - machine-1-8
+        - machine-2-1
+        - machine-2-2
+        - machine-2-3
+        - machine-2-4
+        - machine-2-5
+        - machine-2-6
+        - machine-2-7
+        - machine-2-8
+        - machine-2-9
+        - machine-3-1
+        - machine-3-2
+        - machine-3-3
+        - machine-3-4
+        - machine-3-5
+        - machine-3-6
+        - machine-3-7
+        - machine-3-8
+        - machine-3-9
+        - machine-3-10
+        - machine-3-11
+
+SMD:
+    valid_ratio: 0
\ No newline at end of file
diff --git a/benchmark/benchmark_config/dataset_config/SWAT.yaml b/benchmark/benchmark_config/dataset_config/SWAT.yaml
new file mode 100644
index 0000000..96511e5
--- /dev/null
+++ b/benchmark/benchmark_config/dataset_config/SWAT.yaml
@@ -0,0 +1,24 @@
+Base:
+    dataset: "swat"
+    data_root: ../data/SWAT/
+    model_root: "./benchmark_exp_details"
+    benchmark_dir: "./benchmark_results"
+    train_postfix: "train.pkl"
+    test_postfix: "test.pkl"
+    test_label_postfix: "test_label.pkl"
+    dim: 40
+    nrows: null
+    entities:
+        - swat
+
+
+SWAT:
+    valid_ratio: 0
+    entities: ["swat"]
+
+SWAT_test:
+    nrows: 5000
+    valid_ratio: 0
+    entities: ["swat"]
+
+
diff --git a/benchmark/benchmark_config/dataset_config/WADI.yaml b/benchmark/benchmark_config/dataset_config/WADI.yaml
new file mode 100644
index 0000000..fb0eb47
--- /dev/null
+++ b/benchmark/benchmark_config/dataset_config/WADI.yaml
@@ -0,0 +1,24 @@
+Base:
+    dataset: "wadi"
+    data_root: ../data/WADI/
+    model_root: "./benchmark_exp_details"
+    benchmark_dir: "./benchmark_results"
+    train_postfix: "train.pkl"
+    test_postfix: "test.pkl"
+    test_label_postfix: "test_label.pkl"
+    dim: 93
+    nrows: null
+    entities:
+        - wadi
+
+
+WADI:
+    valid_ratio: 0
+    entities: ["wadi"]
+
+WADI_test:
+    nrows: 5000
+    valid_ratio: 0
+    entities: ["wadi"]
+
+
diff --git a/benchmark/benchmark_config/eval_config.yaml b/benchmark/benchmark_config/eval_config.yaml
new file mode 100644
index 0000000..9ea4329
--- /dev/null
+++ b/benchmark/benchmark_config/eval_config.yaml
@@ -0,0 +1,8 @@
+Base:
+    metrics: ["f1", "delay"]
+    pot_params: {"q": 1.0e-2, "level": [0.99, 0.98, 0.97,0.9,0.8], "dynamic": False}
+    # pot_params: {"q": 1.0e-3, "level": [0.8], "dynamic": False}
+    best_params: {"target_metric": "f1", "target_direction": "max"}
+    thresholding: ["best", "pot"]
+    point_adjustment: [True, False]
+
diff --git a/benchmark/benchmark_config/model_config/anomaly_transformer.yaml b/benchmark/benchmark_config/model_config/anomaly_transformer.yaml
new file mode 100644
index 0000000..84df09e
--- /dev/null
+++ b/benchmark/benchmark_config/model_config/anomaly_transformer.yaml
@@ -0,0 +1,76 @@
+Base:
+    model_id: anomaly_transformer
+    normalize: "minmax"
+
+anomaly_transformer_SMD:
+    dataset_id: SMD
+    batch_size: 128
+    window_size: 100
+    nb_epoch: 10
+    l2_regularizer: 0.1
+    stride: 1
+    num_workers: 1
+    lr: 1.0e-4
+    k: 3
+    device: 2
+
+anomaly_transformer_ASD:
+    dataset_id: ASD
+    batch_size: 128
+    window_size: 100
+    nb_epoch: 10
+    l2_regularizer: 0.1
+    stride: 1
+    num_workers: 1
+    lr: 1.0e-4
+    k: 3
+    device: 2
+
+anomaly_transformer_SWAT:
+    dataset_id: SWAT
+    batch_size: 128
+    window_size: 100
+    nb_epoch: 10
+    l2_regularizer: 0.1
+    stride: 1
+    num_workers: 1
+    lr: 1.0e-4
+    k: 3
+    device: 2
+
+anomaly_transformer_WADI:
+    dataset_id: WADI
+    batch_size: 128
+    window_size: 100
+    nb_epoch: 10
+    l2_regularizer: 0.1
+    stride: 1
+    num_workers: 1
+    lr: 1.0e-4
+    k: 3
+    device: 2
+
+anomaly_transformer_SMAP:
+    dataset_id: SMAP
+    batch_size: 128
+    window_size: 100
+    nb_epoch: 10
+    l2_regularizer: 0.1
+    stride: 1
+    num_workers: 1
+    lr: 1.0e-4
+    k: 3
+    device: 2
+
+anomaly_transformer_MSL:
+    dataset_id: MSL
+    batch_size: 128
+    window_size: 100
+    nb_epoch: 10
+    l2_regularizer: 0.1
+    stride: 1
+    num_workers: 1
+    lr: 1.0e-4
+    k: 3
+    device: 2
+
diff --git a/benchmark/benchmark_config/model_config/autoencoder.yaml b/benchmark/benchmark_config/model_config/autoencoder.yaml
new file mode 100644
index 0000000..987d674
--- /dev/null
+++ b/benchmark/benchmark_config/model_config/autoencoder.yaml
@@ -0,0 +1,51 @@
+Base:
+    model_id: AutoEncoder
+    normalize: "minmax"
+
+autoencoder_SMD:
+    dataset_id: SMD
+    hidden_neurons: [64, 32, 32, 64]
+    batch_size: 512
+    nb_epoch: 50
+    l2_regularizer: 0.1
+    device: 2
+autoencoder_ASD:
+    dataset_id: ASD
+    normalize: "minmax"
+    hidden_neurons: [64, 16, 16, 64]
+    batch_size: 512
+    nb_epoch: 50
+    l2_regularizer: 0.1
+    device: 2
+autoencoder_SWAT:
+    dataset_id: SWAT
+    normalize: "minmax"
+    hidden_neurons: [64, 32, 32, 64]
+    batch_size: 512
+    nb_epoch: 50
+    l2_regularizer: 0.1
+    device: 2
+autoencoder_WADI:
+    dataset_id: WADI
+    normalize: "minmax"
+    hidden_neurons: [64, 32, 32, 64]
+    batch_size: 512
+    nb_epoch: 50
+    l2_regularizer: 0.1
+    device: 2
+autoencoder_SMAP:
+    dataset_id: SMAP
+    normalize: "minmax"
+    hidden_neurons: [64, 16, 16, 64]
+    batch_size: 512
+    nb_epoch: 50
+    l2_regularizer: 0.1
+    device: 2
+autoencoder_MSL:
+    dataset_id: MSL
+    normalize: "minmax"
+    hidden_neurons: [64, 32, 32, 64]
+    batch_size: 512
+    nb_epoch: 50
+    l2_regularizer: 0.1
+    device: 2
diff --git a/benchmark/benchmark_config/model_config/dagmm.yaml b/benchmark/benchmark_config/model_config/dagmm.yaml
new file mode 100644
index 0000000..fac9098
--- /dev/null
+++ b/benchmark/benchmark_config/model_config/dagmm.yaml
@@ -0,0 +1,72 @@
+Base:
+    normalize: "standard"
+    model_id: dagmm
+    reverse_score: False
+
+dagmm_SMD:
+    dataset_id: SMD
+    batch_size: 512
+    nb_epoch: 20
+    compression_hiddens: [128, 64, 2]
+    estimation_hiddens: [100, 50]
+    estimation_dropout_ratio: 0.25
+    lr: 0.0001
+    lambdaone: 0.1
+    lambdatwo: 0.0001
+    device: 2
+dagmm_ASD:
+    dataset_id: ASD
+    batch_size: 512
+    nb_epoch: 20
+    compression_hiddens: [128, 64, 2]
+    estimation_hiddens: [100, 50]
+    estimation_dropout_ratio: 0.25
+    lr: 0.0001
+    lambdaone: 0.1
+    lambdatwo: 0.0001
+    device: 2
+dagmm_SWAT:
+    dataset_id: SWAT
+    batch_size: 512
+    nb_epoch: 20
+    compression_hiddens: [128, 64, 2]
+    estimation_hiddens: [100, 50]
+    estimation_dropout_ratio: 0.25
+    lr: 0.0001
+    lambdaone: 0.1
+    lambdatwo: 0.0001
+    device: 2
+dagmm_WADI:
+    dataset_id: WADI
+    batch_size: 512
+    nb_epoch: 20
+    compression_hiddens: [128, 64, 2]
+    estimation_hiddens: [100, 50]
+    estimation_dropout_ratio: 0.25
+    lr: 0.0001
+    lambdaone: 0.1
+    lambdatwo: 0.0001
+    device: 2
+dagmm_SMAP:
+    dataset_id: SMAP_test
+    batch_size: 512
+    nb_epoch: 20
+    compression_hiddens: [128, 64, 2]
+    estimation_hiddens: [100, 5]
+    estimation_dropout_ratio: 0
+    lr: 0.001
+    lambdaone: 0
+    lambdatwo: 0
+    device: 2
+dagmm_MSL:
+    dataset_id: MSL
+    batch_size: 512
+    nb_epoch: 20
+    compression_hiddens: [128, 64, 2]
+    estimation_hiddens: [100, 50]
+    estimation_dropout_ratio: 0.25
+    lr: 0.0001
+    lambdaone: 0.1
+    lambdatwo: 0.0001
+    device: 2
+
diff --git a/benchmark/benchmark_config/model_config/ganf.yaml b/benchmark/benchmark_config/model_config/ganf.yaml
new file mode 100644
index 0000000..36c1cf1
--- /dev/null
+++ b/benchmark/benchmark_config/model_config/ganf.yaml
@@ -0,0 +1,136 @@
+Base:
+    model_id: ganf
+    normalize: 'minmax'
+
+ganf_SMD:
+    dataset_id: SMD
+    window_size: 100
+    stride: 1
+    n_blocks: 1
+    input_size: 1
+    hidden_size: 32
+    n_hidden: 1
+    dropout: 0.1
+    batch_norm: False
+    batch_size: 512
+    weight_decay: 5.0e-4
+    nb_epoch: 10
+    lr: 2.0e-3
+    h_tol: 1.0e-4
+    rho_max: 1000000
+    lambda1: 0.0
+    rho_init: 1.0
+    alpha_init: 0.0
+    shuffle: True
+    num_workers: 1
+    device: 0
+ganf_ASD:
+    dataset_id: ASD
+    window_size: 100
+    stride: 1
+    n_blocks: 1
+    input_size: 1
+    hidden_size: 32
+    n_hidden: 1
+    dropout: 0.1
+    batch_norm: False
+    batch_size: 512
+    weight_decay: 5.0e-4
+    nb_epoch: 10
+    lr: 2.0e-3
+    h_tol: 1.0e-4
+    rho_max: 1000000
+    lambda1: 0.0
+    rho_init: 1.0
+    alpha_init: 0.0
+    shuffle: True
+    num_workers: 1
+    device: 0
+ganf_SWAT:
+    dataset_id: SWAT
+    window_size: 100
+    stride: 1
+    n_blocks: 1
+    input_size: 1
+    hidden_size: 32
+    n_hidden: 1
+    dropout: 0.1
+    batch_norm: False
+    batch_size: 512
+    weight_decay: 5.0e-4
+    nb_epoch: 10
+    lr: 2.0e-3
+    h_tol: 1.0e-4
+    rho_max: 1000000
+    lambda1: 0.0
+    rho_init: 1.0
+    alpha_init: 0.0
+    shuffle: True
+    num_workers: 1
+    device: 0
+ganf_WADI:
+    dataset_id: WADI
+    window_size: 100
+    stride: 1
+    n_blocks: 1
+    input_size: 1
+    hidden_size: 32
+    n_hidden: 1
+    dropout: 0.1
+    batch_norm: False
+    batch_size: 512
+    weight_decay: 5.0e-4
+    nb_epoch: 10
+    lr: 2.0e-3
+    h_tol: 1.0e-4
+    rho_max: 1000000
+    lambda1: 0.0
+    rho_init: 1.0
+    alpha_init: 0.0
+    shuffle: True
+    num_workers: 1
+    device: 0
+ganf_SMAP:
+    dataset_id: SMAP
+    window_size: 100
+    stride: 1
+    n_blocks: 1
+    input_size: 1
+    hidden_size: 32
+    n_hidden: 1
+    dropout: 0.1
+    batch_norm: False
+    batch_size: 512
+    weight_decay: 5.0e-4
+    nb_epoch: 10
+    lr: 2.0e-3
+    h_tol: 1.0e-4
+    rho_max: 1000000
+    lambda1: 0.0
+    rho_init: 1.0
+    alpha_init: 0.0
+    shuffle: True
+    num_workers: 1
+    device: 0
+ganf_MSL:
+    dataset_id: MSL
+    window_size: 100
+    stride: 1
+    n_blocks: 1
+    input_size: 1
+    hidden_size: 32
+    n_hidden: 1
+    dropout: 0.1
+    batch_norm: False
+    batch_size: 512
+    weight_decay: 5.0e-4
+    nb_epoch: 10
+    lr: 2.0e-3
+    h_tol: 1.0e-4
+    rho_max: 1000000
+    lambda1: 0.0
+    rho_init: 1.0
+    alpha_init: 0.0
+    shuffle: True
+    num_workers: 1
+    device: 0
\ No newline at end of file
diff --git a/benchmark/benchmark_config/model_config/iforest.yaml b/benchmark/benchmark_config/model_config/iforest.yaml
new file mode 100644
index 0000000..444b2d7
--- /dev/null
+++ b/benchmark/benchmark_config/model_config/iforest.yaml
@@ -0,0 +1,27 @@
+Base:
+    model_id: iforest
+    normalize: "minmax"
+
+iforest_SMD:
+    dataset_id: SMD
+    n_estimators: 100
+
+iforest_MSL:
+    dataset_id: MSL
+    n_estimators: 100
+
+iforest_SMAP:
+    dataset_id: SMAP
+    n_estimators: 100
+
+iforest_SWAT:
+    dataset_id: SWAT
+    n_estimators: 100
+
+iforest_WADI:
+    dataset_id: WADI
+    n_estimators: 100
+
+iforest_ASD:
+    dataset_id: ASD
+    n_estimators: 100
\ No newline at end of file
diff --git a/benchmark/benchmark_config/model_config/interfusion.yaml b/benchmark/benchmark_config/model_config/interfusion.yaml
new file mode 100644
index 0000000..c527457
--- /dev/null
+++ b/benchmark/benchmark_config/model_config/interfusion.yaml
@@ -0,0 +1,72 @@
+Base:
+    model_id: interfusion
+    normalize: "minmax"
+
+
+interfusion_SMD:
+    dataset_id: SMD
+    batch_size: 128
+    window_size: 100
+    stride: 1
+    patience: 5
+    lr: 0.001
+    num_workers: 1
+    pretrain_max_epoch: 20
+    nb_epoch: 20
+    device: 0  # -1 for cpu, 0 for cuda:0
+interfusion_ASD:
+    dataset_id: ASD
+    batch_size: 128
+    window_size: 100
+    stride: 1
+    patience: 5
+    lr: 0.001
+    num_workers: 1
+    pretrain_max_epoch: 20
+    nb_epoch: 20
+    device: 0  # -1 for cpu, 0 for cuda:0
+interfusion_SWAT:
+    dataset_id: SWAT
+    batch_size: 128
+    window_size: 100
+    stride: 1
+    patience: 5
+    lr: 0.001
+    num_workers: 1
+    pretrain_max_epoch: 20
+    nb_epoch: 20
+    device: 0  # -1 for cpu, 0 for cuda:0
+interfusion_WADI:
+    dataset_id: WADI
+    batch_size: 128
+    window_size: 100
+    stride: 1
+    patience: 5
+    lr: 0.001
+    num_workers: 1
+    pretrain_max_epoch: 20
+    nb_epoch: 20
+    device: 0  # -1 for cpu, 0 for cuda:0
+interfusion_SMAP:
+    dataset_id: SMAP
+    batch_size: 128
+    window_size: 100
+    stride: 1
+    patience: 5
+    lr: 0.001
+    num_workers: 1
+    pretrain_max_epoch: 20
+    nb_epoch: 20
+    device: 0  # -1 for cpu, 0 for cuda:0
+interfusion_MSL:
+    dataset_id: MSL
+    batch_size: 128
+    window_size: 100
+    stride: 1
+    patience: 5
+    lr: 0.001
+    num_workers: 1
+    pretrain_max_epoch: 20
+    nb_epoch: 20
+    device: 0  # -1 for cpu, 0 for cuda:0
+
diff --git a/benchmark/benchmark_config/model_config/loda.yaml b/benchmark/benchmark_config/model_config/loda.yaml
new file mode 100644
index 0000000..2f9029a
--- /dev/null
+++ b/benchmark/benchmark_config/model_config/loda.yaml
@@ -0,0 +1,29 @@
+Base:
+    model_id: LODA
+    normalize: "minmax"
+
+loda_SMD:
+    dataset_id: SMD
+    n_bins: 10
+    n_random_cuts: 100
+loda_ASD:
+    dataset_id: ASD
+    n_bins: 10
+    n_random_cuts: 100
+loda_SWAT:
+    dataset_id: SWAT
+    n_bins: 10
+    n_random_cuts: 100
+loda_WADI:
+    dataset_id: WADI
+    n_bins: 10
+    n_random_cuts: 100
+loda_SMAP:
+    dataset_id: SMAP
+    n_bins: 10
+    n_random_cuts: 100
+loda_MSL:
+    dataset_id: MSL
+    n_bins: 10
+    n_random_cuts: 100
+    
\ No newline at end of file
diff --git a/benchmark/benchmark_config/model_config/lstm.yaml b/benchmark/benchmark_config/model_config/lstm.yaml
new file mode 100644
index 0000000..cb2e355
--- /dev/null
+++ b/benchmark/benchmark_config/model_config/lstm.yaml
@@ -0,0 +1,105 @@
+Base:
+    model_id: lstm
+    normalize: "minmax"
+
+lstm_SMD:
+    dataset_id: SMD
+    batch_size: 1024
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    patience: 5
+    device: 0  # -1 for cpu, 0 for cuda:0
+    lr: 0.001
+    hidden_size: 256
+    num_layers: 2
+    dropout: 0
+    prediction_length: 1
+    prediction_dims: []
+    num_workers: 4
+
+
+lstm_ASD:
+    dataset_id: ASD
+    normalize: "minmax"
+    batch_size: 1024
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    patience: 5
+    device: 0  # -1 for cpu, 0 for cuda:0
+    lr: 0.001
+    hidden_size: 256
+    num_layers: 2
+    dropout: 0
+    prediction_length: 1
+    prediction_dims: []
+    num_workers: 4
+
+lstm_SWAT:
+    dataset_id: SWAT
+    normalize: "minmax"
+    batch_size: 1024
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    patience: 5
+    device: 0  # -1 for cpu, 0 for cuda:0
+    lr: 0.001
+    hidden_size: 256
+    num_layers: 2
+    dropout: 0
+    prediction_length: 1
+    prediction_dims: []
+    num_workers: 4
+
+lstm_WADI:
+    dataset_id: WADI
+    normalize: "minmax"
+    batch_size: 1024
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    patience: 5
+    device: 0  # -1 for cpu, 0 for cuda:0
+    lr: 0.001
+    hidden_size: 256
+    num_layers: 2
+    dropout: 0
+    prediction_length: 1
+    prediction_dims: []
+    num_workers: 4
+
+lstm_SMAP:
+    dataset_id: SMAP
+    normalize: "minmax"
+    batch_size: 1024
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    patience: 5
+    device: 0  # -1 for cpu, 0 for cuda:0
+    lr: 0.001
+    hidden_size: 256
+    num_layers: 2
+    dropout: 0
+    prediction_length: 1
+    prediction_dims: []
+    num_workers: 4
+
+lstm_MSL:
+    dataset_id: MSL
+    normalize: "minmax"
+    batch_size: 1024
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    patience: 5
+    device: 0  # -1 for cpu, 0 for cuda:0
+    lr: 0.001
+    hidden_size: 256
+    num_layers: 2
+    dropout: 0
+    prediction_length: 1
+    prediction_dims: []
+    num_workers: 4
\ No newline at end of file
diff --git a/benchmark/benchmark_config/model_config/mscred.yaml b/benchmark/benchmark_config/model_config/mscred.yaml
new file mode 100644
index 0000000..7d03830
--- /dev/null
+++ b/benchmark/benchmark_config/model_config/mscred.yaml
@@ -0,0 +1,64 @@
+Base:
+    model_id: mscred
+    normalize: "minmax"
+
+mscred_SMD:
+    dataset_id: SMD
+    window_size: 100
+    stride: 1
+    batch_size: 64
+    nb_epoch: 3
+    device: 1
+    step_max: 5
+    gap_time: 10
+    lr: 0.0002
+mscred_ASD:
+    dataset_id: ASD
+    window_size: 100
+    stride: 1
+    batch_size: 64
+    nb_epoch: 3
+    device: 1
+    step_max: 5
+    gap_time: 10
+    lr: 0.0002
+mscred_SWAT:
+    dataset_id: SWAT
+    window_size: 100
+    stride: 1
+    batch_size: 64
+    nb_epoch: 3
+    device: 1
+    step_max: 5
+    gap_time: 10
+    lr: 0.0002
+mscred_WADI:
+    dataset_id: WADI
+    window_size: 100
+    stride: 1
+    batch_size: 32
+    nb_epoch: 3
+    device: 1
+    step_max: 5
+    gap_time: 10
+    lr: 0.0002
+mscred_SMAP:
+    dataset_id: SMAP
+    window_size: 100
+    stride: 1
+    batch_size: 64
+    nb_epoch: 3
+    device: 1
+    step_max: 5
+    gap_time: 10
+    lr: 0.0002
+mscred_MSL:
+    dataset_id: MSL
+    window_size: 100
+    stride: 1
+    batch_size: 64
+    nb_epoch: 3
+    device: 1
+    step_max: 5
+    gap_time: 10
+    lr: 0.0002 
\ No newline at end of file
diff --git a/benchmark/benchmark_config/model_config/mtad_gat.yaml b/benchmark/benchmark_config/model_config/mtad_gat.yaml
new file mode 100644
index 0000000..fee64ce
--- /dev/null
+++ b/benchmark/benchmark_config/model_config/mtad_gat.yaml
@@ -0,0 +1,142 @@
+Base:
+    model_id: mtad_gat
+    normalize: "minmax"
+
+mtad_gat_SMD:
+    dataset_id: SMD
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    shuffle: True
+    num_workers: 1
+    init_lr: 3.0e-4
+    kernel_size: 7
+    feat_gat_embed_dim: null
+    time_gat_embed_dim: null
+    use_gatv2: True
+    gru_n_layers: 1
+    gru_hid_dim: 150
+    forecast_n_layers: 1
+    forecast_hid_dim: 150
+    recon_n_layers: 1
+    recon_hid_dim: 150
+    dropout: 0.3
+    alpha: 0.2
+    gamma: 1
+    device: 3
+mtad_gat_ASD:
+    dataset_id: ASD
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    shuffle: True
+    num_workers: 1
+    init_lr: 3.0e-4
+    kernel_size: 7
+    feat_gat_embed_dim: null
+    time_gat_embed_dim: null
+    use_gatv2: True
+    gru_n_layers: 1
+    gru_hid_dim: 150
+    forecast_n_layers: 1
+    forecast_hid_dim: 150
+    recon_n_layers: 1
+    recon_hid_dim: 150
+    dropout: 0.3
+    alpha: 0.2
+    gamma: 1
+    device: 3
+mtad_gat_SWAT:
+    dataset_id: SWAT
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    shuffle: True
+    num_workers: 1
+    init_lr: 3.0e-4
+    kernel_size: 7
+    feat_gat_embed_dim: null
+    time_gat_embed_dim: null
+    use_gatv2: True
+    gru_n_layers: 1
+    gru_hid_dim: 150
+    forecast_n_layers: 1
+    forecast_hid_dim: 150
+    recon_n_layers: 1
+    recon_hid_dim: 150
+    dropout: 0.3
+    alpha: 0.2
+    gamma: 1
+    device: 3
+mtad_gat_WADI:
+    dataset_id: WADI
+    batch_size: 128
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    shuffle: True
+    num_workers: 1
+    init_lr: 3.0e-4
+    kernel_size: 7
+    feat_gat_embed_dim: null
+    time_gat_embed_dim: null
+    use_gatv2: True
+    gru_n_layers: 1
+    gru_hid_dim: 150
+    forecast_n_layers: 1
+    forecast_hid_dim: 150
+    recon_n_layers: 1
+    recon_hid_dim: 150
+    dropout: 0.3
+    alpha: 0.2
+    gamma: 1
+    device: 3
+mtad_gat_SMAP:
+    dataset_id: SMAP
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    shuffle: True
+    num_workers: 1
+    init_lr: 3.0e-4
+    kernel_size: 7
+    feat_gat_embed_dim: null
+    time_gat_embed_dim: null
+    use_gatv2: True
+    gru_n_layers: 1
+    gru_hid_dim: 150
+    forecast_n_layers: 1
+    forecast_hid_dim: 150
+    recon_n_layers: 1
+    recon_hid_dim: 150
+    dropout: 0.3
+    alpha: 0.2
+    gamma: 1
+    device: 3
+mtad_gat_MSL:
+    dataset_id: MSL
+    batch_size: 256
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    shuffle: True
+    num_workers: 1
+    init_lr: 3.0e-4
+    kernel_size: 7
+    feat_gat_embed_dim: null
+    time_gat_embed_dim: null
+    use_gatv2: True
+    gru_n_layers: 1
+    gru_hid_dim: 150
+    forecast_n_layers: 1
+    forecast_hid_dim: 150
+    recon_n_layers: 1
+    recon_hid_dim: 150
+    dropout: 0.3
+    alpha: 0.2
+    gamma: 1
+    device: 1
diff --git a/benchmark/benchmark_config/model_config/omnianomaly.yaml b/benchmark/benchmark_config/model_config/omnianomaly.yaml
new file mode 100644
index 0000000..c4e3e4e
--- /dev/null
+++ b/benchmark/benchmark_config/model_config/omnianomaly.yaml
@@ -0,0 +1,59 @@
+Base:
+    model_id: omnianomaly
+    normalize: "minmax"
+    reverse_score: True
+
+omnianomaly_SMD:
+    dataset_id: SMD
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    l2_reg: 0.1
+    initial_lr: 1.0e-3
+    device: 3
+omnianomaly_ASD:
+    dataset_id: ASD
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    l2_reg: 0.1
+    initial_lr: 1.0e-3
+    device: 3
+omnianomaly_SWAT:
+    dataset_id: SWAT
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    l2_reg: 0.1
+    initial_lr: 1.0e-3
+    device: 3
+omnianomaly_WADI:
+    dataset_id: WADI
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    l2_reg: 0.1
+    initial_lr: 1.0e-3
+    device: 3
+omnianomaly_SMAP:
+    dataset_id: SMAP
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    l2_reg: 0.1
+    initial_lr: 1.0e-3
+    device: 3
+omnianomaly_MSL:
+    dataset_id: MSL
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    l2_reg: 0.1
+    initial_lr: 1.0e-3
+    device: 3 
\ No newline at end of file
diff --git a/benchmark/benchmark_config/model_config/pca.yaml b/benchmark/benchmark_config/model_config/pca.yaml
new file mode 100644
index 0000000..c5126a7
--- /dev/null
+++ b/benchmark/benchmark_config/model_config/pca.yaml
@@ -0,0 +1,16 @@
+Basic: 
+    model_id: PCA
+    normalize: "minmax"
+
+PCA_SMD:
+    dataset_id: SMD
+PCA_ASD:
+    dataset_id: ASD
+PCA_SWAT:
+    dataset_id: SWAT
+PCA_WADI:
+    dataset_id: WADI
+PCA_SMAP:
+    dataset_id: SMAP
+PCA_MSL:
+    dataset_id: MSL
diff --git a/benchmark/benchmark_config/model_config/rans.yaml b/benchmark/benchmark_config/model_config/rans.yaml
new file mode 100644
index 0000000..b4481ab
--- /dev/null
+++ b/benchmark/benchmark_config/model_config/rans.yaml
@@ -0,0 +1,111 @@
+Base:
+    model_id: RANS
+    normalize: "minmax"
+
+
+rans_SMAP_test:
+    dataset_id: SMAP_x2_full
+    device: 1 # -1 for cpu, 0 for cuda:0
+    num_workers: 1
+    encoder_layers: 1 
+    decoder_layers: 1
+    activation: 'relu'
+    output_activation: 'relu'
+    S: 2
+    delta: 0.05
+    batch_size: 512
+    synchronize: True
+    freq_warmup: 1
+    sin_warmup: 1
+    nb_epoch: 1
+
+rans_SMD:
+    dataset_id: SMD
+    device: 1 # -1 for cpu, 0 for cuda:0
+    num_workers: 1
+    encoder_layers: 1 
+    decoder_layers: 2 
+    activation: 'relu'
+    output_activation: 'relu'
+    S: 5 
+    delta: 0.05
+    synchronize: True
+    batch_size: 512
+    freq_warmup: 5 
+    sin_warmup: 5 
+    nb_epoch: 50
+rans_ASD:
+    dataset_id: ASD
+    device: 1 # -1 for cpu, 0 for cuda:0
+    num_workers: 1
+    encoder_layers: 1 
+    decoder_layers: 2 
+    activation: 'relu'
+    output_activation: 'relu'
+    S: 5 
+    delta: 0.05
+    synchronize: True
+    batch_size: 512
+    freq_warmup: 5 
+    sin_warmup: 5 
+    nb_epoch: 50
+rans_SWAT:
+    dataset_id: SWAT
+    device: 1 # -1 for cpu, 0 for cuda:0
+    num_workers: 1
+    encoder_layers: 1 
+    decoder_layers: 2 
+    activation: 'relu'
+    output_activation: 'relu'
+    S: 5 
+    delta: 0.05
+    synchronize: True
+    batch_size: 512
+    freq_warmup: 5 
+    sin_warmup: 5 
+    nb_epoch: 50
+rans_WADI:
+    dataset_id: WADI
+    device: 1 # -1 for cpu, 0 for cuda:0
+    num_workers: 1
+    encoder_layers: 1 
+    decoder_layers: 2 
+    activation: 'relu'
+    output_activation: 'relu'
+    S: 5 
+    delta: 0.05
+    synchronize: True
+    batch_size: 512
+    freq_warmup: 5 
+    sin_warmup: 5 
+    nb_epoch: 50
+rans_SMAP:
+    dataset_id: SMAP
+    device: 1 # -1 for cpu, 0 for cuda:0
+    num_workers: 1
+    encoder_layers: 1 
+    decoder_layers: 2 
+    activation: 'relu'
+    output_activation: 'relu'
+    S: 5 
+    delta: 0.05
+    synchronize: True
+    batch_size: 512
+    freq_warmup: 5 
+    sin_warmup: 5 
+    nb_epoch: 50
+rans_MSL:
+    dataset_id: MSL
+    device: 1 # -1 for cpu, 0 for cuda:0
+    num_workers: 1
+    encoder_layers: 1 
+    decoder_layers: 2 
+    activation: 'relu'
+    output_activation: 'relu'
+    S: 5 
+    delta: 0.05
+    synchronize: True
+    batch_size: 512
+    freq_warmup: 5 
+    sin_warmup: 50
+    nb_epoch: 100
\ No newline at end of file
diff --git a/benchmark/benchmark_config/model_config/tranad.yaml b/benchmark/benchmark_config/model_config/tranad.yaml
new file mode 100644
index 0000000..1f9f460
--- /dev/null
+++ b/benchmark/benchmark_config/model_config/tranad.yaml
@@ -0,0 +1,64 @@
+Base:
+    model_id: tranad
+    normalize: "minmax"
+
+tranad_SMD:
+    dataset_id: SMD
+    num_workers: 1
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    device: 3  # -1 for cpu, 0 for cuda:0
+    lr: 0.001
+    hidden_size: 64
+tranad_ASD:
+    dataset_id: ASD
+    num_workers: 1
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    device: 3  # -1 for cpu, 0 for cuda:0
+    lr: 0.001
+    hidden_size: 64
+tranad_SWAT:
+    dataset_id: SWAT
+    num_workers: 1
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    device: 3  # -1 for cpu, 0 for cuda:0
+    lr: 0.001
+    hidden_size: 64
+tranad_WADI:
+    dataset_id: WADI
+    num_workers: 1
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    device: 3  # -1 for cpu, 0 for cuda:0
+    lr: 0.001
+    hidden_size: 64
+tranad_SMAP:
+    dataset_id: SMAP
+    num_workers: 1
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    device: 3  # -1 for cpu, 0 for cuda:0
+    lr: 0.001
+    hidden_size: 64
+tranad_MSL:
+    dataset_id: MSL
+    num_workers: 1
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    nb_epoch: 10
+    device: 3  # -1 for cpu, 0 for cuda:0
+    lr: 0.001
+    hidden_size: 64
diff --git a/benchmark/benchmark_config/model_config/usad.yaml b/benchmark/benchmark_config/model_config/usad.yaml
new file mode 100644
index 0000000..6448f5d
--- /dev/null
+++ b/benchmark/benchmark_config/model_config/usad.yaml
@@ -0,0 +1,70 @@
+Base:
+    model_id: usad
+    normalize: "minmax"
+
+usad_SMD:
+    dataset_id: SMD
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    num_workers: 1
+    device: 0  # -1 for cpu, 0 for cuda:0stride: 5
+    nb_epoch: 10
+    lr: 0.001
+    hidden_size: 64
+usad_ASD:
+    dataset_id: ASD
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    num_workers: 1
+    device: 0  # -1 for cpu, 0 for cuda:0stride: 5
+    nb_epoch: 10
+    lr: 0.001
+    hidden_size: 64
+usad_SWAT:
+    dataset_id: SWAT
+    batch_size: 256
+    window_size: 100
+    stride: 1
+    num_workers: 1
+    device: 0  # -1 for cpu, 0 for cuda:0stride: 5
+    nb_epoch: 10
+    lr: 0.001
+    hidden_size: 64
+usad_WADI:
+    dataset_id: WADI
+    batch_size: 256
+    window_size: 100
+    stride: 1
+    num_workers: 1
+    device: 0  # -1 for cpu, 0 for cuda:0stride: 5
+    nb_epoch: 10
+    lr: 0.001
+    hidden_size: 64
+usad_SMAP:
+    dataset_id: SMAP
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    num_workers: 1
+    device: 0  # -1 for cpu, 0 for cuda:0stride: 5
+    nb_epoch: 10
+    lr: 0.001
+    hidden_size: 64
+usad_MSL:
+    dataset_id: MSL
+    batch_size: 512
+    window_size: 100
+    stride: 1
+    num_workers: 1
+    device: 0  # -1 for cpu, 0 for cuda:0stride: 5
+    nb_epoch: 10
+    lr: 0.001
+    hidden_size: 64
+    
+    
+    
+    
+    
+    
diff --git a/benchmark/dagmm_benchmark.py b/benchmark/dagmm_benchmark.py
new file mode 100644
index 0000000..85b86b8
--- /dev/null
+++ b/benchmark/dagmm_benchmark.py
@@ -0,0 +1,95 @@
+import sys
+
+sys.path.append("../")
+import logging
+import argparse
+from common import data_preprocess
+from common.dataloader import load_dataset
+from common.utils import seed_everything, load_config, set_logger, print_to_json
+from common.evaluation import Evaluator, TimeTracker
+from common.exp import store_entity
+from networks.dagmm.dagmm import DAGMM
+
+seed_everything()
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="./benchmark_config/",
+        help="The config directory.",
+    )
+    parser.add_argument("--expid", type=str, default="dagmm_SMD")
+    parser.add_argument("--gpu", type=int, default=-1)
+    args = vars(parser.parse_args()) 
+
+    config_dir = args["config"]
+    experiment_id = args["expid"]
+
+    params = load_config(config_dir, experiment_id)
+    set_logger(params, args)
+    logging.info(print_to_json(params))
+
+    data_dict = load_dataset(
+        data_root=params["data_root"],
+        entities=params["entities"],
+        dim=params["dim"],
+        valid_ratio=params["valid_ratio"],
+        test_label_postfix=params["test_label_postfix"],
+        test_postfix=params["test_postfix"],
+        train_postfix=params["train_postfix"],
+        nrows=params["nrows"],
+    )
+
+    # preprocessing
+    pp = data_preprocess.preprocessor(model_root=params["model_root"])
+    data_dict = pp.normalize(data_dict, method=params["normalize"])
+
+    evaluator = Evaluator(**params["eval"])
+    for entity in params["entities"]:
+        logging.info("Fitting dataset: {}".format(entity))
+
+        train = data_dict[entity]["train"]
+        test = data_dict[entity]["test"]
+        test_label = data_dict[entity]["test_label"]
+
+        model = DAGMM(
+            comp_hiddens=params["compression_hiddens"],
+            est_hiddens=params["estimation_hiddens"],
+            est_dropout_ratio=params["estimation_dropout_ratio"],
+            minibatch_size=params["batch_size"],
+            epoch_size=params["nb_epoch"],
+            learning_rate=params["lr"],
+            lambda1=params["lambdaone"],
+            lambda2=params["lambdatwo"],
+        )
+
+        # predict anomaly score
+        tt = TimeTracker(nb_epoch=params["nb_epoch"])
+
+        tt.train_start()
+        model.fit(train)
+        tt.train_end()
+
+        train_anomaly_score = model.predict_prob(test)
+        tt.test_start()
+        anomaly_score = model.predict_prob(test)
+        tt.test_end()
+
+        anomaly_label = test_label
+
+        store_entity(
+            params,
+            entity,
+            train_anomaly_score,
+            anomaly_score,
+            anomaly_label,
+            time_tracker=tt.get_data(),
+        )
+        del model
+    evaluator.eval_exp(
+        exp_folder=params["model_root"],
+        entities=params["entities"],
+        merge_folder=params["benchmark_dir"],
+        extra_params=params,
+    )
diff --git a/benchmark/ganf_benchmark.py b/benchmark/ganf_benchmark.py
new file mode 100644
index 0000000..9761733
--- /dev/null
+++ b/benchmark/ganf_benchmark.py
@@ -0,0 +1,123 @@
+import os
+
+os.chdir(os.path.dirname(os.path.realpath(__file__)))
+import sys
+
+sys.path.append("../")
+import logging
+import argparse
+from common import data_preprocess
+from common.dataloader import load_dataset, get_dataloaders
+from common.utils import seed_everything, load_config, set_logger, print_to_json
+from common.evaluation import Evaluator, TimeTracker
+from common.exp import store_entity
+from networks.ganf.GANF import GANF
+
+seed_everything()
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="./benchmark_config/",
+        help="The config directory.",
+    )
+    parser.add_argument("--expid", type=str, default="ganf_SMD")
+    parser.add_argument("--gpu", type=int, default=-1)
+    args = vars(parser.parse_args())
+
+    config_dir = args["config"]
+    experiment_id = args["expid"]
+
+    params = load_config(config_dir, experiment_id)
+    set_logger(params, args)
+    logging.info(print_to_json(params))
+
+    data_dict = load_dataset(
+        data_root=params["data_root"],
+        entities=params["entities"],
+        valid_ratio=params["valid_ratio"],
+        dim=params["dim"],
+        test_label_postfix=params["test_label_postfix"],
+        test_postfix=params["test_postfix"],
+        train_postfix=params["train_postfix"],
+        nrows=params["nrows"],
+    )
+
+    # preprocessing
+    pp = data_preprocess.preprocessor(model_root=params["model_root"])
+    data_dict = pp.normalize(data_dict, method=params["normalize"])
+
+    # sliding windows
+    window_dict = data_preprocess.generate_windows(
+        data_dict,
+        window_size=params["window_size"],
+        stride=params["stride"],
+    )
+
+    # train/test on each entity put here
+    evaluator = Evaluator(**params["eval"])
+    for entity in params["entities"]:
+        logging.info("Fitting dataset: {}".format(entity))
+        windows = window_dict[entity]
+        train_windows = windows["train_windows"]
+        test_windows = windows["test_windows"]
+
+        train_loader, _, test_loader = get_dataloaders(
+            train_windows,
+            test_windows,
+            next_steps=0,
+            batch_size=params["batch_size"],
+            shuffle=params["shuffle"],
+            num_workers=params["num_workers"],
+        )
+
+        model = GANF(
+            n_blocks=params["n_blocks"],
+            input_size=params["input_size"],
+            hidden_size=params["hidden_size"],
+            n_hidden=params["n_hidden"],
+            dropout=params["dropout"],
+            batch_norm=params["batch_norm"],
+            model_root=params["model_root"],
+            device=params["device"],
+        )
+
+        tt = TimeTracker(nb_epoch=params["nb_epoch"])
+        tt.train_start()
+        model.fit(
+            train_loader,
+            n_sensor=params["dim"],
+            weight_decay=params["weight_decay"],
+            n_epochs=params["nb_epoch"],
+            lr=params["lr"],
+            h_tol=params["h_tol"],
+            rho_max=params["rho_max"],
+            lambda1=params["lambda1"],
+            rho_init=params["rho_init"],
+            alpha_init=params["alpha_init"],
+        )
+        tt.train_end()
+
+        tt.test_start()
+        anomaly_score, anomaly_label = model.predict_prob(
+            test_loader, windows["test_label"]
+        )
+        tt.test_end()
+
+        train_anomaly_score = model.predict_prob(train_loader)
+
+        store_entity(
+            params,
+            entity,
+            train_anomaly_score,
+            anomaly_score,
+            anomaly_label,
+            time_tracker=tt.get_data(),
+        )
+    evaluator.eval_exp(
+        exp_folder=params["model_root"],
+        entities=params["entities"],
+        merge_folder=params["benchmark_dir"],
+        extra_params=params,
+    )
diff --git a/benchmark/iforest_benchmark.py b/benchmark/iforest_benchmark.py
new file mode 100644
index 0000000..d15c6d2
--- /dev/null
+++ b/benchmark/iforest_benchmark.py
@@ -0,0 +1,89 @@
+import sys
+
+sys.path.append("../")
+import logging
+import argparse
+
+from common import data_preprocess
+from common.dataloader import load_dataset
+from common.utils import seed_everything, load_config, set_logger, print_to_json
+from common.exp import store_entity
+from common.evaluation import Evaluator, TimeTracker
+from pyod.models.iforest import IForest
+
+
+seed_everything()
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="./benchmark_config/",
+        help="The config directory.",
+    )
+    parser.add_argument("--expid", type=str, default="iforest_SMD")
+    parser.add_argument("--gpu", type=int, default=-1)
+    args = vars(parser.parse_args())
+
+    config_dir = args["config"]
+    experiment_id = args["expid"]
+
+    params = load_config(config_dir, experiment_id)
+    set_logger(params, args)
+    logging.info(print_to_json(params))
+
+    data_dict = load_dataset(
+        data_root=params["data_root"],
+        entities=params["entities"],
+        dim=params["dim"],
+        valid_ratio=params["valid_ratio"],
+        test_label_postfix=params["test_label_postfix"],
+        test_postfix=params["test_postfix"],
+        train_postfix=params["train_postfix"],
+        nrows=params["nrows"],
+    )
+
+    # preprocessing
+    pp = data_preprocess.preprocessor(model_root=params["model_root"])
+    data_dict = pp.normalize(data_dict, method=params["normalize"])
+
+    # train/test on each entity put here
+    evaluator = Evaluator(**params["eval"])
+    for entity in params["entities"]:
+        logging.info("Fitting dataset: {}".format(entity))
+
+        train = data_dict[entity]["train"]
+        test = data_dict[entity]["test"]
+        test_label = data_dict[entity]["test_label"]
+
+        model = IForest(n_estimators=params["n_estimators"])
+
+        tt = TimeTracker()
+        tt.train_start()
+        model.fit(train)
+        tt.train_end()
+
+        train_anomaly_score = model.decision_function(train)
+
+        tt.test_start()
+        anomaly_score = model.decision_function(test)
+        tt.test_end()
+
+        anomaly_label = test_label
+
+        # Make evaluation
+        store_entity(
+            params,
+            entity,
+            train_anomaly_score,
+            anomaly_score,
+            anomaly_label,
+            time_tracker=tt.get_data(),
+        )
+    evaluator.eval_exp(
+        exp_folder=params["model_root"],
+        entities=params["entities"],
+        merge_folder=params["benchmark_dir"],
+        extra_params=params,
+        eval_single=True,
+    )
diff --git a/benchmark/interfusion_benchmark.py b/benchmark/interfusion_benchmark.py
new file mode 100644
index 0000000..5c46c2a
--- /dev/null
+++ b/benchmark/interfusion_benchmark.py
@@ -0,0 +1,101 @@
+# -*- coding: utf-8 -*-
+import os
+import sys
+
+sys.path.append("../")
+
+import logging
+from common import data_preprocess
+from common.dataloader import load_dataset
+from common.utils import seed_everything, load_config, set_logger, print_to_json
+from networks.InterFusion import InterFusion
+from common.evaluation import Evaluator, TimeTracker
+from common.exp import store_entity
+
+seed_everything()
+
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="./benchmark_config/",
+        help="The config directory.",
+    )
+    parser.add_argument("--expid", type=str, default="interfusion_SMD")
+    parser.add_argument("--gpu", type=int, default=-1)
+    args = vars(parser.parse_args())
+
+    config_dir = args["config"]
+    experiment_id = args["expid"]
+
+    params = load_config(config_dir, experiment_id)
+    set_logger(params, args)
+    logging.info(print_to_json(params))
+
+    data_dict = load_dataset(
+        data_root=params["data_root"],
+        entities=params["entities"],
+        dim=params["dim"],
+        valid_ratio=params["valid_ratio"],
+        test_label_postfix=params["test_label_postfix"],
+        test_postfix=params["test_postfix"],
+        train_postfix=params["train_postfix"],
+        nrows=params["nrows"],
+    )
+    pp = data_preprocess.preprocessor(model_root=params["model_root"])
+    data_dict = pp.normalize(data_dict, method=params["normalize"])
+
+    # train/test on each entity put here
+    evaluator = Evaluator(**params["eval"])
+    for entity in params["entities"]:
+        logging.info("Fitting dataset: {}".format(entity))
+        train = data_dict[entity]["train"]
+        valid = data_dict[entity].get("valid", None)
+        test, test_label = (
+            data_dict[entity]["test"],
+            data_dict[entity]["test_label"],
+        )
+
+        model = InterFusion(
+            dataset=params["dataset"],
+            model_root=params["model_root"],
+            dim=params["dim"],
+        )
+        tt = TimeTracker(nb_epoch=params["nb_epoch"])
+
+        tt.train_start()
+        model.fit(
+            x_train=train,
+            x_valid=valid,
+            lr=params["lr"],
+            window_size=params["window_size"],
+            batch_size=params["batch_size"],
+            pretrain_max_epoch=params["pretrain_max_epoch"],
+            max_epoch=params["nb_epoch"],
+        )
+        tt.train_end()
+
+        train_anomaly_score = model.predict_prob(train, None)
+
+        tt.test_start()
+        anomaly_score, anomaly_label = model.predict_prob(test, test_label)
+        tt.test_end()
+
+        store_entity(
+            params,
+            entity,
+            train_anomaly_score,
+            anomaly_score,
+            anomaly_label,
+            time_tracker=tt.get_data(),
+        )
+        del model
+    evaluator.eval_exp(
+        exp_folder=params["model_root"],
+        entities=params["entities"],
+        merge_folder=params["benchmark_dir"],
+        extra_params=params,
+    )
diff --git a/benchmark/lstm_benchmark.py b/benchmark/lstm_benchmark.py
new file mode 100644
index 0000000..4c9b6cf
--- /dev/null
+++ b/benchmark/lstm_benchmark.py
@@ -0,0 +1,120 @@
+import os
+
+os.chdir(os.path.dirname(os.path.realpath(__file__)))
+import sys
+
+sys.path.append("../")
+import logging
+from common import data_preprocess
+from common.dataloader import get_dataloaders, load_dataset
+from common.utils import seed_everything, load_config, set_logger, print_to_json
+from common.evaluation import Evaluator, TimeTracker
+from common.exp import store_entity
+from networks.lstm import LSTM
+
+
+seed_everything()
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="./benchmark_config/",
+        help="The config directory.",
+    )
+    parser.add_argument("--expid", type=str, default="lstm_SMD")
+    parser.add_argument("--gpu", type=int, default=-1)
+    args = vars(parser.parse_args())
+
+    config_dir = args["config"]
+    experiment_id = args["expid"]
+
+    params = load_config(config_dir, experiment_id)
+    set_logger(params, args)
+    logging.info(print_to_json(params))
+
+    data_dict = load_dataset(
+        data_root=params["data_root"],
+        entities=params["entities"],
+        dim=params["dim"],
+        valid_ratio=params["valid_ratio"],
+        test_label_postfix=params["test_label_postfix"],
+        test_postfix=params["test_postfix"],
+        train_postfix=params["train_postfix"],
+        nrows=params["nrows"],
+    )
+
+    # preprocessing
+    pp = data_preprocess.preprocessor(model_root=params["model_root"])
+    data_dict = pp.normalize(data_dict, method=params["normalize"])
+
+    # sliding windows
+    window_dict = data_preprocess.generate_windows(
+        data_dict,
+        window_size=params["window_size"],
+        stride=params["stride"],
+    )
+
+    # train/test on each entity put here
+    evaluator = Evaluator(**params["eval"])
+    for entity in params["entities"]:
+        logging.info("Fitting dataset: {}".format(entity))
+        windows = window_dict[entity]
+        train_windows = windows["train_windows"]
+        test_windows = windows["test_windows"]
+
+        train_loader, _, test_loader = get_dataloaders(
+            train_windows,
+            test_windows,
+            batch_size=params["batch_size"],
+            num_workers=params["num_workers"],
+        )
+
+        model = LSTM(
+            in_channels=params["dim"],
+            num_layers=params["num_layers"],
+            dropout=params["dropout"],
+            window_size=params["window_size"],
+            prediction_length=params["prediction_length"],
+            prediction_dims=params["prediction_dims"],
+            patience=params["patience"],
+            save_path=params["model_root"],
+            nb_epoch=params["nb_epoch"],
+            lr=params["lr"],
+            device=params["device"],
+        )
+
+        tt = TimeTracker(nb_epoch=params["nb_epoch"])
+        tt.train_start()
+        model.fit(
+            train_loader,
+            test_loader=test_loader,
+            test_label=windows["test_label"],
+        )
+        tt.train_end()
+
+        model.load_encoder()
+        train_anomaly_score = model.predict_prob(train_loader)
+
+        tt.test_start()
+        anomaly_score, anomaly_label = model.predict_prob(
+            test_loader, windows["test_label"]
+        )
+        tt.test_end()
+
+        store_entity(
+            params,
+            entity,
+            train_anomaly_score,
+            anomaly_score,
+            anomaly_label,
+            time_tracker=tt.get_data(),
+        )
+    evaluator.eval_exp(
+        exp_folder=params["model_root"],
+        entities=params["entities"],
+        merge_folder=params["benchmark_dir"],
+        extra_params=params,
+    )
diff --git a/benchmark/mscred_benchmark.py b/benchmark/mscred_benchmark.py
new file mode 100644
index 0000000..6e85e5c
--- /dev/null
+++ b/benchmark/mscred_benchmark.py
@@ -0,0 +1,104 @@
+import sys
+
+sys.path.append("../")
+
+import logging
+from common.dataloader import load_dataset
+from common import data_preprocess
+from common.dataloader import load_dataset, get_dataloaders
+from common.utils import seed_everything, load_config, set_logger, print_to_json
+from networks.mscred import MSCRED
+from common.evaluation import Evaluator, TimeTracker
+from common.exp import store_entity
+
+
+seed_everything()
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="./benchmark_config/",
+        help="The config directory.",
+    )
+    parser.add_argument("--expid", type=str, default="mscred_SMD")
+    parser.add_argument("--gpu", type=int, default=-1)
+    args = vars(parser.parse_args())
+
+    config_dir = args["config"]
+    experiment_id = args["expid"]
+
+    params = load_config(config_dir, experiment_id)
+    set_logger(params, args)
+    logging.info(print_to_json(params))
+
+    data_dict = load_dataset(
+        data_root=params["data_root"],
+        entities=params["entities"],
+        dim=params["dim"],
+        valid_ratio=params["valid_ratio"],
+        test_label_postfix=params["test_label_postfix"],
+        test_postfix=params["test_postfix"],
+        train_postfix=params["train_postfix"],
+        nrows=params["nrows"],
+    )
+    # sliding windows
+    window_dict = data_preprocess.generate_windows(
+        data_dict,
+        window_size=params["window_size"],
+        stride=params["stride"],
+    )
+    # train/test on each entity put here
+    evaluator = Evaluator(**params["eval"])
+    for entity in params["entities"]:
+        logging.info("Fitting dataset: {}".format(entity))
+        windows = window_dict[entity]
+        train_windows = windows["train_windows"]
+        test_windows = windows["test_windows"]
+
+        train_loader, _, test_loader = get_dataloaders(
+            train_windows, test_windows, batch_size=params["batch_size"]
+        )
+
+        model = MSCRED(
+            params["dim"],
+            params["window_size"],
+            lr=params["lr"],
+            model_root=params["model_root"],
+            device=params["device"],
+        )
+
+        tt = TimeTracker(nb_epoch=params["nb_epoch"])
+
+        tt.train_start()
+        model.fit(
+            params["nb_epoch"],
+            train_loader,
+            training=True,
+        )
+        tt.train_end()
+
+        train_anomaly_score = model.predict_prob(train_loader)
+
+        tt.test_start()
+        anomaly_score, anomaly_label = model.predict_prob(
+            test_loader, windows["test_label"]
+        )
+        tt.test_end()
+
+        store_entity(
+            params,
+            entity,
+            train_anomaly_score,
+            anomaly_score,
+            anomaly_label,
+            time_tracker=tt.get_data(),
+        )
+    evaluator.eval_exp(
+        exp_folder=params["model_root"],
+        entities=params["entities"],
+        merge_folder=params["benchmark_dir"],
+        extra_params=params,
+    )
diff --git a/benchmark/mtad_gat_benchmark.py b/benchmark/mtad_gat_benchmark.py
new file mode 100644
index 0000000..4e58ce4
--- /dev/null
+++ b/benchmark/mtad_gat_benchmark.py
@@ -0,0 +1,130 @@
+import os
+
+os.chdir(os.path.dirname(os.path.realpath(__file__)))
+import sys
+
+sys.path.append("../")
+import logging
+from common import data_preprocess
+from common.dataloader import load_dataset, get_dataloaders
+from common.utils import seed_everything, load_config, set_logger, print_to_json
+from common.evaluation import Evaluator, TimeTracker
+from common.exp import store_entity
+from networks.mtad_gat import MTAD_GAT
+
+
+seed_everything()
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="./benchmark_config/",
+        help="The config directory.",
+    )
+    parser.add_argument("--expid", type=str, default="mtad_gat_SMD")
+    parser.add_argument("--gpu", type=int, default=-1)
+    args = vars(parser.parse_args())
+
+    config_dir = args["config"]
+    experiment_id = args["expid"]
+
+    params = load_config(config_dir, experiment_id)
+    set_logger(params, args)
+    logging.info(print_to_json(params))
+
+    data_dict = load_dataset(
+        data_root=params["data_root"],
+        entities=params["entities"],
+        valid_ratio=params["valid_ratio"],
+        dim=params["dim"],
+        test_label_postfix=params["test_label_postfix"],
+        test_postfix=params["test_postfix"],
+        train_postfix=params["train_postfix"],
+        nrows=params["nrows"],
+    )
+
+    # preprocessing
+    pp = data_preprocess.preprocessor(model_root=params["model_root"])
+    data_dict = pp.normalize(data_dict, method=params["normalize"])
+
+    # sliding windows
+    window_dict = data_preprocess.generate_windows(
+        data_dict,
+        window_size=params["window_size"],
+        stride=params["stride"],
+    )
+
+    # train/test on each entity put here
+    evaluator = Evaluator(**params["eval"])
+    for entity in params["entities"]:
+        logging.info("Fitting dataset: {}".format(entity))
+        windows = window_dict[entity]
+        train_windows = windows["train_windows"]
+        test_windows = windows["test_windows"]
+
+        train_loader, _, test_loader = get_dataloaders(
+            train_windows,
+            test_windows,
+            next_steps=1,
+            batch_size=params["batch_size"],
+            shuffle=params["shuffle"],
+            num_workers=params["num_workers"],
+        )
+
+        model = MTAD_GAT(
+            n_features=params["dim"],
+            window_size=params["window_size"],
+            out_dim=params["dim"],
+            kernel_size=params["kernel_size"],
+            feat_gat_embed_dim=params["feat_gat_embed_dim"],
+            time_gat_embed_dim=params["time_gat_embed_dim"],
+            use_gatv2=params["use_gatv2"],
+            gru_n_layers=params["gru_n_layers"],
+            gru_hid_dim=params["gru_hid_dim"],
+            forecast_n_layers=params["forecast_n_layers"],
+            forecast_hid_dim=params["forecast_hid_dim"],
+            recon_n_layers=params["recon_n_layers"],
+            recon_hid_dim=params["recon_hid_dim"],
+            dropout=params["dropout"],
+            alpha=params["alpha"],
+            device=params["device"],
+        )
+
+        tt = TimeTracker(nb_epoch=params["nb_epoch"])
+
+        tt.train_start()
+        model.fit(
+            train_loader,
+            val_loader=None,
+            n_epochs=params["nb_epoch"],
+            batch_size=params["batch_size"],
+            init_lr=params["init_lr"],
+            model_root=params["model_root"],
+        )
+        tt.train_end()
+
+        train_anomaly_score = model.predict_prob(train_loader, gamma=params["gamma"])
+
+        tt.test_start()
+        anomaly_score, anomaly_label = model.predict_prob(
+            test_loader, gamma=params["gamma"], window_labels=windows["test_label"]
+        )
+        tt.test_end()
+
+        store_entity(
+            params,
+            entity,
+            train_anomaly_score,
+            anomaly_score,
+            anomaly_label,
+            time_tracker=tt.get_data(),
+        )
+    evaluator.eval_exp(
+        exp_folder=params["model_root"],
+        entities=params["entities"],
+        merge_folder=params["benchmark_dir"],
+        extra_params=params,
+    )
diff --git a/benchmark/omnianomaly_benchmark.py b/benchmark/omnianomaly_benchmark.py
new file mode 100644
index 0000000..3ee265c
--- /dev/null
+++ b/benchmark/omnianomaly_benchmark.py
@@ -0,0 +1,112 @@
+# -*- coding: utf-8 -*-
+import os
+import sys
+
+sys.path.append("../")
+
+import logging
+import warnings
+
+warnings.filterwarnings("ignore", category=DeprecationWarning)
+warnings.filterwarnings("ignore", category=FutureWarning)
+
+from networks.omni_anomaly.detector import OmniDetector
+
+from common import data_preprocess
+from common.dataloader import get_dataloaders, load_dataset
+from common.utils import seed_everything, load_config, set_logger, print_to_json
+from common.evaluation import Evaluator, TimeTracker
+from common.exp import store_entity
+
+
+seed_everything()
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="./benchmark_config/",
+        help="The config directory.",
+    )
+    parser.add_argument("--expid", type=str, default="omnianomaly_SMD")
+    parser.add_argument("--gpu", type=int, default=-1)
+    args = vars(parser.parse_args())
+
+    config_dir = args["config"]
+    experiment_id = args["expid"]
+
+    params = load_config(config_dir, experiment_id)
+    set_logger(params, args)
+    logging.info(print_to_json(params))
+
+    data_dict = load_dataset(
+        data_root=params["data_root"],
+        entities=params["entities"],
+        dim=params["dim"],
+        valid_ratio=params["valid_ratio"],
+        test_label_postfix=params["test_label_postfix"],
+        test_postfix=params["test_postfix"],
+        train_postfix=params["train_postfix"],
+        nrows=params["nrows"],
+    )
+
+    # preprocessing
+    pp = data_preprocess.preprocessor(model_root=params["model_root"])
+    data_dict = pp.normalize(data_dict, method=params["normalize"])
+
+    # sliding windows
+    window_dict = data_preprocess.generate_windows(
+        data_dict,
+        window_size=params["window_size"],
+        stride=params["stride"],
+    )
+
+    # train/test on each entity put here
+    evaluator = Evaluator(**params["eval"], reverse_score=params["reverse_score"])
+    for entity in params["entities"]:
+        logging.info("Fitting dataset: {}".format(entity))
+        windows = window_dict[entity]
+        train_windows = windows["train_windows"]
+        test_windows = windows["test_windows"]
+        test_label_windows = windows["test_label"]
+        # batch data
+        train_loader, _, test_loader = get_dataloaders(
+            train_windows, test_windows, batch_size=params["batch_size"]
+        )
+
+        model = OmniDetector(
+            dim=params["dim"],
+            model_root=params["model_root"],
+            window_size=params["window_size"],
+            initial_lr=params["initial_lr"],
+            l2_reg=params["l2_reg"]
+        )
+        tt = TimeTracker(nb_epoch=params["nb_epoch"])
+
+        tt.train_start()
+        model.fit(train_loader)
+        tt.train_end()
+
+        train_anomaly_score = model.predict_prob(train_loader)
+        tt.test_start()
+        anomaly_score, anomaly_label = model.predict_prob(
+            test_loader, test_label_windows
+        )
+        tt.test_end()
+
+        store_entity(
+            params,
+            entity,
+            train_anomaly_score,
+            anomaly_score,
+            anomaly_label,
+            time_tracker=tt.get_data(),
+        )
+    evaluator.eval_exp(
+        exp_folder=params["model_root"],
+        entities=params["entities"],
+        merge_folder=params["benchmark_dir"],
+        extra_params=params,
+    )
diff --git a/benchmark/tranad_benchmark.py b/benchmark/tranad_benchmark.py
new file mode 100644
index 0000000..4be4087
--- /dev/null
+++ b/benchmark/tranad_benchmark.py
@@ -0,0 +1,109 @@
+import os
+
+os.chdir(os.path.dirname(os.path.realpath(__file__)))
+import sys
+
+sys.path.append("../")
+import logging
+from networks.tranad import *
+from common import data_preprocess
+from common.dataloader import load_dataset, get_dataloaders
+from common.utils import seed_everything, load_config, set_logger, print_to_json
+from networks.tranad.models import TranAD
+from common.evaluation import Evaluator, TimeTracker
+from common.exp import store_entity
+
+seed_everything()
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="./benchmark_config/",
+        help="The config directory.",
+    )
+    parser.add_argument("--expid", type=str, default="tranad_SMD")
+    parser.add_argument("--gpu", type=int, default=-1)
+    args = vars(parser.parse_args())
+
+    config_dir = args["config"]
+    experiment_id = args["expid"]
+
+    params = load_config(config_dir, experiment_id)
+    set_logger(params, args)
+    logging.info(print_to_json(params))
+
+    data_dict = load_dataset(
+        data_root=params["data_root"],
+        entities=params["entities"],
+        valid_ratio=params["valid_ratio"],
+        dim=params["dim"],
+        test_label_postfix=params["test_label_postfix"],
+        test_postfix=params["test_postfix"],
+        train_postfix=params["train_postfix"],
+        nrows=params["nrows"],
+    )
+
+    # preprocessing
+    pp = data_preprocess.preprocessor(model_root=params["model_root"])
+    data_dict = pp.normalize(data_dict, method=params["normalize"])
+
+    # sliding windows
+    window_dict = data_preprocess.generate_windows(
+        data_dict,
+        window_size=params["window_size"],
+        stride=params["stride"],
+    )
+
+    # train/test on each entity put here
+    evaluator = Evaluator(**params["eval"])
+    for entity in params["entities"]:
+        logging.info("Fitting dataset: {}".format(entity))
+        windows = window_dict[entity]
+        train_windows = windows["train_windows"]
+        test_windows = windows["test_windows"]
+
+        train_loader, _, test_loader = get_dataloaders(train_windows, test_windows)
+
+        model = TranAD(
+            params["dim"],
+            params["window_size"],
+            lr=params["lr"],
+            model_root=params["model_root"],
+            device=params["device"],
+        )
+
+        tt = TimeTracker(nb_epoch=params["nb_epoch"])
+
+        tt.train_start()
+        model.fit(
+            params["nb_epoch"],
+            train_loader,
+            training=True,
+        )
+        tt.train_end()
+
+        train_anomaly_score = model.predict_prob(train_loader)
+
+        tt.test_start()
+        anomaly_score, anomaly_label = model.predict_prob(
+            test_loader, windows["test_label"]
+        )
+        tt.test_end()
+
+        store_entity(
+            params,
+            entity,
+            train_anomaly_score,
+            anomaly_score,
+            anomaly_label,
+            time_tracker=tt.get_data(),
+        )
+    evaluator.eval_exp(
+        exp_folder=params["model_root"],
+        entities=params["entities"],
+        merge_folder=params["benchmark_dir"],
+        extra_params=params,
+    )
diff --git a/benchmark/usad_benchmark.py b/benchmark/usad_benchmark.py
new file mode 100644
index 0000000..d1182c7
--- /dev/null
+++ b/benchmark/usad_benchmark.py
@@ -0,0 +1,107 @@
+import sys
+
+sys.path.append("../")
+import logging
+from common import data_preprocess
+from common.dataloader import load_dataset
+from common.utils import seed_everything
+from networks.usad import UsadModel
+from common.utils import seed_everything, load_config, set_logger, print_to_json
+from common.evaluation import Evaluator, TimeTracker
+from common.exp import store_entity
+
+seed_everything()
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="./benchmark_config/",
+        help="The config directory.",
+    )
+    parser.add_argument("--expid", type=str, default="usad_SMD")
+    parser.add_argument("--gpu", type=int, default=-1)
+    args = vars(parser.parse_args())
+
+    config_dir = args["config"]
+    experiment_id = args["expid"]
+
+    params = load_config(config_dir, experiment_id)
+    set_logger(params, args)
+    logging.info(print_to_json(params))
+
+    data_dict = load_dataset(
+        data_root=params["data_root"],
+        entities=params["entities"],
+        valid_ratio=params["valid_ratio"],
+        dim=params["dim"],
+        test_label_postfix=params["test_label_postfix"],
+        test_postfix=params["test_postfix"],
+        train_postfix=params["train_postfix"],
+        nrows=params["nrows"],
+    )
+
+    # preprocessing
+    pp = data_preprocess.preprocessor(model_root=params["model_root"])
+    data_dict = pp.normalize(data_dict, method=params["normalize"])
+
+    # sliding windows
+    window_dict = data_preprocess.generate_windows(
+        data_dict,
+        window_size=params["window_size"],
+        stride=params["stride"],
+    )
+
+    # train/test on each entity put here
+    evaluator = Evaluator(**params["eval"])
+    for entity in params["entities"]:
+        logging.info("Fitting dataset: {}".format(entity))
+        windows = window_dict[entity]
+        train_windows = windows["train_windows"]
+        test_windows = windows["test_windows"]
+        test_windows_label = windows["test_label"]
+
+        model = UsadModel(
+            w_size=train_windows.shape[1] * train_windows.shape[2],
+            z_size=train_windows.shape[1] * params["hidden_size"],
+            device=params["device"],
+        )
+        tt = TimeTracker(nb_epoch=params["nb_epoch"])
+
+        tt.train_start()
+        model.fit(
+            windows_train=train_windows,
+            windows_val=None,
+            epochs=params["nb_epoch"],
+            batch_size=params["batch_size"],
+        )
+        tt.train_end()
+
+        train_anomaly_score = model.predict_prob(
+            windows_test=train_windows,
+            batch_size=params["batch_size"],
+        )
+        tt.test_start()
+        anomaly_score, anomaly_label = model.predict_prob(
+            windows_test=test_windows,
+            batch_size=params["batch_size"],
+            windows_label=test_windows_label,
+        )
+        tt.test_end()
+
+        store_entity(
+            params,
+            entity,
+            train_anomaly_score,
+            anomaly_score,
+            anomaly_label,
+            time_tracker=tt.get_data(),
+        )
+    evaluator.eval_exp(
+        exp_folder=params["model_root"],
+        entities=params["entities"],
+        merge_folder=params["benchmark_dir"],
+        extra_params=params,
+    )
diff --git a/common/autotuner.py b/common/autotuner.py
new file mode 100644
index 0000000..c14665c
--- /dev/null
+++ b/common/autotuner.py
@@ -0,0 +1,132 @@
+import os
+import yaml
+import itertools
+import hashlib
+import time 
+import subprocess
+import glob
+import numpy as np
+from .utils import load_config, print_to_json, load_dataset_config
+
+
+
+def enumerate_params(config_file, exclude_expid=[]):
+    with open(config_file, "r") as cfg:
+        config_dict = yaml.load(cfg, Loader=yaml.FullLoader)
+    # tuning space
+    tune_dict = config_dict["tuner_space"]
+    for k, v in tune_dict.items():
+        if not isinstance(v, list):
+            tune_dict[k] = [v]
+    experiment_id = config_dict["base_expid"]
+    if "model_config" in config_dict:
+        model_dict = dict()
+        if 'Base' in config_dict["model_config"]:
+            model_dict.update(config_dict["model_config"]['Base'])
+        model_dict.update(config_dict["model_config"][experiment_id])
+    else:
+        base_config_dir = config_dict.get("base_config", os.path.dirname(config_file))
+        model_dict = load_config(base_config_dir, experiment_id)
+
+    dataset_id = config_dict.get("dataset_id", model_dict["dataset_id"])
+    
+    if "dataset_config" in config_dict:
+        dataset_dict = config_dict["dataset_config"][dataset_id]
+    else:
+        dataset_dict = load_dataset_config(base_config_dir, dataset_id)
+        
+    if model_dict["dataset_id"] == "TBD": # rename base expid
+        model_dict["dataset_id"] = dataset_id
+        experiment_id = model_dict["model"] + "_" + dataset_id
+        
+    # key checking
+    tuner_keys = set(tune_dict.keys())
+    base_keys = set(model_dict.keys()).union(set(dataset_dict.keys()))
+    if len(tuner_keys - base_keys) > 0:
+        raise RuntimeError("Invalid params in tuner config: {}".format(tuner_keys - base_keys))
+
+    config_dir = config_file.replace(".yaml", "")
+    if not os.path.exists(config_dir):
+        os.makedirs(config_dir)
+
+    # enumerate dataset para combinations
+    dataset_dict = {k: tune_dict[k] if k in tune_dict else [v] for k, v in dataset_dict.items()}
+    dataset_para_keys = list(dataset_dict.keys())
+    dataset_para_combs = dict()
+    for idx, values in enumerate(itertools.product(*map(dataset_dict.get, dataset_para_keys))):
+        dataset_params = dict(zip(dataset_para_keys, values))
+        # if dataset_params["data_format"] == "h5":
+        #     dataset_para_combs[dataset_id] = dataset_params
+        # else:
+        hash_id = hashlib.md5(print_to_json(dataset_params).encode("utf-8")).hexdigest()[0:8]
+        dataset_para_combs[dataset_id + "_{}".format(hash_id)] = dataset_params
+
+    # dump dataset para combinations to config file
+    dataset_config = os.path.join(config_dir, "dataset_config.yaml")
+    with open(dataset_config, "w") as fw:
+        yaml.dump(dataset_para_combs, fw, default_flow_style=None, indent=4)
+
+    # enumerate model para combinations
+    model_dict = {k: tune_dict[k] if k in tune_dict else [v] for k, v in model_dict.items()}
+    model_para_keys = list(model_dict.keys())
+    model_param_combs = dict()
+    for idx, values in enumerate(itertools.product(*map(model_dict.get, model_para_keys))):
+        model_param_combs[idx + 1] = dict(zip(model_para_keys, values))
+        
+    # update dataset_id into model params
+    merged_param_combs = dict()
+    for idx, item in enumerate(itertools.product(model_param_combs.values(),
+                                                 dataset_para_combs.keys())):
+        para_dict = item[0]
+        para_dict["dataset_id"] = item[1]
+        random_number = ""
+        # if para_dict["debug"]:
+        #     random_number = str(np.random.randint(1e8)) # add a random number to avoid duplicate during debug
+        hash_id = hashlib.md5((print_to_json(para_dict) + random_number).encode("utf-8")).hexdigest()[0:8]
+        hash_expid = experiment_id + "_{:03d}_{}".format(idx + 1, hash_id)
+        if hash_expid not in exclude_expid:
+            merged_param_combs[hash_expid] = para_dict.copy()
+
+    # dump model para combinations to config file
+    model_config = os.path.join(config_dir, "model_config.yaml")
+    with open(model_config, "w") as fw:
+        yaml.dump(merged_param_combs, fw, default_flow_style=None, indent=4)
+    print("Enumerate all tuner configurations done.")    
+    return config_dir
+
+def load_experiment_ids(config_dir):
+    model_configs = glob.glob(os.path.join(config_dir, "model_config.yaml"))
+    if not model_configs:
+        model_configs = glob.glob(os.path.join(config_dir, "model_config/*.yaml"))
+    experiment_id_list = []
+    for config in model_configs:
+        with open(config, "r") as cfg:
+            config_dict = yaml.load(cfg, Loader=yaml.FullLoader)
+            experiment_id_list += config_dict.keys()
+    return sorted(experiment_id_list)
+
+def grid_search(model_name, config_dir, gpu_list, expid_tag=None):
+    config_dir = os.path.abspath(config_dir)
+    experiment_id_list = load_experiment_ids(config_dir)
+    if expid_tag is not None:
+        experiment_id_list = [expid for expid in experiment_id_list if str(expid_tag) in expid]
+        assert len(experiment_id_list) > 0, "tag={} does not match any expid!".format(expid_tag)
+    gpu_list = list(gpu_list)
+    idle_queue = list(range(len(gpu_list)))
+    processes = dict()
+    while len(experiment_id_list) > 0:
+        if len(idle_queue) > 0:
+            idle_idx = idle_queue.pop(0)
+            gpu_id = gpu_list[idle_idx]
+            expid = experiment_id_list.pop(0)
+            cmd = "python -u {}_benchmark.py --config {} --expid {} --gpu {}"\
+                  .format(model_name, config_dir, expid, gpu_id)
+            # print("Run cmd:", cmd)
+            p = subprocess.Popen(cmd.split(), cwd="../benchmark")
+            processes[idle_idx] = p
+        else:
+            time.sleep(5)
+            for idle_idx, p in processes.items():
+                if p.poll() is not None: # terminated
+                    idle_queue.append(idle_idx)
+    [p.wait() for p in processes.values()]
\ No newline at end of file
diff --git a/common/config.py b/common/config.py
new file mode 100644
index 0000000..69692cb
--- /dev/null
+++ b/common/config.py
@@ -0,0 +1,112 @@
+import os
+import time
+import argparse
+import logging
+import glob
+
+
+class eval_config:
+    def __init__(self) -> None:
+        self.metrics = {"f1": True, "adjusted_f1": True, "delay": True, "time": True}
+
+
+class model_config(eval_config):
+    def __init__(self) -> None:
+        super.__init__()
+        pass
+
+
+entities = {
+    "SMD": ["machine-1-{}".format(i) for i in range(1, 9)]
+    + ["machine-2-{}".format(i) for i in range(1, 10)]
+    + ["machine-3-{}".format(i) for i in range(1, 12)],
+    "SMAP": [
+        "P-1",
+        "S-1",
+        "E-1",
+        "E-2",
+        "E-3",
+        "E-4",
+        "E-5",
+        "E-6",
+        "E-7",
+        "E-8",
+        "E-9",
+        "E-10",
+        "E-11",
+        "E-12",
+        "E-13",
+        "A-1",
+        "D-1",
+        "P-2",
+        "P-3",
+        "D-2",
+        "D-3",
+        "D-4",
+        "A-2",
+        "A-3",
+        "A-4",
+        "G-1",
+        "G-2",
+        "D-5",
+        "D-6",
+        "D-7",
+        "F-1",
+        "P-4",
+        "G-3",
+        "T-1",
+        "T-2",
+        "D-8",
+        "D-9",
+        "F-2",
+        "G-4",
+        "T-3",
+        "D-11",
+        "D-12",
+        "B-1",
+        "G-6",
+        "G-7",
+        "P-7",
+        "R-1",
+        "A-5",
+        "A-6",
+        "A-7",
+        "D-13",
+        "P-2",
+        "A-8",
+        "A-9",
+        "F-3",
+    ],
+    "MSL": [
+        "M-6",
+        "M-1",
+        "M-2",
+        "S-2",
+        "P-10",
+        "T-4",
+        "T-5",
+        "F-7",
+        "M-3",
+        "M-4",
+        "M-5",
+        "P-15",
+        "C-1",
+        "C-2",
+        "T-12",
+        "T-13",
+        "F-4",
+        "F-5",
+        "D-14",
+        "T-9",
+        "P-14",
+        "T-8",
+        "P-11",
+        "D-15",
+        "D-16",
+        "M-7",
+        "F-8",
+    ],
+    "WADI": ["wadi"],
+    "SWAT": ["swat"],
+    "WADI_SPLIT": ["wadi-1", "wadi-2", "wadi-3"],  # if OOM occurs
+}
diff --git a/common/data_preprocess.py b/common/data_preprocess.py
new file mode 100644
index 0000000..cb5ed69
--- /dev/null
+++ b/common/data_preprocess.py
@@ -0,0 +1,108 @@
+import logging
+import os
+import pickle
+from collections import defaultdict
+import numpy as np
+from sklearn.preprocessing import (
+    KBinsDiscretizer,
+    MinMaxScaler,
+    RobustScaler,
+    StandardScaler,
+)
+
+from common.utils import load_hdf5, save_hdf5
+
+
+class preprocessor:
+    def __init__(self, model_root):
+        self.model_root = model_root
+        self.vocab_size = None
+        self.discretizer_list = defaultdict(list)
+
+    def save(self, filepath):
+        filepath = os.path.join(filepath, "preprocessor.pkl")
+        logging.info("Saving preprocessor into {}".format(filepath))
+        with open(filepath, "wb") as fw:
+            pickle.dump(self.__dict__, fw)
+
+    def load(self, filepath):
+        filepath = os.path.join(filepath, "preprocessor.pkl")
+        logging.info("Loading preprocessor from {}".format(filepath))
+        with open(filepath, "rb") as fw:
+            self.__dict__.update(pickle.load(fw))
+
+    def normalize(self, data_dict, method="minmax"):
+        if method == "none":
+            return data_dict
+        logging.info("Normalizing data with {}".format(method))
+        normalized_dict = defaultdict(dict)
+        for k, subdata_dict in data_dict.items():
+            # method: minmax, standard, robust
+            # fit_transform using train
+            if method == "minmax":
+                est = MinMaxScaler()
+            elif method == "standard":
+                est = StandardScaler()
+            elif method == "robust":
+                est = RobustScaler()
+
+            train_ = est.fit_transform(subdata_dict["train"])
+            test_ = est.transform(subdata_dict["test"])
+
+            # assign back
+            normalized_dict[k]["train"] = train_
+            normalized_dict[k]["test"] = test_
+            for subk in subdata_dict.keys():
+                if subk not in ["train", "test"]:
+                    normalized_dict[k][subk] = subdata_dict[subk]
+        return normalized_dict
+
+
+def get_windows(ts, labels=None, window_size=128, stride=1, dim=None):
+    i = 0
+    ts_len = ts.shape[0]
+    windows = []
+    label_windows = []
+    while i + window_size < ts_len:
+        if dim is not None:
+            windows.append(ts[i : i + window_size, dim])
+        else:
+            windows.append(ts[i : i + window_size])
+        if labels is not None:
+            label_windows.append(labels[i : i + window_size])
+        i += stride
+    if labels is not None:
+        return np.array(windows, dtype=np.float32), np.array(
+            label_windows, dtype=np.float32
+        )
+    else:
+        return np.array(windows, dtype=np.float32), None
+
+
+def generate_windows(data_dict, window_size=100, nrows=None, stride=1, **kwargs):
+    logging.info("Generating sliding windows (size {}).".format(window_size))
+    results = defaultdict(dict)
+    for dataname, subdata_dict in data_dict.items():
+        for k in ["train", "valid", "test"]:
+            if k not in subdata_dict: continue
+            data = subdata_dict[k][0:nrows]
+            if k == "train":
+                data_windows, _ = get_windows(
+                    data, window_size=window_size, stride=stride
+                )
+                results[dataname]["train_windows"] = data_windows
+            if k == "valid":
+                data_windows, _ = get_windows(
+                    data, window_size=window_size, stride=stride
+                )
+                results[dataname]["valid_windows"] = data_windows
+            if k == "test":
+                test_label = subdata_dict["test_label"][0:nrows]
+                test_windows, test_label = get_windows(
+                    data, test_label, window_size=window_size, stride=1
+                )
+                results[dataname]["test_windows"] = test_windows
+                results[dataname]["test_label"] = test_label
+            logging.info("Windows for {} #: {}".format(k, data_windows.shape))
+
+    return results
diff --git a/common/dataloader.py b/common/dataloader.py
new file mode 100644
index 0000000..5e1028a
--- /dev/null
+++ b/common/dataloader.py
@@ -0,0 +1,135 @@
+import logging
+import os
+import pickle
+import numpy as np
+from collections import defaultdict
+from torch.utils.data import DataLoader, Dataset
+
+data_path_dict = {
+    "SMD": "./datasets/anomaly/SMD/processed",
+    "SMAP": "./datasets/anomaly/SMAP-MSL/processed_SMAP",
+    "MSL": "./datasets/anomaly/SMAP-MSL/processed_MSL",
+    "WADI": "./datasets/anomaly/WADI/processed",
+    "SWAT": "./datasets/anomaly/SWAT/processed",
+    "WADI_SPLIT": "./datasets/anomaly/WADI_SPLIT/processed",
+    "SWAT_SPLIT": "./datasets/anomaly/SWAT_SPLIT/processed",
+}
+
+
+def get_data_dim(dataset):
+    if "SMAP" in dataset:
+        return 25
+    elif "MSL" in dataset:
+        return 55
+    elif "SMD" in dataset:
+        return 38
+    elif "WADI" in dataset:
+        return 93
+    elif "SWAT" in dataset:
+        return 40
+    else:
+        raise ValueError("unknown dataset " + str(dataset))
+
+
+def load_dataset(
+    data_root,
+    entities,
+    valid_ratio,
+    dim,
+    test_label_postfix,
+    test_postfix,
+    train_postfix,
+    nan_value=0,
+    nrows=None,
+):
+    """
+    use_dim: dimension used in multivariate timeseries
+    """
+    logging.info("Loading data from {}".format(data_root))
+
+    data = defaultdict(dict)
+    total_train_len, total_valid_len, total_test_len = 0, 0, 0
+    for dataname in entities:
+        with open(
+            os.path.join(data_root, "{}_{}".format(dataname, train_postfix)), "rb"
+        ) as f:
+            train = pickle.load(f).reshape((-1, dim))[0:nrows, :]
+            if valid_ratio > 0:
+                split_idx = int(len(train) * valid_ratio)
+                train, valid = train[:-split_idx], train[-split_idx:]
+                data[dataname]["valid"] = np.nan_to_num(valid, nan_value)
+                total_valid_len += len(valid)
+            data[dataname]["train"] = np.nan_to_num(train, nan_value)
+            total_train_len += len(train)
+        with open(
+            os.path.join(data_root, "{}_{}".format(dataname, test_postfix)), "rb"
+        ) as f:
+            test = pickle.load(f).reshape((-1, dim))[0:nrows, :]
+            data[dataname]["test"] = np.nan_to_num(test, nan_value)
+            total_test_len += len(test)
+        with open(
+            os.path.join(data_root, "{}_{}".format(dataname, test_label_postfix)), "rb"
+        ) as f:
+            data[dataname]["test_label"] = pickle.load(f).reshape(-1)[0:nrows]
+    logging.info("Loading {} entities done.".format(len(entities)))
+    logging.info(
+        "Train/Valid/Test: {}/{}/{} lines.".format(
+            total_train_len, total_valid_len, total_test_len
+        )
+    )
+
+    return data
+
+
+class sliding_window_dataset(Dataset):
+    def __init__(self, data, next_steps=0):
+        self.data = data
+        self.next_steps = next_steps
+
+    def __getitem__(self, index):
+        if self.next_steps == 0:
+            x = self.data[index]
+            return x
+        else:
+            x = self.data[index, 0 : -self.next_steps]
+            y = self.data[index, -self.next_steps :]
+            return x, y
+
+    def __len__(self):
+        return len(self.data)
+
+
+def get_dataloaders(
+    train_data,
+    test_data,
+    valid_data=None,
+    next_steps=0,
+    batch_size=32,
+    shuffle=True,
+    num_workers=1,
+):
+
+    train_loader = DataLoader(
+        sliding_window_dataset(train_data, next_steps),
+        batch_size=batch_size,
+        shuffle=shuffle,
+        num_workers=num_workers,
+    )
+
+    test_loader = DataLoader(
+        sliding_window_dataset(test_data, next_steps),
+        batch_size=batch_size,
+        shuffle=False,
+        num_workers=num_workers,
+    )
+
+    if valid_data is not None:
+        valid_loader = DataLoader(
+            sliding_window_dataset(valid_data, next_steps),
+            batch_size=batch_size,
+            shuffle=shuffle,
+            num_workers=num_workers,
+        )
+    else:
+        valid_loader = None
+    return train_loader, valid_loader, test_loader
diff --git a/common/evaluation/__init__.py b/common/evaluation/__init__.py
new file mode 100644
index 0000000..1667c2d
--- /dev/null
+++ b/common/evaluation/__init__.py
@@ -0,0 +1,6 @@
+from .point_adjustment import *
+from .spot import *
+from .metrics import *
+from .thresholding import *
+from .eval_pipline import *
+from .time_tracker import *
diff --git a/common/evaluation/eval_pipline.py b/common/evaluation/eval_pipline.py
new file mode 100644
index 0000000..8fe9e2e
--- /dev/null
+++ b/common/evaluation/eval_pipline.py
@@ -0,0 +1,333 @@
+import imp
+import os
+import logging
+import pandas as pd
+import numpy as np
+from sklearn.preprocessing import MinMaxScaler
+from collections import defaultdict, namedtuple
+from common.exp import json_pretty_dump
+from ..utils import load_hdf5, load_json, print_to_json
+from .metrics import compute_binary_metrics, compute_delay
+from .point_adjustment import adjust_pred
+from .thresholding import best_th, eps_th, pot_th
+
+
+def get_comb_key(thresholding, point_adjustment):
+    return "{}{}".format(thresholding, "_adjusted" if point_adjustment else "")
+
+
+def results2csv(results, filepath):
+    columns = [
+        "uptime",
+        "dataset_id",
+        "strategy",
+        "exp_id",
+        "model_id",
+        "length",
+        "f1_adjusted",
+        "pc_adjusted",
+        "rc_adjusted",
+        "f1",
+        "pc",
+        "rc",
+        "delay",
+        "train_time",
+        "test_time",
+        "nb_epoch",
+        "nb_eval_entity",
+        "nb_total_entity",
+    ]
+
+    filedir = os.path.dirname(filepath)
+    os.makedirs(filedir, exist_ok=True)
+
+    total_rows = []
+    basic_info = {
+        key: value for key, value in results.items() if not isinstance(value, dict)
+    }
+
+    for key, value in results.items():
+        if isinstance(value, dict):
+            row = {"strategy": key, **value, **basic_info}
+            total_rows.append(row)
+
+    if os.path.isfile(filepath):
+        logging.info(f"File {filepath} exists, loading directly.")
+        df = pd.read_csv(filepath)
+    else:
+        df = pd.DataFrame()
+    total_rows.extend(df.to_dict(orient="records"))
+    pd.DataFrame(total_rows, columns=columns).to_csv(filepath, index=False)
+    logging.info(f"Appended exp results to {filepath}.")
+
+
+class Evaluator:
+    """
+
+    th (str): options: "best", "eps", "pot"
+    """
+
+    def __init__(
+        self,
+        metrics,
+        thresholding="best",
+        pot_params={"q": 1e-3, "level": 0.99, "dynamic": False},
+        best_params={"target_metric": "f1", "target_direction": "max"},
+        point_adjustment=False,
+        reverse_score=False
+    ):
+        if isinstance(thresholding, str):
+            thresholding = [thresholding]
+        if isinstance(point_adjustment, str):
+            point_adjustment = [point_adjustment]
+
+        self.thresholding = thresholding
+        self.metrics = metrics
+        self.best_params = best_params
+        self.pot_params = pot_params
+        self.point_adjustment = point_adjustment
+        self.reverse_score = reverse_score
+
+    def score2pred(
+        self,
+        thresholding,
+        anomaly_score,
+        anomaly_label,
+        train_anomaly_score=None,
+        point_adjustment=False,
+    ):
+        if self.reverse_score:
+            anomaly_score = -anomaly_score
+
+
+        pred_results = {"anomaly_pred": None, "anomaly_pred_adjusted": None, "th": None}
+
+        if thresholding == "best":
+            th = best_th(
+                anomaly_score,
+                anomaly_label,
+                point_adjustment=point_adjustment,
+                **self.best_params,
+            )
+        if thresholding == "pot":
+            th = pot_th(train_anomaly_score, anomaly_score, **self.pot_params)
+        if thresholding == "eps":
+            th = eps_th(train_anomaly_score, reg_level=1)
+            
+
+        anomaly_pred = (anomaly_score >= th).astype(int)
+
+        pred_results["anomaly_pred"] = anomaly_pred
+        pred_results["th"] = th
+        if self.point_adjustment:
+            pred_results["anomaly_pred_adjusted"] = adjust_pred(
+                anomaly_pred, anomaly_label
+            )
+        return pred_results
+
+    def eval(
+        self,
+        anomaly_label,
+        anomaly_score=None,
+        train_anomaly_score=None,
+    ):
+        eval_results = {}
+        for point_adjustment in self.point_adjustment:
+            for thresholding in self.thresholding:
+                eval_results_tmp = {}
+
+                pred_results = self.score2pred(
+                    thresholding,
+                    anomaly_score,
+                    anomaly_label,
+                    train_anomaly_score,
+                    point_adjustment,
+                )
+                eval_results_tmp["th"] = pred_results["th"]
+                anomaly_pred = pred_results["anomaly_pred"]
+
+                eval_results_tmp.update(
+                    self.cal_metrics(anomaly_pred, anomaly_label, point_adjustment)
+                )
+
+                key = get_comb_key(thresholding, point_adjustment)
+                eval_results[key] = eval_results_tmp
+        return eval_results
+
+    def cal_metrics(self, anomaly_pred, anomaly_label, point_adjustment):
+        logging.info(
+            "Pred pos {}/{}, Label pos {}/{}".format(
+                anomaly_pred.sum(),
+                anomaly_pred.shape[0],
+                anomaly_label.sum(),
+                anomaly_label.shape[0],
+            )
+        )
+        eval_metrics = {"length": anomaly_pred.shape[0]}
+        for metric in self.metrics:
+            if metric in ["f1", "pc", "rc"]:
+                eval_metrics.update(
+                    compute_binary_metrics(
+                        anomaly_pred,
+                        anomaly_label,
+                        point_adjustment,
+                    )
+                )
+            if metric == "delay":
+                eval_metrics["delay"] = compute_delay(anomaly_pred, anomaly_label)
+        return eval_metrics
+
+    def eval_exp(
+        self, exp_folder, entities, merge_folder, extra_params, eval_single=False
+    ):
+        eval_results = {
+            "dataset_id": extra_params["dataset_id"],
+            "exp_id": extra_params["exp_id"],
+            "model_id": extra_params["model_id"],
+            "train_time": 0,
+            "test_time": 0,
+            "nb_epoch": 0,
+            "nb_eval_entity": 0,
+            "nb_total_entity": len(entities),
+            "uptime": extra_params["uptime"],
+        }
+        # score to pred, for every entity
+        merge_dict = {
+            "anomaly_pred": defaultdict(list),
+            "anomaly_label": defaultdict(list),
+        }
+        for entity in entities:
+            entity_folder = os.path.join(exp_folder, entity)
+            try:
+                score_dict = load_hdf5(
+                    os.path.join(entity_folder, f"score_{entity}.hdf5")
+                )
+                time_track = load_json(os.path.join(entity_folder, "time.json"))
+            except:
+                logging.warn("Failed to load entity {}.".format(entity))
+                continue
+
+            # merge efficiency info to eval_results
+            eval_results["train_time"] += float(time_track["train_time"])
+            eval_results["test_time"] += float(time_track["test_time"])
+            eval_results["nb_epoch"] += int(time_track["nb_epoch"])
+            eval_results["nb_eval_entity"] += 1
+
+            thresholds = {}
+            eval_results_single = defaultdict(dict)
+            for point_adjustment in self.point_adjustment:
+                for thresholding in self.thresholding:
+                    EvalKey = namedtuple("key", ["point_adjustment", "thresholding"])
+                    eval_key = EvalKey(point_adjustment, thresholding)
+
+                    pred_results = self.score2pred(
+                        thresholding,
+                        score_dict["anomaly_score"],
+                        score_dict["anomaly_label"],
+                        score_dict["train_anomaly_score"],
+                        point_adjustment,
+                    )
+
+                    key = get_comb_key(eval_key.thresholding, eval_key.point_adjustment)
+                    thresholds[key] = pred_results["th"]
+                    merge_dict["anomaly_pred"][eval_key].append(
+                        pred_results["anomaly_pred"]
+                    )
+                    merge_dict["anomaly_label"][eval_key].append(
+                        score_dict["anomaly_label"]
+                    )
+
+                    if eval_single:
+                        eval_results_single[key]["th"] = pred_results["th"]
+                        eval_results_single[key].update(
+                            self.cal_metrics(
+                                pred_results["anomaly_pred"],
+                                score_dict["anomaly_label"],
+                                point_adjustment,
+                            )
+                        )
+            if eval_single:
+                json_pretty_dump(
+                    eval_results_single,
+                    os.path.join(entity_folder, "eval_results.json"),
+                )
+
+            json_pretty_dump(thresholds, os.path.join(entity_folder, "thresholds.json"))
+
+        # merge effectiveness info to eval_results
+        for eval_key in merge_dict["anomaly_pred"].keys():
+            key = get_comb_key(eval_key.thresholding, eval_key.point_adjustment)
+
+            pred_cat = np.concatenate(merge_dict["anomaly_pred"][eval_key])
+            label_cat = np.concatenate(merge_dict["anomaly_label"][eval_key])
+            eval_result_tmp = self.cal_metrics(
+                pred_cat, label_cat, eval_key.point_adjustment
+            )
+            eval_result_tmp["length"] = pred_cat.shape[0]
+            eval_results[key] = eval_result_tmp
+
+        logging.info(print_to_json(eval_results))
+        logging.info(
+            "Evaluated {}/{} entities.".format(
+                eval_results["nb_eval_entity"], len(entities)
+            )
+        )
+
+        results2csv(
+            eval_results,
+            os.path.join(merge_folder, extra_params["dataset_id"], "bench_results.csv"),
+        )
+
+    def eval_exp_simple(
+        self, exp_folder, entities, merge_folder, extra_params, eval_single=False
+    ):
+        merge_dict = {
+            "anomaly_pred": defaultdict(list),
+            "anomaly_label": defaultdict(list),
+        }
+        for entity in entities:
+            entity_folder = os.path.join(exp_folder, entity)
+            try:
+                score_dict = load_hdf5(
+                    os.path.join(entity_folder, f"score_{entity}.hdf5")
+                )
+                time_track = load_json(os.path.join(entity_folder, "time.json"))
+            except:
+                logging.warn("Failed to load entity {}.".format(entity))
+                continue
+
+            for point_adjustment in self.point_adjustment:
+                for thresholding in self.thresholding:
+                    EvalKey = namedtuple("key", ["point_adjustment", "thresholding"])
+                    eval_key = EvalKey(point_adjustment, thresholding)
+
+                    pred_results = self.score2pred(
+                        thresholding,
+                        score_dict["anomaly_score"],
+                        score_dict["anomaly_label"],
+                        score_dict["train_anomaly_score"],
+                        point_adjustment,
+                    )
+                    merge_dict["anomaly_pred"][eval_key].append(
+                        pred_results["anomaly_pred"]
+                    )
+                    merge_dict["anomaly_label"][eval_key].append(
+                        score_dict["anomaly_label"]
+                    )
+
+        eval_results = {}
+        # merge effectiveness info to eval_results
+        for eval_key in merge_dict["anomaly_pred"].keys():
+            key = get_comb_key(eval_key.thresholding, eval_key.point_adjustment)
+
+            pred_cat = np.concatenate(merge_dict["anomaly_pred"][eval_key])
+            label_cat = np.concatenate(merge_dict["anomaly_label"][eval_key])
+            eval_result_tmp = self.cal_metrics(
+                pred_cat, label_cat, eval_key.point_adjustment
+            )
+            eval_result_tmp["length"] = pred_cat.shape[0]
+            eval_results[key] = eval_result_tmp
+
+            logging.info("{} pred_sum: {}".format(eval_key, pred_cat.sum()))
+
+        logging.info(print_to_json(eval_results))
diff --git a/common/evaluation/metrics.py b/common/evaluation/metrics.py
new file mode 100644
index 0000000..3f9f93b
--- /dev/null
+++ b/common/evaluation/metrics.py
@@ -0,0 +1,51 @@
+import logging
+import numpy as np
+from sklearn.metrics import f1_score, precision_score, recall_score
+from common.evaluation import adjust_pred
+
+
+def compute_binary_metrics(anomaly_pred, anomaly_label, adjustment=False):
+    if not adjustment:
+        eval_anomaly_pred = anomaly_pred
+        metrics = {
+            "f1": f1_score(eval_anomaly_pred, anomaly_label),
+            "pc": precision_score(eval_anomaly_pred, anomaly_label),
+            "rc": recall_score(eval_anomaly_pred, anomaly_label),
+        }
+    else:
+        eval_anomaly_pred = adjust_pred(anomaly_pred, anomaly_label)
+        metrics = {
+            "f1_adjusted": f1_score(eval_anomaly_pred, anomaly_label),
+            "pc_adjusted": precision_score(eval_anomaly_pred, anomaly_label),
+            "rc_adjusted": recall_score(eval_anomaly_pred, anomaly_label),
+        }
+    return metrics
+
+
+def compute_delay(anomaly_pred, anomaly_label):
+    def onehot2interval(arr):
+        result = []
+        record = False
+        for idx, item in enumerate(arr):
+            if item == 1 and not record:
+                start = idx
+                record = True
+            if item == 0 and record:
+                end = idx  # not include the end point, like [a,b)
+                record = False
+                result.append((start, end))
+        return result
+
+    count = 0
+    total_delay = 0
+    pred = np.array(anomaly_pred)
+    label = np.array(anomaly_label)
+    for start, end in onehot2interval(label):
+        pred_interval = pred[start:end]
+        if pred_interval.sum() > 0:
+            delay = np.where(pred_interval == 1)[0][0]
+            delay = delay / len(pred_interval)  # normalized by the interval
+            total_delay += delay
+            count += 1
+    avg_delay = total_delay / (1e-6 + count)
+    return avg_delay
diff --git a/common/evaluation/point_adjustment.py b/common/evaluation/point_adjustment.py
new file mode 100644
index 0000000..afc90eb
--- /dev/null
+++ b/common/evaluation/point_adjustment.py
@@ -0,0 +1,28 @@
+import copy
+
+
+def adjust_pred(pred, label):
+    """
+    Borrow from https://github.com/NetManAIOps/OmniAnomaly/blob/master/omni_anomaly/eval_methods.py
+    """
+    adjusted_pred = copy.deepcopy(pred)
+
+    anomaly_state = False
+    anomaly_count = 0
+    latency = 0
+    for i in range(len(adjusted_pred)):
+        if label[i] and adjusted_pred[i] and not anomaly_state:
+            anomaly_state = True
+            anomaly_count += 1
+            for j in range(i, 0, -1):
+                if not label[j]:
+                    break
+                else:
+                    if not adjusted_pred[j]:
+                        adjusted_pred[j] = True
+                        latency += 1
+        elif not label[i]:
+            anomaly_state = False
+        if anomaly_state:
+            adjusted_pred[i] = True
+    return adjusted_pred
diff --git a/common/evaluation/spot.py b/common/evaluation/spot.py
new file mode 100644
index 0000000..f9765fb
--- /dev/null
+++ b/common/evaluation/spot.py
@@ -0,0 +1,493 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+Created on Mon Dec 12 10:08:16 2016
+
+@author: Alban Siffer 
+@company: Amossys
+@license: GNU GPLv3
+"""
+
+from math import log, floor
+import logging
+from tkinter.font import ROMAN
+import numpy as np
+import pandas as pd
+from scipy.optimize import minimize
+
+"""
+================================= MAIN CLASS ==================================
+"""
+
+
+class SPOT:
+    """
+    This class allows to run SPOT algorithm on univariate dataset (upper-bound)
+
+    Attributes
+    ----------
+    proba : float
+        Detection level (risk), chosen by the user
+
+    extreme_quantile : float
+        current threshold (bound between normal and abnormal events)
+
+    data : numpy.array
+        stream
+
+    init_data : numpy.array
+        initial batch of observations (for the calibration/initialization step)
+
+    init_threshold : float
+        initial threshold computed during the calibration step
+
+    peaks : numpy.array
+        array of peaks (excesses above the initial threshold)
+
+    n : int
+        number of observed values
+
+    Nt : int
+        number of observed peaks
+    """
+
+    def __init__(self, q=1e-4):
+        """
+        Constructor
+
+            Parameters
+            ----------
+            q
+                    Detection level (risk)
+
+            Returns
+            ----------
+        SPOT object
+        """
+        self.proba = q
+        self.extreme_quantile = None
+        self.data = None
+        self.init_data = None
+        self.init_threshold = None
+        self.peaks = None
+        self.n = 0
+        self.Nt = 0
+
+    def __str__(self):
+        s = ""
+        s += "Streaming Peaks-Over-Threshold Object\n"
+        s += "Detection level q = %s\n" % self.proba
+        if self.data is not None:
+            s += "Data imported : Yes\n"
+            s += "\t initialization  : %s values\n" % self.init_data.size
+            s += "\t stream : %s values\n" % self.data.size
+        else:
+            s += "Data imported : No\n"
+            return s
+
+        if self.n == 0:
+            s += "Algorithm initialized : No\n"
+        else:
+            s += "Algorithm initialized : Yes\n"
+            s += "\t initial threshold : %s\n" % self.init_threshold
+
+            r = self.n - self.init_data.size
+            if r > 0:
+                s += "Algorithm run : Yes\n"
+                s += "\t number of observations : %s (%.2f %%)\n" % (
+                    r,
+                    100 * r / self.n,
+                )
+            else:
+                s += "\t number of peaks  : %s\n" % self.Nt
+                s += "\t extreme quantile : %s\n" % self.extreme_quantile
+                s += "Algorithm run : No\n"
+        return s
+
+    def fit(self, init_data, data):
+        """
+        Import data to SPOT object
+
+        Parameters
+            ----------
+            init_data : list, numpy.array or pandas.Series
+                    initial batch to calibrate the algorithm
+
+        data : numpy.array
+                    data for the run (list, np.array or pd.series)
+
+        """
+        if isinstance(data, list):
+            self.data = np.array(data)
+        elif isinstance(data, np.ndarray):
+            self.data = data
+        elif isinstance(data, pd.Series):
+            self.data = data.values
+        else:
+            logging.info("This data format (%s) is not supported" % type(data))
+            return
+
+        if isinstance(init_data, list):
+            self.init_data = np.array(init_data)
+        elif isinstance(init_data, np.ndarray):
+            self.init_data = init_data
+        elif isinstance(init_data, pd.Series):
+            self.init_data = init_data.values
+        elif isinstance(init_data, int):
+            self.init_data = self.data[:init_data]
+            self.data = self.data[init_data:]
+        elif isinstance(init_data, float) & (init_data < 1) & (init_data > 0):
+            r = int(init_data * data.size)
+            self.init_data = self.data[:r]
+            self.data = self.data[r:]
+        else:
+            logging.info("The initial data cannot be set")
+            return
+
+    def add(self, data):
+        """
+        This function allows to append data to the already fitted data
+
+        Parameters
+            ----------
+            data : list, numpy.array, pandas.Series
+                    data to append
+        """
+        if isinstance(data, list):
+            data = np.array(data)
+        elif isinstance(data, np.ndarray):
+            data = data
+        elif isinstance(data, pd.Series):
+            data = data.values
+        else:
+            logging.info("This data format (%s) is not supported" % type(data))
+            return
+
+        self.data = np.append(self.data, data)
+        return
+
+    def initialize(self, level=0.98, min_extrema=False, verbose=False):
+        """
+        Run the calibration (initialization) step
+
+        Parameters
+            ----------
+        level : float
+            (default 0.98) Probability associated with the initial threshold t
+            verbose : bool
+                    (default = True) If True, gives details about the batch initialization
+        verbose: bool
+            (default True) If True, prints log
+        min_extrema bool
+            (default False) If True, find min extrema instead of max extrema
+        """
+        if min_extrema:
+            self.init_data = -self.init_data
+            self.data = -self.data
+            level = 1 - level
+
+        level = level - floor(level)
+
+        n_init = self.init_data.size
+
+        S = np.sort(self.init_data)  # we sort X to get the empirical quantile
+
+        self.init_threshold = S[
+            int(level * n_init)
+        ]  # t is fixed for the whole algorithm
+        logging.info(
+            "n_init: {}, level: {}, level*n_init: {}".format(
+                n_init, level, int(level * n_init)
+            )
+        )
+
+        # initial peaks
+        self.peaks = (
+            self.init_data[self.init_data > self.init_threshold] - self.init_threshold
+        )
+        self.Nt = self.peaks.size
+        self.n = n_init
+
+        if verbose:
+            logging.info("Initial threshold : %s" % self.init_threshold)
+            logging.info("Number of peaks : %s" % self.Nt)
+            logging.info("Grimshaw maximum log-likelihood estimation ... ")
+
+        g, s, l = self._grimshaw()
+        self.extreme_quantile = self._quantile(g, s)
+
+        if verbose:
+            logging.info("[done]")
+            logging.info("\t" + chr(0x03B3) + " = " + str(g))
+            logging.info("\t" + chr(0x03C3) + " = " + str(s))
+            logging.info("\tL = " + str(l))
+            logging.info(
+                "Extreme quantile (probability = %s): %s"
+                % (self.proba, self.extreme_quantile)
+            )
+
+        return
+
+    def _rootsFinder(fun, jac, bounds, npoints, method):
+        """
+        Find possible roots of a scalar function
+
+        Parameters
+        ----------
+        fun : function
+                    scalar function
+        jac : function
+            first order derivative of the function
+        bounds : tuple
+            (min,max) interval for the roots search
+        npoints : int
+            maximum number of roots to output
+        method : str
+            'regular' : regular sample of the search interval, 'random' : uniform (distribution) sample of the search interval
+
+        Returns
+        ----------
+        numpy.array
+            possible roots of the function
+        """
+        if method == "regular":
+            step = (bounds[1] - bounds[0]) / (npoints + 1)
+            X0 = np.arange(bounds[0] + step, bounds[1], step)
+        elif method == "random":
+            X0 = np.random.uniform(bounds[0], bounds[1], npoints)
+
+        def objFun(X, f, jac):
+            g = 0
+            j = np.zeros(X.shape)
+            i = 0
+            for x in X:
+                fx = f(x)
+                g = g + fx ** 2
+                j[i] = 2 * fx * jac(x)
+                i = i + 1
+            return g, j
+
+        opt = minimize(
+            lambda X: objFun(X, fun, jac),
+            X0,
+            method="L-BFGS-B",
+            jac=True,
+            bounds=[bounds] * len(X0),
+        )
+
+        X = opt.x
+        np.round(X, decimals=5)
+        return np.unique(X)
+
+    def _log_likelihood(Y, gamma, sigma):
+        """
+        Compute the log-likelihood for the Generalized Pareto Distribution (μ=0)
+
+        Parameters
+        ----------
+        Y : numpy.array
+                    observations
+        gamma : float
+            GPD index parameter
+        sigma : float
+            GPD scale parameter (>0)
+
+        Returns
+        ----------
+        float
+            log-likelihood of the sample Y to be drawn from a GPD(γ,σ,μ=0)
+        """
+        n = Y.size
+        if gamma != 0:
+            tau = gamma / sigma
+            L = -n * log(sigma) - (1 + (1 / gamma)) * (np.log(1 + tau * Y)).sum()
+        else:
+            L = n * (1 + log(Y.mean()))
+        return L
+
+    def _grimshaw(self, epsilon=1e-8, n_points=10):
+        """
+        Compute the GPD parameters estimation with the Grimshaw's trick
+
+        Parameters
+        ----------
+        epsilon : float
+                    numerical parameter to perform (default : 1e-8)
+        n_points : int
+            maximum number of candidates for maximum likelihood (default : 10)
+
+        Returns
+        ----------
+        gamma_best,sigma_best,ll_best
+            gamma estimates, sigma estimates and corresponding log-likelihood
+        """
+
+        def u(s):
+            return 1 + np.log(s).mean()
+
+        def v(s):
+            return np.mean(1 / s)
+
+        def w(Y, t):
+            s = 1 + t * Y
+            us = u(s)
+            vs = v(s)
+            return us * vs - 1
+
+        def jac_w(Y, t):
+            s = 1 + t * Y
+            us = u(s)
+            vs = v(s)
+            jac_us = (1 / t) * (1 - vs)
+            jac_vs = (1 / t) * (-vs + np.mean(1 / s ** 2))
+            return us * jac_vs + vs * jac_us
+
+        Ym = self.peaks.min()
+        YM = self.peaks.max()
+        Ymean = self.peaks.mean()
+
+        a = -1 / YM
+        if abs(a) < 2 * epsilon:
+            epsilon = abs(a) / n_points
+
+        a = a + epsilon
+        b = 2 * (Ymean - Ym) / (Ymean * Ym)
+        c = 2 * (Ymean - Ym) / (Ym ** 2)
+
+        # We look for possible roots
+        left_zeros = SPOT._rootsFinder(
+            lambda t: w(self.peaks, t),
+            lambda t: jac_w(self.peaks, t),
+            (a + epsilon, -epsilon),
+            n_points,
+            "regular",
+        )
+
+        right_zeros = SPOT._rootsFinder(
+            lambda t: w(self.peaks, t),
+            lambda t: jac_w(self.peaks, t),
+            (b, c),
+            n_points,
+            "regular",
+        )
+
+        # all the possible roots
+        zeros = np.concatenate((left_zeros, right_zeros))
+
+        # 0 is always a solution so we initialize with it
+        gamma_best = 0
+        sigma_best = Ymean
+        ll_best = SPOT._log_likelihood(self.peaks, gamma_best, sigma_best)
+
+        # we look for better candidates
+        for z in zeros:
+            gamma = u(1 + z * self.peaks) - 1
+            sigma = gamma / z
+            ll = SPOT._log_likelihood(self.peaks, gamma, sigma)
+            if ll > ll_best:
+                gamma_best = gamma
+                sigma_best = sigma
+                ll_best = ll
+
+        return gamma_best, sigma_best, ll_best
+
+    def _quantile(self, gamma, sigma):
+        """
+        Compute the quantile at level 1-q
+
+        Parameters
+        ----------
+        gamma : float
+                    GPD parameter
+        sigma : float
+            GPD parameter
+
+        Returns
+        ----------
+        float
+            quantile at level 1-q for the GPD(γ,σ,μ=0)
+        """
+        r = self.n * self.proba / self.Nt
+        if gamma != 0:
+            return self.init_threshold + (sigma / gamma) * (pow(r, -gamma) - 1)
+        else:
+            return self.init_threshold - sigma * log(r)
+
+    def run(self, with_alarm=True, dynamic=True):
+        """
+        Run SPOT on the stream
+        
+        Parameters
+        ----------
+        with_alarm : bool
+		    (default = True) If False, SPOT will adapt the threshold assuming \
+            there is no abnormal values
+
+
+        Returns
+        ----------
+        dict
+            keys : 'thresholds' and 'alarms'
+            
+            'thresholds' contains the extreme quantiles and 'alarms' contains \
+            the indexes of the values which have triggered alarms
+            
+        """
+        if self.n > self.init_data.size:
+            logging.info(
+                "Warning : the algorithm seems to have already been run, you \
+            should initialize before running again"
+            )
+            return {}
+
+        # list of the thresholds
+        th = []
+        alarm = []
+        # Loop over the stream
+        for i in range(self.data.size):
+
+            if not dynamic:
+                if self.data[i] > self.init_threshold and with_alarm:
+                    self.extreme_quantile = self.init_threshold
+                    alarm.append(i)
+            else:
+                # If the observed value exceeds the current threshold (alarm case)
+                if self.data[i] > self.extreme_quantile:
+                    # if we want to alarm, we put it in the alarm list
+                    if with_alarm:
+                        alarm.append(i)
+                    # otherwise we add it in the peaks
+                    else:
+                        self.peaks = np.append(
+                            self.peaks, self.data[i] - self.init_threshold
+                        )
+                        self.Nt += 1
+                        self.n += 1
+                        # and we update the thresholds
+
+                        g, s, l = self._grimshaw()
+                        self.extreme_quantile = self._quantile(g, s)
+
+                # case where the value exceeds the initial threshold but not the alarm ones
+                elif self.data[i] > self.init_threshold:
+                    # we add it in the peaks
+                    self.peaks = np.append(
+                        self.peaks, self.data[i] - self.init_threshold
+                    )
+                    self.Nt += 1
+                    self.n += 1
+                    # and we update the thresholds
+
+                    g, s, l = self._grimshaw()
+                    self.extreme_quantile = self._quantile(g, s)
+                else:
+                    self.n += 1
+
+            th.append(self.extreme_quantile)  # thresholds record
+
+        return {"thresholds": th, "alarms": alarm}
+
+
+"""
+============================ UPPER & LOWER BOUNDS =============================
+"""
diff --git a/common/evaluation/thresholding.py b/common/evaluation/thresholding.py
new file mode 100644
index 0000000..612d089
--- /dev/null
+++ b/common/evaluation/thresholding.py
@@ -0,0 +1,151 @@
+from cgi import print_form
+import logging
+import numpy as np
+import more_itertools as mit
+from .metrics import compute_binary_metrics
+from .spot import SPOT
+
+
+def pot_th(train_anomaly_score, anomaly_score, q=1e-3, level=0.99, dynamic=False):
+    """
+    Run POT method on given score.
+    :param init_score (np.ndarray): The data to get init threshold.
+                    For `OmniAnomaly`, it should be the anomaly score of train set.
+    :param: score (np.ndarray): The data to run POT method.
+                    For `OmniAnomaly`, it should be the anomaly score of test set.
+    :param label (np.ndarray): boolean list of true anomalies in score
+    :param q (float): Detection level (risk)
+    :param level (float): Probability associated with the initial threshold t
+    :return dict: pot result dict
+    Method from OmniAnomaly (https://github.com/NetManAIOps/OmniAnomaly)
+    """
+    logging.info(f"Computing the threshold using POT with q={q}, level={level}...")
+    logging.info(
+        "[POT] Train score max: {}, min: {}".format(
+            train_anomaly_score.max(), train_anomaly_score.min()
+        )
+    )
+    logging.info(
+        "[POT] Test score max: {}, min: {}".format(
+            anomaly_score.max(), anomaly_score.min()
+        )
+    )
+    print(train_anomaly_score.shape, anomaly_score.shape)
+
+    pot_th = None
+    if not isinstance(level, list):
+        level = [level]
+    for l in level:
+        try:
+            s = SPOT(q)  # SPOT object
+            s.fit(train_anomaly_score, anomaly_score)
+            s.initialize(level=l, min_extrema=False)  # Calibration step
+            ret = s.run(dynamic=dynamic, with_alarm=False)
+            pot_th = np.mean(ret["thresholds"])
+            logging.info(f"Hit level={l}")
+            break
+        except:
+            pass
+    if pot_th is None:
+        pot_th = np.percentile(anomaly_score, level[0] * 100)
+        logging.info(
+            "POT cannot find the threshold, use {}% percentile {}".format(
+                level[0] * 100, pot_th
+            )
+        )
+    return pot_th
+
+
+def eps_th(train_anomaly_score, reg_level=1):
+    """
+    Threshold method proposed by Hundman et. al. (https://arxiv.org/abs/1802.04431)
+    Code from TelemAnom (https://github.com/khundman/telemanom)
+    """
+    logging.info("Computing the threshold with eps...")
+    e_s = train_anomaly_score
+    best_epsilon = None
+    max_score = -10000000
+    mean_e_s = np.mean(e_s)
+    sd_e_s = np.std(e_s)
+
+    for z in np.arange(2.5, 12, 0.5):
+        epsilon = mean_e_s + sd_e_s * z
+        pruned_e_s = e_s[e_s < epsilon]
+
+        i_anom = np.argwhere(e_s >= epsilon).reshape(
+            -1,
+        )
+        buffer = np.arange(1, 50)
+        i_anom = np.sort(
+            np.concatenate(
+                (
+                    i_anom,
+                    np.array([i + buffer for i in i_anom]).flatten(),
+                    np.array([i - buffer for i in i_anom]).flatten(),
+                )
+            )
+        )
+        i_anom = i_anom[(i_anom < len(e_s)) & (i_anom >= 0)]
+        i_anom = np.sort(np.unique(i_anom))
+
+        if len(i_anom) > 0:
+            groups = [list(group) for group in mit.consecutive_groups(i_anom)]
+            # E_seq = [(g[0], g[-1]) for g in groups if not g[0] == g[-1]]
+
+            mean_perc_decrease = (mean_e_s - np.mean(pruned_e_s)) / mean_e_s
+            sd_perc_decrease = (sd_e_s - np.std(pruned_e_s)) / sd_e_s
+            if reg_level == 0:
+                denom = 1
+            elif reg_level == 1:
+                denom = len(i_anom)
+            elif reg_level == 2:
+                denom = len(i_anom) ** 2
+
+            score = (mean_perc_decrease + sd_perc_decrease) / denom
+
+            if score >= max_score and len(i_anom) < (len(e_s) * 0.5):
+                max_score = score
+                best_epsilon = epsilon
+
+    if best_epsilon is None:
+        best_epsilon = np.max(e_s)
+    return best_epsilon
+
+
+def best_th(
+    anomaly_score,
+    anomaly_label,
+    target_metric="f1",
+    target_direction="max",
+    point_adjustment=False,
+):
+    logging.info("Searching for the best threshod..")
+    search_range = np.linspace(0, 1, 100)
+    search_history = []
+    if point_adjustment:
+        target_metric = target_metric + "_adjusted"
+
+    for anomaly_percent in search_range:
+        theta = np.percentile(anomaly_score, 100 * (1 - anomaly_percent))
+        pred = (anomaly_score >= theta).astype(int)
+
+        metric_dict = compute_binary_metrics(pred, anomaly_label, point_adjustment)
+        current_value = metric_dict[target_metric]
+
+        logging.debug(f"th={theta}, {target_metric}={current_value}")
+
+        search_history.append(
+            {
+                "best_value": current_value,
+                "best_theta": theta,
+                "target_metric": target_metric,
+                "target_direction": target_direction,
+            }
+        )
+
+    result = (
+        max(search_history, key=lambda x: x["best_value"])
+        if target_direction == "max"
+        else min(search_history, key=lambda x: x["best_value"])
+    )
+    return result["best_theta"]
diff --git a/common/evaluation/time_tracker.py b/common/evaluation/time_tracker.py
new file mode 100644
index 0000000..f4bb877
--- /dev/null
+++ b/common/evaluation/time_tracker.py
@@ -0,0 +1,29 @@
+import time
+
+
+class TimeTracker:
+    def __init__(self, nb_epoch=1):
+        self.train_time = 0
+        self.test_time = 0
+        self.nb_epoch = nb_epoch
+
+    def train_start(self):
+        self.s_train = time.time()
+
+    def train_end(self):
+        self.e_train = time.time()
+        self.train_time = self.e_train - self.s_train
+
+    def test_start(self):
+        self.s_test = time.time()
+
+    def test_end(self):
+        self.e_test = time.time()
+        self.test_time = self.e_test - self.s_test
+
+    def get_data(self):
+        return {
+            "train_time": self.train_time,
+            "test_time": self.test_time,
+            "nb_epoch": self.nb_epoch,
+        }
diff --git a/common/evaluation_.py b/common/evaluation_.py
new file mode 100644
index 0000000..5f42563
--- /dev/null
+++ b/common/evaluation_.py
@@ -0,0 +1,190 @@
+import os
+import sys
+
+import json
+import glob
+import hashlib
+import numpy as np
+from sklearn.metrics import f1_score, precision_score, recall_score, roc_auc_score
+from sklearn.cluster import AgglomerativeClustering
+from sklearn.preprocessing import MinMaxScaler
+from common.evaluation.spot import SPOT
+from common.utils import pprint
+
+
+def evaluate_all(
+    anomaly_score,
+    anomaly_label,
+    train_anomaly_score=None,
+    q=1e-3,
+    level=None,
+    verbose=True,
+):
+    # normalize anomaly_score
+    print("Normalizing anomaly scores.")
+    anomaly_score = normalize_1d(anomaly_score)
+    if train_anomaly_score is not None:
+        train_anomaly_score = normalize_1d(train_anomaly_score)
+
+    metrics = {}
+    # compute auc
+    try:
+        auc = roc_auc_score(anomaly_label, anomaly_score)
+    except ValueError:
+        auc = 0
+        print("All zero in anomaly label, set auc=0")
+    metrics["1.AUC"] = auc
+
+    # compute salience
+    # salience = compute_salience(anomaly_score, anomaly_label)
+    # metrics["2.Salience"] = salience
+
+    # iterate thresholds
+    # _, theta_iter, _, pred = iter_thresholds(
+    #     anomaly_score, anomaly_label, metric="f1", normalized=True
+    # )
+    # _, adjust_pred = point_adjustment(pred, anomaly_label)
+    # metrics_iter = compute_point2point(pred, adjust_pred, anomaly_label)
+    # metrics_iter["delay"] = compute_delay(anomaly_label, pred)
+    # metrics_iter["theta"] = theta_iter
+    # metrics["3.Iteration Based"] = metrics_iter
+
+    # # EVT needs anomaly scores on training data for initialization
+    # if train_anomaly_score is not None:
+    #     print("Finding thresholds via EVT.")
+    #     theta_evt, pred_evt = compute_th_evt(
+    #         train_anomaly_score, anomaly_score, anomaly_label, q, level
+    #     )
+    #     _, adjust_pred_evt = point_adjustment(pred_evt, anomaly_label)
+    #     metrics_evt = compute_point2point(pred_evt, adjust_pred_evt, anomaly_label)
+    #     metrics_evt["delay"] = compute_delay(anomaly_label, pred_evt)
+    #     metrics_evt["theta"] = theta_evt
+    #     metrics["4.EVT Based"] = metrics_evt
+
+    if verbose:
+        print("\n" + "-" * 20 + "\n")
+        pprint(metrics)
+
+    return metrics
+
+
+def normalize_1d(arr):
+    est = MinMaxScaler()
+    return est.fit_transform(arr.reshape(-1, 1)).reshape(-1)
+
+
+def json_pretty_dump(obj, filename):
+    with open(filename, "w") as fw:
+        json.dump(
+            obj,
+            fw,
+            sort_keys=True,
+            indent=4,
+            separators=(",", ": "),
+            ensure_ascii=False,
+        )
+
+
+def store_benchmarking_results(
+    hash_id,
+    benchmark_dir,
+    dataset,
+    entity,
+    args,
+    model_name,
+    anomaly_score,
+    anomaly_label,
+    time_tracker,
+):
+    value_store_dir = os.path.join(benchmark_dir, model_name, hash_id, dataset, entity)
+    os.makedirs(value_store_dir, exist_ok=True)
+    np.savez(os.path.join(value_store_dir, "anomaly_score"), anomaly_score)
+    np.savez(os.path.join(value_store_dir, "anomaly_label"), anomaly_label)
+
+    json_pretty_dump(time_tracker, os.path.join(value_store_dir, "time.json"))
+
+    param_store_dir = os.path.join(benchmark_dir, model_name, hash_id)
+
+    param_store = {"cmd": "python {}".format(" ".join(sys.argv))}
+    param_store.update(args)
+
+    json_pretty_dump(param_store, os.path.join(param_store_dir, "params.json"))
+    print("Store output of {} to {} done.".format(model_name, param_store_dir))
+    return os.path.join(benchmark_dir, model_name, hash_id, dataset)
+
+
+def evaluate_benchmarking_folder(
+    folder, benchmarking_dir, hash_id, dataset, model_name
+):
+    total_adj_f1 = []
+    total_train_time = []
+    total_test_time = []
+    folder_count = 0
+    for folder in glob.glob(os.path.join(folder, "*")):
+        folder_name = os.path.basename(folder)
+        print("Evaluating {}".format(folder_name))
+
+        anomaly_score = np.load(
+            os.path.join(folder, "anomaly_score.npz"), allow_pickle=True
+        )["arr_0"].item()["test"]
+
+        anomaly_score_train = np.load(
+            os.path.join(folder, "anomaly_score.npz"), allow_pickle=True
+        )["arr_0"].item()["train"]
+
+        anomaly_label = np.load(os.path.join(folder, "anomaly_label.npz"))[
+            "arr_0"
+        ].astype(int)
+        with open(os.path.join(folder, "time.json")) as fr:
+            time = json.load(fr)
+
+        best_f1, best_theta, best_adjust_pred, best_raw_pred = iter_thresholds(
+            anomaly_score, anomaly_label, metric="f1", adjustment=True
+        )
+
+        try:
+            auc = roc_auc_score(anomaly_label, anomaly_score)
+        except ValueError as e:
+            auc = 0
+            print("All zero in anomaly label, set auc=0")
+
+        metrics = {}
+        metrics_iter, metrics_evt, theta_iter, theta_evt = evaluate_all(
+            anomaly_score, anomaly_label, anomaly_score_train
+        )
+
+        total_adj_f1.append(metrics_iter["adj_f1"])
+        total_train_time.append(time["train"])
+        total_test_time.append(time["test"])
+
+        metrics["metrics_iteration"] = metrics_iter
+        metrics["metrics_iteration"]["theta"] = theta_iter
+        metrics["metrics_evt"] = metrics_evt
+        metrics["metrics_evt"]["theta"] = theta_evt
+        # metrics["train_time"] = time["train"]
+        # metrics["test_time"] = time["test"]
+
+        print(metrics)
+        json_pretty_dump(metrics, os.path.join(folder, "metrics.json"))
+        folder_count += 1
+
+    total_adj_f1 = np.array(total_adj_f1)
+    adj_f1_mean = total_adj_f1.mean()
+    adj_f1_std = total_adj_f1.std()
+
+    train_time_sum = sum(total_train_time)
+    test_time_sum = sum(total_test_time)
+
+    with open(
+        os.path.join(benchmarking_dir, f"{dataset}_{model_name}.txt"), "a+"
+    ) as fw:
+        params = " ".join(sys.argv)
+        info = f"{hash_id}\tcount:{folder_count}\t{params}\ttrain:{train_time_sum:.4f} test:{test_time_sum:.4f}\tadj f1: [{adj_f1_mean:.4f}({adj_f1_std:.4f})]\n"
+        fw.write(info)
+    print(info)
+
+
+if __name__ == "__main__":
+    anomaly_label = [0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1]
+    anomaly_score = np.random.uniform(0, 1, size=len(anomaly_label))
+    evaluate_all(anomaly_score, anomaly_label)
diff --git a/common/exp.py b/common/exp.py
new file mode 100644
index 0000000..2a6ad4d
--- /dev/null
+++ b/common/exp.py
@@ -0,0 +1,55 @@
+import logging
+import yaml
+import json
+import os
+from .utils import save_hdf5
+
+
+BENCHMARK_DIR = "./benchmark_results"
+
+
+def json_pretty_dump(obj, filename):
+    with open(filename, "w") as fw:
+        json.dump(
+            {str(k): str(v) for k, v in obj.items()},
+            fw,
+            sort_keys=True,
+            indent=4,
+            separators=(",", ": "),
+            ensure_ascii=False,
+        )
+
+
+def store_entity(
+    params,
+    entity,
+    train_anomaly_score,
+    anomaly_score,
+    anomaly_label,
+    eval_results={},
+    time_tracker={},
+):
+    exp_folder = params["model_root"]
+    entity_folder = os.path.join(exp_folder, entity)
+    os.makedirs(entity_folder, exist_ok=True)
+
+    # save params
+    with open(os.path.join(exp_folder, "params.yaml"), "w") as fw:
+        yaml.dump(params, fw)
+
+    # save results
+    json_pretty_dump(eval_results, os.path.join(entity_folder, "eval_results.json"))
+
+    # save time
+    json_pretty_dump(time_tracker, os.path.join(entity_folder, "time.json"))
+
+    # save scores
+    score_dict = {
+        "anomaly_label": anomaly_label,
+        "anomaly_score": anomaly_score,
+        "train_anomaly_score": train_anomaly_score,
+    }
+    save_hdf5(os.path.join(entity_folder, f"score_{entity}.hdf5"), score_dict)
+
+    logging.info(f"Saving results for {entity} done.")
+
diff --git a/common/utils.py b/common/utils.py
new file mode 100644
index 0000000..bad44bd
--- /dev/null
+++ b/common/utils.py
@@ -0,0 +1,197 @@
+import sys
+import numpy as np
+import json
+import logging
+import h5py
+import random
+import os
+import torch
+import tensorflow as tf
+import glob
+import yaml
+from collections import OrderedDict
+from datetime import datetime
+
+# add this line to avoid weird characters in yaml files
+yaml.Dumper.ignore_aliases = lambda *args : True
+
+DEFAULT_RANDOM_SEED = 2022
+
+
+def load_config(config_dir, experiment_id, eval_id=""):
+    params = dict()
+    model_configs = glob.glob(os.path.join(config_dir, "model_config.yaml"))
+    if not model_configs:
+        model_configs = glob.glob(os.path.join(config_dir, "model_config/*.yaml"))
+    if not model_configs:
+        raise RuntimeError("config_dir={} is not valid!".format(config_dir))
+    found_params = dict()
+    for config in model_configs:
+        with open(config, "r") as cfg:
+            config_dict = yaml.load(cfg, Loader=yaml.FullLoader)
+            if "Base" in config_dict:
+                found_params["Base"] = config_dict["Base"]
+            if experiment_id in config_dict:
+                found_params[experiment_id] = config_dict[experiment_id]
+        if len(found_params) == 2:
+            break
+    if experiment_id not in found_params:
+        raise ValueError("exp_id={} not found in config".format(experiment_id))
+    # Update base settings first so that values can be overrided when conflict
+    # with experiment_id settings
+    params.update(found_params.get("Base", {}))
+    params.update(found_params.get(experiment_id))
+    params["exp_id"] = experiment_id
+    dataset_params = load_dataset_config(config_dir, params["dataset_id"])
+    params.update(dataset_params)
+    if "eval" not in params:
+        eval_params = load_eval_config(config_dir, params.get(eval_id, "Base"))
+        params["eval"] = eval_params
+    return params
+
+
+def load_dataset_config(config_dir, dataset_id):
+    dataset_configs = glob.glob(os.path.join(config_dir, "dataset_config.yaml"))
+    if not dataset_configs:
+        dataset_configs = glob.glob(os.path.join(config_dir, "dataset_config/*.yaml"))
+    for config in dataset_configs:
+        with open(config, "r") as cfg:
+            try:
+                config_dict = yaml.load(cfg, Loader=yaml.FullLoader)
+                if "Base" in config_dict:
+                    dataset_config = config_dict["Base"]
+                else:
+                    dataset_config = {}
+                if dataset_id in config_dict:
+                    dataset_config.update(config_dict[dataset_id])
+                    return dataset_config
+            except TypeError:
+                pass
+    raise RuntimeError("dataset_id={} is not found in config.".format(dataset_id))
+
+
+def load_eval_config(config_dir, eval_id="Base"):
+    eval_configs = glob.glob(os.path.join(config_dir, "eval_config.yaml"))
+    if not eval_configs:
+        eval_configs = glob.glob(os.path.join(config_dir, "eval_config/*.yaml"))
+    for config in eval_configs:
+        with open(config, "r") as cfg:
+            config_dict = yaml.load(cfg, Loader=yaml.FullLoader)
+            eval_config = config_dict["Base"]
+            if eval_id in config_dict:
+                eval_config.update(config_dict[eval_id])
+                return eval_config
+    raise RuntimeError("eval_id={} is not found in config.".format(eval_id))
+
+
+def set_device(gpu=-1):
+    import torch
+
+    if gpu != -1 and torch.cuda.is_available():
+        device = torch.device(
+            "cuda:" + str(0)
+        )  # already set env var in set logger function.
+    else:
+        device = torch.device("cpu")
+    return device
+
+
+def pprint(d, indent=0):
+    d = sorted([(k, v) for k, v in d.items()], key=lambda x: x[0])
+    for key, value in d:
+        print("\t" * indent + str(key))
+        if isinstance(value, dict):
+            pprint(value, indent + 1)
+        else:
+            print("\t" * (indent + 1) + str(round(value, 4)))
+
+
+def load_hdf5(infile):
+    logging.info("Loading hdf5 from {}".format(infile))
+    with h5py.File(infile, "r") as f:
+        return {key: f[key][:] for key in list(f.keys())}
+
+
+def save_hdf5(outfile, arr_dict):
+    logging.info("Saving hdf5 to {}".format(outfile))
+    with h5py.File(outfile, "w") as f:
+        for key in arr_dict.keys():
+            f.create_dataset(key, data=arr_dict[key])
+
+
+def print_to_json(data, sort_keys=True):
+    new_data = dict((k, str(v)) for k, v in data.items())
+    if sort_keys:
+        new_data = OrderedDict(sorted(new_data.items(), key=lambda x: x[0]))
+    return json.dumps(new_data, indent=4)
+
+
+def load_json(infile):
+    with open(infile, "r") as fr:
+        return json.load(fr)
+
+
+def update_from_nni_params(params, nni_params):
+    if nni_params:
+        params.update(nni_params)
+    return params
+
+
+def seed_basic(seed=DEFAULT_RANDOM_SEED):
+    random.seed(seed)
+    os.environ["PYTHONHASHSEED"] = str(seed)
+    np.random.seed(seed)
+
+
+def seed_tf(seed=DEFAULT_RANDOM_SEED):
+    try:
+        tf.random.set_seed(seed)
+    except:
+        tf.set_random_seed(seed)
+
+
+def seed_torch(seed=DEFAULT_RANDOM_SEED):
+    torch.manual_seed(seed)
+    torch.cuda.manual_seed(seed)
+    torch.backends.cudnn.deterministic = True
+    torch.backends.cudnn.benchmark = False
+
+
+def seed_everything(seed=DEFAULT_RANDOM_SEED):
+    seed_basic(seed)
+    seed_tf(seed)
+    seed_torch(seed)
+
+
+def set_logger(params, args, log_file=None):
+    if log_file is None:
+        log_dir = os.path.join(
+            params["model_root"],
+            params["dataset_id"],
+            params["model_id"],
+            params["exp_id"],
+        )
+        log_file = os.path.join(log_dir, "execution.log")
+    log_dir = os.path.dirname(log_file)
+    os.makedirs(log_dir, exist_ok=True)
+    params["model_root"] = log_dir
+    params["uptime"] = datetime.now().strftime("%Y%m%d-%H%M%S")
+    
+    if "gpu" in args:
+        params["device"] = args["gpu"]
+    # logs will not show in the file without the two lines.
+    for handler in logging.root.handlers[:]:
+        logging.root.removeHandler(handler)
+
+    logging.basicConfig(
+        level=logging.INFO,
+        format="%(asctime)s P%(process)d %(levelname)s %(message)s",
+        handlers=[logging.FileHandler(log_file, mode="w"), logging.StreamHandler()],
+    )
+
+    if params.get("device", -1) != -1:
+        os.environ["CUDA_VISIBLE_DEVICES"] = "{}".format(params["device"])
+        logging.info("Using device: cuda: {}".format(params["device"]))
+    else:
+        os.environ["CUDA_VISIBLE_DEVICES"] = ""
+        logging.info("Using device: cpu.")
diff --git a/networks/InterFusion/__init__.py b/networks/InterFusion/__init__.py
new file mode 100644
index 0000000..88cd447
--- /dev/null
+++ b/networks/InterFusion/__init__.py
@@ -0,0 +1 @@
+from .wrapper import *
diff --git a/networks/InterFusion/algorithm/InterFusion.py b/networks/InterFusion/algorithm/InterFusion.py
new file mode 100644
index 0000000..da34c7c
--- /dev/null
+++ b/networks/InterFusion/algorithm/InterFusion.py
@@ -0,0 +1,634 @@
+from enum import Enum
+from typing import Optional, List
+
+import logging
+import tensorflow as tf
+from tensorflow.contrib.rnn import static_rnn, static_bidirectional_rnn
+from tensorflow.contrib.framework import arg_scope
+import tfsnippet as spt
+from tfsnippet.bayes import BayesianNet
+from tfsnippet.utils import instance_reuse, VarScopeObject, reopen_variable_scope
+from tfsnippet.distributions import FlowDistribution, Normal
+from tfsnippet.layers import l2_regularizer
+
+import mltk
+from .recurrent_distribution import RecurrentDistribution
+from .real_nvp import dense_real_nvp
+from .conv1d_ import conv1d, deconv1d
+
+
+class RNNCellType(str, Enum):
+    GRU = "GRU"
+    LSTM = "LSTM"
+    Basic = "Basic"
+
+
+class ModelConfig(mltk.Config):
+    x_dim: int = -1
+    z_dim: int = 3
+    u_dim: int = 1
+    window_length = 100
+    output_shape: List[int] = [25, 25, 50, 50, 100]
+    z2_dim: int = 13
+    l2_reg = 0.0001
+    posterior_flow_type: Optional[str] = mltk.config_field(
+        choices=["rnvp", "nf"], default="rnvp"
+    )
+    # can be 'rnvp' for RealNVP, 'nf' for planarNF, None for not using posterior flow.
+    posterior_flow_layers = 20
+    rnn_cell: RNNCellType = RNNCellType.GRU  # can be 'GRU', 'LSTM' or 'Basic'
+    rnn_hidden_units = 500
+    use_leaky_relu = False
+    use_bidirectional_rnn = False  # whether to use bidirectional rnn or not
+    use_self_attention = (
+        False  # whether to use self-attention on hidden states before infer qz or not.
+    )
+    unified_px_logstd = False
+    dropout_feature = False  # dropout on the features in arnn
+    logstd_min = -5.0
+    logstd_max = 2.0
+    use_prior_flow = (
+        False  # If True, use RealNVP prior flow to enhance the representation of p(z).
+    )
+    prior_flow_layers = 20
+
+    connect_qz = True
+    connect_pz = True
+
+
+# The final InterFusion model.
+class MTSAD(VarScopeObject):
+    def __init__(self, config: ModelConfig, name=None, scope=None):
+        self.config = config
+        super(MTSAD, self).__init__(name=name, scope=scope)
+
+        with reopen_variable_scope(self.variable_scope):
+            if self.config.rnn_cell == RNNCellType.Basic:
+                self.d_fw_cell = tf.nn.rnn_cell.BasicRNNCell(
+                    self.config.rnn_hidden_units, name="d_fw_cell"
+                )
+                self.a_fw_cell = tf.nn.rnn_cell.BasicRNNCell(
+                    self.config.rnn_hidden_units, name="a_fw_cell"
+                )
+                if self.config.use_bidirectional_rnn:
+                    self.d_bw_cell = tf.nn.rnn_cell.BasicRNNCell(
+                        self.config.rnn_hidden_units, name="d_bw_cell"
+                    )
+                    self.a_bw_cell = tf.nn.rnn_cell.BasicRNNCell(
+                        self.config.rnn_hidden_units, name="a_bw_cell"
+                    )
+            elif self.config.rnn_cell == RNNCellType.LSTM:
+                self.d_fw_cell = tf.nn.rnn_cell.LSTMCell(
+                    self.config.rnn_hidden_units, name="d_fw_cell"
+                )
+                self.a_fw_cell = tf.nn.rnn_cell.LSTMCell(
+                    self.config.rnn_hidden_units, name="a_fw_cell"
+                )
+                if self.config.use_bidirectional_rnn:
+                    self.d_bw_cell = tf.nn.rnn_cell.LSTMCell(
+                        self.config.rnn_hidden_units, name="d_bw_cell"
+                    )
+                    self.a_bw_cell = tf.nn.rnn_cell.LSTMCell(
+                        self.config.rnn_hidden_units, name="a_bw_cell"
+                    )
+            elif self.config.rnn_cell == RNNCellType.GRU:
+                self.d_fw_cell = tf.nn.rnn_cell.GRUCell(
+                    self.config.rnn_hidden_units, name="d_fw_cell"
+                )
+                self.a_fw_cell = tf.nn.rnn_cell.GRUCell(
+                    self.config.rnn_hidden_units, name="a_fw_cell"
+                )
+                if self.config.use_bidirectional_rnn:
+                    self.d_bw_cell = tf.nn.rnn_cell.GRUCell(
+                        self.config.rnn_hidden_units, name="d_bw_cell"
+                    )
+                    self.a_bw_cell = tf.nn.rnn_cell.GRUCell(
+                        self.config.rnn_hidden_units, name="a_bw_cell"
+                    )
+            else:
+                raise ValueError("rnn cell must be one of GRU, LSTM or Basic.")
+
+            if self.config.posterior_flow_type == "nf":
+                self.posterior_flow = spt.layers.planar_normalizing_flows(
+                    n_layers=self.config.posterior_flow_layers, scope="posterior_flow"
+                )
+            elif self.config.posterior_flow_type == "rnvp":
+                self.posterior_flow = dense_real_nvp(
+                    flow_depth=self.config.posterior_flow_layers,
+                    activation=tf.nn.leaky_relu
+                    if self.config.use_leaky_relu
+                    else tf.nn.relu,
+                    kernel_regularizer=l2_regularizer(self.config.l2_reg),
+                    scope="posterior_flow",
+                )
+            else:
+                self.posterior_flow = None
+
+            if self.config.use_prior_flow:
+                self.prior_flow = dense_real_nvp(
+                    flow_depth=self.config.prior_flow_layers,
+                    activation=tf.nn.leaky_relu
+                    if self.config.use_leaky_relu
+                    else tf.nn.relu,
+                    kernel_regularizer=l2_regularizer(self.config.l2_reg),
+                    is_prior_flow=True,
+                    scope="prior_flow",
+                )
+            else:
+                self.prior_flow = None
+
+    def _my_rnn_net(
+        self,
+        x,
+        window_length,
+        fw_cell,
+        bw_cell=None,
+        time_axis=1,
+        use_bidirectional_rnn=False,
+    ):
+        """
+        Get the base rnn model.
+        :param x: The rnn input.
+        :param window_length: The window length of input along time axis.
+        :param fw_cell: Forward rnn cell.
+        :param bw_cell: Optional. Backward rnn cell, only use when config.use_bidirectional_rnn=True.
+        :param time_axis: Which is the time axis in input x, default 1.
+        :param use_bidirectional_rnn: Whether or not use bidirectional rnn. Default false.
+        :return: Tensor (batch_size, window_length, rnn_hidden_units). The output of rnn.
+        """
+
+        x = tf.unstack(value=x, num=window_length, axis=time_axis)
+
+        if use_bidirectional_rnn:
+            outputs, _, _ = static_bidirectional_rnn(
+                fw_cell, bw_cell, x, dtype=tf.float32
+            )
+        else:
+            outputs, _ = static_rnn(fw_cell, x, dtype=tf.float32)
+
+        outputs = tf.stack(
+            outputs, axis=time_axis
+        )  # (batch_size, window_length, rnn_hidden_units)
+        return outputs
+
+    @instance_reuse
+    def a_rnn_net(
+        self,
+        x,
+        window_length,
+        time_axis=1,
+        use_bidirectional_rnn=False,
+        use_self_attention=False,
+        is_training=False,
+    ):
+        """
+        Reverse rnn network a, capture the future information in qnet.
+        """
+
+        def dropout_fn(input):
+            return tf.layers.dropout(input, rate=0.5, training=is_training)
+
+        flag = False
+        if len(x.shape) == 4:  # (n_samples, batch_size, window_length, x_dim)
+            x, s1, s2 = spt.ops.flatten_to_ndims(x, 3)
+            flag = True
+        elif len(x.shape) != 3:
+            logging.error("rnn input shape error.")
+
+        # reverse the input sequence
+        reversed_x = tf.reverse(x, axis=[time_axis])
+
+        if use_bidirectional_rnn:
+            reversed_outputs = self._my_rnn_net(
+                x=reversed_x,
+                window_length=window_length,
+                fw_cell=self.a_fw_cell,
+                bw_cell=self.a_bw_cell,
+                time_axis=time_axis,
+                use_bidirectional_rnn=use_bidirectional_rnn,
+            )
+        else:
+            reversed_outputs = self._my_rnn_net(
+                x=reversed_x,
+                window_length=window_length,
+                fw_cell=self.a_fw_cell,
+                time_axis=time_axis,
+                use_bidirectional_rnn=use_bidirectional_rnn,
+            )
+
+        outputs = tf.reverse(reversed_outputs, axis=[time_axis])
+
+        # self attention
+        if use_self_attention:
+            outputs1 = spt.layers.dense(
+                outputs,
+                500,
+                activation_fn=tf.nn.tanh,
+                use_bias=True,
+                scope="arnn_attention_dense1",
+            )
+            outputs1 = tf.nn.softmax(
+                spt.layers.dense(
+                    outputs1,
+                    window_length,
+                    use_bias=False,
+                    scope="arnn_attention_dense2",
+                ),
+                axis=1,
+            )
+            M_t = tf.matmul(tf.transpose(outputs, perm=[0, 2, 1]), outputs1)
+            outputs = tf.transpose(M_t, perm=[0, 2, 1])
+
+        # feature extraction layers
+        outputs = spt.layers.dense(
+            outputs,
+            units=500,
+            activation_fn=tf.nn.leaky_relu
+            if self.config.use_leaky_relu
+            else tf.nn.relu,
+            kernel_regularizer=l2_regularizer(self.config.l2_reg),
+            scope="arnn_feature_dense1",
+        )
+        if self.config.dropout_feature:
+            outputs = dropout_fn(outputs)
+        outputs = spt.layers.dense(
+            outputs,
+            units=500,
+            activation_fn=tf.nn.leaky_relu
+            if self.config.use_leaky_relu
+            else tf.nn.relu,
+            kernel_regularizer=l2_regularizer(self.config.l2_reg),
+            scope="arnn_feature_dense2",
+        )
+        if self.config.dropout_feature:
+            outputs = dropout_fn(outputs)
+
+        if flag:
+            outputs = spt.ops.unflatten_from_ndims(outputs, s1, s2)
+
+        return outputs
+
+    @instance_reuse
+    def qz_mean_layer(self, x):
+        return spt.layers.dense(x, units=self.config.z_dim, scope="qz_mean")
+
+    @instance_reuse
+    def qz_logstd_layer(self, x):
+        return tf.clip_by_value(
+            spt.layers.dense(x, units=self.config.z_dim, scope="qz_logstd"),
+            clip_value_min=self.config.logstd_min,
+            clip_value_max=self.config.logstd_max,
+        )
+
+    @instance_reuse
+    def pz_mean_layer(self, x):
+        return spt.layers.dense(x, units=self.config.z_dim, scope="pz_mean")
+
+    @instance_reuse
+    def pz_logstd_layer(self, x):
+        return tf.clip_by_value(
+            spt.layers.dense(x, units=self.config.z_dim, scope="pz_logstd"),
+            clip_value_min=self.config.logstd_min,
+            clip_value_max=self.config.logstd_max,
+        )
+
+    @instance_reuse
+    def hz2_deconv(self, z2):
+        with arg_scope(
+            [deconv1d],
+            kernel_size=5,
+            activation_fn=tf.nn.leaky_relu
+            if self.config.use_leaky_relu
+            else tf.nn.relu,
+            kernel_regularizer=l2_regularizer(self.config.l2_reg),
+        ):
+            h_z = deconv1d(
+                z2,
+                out_channels=self.config.x_dim,
+                output_shape=self.config.output_shape[0],
+                strides=2,
+            )
+            h_z = deconv1d(
+                h_z,
+                out_channels=self.config.x_dim,
+                output_shape=self.config.output_shape[1],
+                strides=1,
+            )
+            h_z = deconv1d(
+                h_z,
+                out_channels=self.config.x_dim,
+                output_shape=self.config.output_shape[2],
+                strides=2,
+            )
+            h_z = deconv1d(
+                h_z,
+                out_channels=self.config.x_dim,
+                output_shape=self.config.output_shape[3],
+                strides=1,
+            )
+            h_z2 = deconv1d(
+                h_z,
+                out_channels=self.config.x_dim,
+                output_shape=self.config.output_shape[4],
+                strides=2,
+            )
+        return h_z2
+
+    @instance_reuse
+    def q_net(self, x, observed=None, u=None, n_z=None, is_training=False):
+        # vs.name = self.variable_scope.name + "/q_net"
+        logging.info("q_net builder: %r", locals())
+
+        net = BayesianNet(observed=observed)
+
+        def dropout_fn(input):
+            return tf.layers.dropout(input, rate=0.5, training=is_training)
+
+        # use the pretrained z2 which compress along the time dimension
+        qz2_mean, qz2_logstd = self.h_for_qz(x, is_training=is_training)
+
+        qz2_distribution = Normal(mean=qz2_mean, logstd=qz2_logstd)
+
+        qz2_distribution = qz2_distribution.batch_ndims_to_value(2)
+
+        z2 = net.add("z2", qz2_distribution, n_samples=n_z, is_reparameterized=True)
+
+        # d_{1:t} from deconv
+        h_z = self.h_for_px(z2)
+
+        # a_{1:t}, (batch_size, window_length, dense_hidden_units)
+        arnn_out = self.a_rnn_net(
+            h_z,
+            window_length=self.config.window_length,
+            use_bidirectional_rnn=self.config.use_bidirectional_rnn,
+            use_self_attention=self.config.use_self_attention,
+            is_training=is_training,
+        )
+
+        if self.config.connect_qz:
+            qz_distribution = RecurrentDistribution(
+                arnn_out,
+                mean_layer=self.qz_mean_layer,
+                logstd_layer=self.qz_logstd_layer,
+                z_dim=self.config.z_dim,
+                window_length=self.config.window_length,
+            )
+        else:
+            qz_mean = spt.layers.dense(
+                arnn_out, units=self.config.z_dim, scope="qz1_mean"
+            )
+            qz_logstd = tf.clip_by_value(
+                spt.layers.dense(arnn_out, units=self.config.z_dim, scope="qz1_logstd"),
+                clip_value_min=self.config.logstd_min,
+                clip_value_max=self.config.logstd_max,
+            )
+            qz_distribution = Normal(mean=qz_mean, logstd=qz_logstd)
+
+        if self.posterior_flow is not None:
+            qz_distribution = FlowDistribution(
+                distribution=qz_distribution, flow=self.posterior_flow
+            ).batch_ndims_to_value(1)
+        else:
+            qz_distribution = qz_distribution.batch_ndims_to_value(2)
+
+        z1 = net.add("z1", qz_distribution, is_reparameterized=True)
+
+        return net
+
+    @instance_reuse
+    def p_net(self, observed=None, u=None, n_z=None, is_training=False):
+        logging.info("p_net builder: %r", locals())
+
+        net = BayesianNet(observed=observed)
+
+        pz2_distribution = Normal(
+            mean=tf.zeros([self.config.z2_dim, self.config.x_dim]),
+            logstd=tf.zeros([self.config.z2_dim, self.config.x_dim]),
+        ).batch_ndims_to_value(2)
+
+        z2 = net.add("z2", pz2_distribution, n_samples=n_z, is_reparameterized=True)
+
+        # e_{1:t} from deconv, shared params
+        h_z2 = self.h_for_px(z2)
+
+        if self.config.connect_pz:
+            pz_distribution = RecurrentDistribution(
+                h_z2,
+                mean_layer=self.pz_mean_layer,
+                logstd_layer=self.pz_logstd_layer,
+                z_dim=self.config.z_dim,
+                window_length=self.config.window_length,
+            )
+        else:
+            # non-recurrent pz
+            pz_mean = spt.layers.dense(h_z2, units=self.config.z_dim, scope="pz_mean")
+            pz_logstd = tf.clip_by_value(
+                spt.layers.dense(h_z2, units=self.config.z_dim, scope="pz_logstd"),
+                clip_value_min=self.config.logstd_min,
+                clip_value_max=self.config.logstd_max,
+            )
+            pz_distribution = Normal(mean=pz_mean, logstd=pz_logstd)
+
+        if self.prior_flow is not None:
+            pz_distribution = FlowDistribution(
+                distribution=pz_distribution, flow=self.prior_flow
+            ).batch_ndims_to_value(1)
+        else:
+            pz_distribution = pz_distribution.batch_ndims_to_value(2)
+
+        z1 = net.add("z1", pz_distribution, is_reparameterized=True)
+
+        h_z1 = spt.layers.dense(z1, units=self.config.x_dim)
+
+        h_z = spt.ops.broadcast_concat(h_z1, h_z2, axis=-1)
+
+        h_z = spt.layers.dense(
+            h_z,
+            units=500,
+            activation_fn=tf.nn.leaky_relu
+            if self.config.use_leaky_relu
+            else tf.nn.relu,
+            kernel_regularizer=l2_regularizer(self.config.l2_reg),
+            scope="feature_dense1",
+        )
+
+        h_z = spt.layers.dense(
+            h_z,
+            units=500,
+            activation_fn=tf.nn.leaky_relu
+            if self.config.use_leaky_relu
+            else tf.nn.relu,
+            kernel_regularizer=l2_regularizer(self.config.l2_reg),
+            scope="feature_dense2",
+        )
+
+        x_mean = spt.layers.dense(h_z, units=self.config.x_dim, scope="x_mean")
+        if self.config.unified_px_logstd:
+            x_logstd = tf.clip_by_value(
+                tf.get_variable(
+                    name="x_logstd",
+                    shape=(),
+                    trainable=True,
+                    dtype=tf.float32,
+                    initializer=tf.constant_initializer(-1.0, dtype=tf.float32),
+                ),
+                clip_value_min=self.config.logstd_min,
+                clip_value_max=self.config.logstd_max,
+            )
+        else:
+            x_logstd = tf.clip_by_value(
+                spt.layers.dense(h_z, units=self.config.x_dim, scope="x_logstd"),
+                clip_value_min=self.config.logstd_min,
+                clip_value_max=self.config.logstd_max,
+            )
+
+        x = net.add(
+            "x",
+            Normal(mean=x_mean, logstd=x_logstd).batch_ndims_to_value(2),
+            is_reparameterized=True,
+        )
+
+        return net
+
+    def reconstruct(self, x, u, mask, n_z=None):
+        with tf.name_scope("model.reconstruct"):
+            qnet = self.q_net(x=x, u=u, n_z=n_z)
+            pnet = self.p_net(observed={"z1": qnet["z1"], "z2": qnet["z2"]}, u=u)
+        return pnet["x"]
+
+    def get_score(self, x_embed, x_eval, u, n_z=None):
+        with tf.name_scope("model.get_score"):
+            qnet = self.q_net(x=x_embed, u=u, n_z=n_z)
+            pnet = self.p_net(observed={"z1": qnet["z1"], "z2": qnet["z2"]}, u=u)
+            score = pnet["x"].distribution.base_distribution.log_prob(x_eval)
+            recons_mean = pnet["x"].distribution.base_distribution.mean
+            recons_std = pnet["x"].distribution.base_distribution.std
+            if n_z is not None:
+                score = tf.reduce_mean(score, axis=0)
+                recons_mean = tf.reduce_mean(recons_mean, axis=0)
+                recons_std = tf.reduce_mean(recons_std, axis=0)
+        return score, recons_mean, recons_std
+
+    @instance_reuse
+    def h_for_qz(self, x, is_training=False):
+        with arg_scope(
+            [conv1d],
+            kernel_size=5,
+            activation_fn=tf.nn.leaky_relu
+            if self.config.use_leaky_relu
+            else tf.nn.relu,
+            kernel_regularizer=l2_regularizer(self.config.l2_reg),
+        ):
+            h_x = conv1d(x, out_channels=self.config.x_dim, strides=2)  # 50
+            h_x = conv1d(h_x, out_channels=self.config.x_dim)
+            h_x = conv1d(h_x, out_channels=self.config.x_dim, strides=2)  # 25
+            h_x = conv1d(h_x, out_channels=self.config.x_dim)
+            h_x = conv1d(h_x, out_channels=self.config.x_dim, strides=2)  # 13
+
+        qz_mean = conv1d(h_x, kernel_size=1, out_channels=self.config.x_dim)
+        qz_logstd = conv1d(h_x, kernel_size=1, out_channels=self.config.x_dim)
+        qz_logstd = tf.clip_by_value(
+            qz_logstd,
+            clip_value_min=self.config.logstd_min,
+            clip_value_max=self.config.logstd_max,
+        )
+        return qz_mean, qz_logstd
+
+    @instance_reuse
+    def h_for_px(self, z):
+        with arg_scope(
+            [deconv1d],
+            kernel_size=5,
+            activation_fn=tf.nn.leaky_relu
+            if self.config.use_leaky_relu
+            else tf.nn.relu,
+            kernel_regularizer=l2_regularizer(self.config.l2_reg),
+        ):
+            h_z = deconv1d(
+                z,
+                out_channels=self.config.x_dim,
+                output_shape=self.config.output_shape[0],
+                strides=2,
+            )
+            h_z = deconv1d(
+                h_z,
+                out_channels=self.config.x_dim,
+                output_shape=self.config.output_shape[1],
+                strides=1,
+            )
+            h_z = deconv1d(
+                h_z,
+                out_channels=self.config.x_dim,
+                output_shape=self.config.output_shape[2],
+                strides=2,
+            )
+            h_z = deconv1d(
+                h_z,
+                out_channels=self.config.x_dim,
+                output_shape=self.config.output_shape[3],
+                strides=1,
+            )
+            h_z = deconv1d(
+                h_z,
+                out_channels=self.config.x_dim,
+                output_shape=self.config.output_shape[4],
+                strides=2,
+            )
+        return h_z
+
+    @instance_reuse
+    def pretrain_q_net(self, x, observed=None, n_z=None, is_training=False):
+        # vs.name = self.variable_scope.name + "/q_net"
+        logging.info("pretrain_q_net builder: %r", locals())
+
+        net = BayesianNet(observed=observed)
+
+        def dropout_fn(input):
+            return tf.layers.dropout(input, rate=0.5, training=is_training)
+
+        qz_mean, qz_logstd = self.h_for_qz(x, is_training=is_training)
+
+        qz_distribution = Normal(mean=qz_mean, logstd=qz_logstd)
+
+        qz_distribution = qz_distribution.batch_ndims_to_value(2)
+
+        z = net.add("z", qz_distribution, n_samples=n_z, is_reparameterized=True)
+
+        return net
+
+    @instance_reuse
+    def pretrain_p_net(self, observed=None, n_z=None, is_training=False):
+        logging.info("p_net builder: %r", locals())
+
+        net = BayesianNet(observed=observed)
+
+        pz_distribution = Normal(
+            mean=tf.zeros([self.config.z2_dim, self.config.x_dim]),
+            logstd=tf.zeros([self.config.z2_dim, self.config.x_dim]),
+        )
+
+        pz_distribution = pz_distribution.batch_ndims_to_value(2)
+
+        z = net.add("z", pz_distribution, n_samples=n_z, is_reparameterized=True)
+
+        h_z = self.h_for_px(z)
+
+        px_mean = conv1d(
+            h_z, kernel_size=1, out_channels=self.config.x_dim, scope="pre_px_mean"
+        )
+        px_logstd = conv1d(
+            h_z, kernel_size=1, out_channels=self.config.x_dim, scope="pre_px_logstd"
+        )
+        px_logstd = tf.clip_by_value(
+            px_logstd,
+            clip_value_min=self.config.logstd_min,
+            clip_value_max=self.config.logstd_max,
+        )
+
+        x = net.add(
+            "x",
+            Normal(mean=px_mean, logstd=px_logstd).batch_ndims_to_value(2),
+            is_reparameterized=True,
+        )
+
+        return net
diff --git a/networks/InterFusion/algorithm/InterFusion_swat.py b/networks/InterFusion/algorithm/InterFusion_swat.py
new file mode 100644
index 0000000..db6e925
--- /dev/null
+++ b/networks/InterFusion/algorithm/InterFusion_swat.py
@@ -0,0 +1,407 @@
+from enum import Enum
+from typing import Optional, List
+
+import logging
+import tensorflow as tf
+from tensorflow.contrib.rnn import static_rnn, static_bidirectional_rnn
+from tensorflow.contrib.framework import arg_scope
+import tfsnippet as spt
+from tfsnippet.bayes import BayesianNet
+from tfsnippet.utils import (instance_reuse,
+                             VarScopeObject,
+                             reopen_variable_scope)
+from tfsnippet.distributions import FlowDistribution, Normal
+from tfsnippet.layers import l2_regularizer
+
+import mltk
+from .recurrent_distribution import RecurrentDistribution
+from .real_nvp import dense_real_nvp
+from .conv1d_ import conv1d, deconv1d
+
+
+class RNNCellType(str, Enum):
+    GRU = 'GRU'
+    LSTM = 'LSTM'
+    Basic = 'Basic'
+
+
+class ModelConfig(mltk.Config):
+    x_dim: int = -1
+    z_dim: int = 3
+    u_dim: int = 1
+    window_length = 30
+    output_shape: List[int] = [15, 15, 30]
+    z2_dim: int = 8
+    l2_reg = 0.0001
+    posterior_flow_type: Optional[str] = mltk.config_field(choices=['rnvp', 'nf'], default='rnvp')
+    # can be 'rnvp' for RealNVP, 'nf' for planarNF, None for not using posterior flow.
+    posterior_flow_layers = 20
+    rnn_cell: RNNCellType = RNNCellType.GRU  # can be 'GRU', 'LSTM' or 'Basic'
+    rnn_hidden_units = 500
+    use_leaky_relu = False
+    use_bidirectional_rnn = False       # whether to use bidirectional rnn or not
+    use_self_attention = False          # whether to use self-attention on hidden states before infer qz or not.
+    unified_px_logstd = False
+    dropout_feature = False             # dropout on the features in arnn
+    logstd_min = -5.
+    logstd_max = 2.
+    use_prior_flow = False              # If True, use RealNVP prior flow to enhance the representation of p(z).
+    prior_flow_layers = 20
+
+    connect_qz = True
+    connect_pz = True
+
+
+# InterFusion model for SWaT & WADI (differ in num of layers)
+class MTSAD_SWAT(VarScopeObject):
+
+    def __init__(self, config: ModelConfig, name=None, scope=None):
+        self.config = config
+        super(MTSAD_SWAT, self).__init__(name=name, scope=scope)
+
+        with reopen_variable_scope(self.variable_scope):
+            if self.config.rnn_cell == RNNCellType.Basic:
+                self.d_fw_cell = tf.nn.rnn_cell.BasicRNNCell(self.config.rnn_hidden_units, name='d_fw_cell')
+                self.a_fw_cell = tf.nn.rnn_cell.BasicRNNCell(self.config.rnn_hidden_units, name='a_fw_cell')
+                if self.config.use_bidirectional_rnn:
+                    self.d_bw_cell = tf.nn.rnn_cell.BasicRNNCell(self.config.rnn_hidden_units, name='d_bw_cell')
+                    self.a_bw_cell = tf.nn.rnn_cell.BasicRNNCell(self.config.rnn_hidden_units, name='a_bw_cell')
+            elif self.config.rnn_cell == RNNCellType.LSTM:
+                self.d_fw_cell = tf.nn.rnn_cell.LSTMCell(self.config.rnn_hidden_units, name='d_fw_cell')
+                self.a_fw_cell = tf.nn.rnn_cell.LSTMCell(self.config.rnn_hidden_units, name='a_fw_cell')
+                if self.config.use_bidirectional_rnn:
+                    self.d_bw_cell = tf.nn.rnn_cell.LSTMCell(self.config.rnn_hidden_units, name='d_bw_cell')
+                    self.a_bw_cell = tf.nn.rnn_cell.LSTMCell(self.config.rnn_hidden_units, name='a_bw_cell')
+            elif self.config.rnn_cell == RNNCellType.GRU:
+                self.d_fw_cell = tf.nn.rnn_cell.GRUCell(self.config.rnn_hidden_units, name='d_fw_cell')
+                self.a_fw_cell = tf.nn.rnn_cell.GRUCell(self.config.rnn_hidden_units, name='a_fw_cell')
+                if self.config.use_bidirectional_rnn:
+                    self.d_bw_cell = tf.nn.rnn_cell.GRUCell(self.config.rnn_hidden_units, name='d_bw_cell')
+                    self.a_bw_cell = tf.nn.rnn_cell.GRUCell(self.config.rnn_hidden_units, name='a_bw_cell')
+            else:
+                raise ValueError('rnn cell must be one of GRU, LSTM or Basic.')
+
+            if self.config.posterior_flow_type == 'nf':
+                self.posterior_flow = spt.layers.planar_normalizing_flows(n_layers=self.config.posterior_flow_layers,
+                                                                          scope='posterior_flow')
+            elif self.config.posterior_flow_type == 'rnvp':
+                self.posterior_flow = dense_real_nvp(flow_depth=self.config.posterior_flow_layers,
+                                                     activation=tf.nn.leaky_relu if self.config.use_leaky_relu else tf.nn.relu,
+                                                     kernel_regularizer=l2_regularizer(self.config.l2_reg),
+                                                     scope='posterior_flow')
+            else:
+                self.posterior_flow = None
+
+            if self.config.use_prior_flow:
+                self.prior_flow = dense_real_nvp(flow_depth=self.config.prior_flow_layers,
+                                                 activation=tf.nn.leaky_relu if self.config.use_leaky_relu else tf.nn.relu,
+                                                 kernel_regularizer=l2_regularizer(self.config.l2_reg),
+                                                 is_prior_flow=True,
+                                                 scope='prior_flow')
+            else:
+                self.prior_flow = None
+
+    def _my_rnn_net(self, x, window_length, fw_cell, bw_cell=None,
+                    time_axis=1, use_bidirectional_rnn=False):
+        """
+        Get the base rnn model for d-net and a-net.
+        :param x: The rnn input.
+        :param window_length: The window length of input along time axis.
+        :param fw_cell: Forward rnn cell.
+        :param bw_cell: Optional. Backward rnn cell, only use when config.use_bidirectional_rnn=True.
+        :param time_axis: Which is the time axis in input x, default 1.
+        :param use_bidirectional_rnn: Whether or not use bidirectional rnn. Default false.
+        :return: Tensor (batch_size, window_length, rnn_hidden_units). The output of rnn.
+        """
+
+        x = tf.unstack(value=x, num=window_length, axis=time_axis)
+
+        if use_bidirectional_rnn:
+            outputs, _, _ = static_bidirectional_rnn(fw_cell, bw_cell, x, dtype=tf.float32)
+        else:
+            outputs, _ = static_rnn(fw_cell, x, dtype=tf.float32)
+
+        outputs = tf.stack(outputs, axis=time_axis)     # (batch_size, window_length, rnn_hidden_units)
+        return outputs
+
+    @instance_reuse
+    def a_rnn_net(self, x, window_length, time_axis=1,
+                  use_bidirectional_rnn=False, use_self_attention=False, is_training=False):
+        """
+        Reverse rnn network a, capture the future information in qnet.
+        """
+        def dropout_fn(input):
+            return tf.layers.dropout(input, rate=.5, training=is_training)
+
+        flag = False
+        if len(x.shape) == 4:               # (n_samples, batch_size, window_length, x_dim)
+            x, s1, s2 = spt.ops.flatten_to_ndims(x, 3)
+            flag = True
+        elif len(x.shape) != 3:
+            logging.error('rnn input shape error.')
+
+        # reverse the input sequence
+        reversed_x = tf.reverse(x, axis=[time_axis])
+
+        if use_bidirectional_rnn:
+            reversed_outputs = self._my_rnn_net(x=reversed_x, window_length=window_length, fw_cell=self.a_fw_cell,
+                                       bw_cell=self.a_bw_cell, time_axis=time_axis,
+                                       use_bidirectional_rnn=use_bidirectional_rnn)
+        else:
+            reversed_outputs = self._my_rnn_net(x=reversed_x, window_length=window_length, fw_cell=self.a_fw_cell,
+                                       time_axis=time_axis, use_bidirectional_rnn=use_bidirectional_rnn)
+
+        outputs = tf.reverse(reversed_outputs, axis=[time_axis])
+
+        # self attention
+        if use_self_attention:
+            outputs1 = spt.layers.dense(outputs, 500, activation_fn=tf.nn.tanh, use_bias=True, scope='arnn_attention_dense1')
+            outputs1 = tf.nn.softmax(spt.layers.dense(outputs1, window_length,
+                                                      use_bias=False, scope='arnn_attention_dense2'), axis=1)
+            M_t = tf.matmul(tf.transpose(outputs, perm=[0, 2, 1]), outputs1)
+            outputs = tf.transpose(M_t, perm=[0, 2, 1])
+
+        # feature extraction layers
+        outputs = spt.layers.dense(outputs, units=500, activation_fn=tf.nn.leaky_relu if self.config.use_leaky_relu else tf.nn.relu,
+                                   kernel_regularizer=l2_regularizer(self.config.l2_reg), scope='arnn_feature_dense1')
+        if self.config.dropout_feature:
+            outputs = dropout_fn(outputs)
+        outputs = spt.layers.dense(outputs, units=500, activation_fn=tf.nn.leaky_relu if self.config.use_leaky_relu else tf.nn.relu,
+                                   kernel_regularizer=l2_regularizer(self.config.l2_reg), scope='arnn_feature_dense2')
+        if self.config.dropout_feature:
+            outputs = dropout_fn(outputs)
+
+        if flag:
+            outputs = spt.ops.unflatten_from_ndims(outputs, s1, s2)
+
+        return outputs
+
+    @instance_reuse
+    def qz_mean_layer(self, x):
+        return spt.layers.dense(x, units=self.config.z_dim, scope='qz_mean')
+
+    @instance_reuse
+    def qz_logstd_layer(self, x):
+        return tf.clip_by_value(spt.layers.dense(x, units=self.config.z_dim, scope='qz_logstd'),
+                                clip_value_min=self.config.logstd_min, clip_value_max=self.config.logstd_max)
+
+    @instance_reuse
+    def pz_mean_layer(self, x):
+        return spt.layers.dense(x, units=self.config.z_dim, scope='pz_mean')
+
+    @instance_reuse
+    def pz_logstd_layer(self, x):
+        return tf.clip_by_value(spt.layers.dense(x, units=self.config.z_dim, scope='pz_logstd'),
+                                clip_value_min=self.config.logstd_min, clip_value_max=self.config.logstd_max)
+
+    @instance_reuse
+    def hz2_deconv(self, z2):
+        with arg_scope([deconv1d],
+                       kernel_size=5,
+                       activation_fn=tf.nn.leaky_relu if self.config.use_leaky_relu else tf.nn.relu,
+                       kernel_regularizer=l2_regularizer(self.config.l2_reg)):
+            h_z = deconv1d(z2, out_channels=self.config.x_dim, output_shape=self.config.output_shape[0], strides=2)
+            h_z = deconv1d(h_z, out_channels=self.config.x_dim, output_shape=self.config.output_shape[1], strides=1)
+            h_z2 = deconv1d(h_z, out_channels=self.config.x_dim, output_shape=self.config.output_shape[2], strides=2)
+        return h_z2
+
+    @instance_reuse
+    def q_net(self, x, observed=None, u=None, n_z=None, is_training=False):
+        # vs.name = self.variable_scope.name + "/q_net"
+        logging.info('q_net builder: %r', locals())
+
+        net = BayesianNet(observed=observed)
+
+        def dropout_fn(input):
+            return tf.layers.dropout(input, rate=.5, training=is_training)
+
+        # use the pretrained z2 which compress along the time dimension
+        qz2_mean, qz2_logstd = self.h_for_qz(x, is_training=is_training)
+
+        qz2_distribution = Normal(mean=qz2_mean, logstd=qz2_logstd)
+
+        qz2_distribution = qz2_distribution.batch_ndims_to_value(2)
+
+        z2 = net.add('z2', qz2_distribution, n_samples=n_z, is_reparameterized=True)
+
+        # d_{1:t} from deconv
+        h_z = self.h_for_px(z2)
+
+        # a_{1:t}, (batch_size, window_length, dense_hidden_units)
+        arnn_out = self.a_rnn_net(h_z, window_length=self.config.window_length,
+                                  use_bidirectional_rnn=self.config.use_bidirectional_rnn,
+                                  use_self_attention=self.config.use_self_attention,
+                                  is_training=is_training)
+
+        if self.config.connect_qz:
+            qz_distribution = RecurrentDistribution(arnn_out,
+                                                    mean_layer=self.qz_mean_layer, logstd_layer=self.qz_logstd_layer,
+                                                    z_dim=self.config.z_dim, window_length=self.config.window_length)
+        else:
+            qz_mean = spt.layers.dense(arnn_out, units=self.config.z_dim, scope='qz1_mean')
+            qz_logstd = tf.clip_by_value(spt.layers.dense(arnn_out, units=self.config.z_dim, scope='qz1_logstd'),
+                                         clip_value_min=self.config.logstd_min, clip_value_max=self.config.logstd_max)
+            qz_distribution = Normal(mean=qz_mean, logstd=qz_logstd)
+
+        if self.posterior_flow is not None:
+            qz_distribution = FlowDistribution(distribution=qz_distribution, flow=self.posterior_flow).batch_ndims_to_value(1)
+        else:
+            qz_distribution = qz_distribution.batch_ndims_to_value(2)
+
+        z1 = net.add('z1', qz_distribution, is_reparameterized=True)
+
+        return net
+
+    @instance_reuse
+    def p_net(self, observed=None, u=None, n_z=None, is_training=False):
+        logging.info('p_net builder: %r', locals())
+
+        net = BayesianNet(observed=observed)
+
+        pz2_distribution = Normal(mean=tf.zeros([self.config.z2_dim, self.config.x_dim]),
+                                 logstd=tf.zeros([self.config.z2_dim, self.config.x_dim])).batch_ndims_to_value(2)
+
+        z2 = net.add('z2', pz2_distribution, n_samples=n_z, is_reparameterized=True)
+
+        # e_{1:t} from deconv, shared params
+        h_z2 = self.h_for_px(z2)
+
+        if self.config.connect_pz:
+            pz_distribution = RecurrentDistribution(h_z2,
+                                                    mean_layer=self.pz_mean_layer, logstd_layer=self.pz_logstd_layer,
+                                                    z_dim=self.config.z_dim, window_length=self.config.window_length)
+        else:
+            # non-recurrent pz
+            pz_mean = spt.layers.dense(h_z2, units=self.config.z_dim, scope='pz_mean')
+            pz_logstd = tf.clip_by_value(spt.layers.dense(h_z2,
+                                                          units=self.config.z_dim, scope='pz_logstd'),
+                                                          clip_value_min=self.config.logstd_min,
+                                                          clip_value_max=self.config.logstd_max)
+            pz_distribution = Normal(mean=pz_mean, logstd=pz_logstd)
+
+        if self.prior_flow is not None:
+            pz_distribution = FlowDistribution(distribution=pz_distribution, flow=self.prior_flow).batch_ndims_to_value(1)
+        else:
+            pz_distribution = pz_distribution.batch_ndims_to_value(2)
+
+        z1 = net.add('z1', pz_distribution, is_reparameterized=True)
+
+        h_z1 = spt.layers.dense(z1, units=self.config.x_dim)
+
+        h_z = spt.ops.broadcast_concat(h_z1, h_z2, axis=-1)
+
+        h_z = spt.layers.dense(h_z, units=500, activation_fn=tf.nn.leaky_relu if self.config.use_leaky_relu else tf.nn.relu,
+                               kernel_regularizer=l2_regularizer(self.config.l2_reg), scope='feature_dense1')
+
+        h_z = spt.layers.dense(h_z, units=500, activation_fn=tf.nn.leaky_relu if self.config.use_leaky_relu else tf.nn.relu,
+                               kernel_regularizer=l2_regularizer(self.config.l2_reg), scope='feature_dense2')
+
+        x_mean = spt.layers.dense(h_z, units=self.config.x_dim, scope='x_mean')
+        if self.config.unified_px_logstd:
+            x_logstd = tf.clip_by_value(
+                tf.get_variable(name='x_logstd', shape=(), trainable=True, dtype=tf.float32,
+                                initializer=tf.constant_initializer(-1., dtype=tf.float32)),
+                                clip_value_min=self.config.logstd_min, clip_value_max=self.config.logstd_max)
+        else:
+            x_logstd = tf.clip_by_value(spt.layers.dense(h_z, units=self.config.x_dim, scope='x_logstd'),
+                                        clip_value_min=self.config.logstd_min, clip_value_max=self.config.logstd_max)
+
+        x = net.add('x',
+                    Normal(mean=x_mean, logstd=x_logstd).batch_ndims_to_value(2),
+                    is_reparameterized=True)
+
+        return net
+
+    def reconstruct(self, x, u, mask, n_z=None):
+        with tf.name_scope('model.reconstruct'):
+            qnet = self.q_net(x=x, u=u, n_z=n_z)
+            pnet = self.p_net(observed={'z1': qnet['z1'], 'z2': qnet['z2']}, u=u)
+        return pnet['x']
+
+    def get_score(self, x_embed, x_eval, u, n_z=None):
+        with tf.name_scope('model.get_score'):
+            qnet = self.q_net(x=x_embed, u=u, n_z=n_z)
+            pnet = self.p_net(observed={'z1': qnet['z1'], 'z2': qnet['z2']}, u=u)
+            score = pnet['x'].distribution.base_distribution.log_prob(x_eval)
+            recons_mean = pnet['x'].distribution.base_distribution.mean
+            recons_std = pnet['x'].distribution.base_distribution.std
+            if n_z is not None:
+                score = tf.reduce_mean(score, axis=0)
+                recons_mean = tf.reduce_mean(recons_mean, axis=0)
+                recons_std = tf.reduce_mean(recons_std, axis=0)
+        return score, recons_mean, recons_std
+
+    @instance_reuse
+    def h_for_qz(self, x, is_training=False):
+        with arg_scope([conv1d],
+                       kernel_size=5,
+                       activation_fn=tf.nn.leaky_relu if self.config.use_leaky_relu else tf.nn.relu,
+                       kernel_regularizer=l2_regularizer(self.config.l2_reg)):
+            h_x = conv1d(x, out_channels=self.config.x_dim, strides=2)   # 15
+            h_x = conv1d(h_x, out_channels=self.config.x_dim)
+            h_x = conv1d(h_x, out_channels=self.config.x_dim, strides=2)        # 8
+
+        qz_mean = conv1d(h_x, kernel_size=1, out_channels=self.config.x_dim)
+        qz_logstd = conv1d(h_x, kernel_size=1, out_channels=self.config.x_dim)
+        qz_logstd = tf.clip_by_value(qz_logstd, clip_value_min=self.config.logstd_min,
+                                     clip_value_max=self.config.logstd_max)
+        return qz_mean, qz_logstd
+
+    @instance_reuse
+    def h_for_px(self, z):
+        with arg_scope([deconv1d],
+                       kernel_size=5,
+                       activation_fn=tf.nn.leaky_relu if self.config.use_leaky_relu else tf.nn.relu,
+                       kernel_regularizer=l2_regularizer(self.config.l2_reg)):
+            h_z = deconv1d(z, out_channels=self.config.x_dim, output_shape=self.config.output_shape[0], strides=2)
+            h_z = deconv1d(h_z, out_channels=self.config.x_dim, output_shape=self.config.output_shape[1], strides=1)
+            h_z = deconv1d(h_z, out_channels=self.config.x_dim, output_shape=self.config.output_shape[2], strides=2)
+        return h_z
+
+    @instance_reuse
+    def pretrain_q_net(self, x, observed=None, n_z=None, is_training=False):
+        # vs.name = self.variable_scope.name + "/q_net"
+        logging.info('pretrain_q_net builder: %r', locals())
+
+        net = BayesianNet(observed=observed)
+
+        def dropout_fn(input):
+            return tf.layers.dropout(input, rate=.5, training=is_training)
+
+        qz_mean, qz_logstd = self.h_for_qz(x, is_training=is_training)
+
+        qz_distribution = Normal(mean=qz_mean, logstd=qz_logstd)
+
+        qz_distribution = qz_distribution.batch_ndims_to_value(2)
+
+        z = net.add('z', qz_distribution, n_samples=n_z, is_reparameterized=True)
+
+        return net
+
+    @instance_reuse
+    def pretrain_p_net(self, observed=None, n_z=None, is_training=False):
+        logging.info('p_net builder: %r', locals())
+
+        net = BayesianNet(observed=observed)
+
+        pz_distribution = Normal(mean=tf.zeros([self.config.z2_dim, self.config.x_dim]),
+                                 logstd=tf.zeros([self.config.z2_dim, self.config.x_dim]))
+
+        pz_distribution = pz_distribution.batch_ndims_to_value(2)
+
+        z = net.add('z',
+                    pz_distribution,
+                    n_samples=n_z, is_reparameterized=True)
+
+        h_z = self.h_for_px(z)
+
+        px_mean = conv1d(h_z, kernel_size=1, out_channels=self.config.x_dim, scope='pre_px_mean')
+        px_logstd = conv1d(h_z, kernel_size=1, out_channels=self.config.x_dim, scope='pre_px_logstd')
+        px_logstd = tf.clip_by_value(px_logstd, clip_value_min=self.config.logstd_min,
+                                     clip_value_max=self.config.logstd_max)
+
+        x = net.add('x',
+                    Normal(mean=px_mean, logstd=px_logstd).batch_ndims_to_value(2),
+                    is_reparameterized=True)
+
+        return net
diff --git a/networks/InterFusion/algorithm/__init__.py b/networks/InterFusion/algorithm/__init__.py
new file mode 100644
index 0000000..bb2b167
--- /dev/null
+++ b/networks/InterFusion/algorithm/__init__.py
@@ -0,0 +1,4 @@
+from .recurrent_distribution import RecurrentDistribution
+from .real_nvp import dense_real_nvp
+from .utils import *
+from .mcmc_recons import *
\ No newline at end of file
diff --git a/networks/InterFusion/algorithm/conv1d_.py b/networks/InterFusion/algorithm/conv1d_.py
new file mode 100644
index 0000000..1b0fa01
--- /dev/null
+++ b/networks/InterFusion/algorithm/conv1d_.py
@@ -0,0 +1,534 @@
+import numpy as np
+import tensorflow as tf
+from tensorflow.contrib.framework import add_arg_scope
+
+from tfsnippet.ops import (assert_rank, assert_scalar_equal, flatten_to_ndims,
+                           unflatten_from_ndims)
+from tfsnippet.utils import (validate_positive_int_arg, ParamSpec,
+                             is_tensor_object, assert_deps,
+                             get_shape,
+                             add_name_and_scope_arg_doc, model_variable,
+                             maybe_check_numerics, maybe_add_histogram, InputSpec,
+                             get_static_shape, validate_enum_arg, validate_int_tuple_arg)
+
+__all__ = [
+    'conv1d', 'deconv1d', 'validate_conv1d_input', 'get_deconv_output_length', 'batch_norm_1d'
+]
+
+
+@add_arg_scope
+def batch_norm_1d(input, channels_last=True, training=False, name=None,
+                  scope=None):
+    """
+    Apply batch normalization on 1D convolutional layer.
+
+    Args:
+        input (tf.Tensor): The input tensor.
+        channels_last (bool): Whether or not the channel dimension is at last?
+        training (bool or tf.Tensor): Whether or not the model is under
+            training stage?
+
+    Returns:
+        tf.Tensor: The normalized tensor.
+    """
+    with tf.variable_scope(scope, default_name=name or 'batch_norm_1d'):
+        input, s1, s2 = flatten_to_ndims(input, ndims=3)
+        output = tf.layers.batch_normalization(
+            input,
+            axis=-1 if channels_last else -2,
+            training=training,
+            name='norm'
+        )
+        output = unflatten_from_ndims(output, s1, s2)
+        return output
+
+
+def validate_conv1d_input(input, channels_last, arg_name='input'):
+    """
+    Validate the input for 1-d convolution.
+    Args:
+        input: The input tensor, must be at least 3-d.
+        channels_last (bool): Whether or not the last dimension is the
+            channels dimension? (i.e., the data format is (batch, length, channels))
+        arg_name (str): Name of the input argument.
+    Returns:
+        (tf.Tensor, int, str): The validated input tensor, the number of input
+            channels, and the data format.
+    """
+    if channels_last:
+        channel_axis = -1
+        input_spec = InputSpec(shape=('...', '?', '?', '*'))
+        data_format = "NWC"
+    else:
+        channel_axis = -2
+        input_spec = InputSpec(shape=('...', '?', '*', '?'))
+        data_format = "NCW"
+    input = input_spec.validate(arg_name, input)
+    input_shape = get_static_shape(input)
+    in_channels = input_shape[channel_axis]
+
+    return input, in_channels, data_format
+
+
+def get_deconv_output_length(input_length, kernel_size, strides, padding):
+    """
+    Get the output length of deconvolution at a specific dimension.
+    Args:
+        input_length: Input tensor length.
+        kernel_size: The size of the kernel.
+        strides: The stride of convolution.
+        padding: One of {"same", "valid"}, case in-sensitive
+    Returns:
+        int: The output length of deconvolution.
+    """
+    padding = validate_enum_arg(
+        'padding', str(padding).upper(), ['SAME', 'VALID'])
+    output_length = input_length * strides
+    if padding == 'VALID':
+        output_length += max(kernel_size - strides, 0)
+    return output_length
+
+
+@add_arg_scope
+@add_name_and_scope_arg_doc
+def conv1d(input,
+           out_channels,
+           kernel_size,
+           strides=1,
+           dilations=1,
+           padding='same',
+           channels_last=True,
+           activation_fn=None,
+           normalizer_fn=None,
+           gated=False,
+           gate_sigmoid_bias=2.,
+           kernel=None,
+           kernel_mask=None,
+           kernel_initializer=None,
+           kernel_regularizer=None,
+           kernel_constraint=None,
+           use_bias=None,
+           bias=None,
+           bias_initializer=tf.zeros_initializer(),
+           bias_regularizer=None,
+           bias_constraint=None,
+           trainable=True,
+           name=None,
+           scope=None):
+    """
+    1D convolutional layer.
+    Args:
+        input (Tensor): The input tensor, at least 3-d.
+        out_channels (int): The channel numbers of the output.
+        kernel_size (int or tuple(int,)): Kernel size over spatial dimensions.
+        strides (int): Strides over spatial dimensions.
+        dilations (int): The dilation factor over spatial dimensions.
+        padding: One of {"valid", "same"}, case in-sensitive.
+        channels_last (bool): Whether or not the channel axis is the last
+            axis in `input`? (i.e., the data format is "NWC")
+        activation_fn: The activation function.
+        normalizer_fn: The normalizer function.
+        gated (bool): Whether or not to use gate on output?
+            `output = activation_fn(output) * sigmoid(gate)`.
+        gate_sigmoid_bias (Tensor): The bias added to `gate` before applying
+            the `sigmoid` activation.
+        kernel (Tensor): Instead of creating a new variable, use this tensor.
+        kernel_mask (Tensor): If specified, multiply this mask onto `kernel`,
+            i.e., the actual kernel to use will be `kernel * kernel_mask`.
+        kernel_initializer: The initializer for `kernel`.
+            Would be ``default_kernel_initializer(...)`` if not specified.
+        kernel_regularizer: The regularizer for `kernel`.
+        kernel_constraint: The constraint for `kernel`.
+        use_bias (bool or None): Whether or not to use `bias`?
+            If :obj:`True`, will always use bias.
+            If :obj:`None`, will use bias only if `normalizer_fn` is not given.
+            If :obj:`False`, will never use bias.
+            Default is :obj:`None`.
+        bias (Tensor): Instead of creating a new variable, use this tensor.
+        bias_initializer: The initializer for `bias`.
+        bias_regularizer: The regularizer for `bias`.
+        bias_constraint: The constraint for `bias`.
+        trainable (bool): Whether or not the parameters are trainable?
+    Returns:
+        tf.Tensor: The output tensor.
+    """
+    if not channels_last:
+        raise ValueError('Currently only channels_last=True is supported.')
+    input, in_channels, data_format = \
+        validate_conv1d_input(input, channels_last)
+    out_channels = validate_positive_int_arg('out_channels', out_channels)
+    dtype = input.dtype.base_dtype
+    if gated:
+        out_channels *= 2
+
+    # check functional arguments
+    padding = validate_enum_arg(
+        'padding', str(padding).upper(), ['VALID', 'SAME'])
+    dilations = validate_positive_int_arg('dilations', dilations)
+    strides = validate_positive_int_arg('strides', strides)
+
+    if dilations > 1 and not channels_last:
+        raise ValueError('`channels_last` == False is incompatible with '
+                         '`dilations` > 1.')
+
+    if strides > 1 and dilations > 1:
+        raise ValueError('`strides` > 1 is incompatible with `dilations` > 1.')
+
+    if use_bias is None:
+        use_bias = normalizer_fn is None
+
+    # get the specification of outputs and parameters
+    kernel_size = validate_int_tuple_arg('kernel_size', kernel_size)
+    kernel_shape = kernel_size + (in_channels, out_channels)
+    bias_shape = (out_channels,)
+
+    # validate the parameters
+    if kernel is not None:
+        kernel_spec = ParamSpec(shape=kernel_shape, dtype=dtype)
+        kernel = kernel_spec.validate('kernel', kernel)
+    if kernel_mask is not None:
+        kernel_mask_spec = InputSpec(dtype=dtype)
+        kernel_mask = kernel_mask_spec.validate('kernel_mask', kernel_mask)
+    if kernel_initializer is None:
+        kernel_initializer = tf.glorot_normal_initializer()
+    if bias is not None:
+        bias_spec = ParamSpec(shape=bias_shape, dtype=dtype)
+        bias = bias_spec.validate('bias', bias)
+
+    # the main part of the conv1d layer
+    with tf.variable_scope(scope, default_name=name or 'conv1d'):
+        c_axis = -1 if channels_last else -2
+
+        # create the variables
+        if kernel is None:
+            kernel = model_variable(
+                'kernel',
+                shape=kernel_shape,
+                dtype=dtype,
+                initializer=kernel_initializer,
+                regularizer=kernel_regularizer,
+                constraint=kernel_constraint,
+                trainable=trainable
+            )
+
+        if kernel_mask is not None:
+            kernel = kernel * kernel_mask
+
+        maybe_add_histogram(kernel, 'kernel')
+        kernel = maybe_check_numerics(kernel, 'kernel')
+
+        if use_bias and bias is None:
+            bias = model_variable(
+                'bias',
+                shape=bias_shape,
+                initializer=bias_initializer,
+                regularizer=bias_regularizer,
+                constraint=bias_constraint,
+                trainable=trainable
+            )
+            maybe_add_histogram(bias, 'bias')
+            bias = maybe_check_numerics(bias, 'bias')
+
+        # flatten to 3d
+        output, s1, s2 = flatten_to_ndims(input, 3)
+
+        # do convolution
+        if dilations > 1:
+            output = tf.nn.convolution(
+                input=output,
+                filter=kernel,
+                dilation_rate=(dilations,),
+                padding=padding,
+                data_format=data_format
+            )
+        else:
+            output = tf.nn.conv1d(
+                value=output,
+                filters=kernel,
+                stride=strides,
+                padding=padding,
+                data_format=data_format
+            )
+
+        # add bias
+        if use_bias:
+            output = tf.add(output, bias)
+
+        # apply the normalization function if specified
+        if normalizer_fn is not None:
+            output = normalizer_fn(output)
+
+        # split into halves if gated
+        if gated:
+            output, gate = tf.split(output, 2, axis=c_axis)
+
+        # apply the activation function if specified
+        if activation_fn is not None:
+            output = activation_fn(output)
+
+        # apply the gate if required
+        if gated:
+            if gate_sigmoid_bias is None:
+                gate_sigmoid_bias = model_variable(
+                    'gate_sigmoid_bias',
+                    shape=bias_shape,
+                    initializer=bias_initializer,
+                    regularizer=bias_regularizer,
+                    constraint=bias_constraint,
+                    trainable=trainable
+                )
+                maybe_add_histogram(gate_sigmoid_bias, 'gate_sigmoid_bias')
+                gate_sigmoid_bias = maybe_check_numerics(gate_sigmoid_bias, 'gate_sigmoid_bias')
+            output = output * tf.sigmoid(gate + gate_sigmoid_bias, name='gate')
+
+        # unflatten back to original shape
+        output = unflatten_from_ndims(output, s1, s2)
+
+        maybe_add_histogram(output, 'output')
+        output = maybe_check_numerics(output, 'output')
+    return output
+
+
+@add_arg_scope
+@add_name_and_scope_arg_doc
+def deconv1d(input,
+             out_channels,
+             kernel_size,
+             strides=1,
+             padding='same',
+             channels_last=True,
+             output_shape=None,
+             activation_fn=None,
+             normalizer_fn=None,
+             gated=False,
+             gate_sigmoid_bias=2.,
+             kernel=None,
+             kernel_initializer=None,
+             kernel_regularizer=None,
+             kernel_constraint=None,
+             use_bias=None,
+             bias=None,
+             bias_initializer=tf.zeros_initializer(),
+             bias_regularizer=None,
+             bias_constraint=None,
+             trainable=True,
+             name=None,
+             scope=None):
+    """
+    1D deconvolutional layer.
+    Args:
+        input (Tensor): The input tensor, at least 3-d.
+        out_channels (int): The channel numbers of the deconvolution output.
+        kernel_size (int or tuple(int,)): Kernel size over spatial dimensions.
+        strides (int): Strides over spatial dimensions.
+        padding: One of {"valid", "same"}, case in-sensitive.
+        channels_last (bool): Whether or not the channel axis is the last
+            axis in `input`? (i.e., the data format is "NWC")
+        output_shape: If specified, use this as the shape of the
+            deconvolution output; otherwise compute the size of each dimension
+            by::
+                output_size = input_size * strides
+                if padding == 'valid':
+                    output_size += max(kernel_size - strides, 0)
+        activation_fn: The activation function.
+        normalizer_fn: The normalizer function.
+        gated (bool): Whether or not to use gate on output?
+            `output = activation_fn(output) * sigmoid(gate)`.
+        gate_sigmoid_bias (Tensor): The bias added to `gate` before applying
+            the `sigmoid` activation.
+        kernel (Tensor): Instead of creating a new variable, use this tensor.
+        kernel_initializer: The initializer for `kernel`.
+            Would be ``default_kernel_initializer(...)`` if not specified.
+        kernel_regularizer: The regularizer for `kernel`.
+        kernel_constraint: The constraint for `kernel`.
+        use_bias (bool or None): Whether or not to use `bias`?
+            If :obj:`True`, will always use bias.
+            If :obj:`None`, will use bias only if `normalizer_fn` is not given.
+            If :obj:`False`, will never use bias.
+            Default is :obj:`None`.
+        bias (Tensor): Instead of creating a new variable, use this tensor.
+        bias_initializer: The initializer for `bias`.
+        bias_regularizer: The regularizer for `bias`.
+        bias_constraint: The constraint for `bias`.
+        trainable (bool): Whether or not the parameters are trainable?
+    Returns:
+        tf.Tensor: The output tensor.
+    """
+    if not channels_last:
+        raise ValueError('Currently only channels_last=True is supported.')
+    input, in_channels, data_format = \
+        validate_conv1d_input(input, channels_last)
+    out_channels = validate_positive_int_arg('out_channels', out_channels)
+    dtype = input.dtype.base_dtype
+    if gated:
+        out_channels *= 2
+
+    # check functional arguments
+    padding = validate_enum_arg(
+        'padding', str(padding).upper(), ['VALID', 'SAME'])
+    strides = validate_positive_int_arg('strides', strides)
+
+    if use_bias is None:
+        use_bias = normalizer_fn is None
+
+    # get the specification of outputs and parameters
+    kernel_size = validate_int_tuple_arg('kernel_size', kernel_size)
+    kernel_shape = kernel_size + (out_channels, in_channels)
+    bias_shape = (out_channels,)
+
+    given_w = None
+    given_output_shape = output_shape
+
+    if is_tensor_object(given_output_shape):
+        given_output_shape = tf.convert_to_tensor(given_output_shape)
+    elif given_output_shape is not None:
+        given_w = given_output_shape
+
+    # validate the parameters
+    if kernel is not None:
+        kernel_spec = ParamSpec(shape=kernel_shape, dtype=dtype)
+        kernel = kernel_spec.validate('kernel', kernel)
+    if kernel_initializer is None:
+        kernel_initializer = tf.glorot_normal_initializer()
+    if bias is not None:
+        bias_spec = ParamSpec(shape=bias_shape, dtype=dtype)
+        bias = bias_spec.validate('bias', bias)
+
+    # the main part of the conv2d layer
+    with tf.variable_scope(scope, default_name=name or 'deconv1d'):
+        with tf.name_scope('output_shape'):
+            # detect the input shape and axis arrangements
+            input_shape = get_static_shape(input)
+            if channels_last:
+                c_axis, w_axis = -1, -2
+            else:
+                c_axis, w_axis = -2, -1
+
+            output_shape = [None, None, None]
+            output_shape[c_axis] = out_channels
+            if given_output_shape is None:
+                if input_shape[w_axis] is not None:
+                    output_shape[w_axis] = get_deconv_output_length(
+                        input_shape[w_axis], kernel_shape[0], strides[0],
+                        padding
+                    )
+            else:
+                if not is_tensor_object(given_output_shape):
+                    output_shape[w_axis] = given_w
+
+            # infer the batch shape in 3-d
+            batch_shape = input_shape[:-2]
+            if None not in batch_shape:
+                output_shape[0] = int(np.prod(batch_shape))
+
+            # now the static output shape is ready
+            output_static_shape = tf.TensorShape(output_shape)
+
+            # prepare for the dynamic batch shape
+            if output_shape[0] is None:
+                output_shape[0] = tf.reduce_prod(get_shape(input)[:-2])
+
+            # prepare for the dynamic spatial dimensions
+            if output_shape[w_axis] is None:
+                if given_output_shape is None:
+                    input_shape = get_shape(input)
+                    if output_shape[w_axis] is None:
+                        output_shape[w_axis] = get_deconv_output_length(
+                            input_shape[w_axis], kernel_shape[0],
+                            strides[0], padding
+                        )
+                else:
+                    assert(is_tensor_object(given_output_shape))
+                    with assert_deps([
+                        assert_rank(given_output_shape, 1),
+                        assert_scalar_equal(
+                            tf.size(given_output_shape), 1)
+                    ]):
+                        output_shape[w_axis] = given_output_shape[0]
+
+            # compose the final dynamic shape
+            if any(is_tensor_object(s) for s in output_shape):
+                output_shape = tf.stack(output_shape)
+            else:
+                output_shape = tuple(output_shape)
+
+        # create the variables
+        if kernel is None:
+            kernel = model_variable(
+                'kernel',
+                shape=kernel_shape,
+                dtype=dtype,
+                initializer=kernel_initializer,
+                regularizer=kernel_regularizer,
+                constraint=kernel_constraint,
+                trainable=trainable
+            )
+
+        maybe_add_histogram(kernel, 'kernel')
+        kernel = maybe_check_numerics(kernel, 'kernel')
+
+        if use_bias and bias is None:
+            bias = model_variable(
+                'bias',
+                shape=bias_shape,
+                initializer=bias_initializer,
+                regularizer=bias_regularizer,
+                constraint=bias_constraint,
+                trainable=trainable
+            )
+            maybe_add_histogram(bias, 'bias')
+            bias = maybe_check_numerics(bias, 'bias')
+
+        # flatten to 3d
+        output, s1, s2 = flatten_to_ndims(input, 3)
+
+        # do convolution or deconvolution
+        output = tf.contrib.nn.conv1d_transpose(
+            value=output,
+            filter=kernel,
+            output_shape=output_shape,
+            stride=strides,
+            padding=padding,
+            data_format=data_format
+        )
+        if output_static_shape is not None:
+            output.set_shape(output_static_shape)
+
+        # add bias
+        if use_bias:
+            output = tf.add(output, bias)
+
+        # apply the normalization function if specified
+        if normalizer_fn is not None:
+            output = normalizer_fn(output)
+
+        # split into halves if gated
+        if gated:
+            output, gate = tf.split(output, 2, axis=c_axis)
+
+        # apply the activation function if specified
+        if activation_fn is not None:
+            output = activation_fn(output)
+
+        # apply the gate if required
+        if gated:
+            if gate_sigmoid_bias is None:
+                gate_sigmoid_bias = model_variable(
+                    'gate_sigmoid_bias',
+                    shape=bias_shape,
+                    initializer=bias_initializer,
+                    regularizer=bias_regularizer,
+                    constraint=bias_constraint,
+                    trainable=trainable
+                )
+                maybe_add_histogram(gate_sigmoid_bias, 'gate_sigmoid_bias')
+                gate_sigmoid_bias = maybe_check_numerics(gate_sigmoid_bias, 'gate_sigmoid_bias')
+            output = output * tf.sigmoid(gate + gate_sigmoid_bias, name='gate')
+
+        # unflatten back to original shape
+        output = unflatten_from_ndims(output, s1, s2)
+
+        maybe_add_histogram(output, 'output')
+        output = maybe_check_numerics(output, 'output')
+
+    return output
diff --git a/networks/InterFusion/algorithm/mcmc_recons.py b/networks/InterFusion/algorithm/mcmc_recons.py
new file mode 100644
index 0000000..7eb77b8
--- /dev/null
+++ b/networks/InterFusion/algorithm/mcmc_recons.py
@@ -0,0 +1,68 @@
+import tensorflow as tf
+
+
+__all__ = ['masked_reconstruct', 'mcmc_reconstruct']
+
+
+def masked_reconstruct(reconstruct, x, u, mask, name=None):
+    """
+    Replace masked elements of `x` with reconstructed outputs.
+    The potential anomaly points on x can be masked, and replaced by the reconstructed values.
+    This can make the reconstruction more likely to be the normal pattern x should follow.
+    Args:
+        reconstruct ((tf.Tensor, tf.Tensor, tf.Tensor) -> tf.Tensor): Function for reconstructing `x`.
+        x: The tensor to be reconstructed by `func`.
+        u: Additional input for reconstructing `x`.
+        mask: (tf.Tensor) mask, must be broadcastable into the shape of `x`.
+            Indicating whether or not to mask each element of `x`.
+        name (str): Name of this operation in TensorFlow graph.
+            (default "masked_reconstruct")
+    Returns:
+        tf.Tensor: `x` with masked elements replaced by reconstructed outputs.
+    """
+    with tf.name_scope(name, default_name='masked_reconstruct'):
+        x = tf.convert_to_tensor(x)  # type: tf.Tensor
+        mask = tf.convert_to_tensor(mask, dtype=tf.int32)  # type: tf.Tensor
+
+        mask = tf.broadcast_to(mask, tf.shape(x))
+
+        # get reconstructed x. Currently only support mask the last point if pixelcnn decoder is used.
+        x_recons = reconstruct(x, u, mask)
+
+        # get masked outputs
+        return tf.where(tf.cast(mask, dtype=tf.bool), x_recons, x)
+
+
+def mcmc_reconstruct(reconstruct, x, u, mask, iter_count,
+                                 back_prop=True, name=None):
+    """
+    Iteratively reconstruct `x` with `mask` for `iter_count` times.
+    This method will call :func:`masked_reconstruct` for `iter_count` times,
+    with the output from previous iteration as the input `x` for the next
+    iteration.  The output of the final iteration would be returned.
+    Args:
+        reconstruct: Function for reconstructing `x`.
+        x: The tensor to be reconstructed by `func`.
+        u: Additional input for reconstructing `x`.
+        mask: (tf.Tensor) mask, must be broadcastable into the shape of `x`.
+            Indicating whether or not to mask each element of `x`.
+        iter_count (int or tf.Tensor):
+            Number of mcmc iterations(must be greater than 1).
+        back_prop (bool): Whether or not to support back-propagation through
+            all the iterations? (default :obj:`True`)
+        name (str): Name of this operation in TensorFlow graph.
+            (default "iterative_masked_reconstruct")
+    Returns:
+        tf.Tensor: The iteratively reconstructed `x`.
+    """
+    with tf.name_scope(name, default_name='mcmc_reconstruct'):
+
+        # do the masked reconstructions
+        x_recons, _ = tf.while_loop(
+            lambda x_i, i: i < iter_count,
+            lambda x_i, i: (masked_reconstruct(reconstruct, x_i, u, mask), i + 1),
+            [x, tf.constant(0, dtype=tf.int32)],
+            back_prop=back_prop
+        )
+
+        return x_recons
diff --git a/networks/InterFusion/algorithm/real_nvp.py b/networks/InterFusion/algorithm/real_nvp.py
new file mode 100644
index 0000000..59c7c51
--- /dev/null
+++ b/networks/InterFusion/algorithm/real_nvp.py
@@ -0,0 +1,118 @@
+import tensorflow as tf
+import tfsnippet as spt
+from tensorflow.contrib.framework import arg_scope
+import numpy as np
+from tfsnippet.layers.flows.utils import ZeroLogDet
+
+
+class FeatureReversingFlow(spt.layers.FeatureMappingFlow):
+
+    def __init__(self, axis=-1, value_ndims=1, name=None, scope=None):
+        super(FeatureReversingFlow, self).__init__(
+            axis=int(axis), value_ndims=value_ndims, name=name, scope=scope)
+
+    @property
+    def explicitly_invertible(self):
+        return True
+
+    def _build(self, input=None):
+        pass
+
+    def _reverse_feature(self, x, compute_y, compute_log_det):
+        n_features = spt.utils.get_static_shape(x)[self.axis]
+        if n_features is None:
+            raise ValueError('The feature dimension must be fixed.')
+        assert (0 > self.axis >= -self.value_ndims >=
+                -len(spt.utils.get_static_shape(x)))
+        permutation = np.asarray(list(reversed(range(n_features))),
+                                 dtype=np.int32)
+
+        # compute y
+        y = None
+        if compute_y:
+            y = tf.gather(x, permutation, axis=self.axis)
+
+        # compute log_det
+        log_det = None
+        if compute_log_det:
+            log_det = ZeroLogDet(spt.utils.get_shape(x)[:-self.value_ndims],
+                                 x.dtype.base_dtype)
+
+        return y, log_det
+
+    def _transform(self, x, compute_y, compute_log_det):
+        return self._reverse_feature(x, compute_y, compute_log_det)
+
+    def _inverse_transform(self, y, compute_x, compute_log_det):
+        return self._reverse_feature(y, compute_x, compute_log_det)
+
+
+def dense_real_nvp(flow_depth: int,
+                   activation,
+                   kernel_regularizer,
+                   scope: str,
+                   use_invertible_flow=True,
+                   strict_invertible=False,
+                   use_actnorm_flow=False,
+                   dense_coupling_n_hidden_layers=1,
+                   dense_coupling_n_hidden_units=100,
+                   coupling_scale_shift_initializer='zero',     # 'zero' or 'normal'
+                   coupling_scale_shift_normal_initializer_stddev=0.001,
+                   coupling_scale_type='sigmoid',               # 'sigmoid' or 'exp'
+                   coupling_sigmoid_scale_bias=2.,
+                   is_prior_flow=False) -> spt.layers.BaseFlow:
+    def shift_and_scale(x1, n2):
+
+        with arg_scope([spt.layers.dense],
+                       activation_fn=activation,
+                       kernel_regularizer=kernel_regularizer):
+            h = x1
+            for j in range(dense_coupling_n_hidden_layers):
+                h = spt.layers.dense(h,
+                                     units=dense_coupling_n_hidden_units,
+                                     scope='hidden_{}'.format(j))
+
+        # compute shift and scale
+        if coupling_scale_shift_initializer == 'zero':
+            pre_params_initializer = tf.zeros_initializer()
+        else:
+            pre_params_initializer = tf.random_normal_initializer(
+                stddev=coupling_scale_shift_normal_initializer_stddev)
+        pre_params = spt.layers.dense(h,
+                                      units=n2 * 2,
+                                      kernel_initializer=pre_params_initializer,
+                                      scope='shift_and_scale',)
+
+        shift = pre_params[..., :n2]
+        scale = pre_params[..., n2:]
+
+        return shift, scale
+
+    with tf.variable_scope(scope):
+        flows = []
+        for i in range(flow_depth):
+            level = []
+            if use_invertible_flow:
+                level.append(
+                    spt.layers.InvertibleDense(
+                        strict_invertible=strict_invertible)
+                )
+            else:
+                level.append(FeatureReversingFlow())
+            level.append(
+                spt.layers.CouplingLayer(
+                    tf.make_template(
+                        'coupling', shift_and_scale, create_scope_now_=True),
+                    scale_type=coupling_scale_type,
+                    sigmoid_scale_bias=coupling_sigmoid_scale_bias,
+                )
+            )
+            if use_actnorm_flow:
+                level.append(spt.layers.ActNorm())
+            flows.extend(level)
+        flow = spt.layers.SequentialFlow(flows)
+
+    if is_prior_flow:
+        flow = flow.invert()
+
+    return flow
diff --git a/networks/InterFusion/algorithm/recurrent_distribution.py b/networks/InterFusion/algorithm/recurrent_distribution.py
new file mode 100644
index 0000000..1b55889
--- /dev/null
+++ b/networks/InterFusion/algorithm/recurrent_distribution.py
@@ -0,0 +1,196 @@
+import tensorflow as tf
+import tfsnippet as spt
+from tfsnippet.distributions import Distribution, Normal
+from tfsnippet.stochastic import StochasticTensor
+import numpy as np
+
+
+class RecurrentDistribution(Distribution):
+    def __init__(self, input, mean_layer, logstd_layer, z_dim, window_length,
+                 is_reparameterized=True, check_numerics=False):
+        batch_shape = spt.utils.concat_shapes([spt.utils.get_shape(input)[:-1], [z_dim]])
+        batch_static_shape = tf.TensorShape(spt.utils.get_static_shape(input)[:-1] + (z_dim,))
+
+        super(RecurrentDistribution, self).__init__(dtype=input.dtype,
+                                                    is_continuous=True,
+                                                    is_reparameterized=is_reparameterized,
+                                                    value_ndims=0,
+                                                    batch_shape=batch_shape,
+                                                    batch_static_shape=batch_static_shape)
+
+        self.mean_layer = mean_layer
+        self.logstd_layer = logstd_layer
+        self.z_dim = z_dim
+        self._check_numerics = check_numerics
+        self.window_length = window_length
+        self.origin_input = input
+        if len(input.shape) > 3:
+            input, s1, s2 = spt.ops.flatten_to_ndims(input, 3)
+            self.time_first_input = tf.transpose(input, [1, 0, 2])
+            self.s1 = s1
+            self.s2 = s2
+            self.need_unflatten = True
+        elif len(input.shape) == 3:
+            self.time_first_input = tf.transpose(input, [1, 0, 2])  # (window_length, batch_size, feature_dim)
+            self.s1 = None
+            self.s2 = None
+            self.need_unflatten = False
+        else:
+            raise ValueError('Invalid input shape in recurrent distribution.')
+        self._mu = None
+        self._logstd = None
+
+    def mean(self):
+        return self._mu
+
+    def logstd(self):
+        return self._logstd
+
+    def _normal_pdf(self, x, mu, logstd):
+        c = -0.5 * np.log(2 * np.pi)
+        precision = tf.exp(-2 * logstd)
+        if self._check_numerics:
+            precision = tf.check_numerics(precision, "precision")
+        log_prob = c - logstd - 0.5 * precision * tf.square(x - mu)
+        if self._check_numerics:
+            log_prob = tf.check_numerics(log_prob, 'log_prob')
+        return log_prob
+
+    def sample_step(self, a, t):
+        z_previous, mu_z_previous, logstd_z_previous, _ = a
+        noise, input = t
+
+        # use the sampled z to derive the (mu. sigma) on next timestamp. may introduce small noise for each sample step.
+        concat_input = spt.ops.broadcast_concat(input, z_previous, axis=-1)
+
+        mu = self.mean_layer(concat_input)  # n_sample * batch_size * z_dim
+
+        logstd = self.logstd_layer(concat_input)  # n_sample * batch_size * z_dim
+
+        std = spt.utils.maybe_check_numerics(tf.exp(logstd), name='recurrent_distribution_z_std',
+                                             message='z_std in recurrent distribution exceeds.')
+
+        z_n = mu + std * noise
+
+        log_prob = self._normal_pdf(z_n, mu, logstd)
+
+        return z_n, mu, logstd, log_prob
+
+    def log_prob_step(self, a, t):
+        z_previous, _, _, log_prob_previous = a
+        given_n, input_n = t
+
+        concat_input = spt.ops.broadcast_concat(z_previous, input_n, axis=-1)
+
+        mu = self.mean_layer(concat_input)
+
+        logstd = self.logstd_layer(concat_input)
+
+        log_prob_n = self._normal_pdf(given_n, mu, logstd)
+
+        return given_n, mu, logstd, log_prob_n
+
+    def sample(self, n_samples=None, is_reparameterized=None, group_ndims=0, compute_density=False,
+               name=None):
+
+        if n_samples is None:
+            n_samples = 1
+            n_samples_is_none = True
+        else:
+            n_samples_is_none = False
+
+        with tf.name_scope(name=name, default_name='sample'):
+            noise = tf.random_normal(shape=[n_samples, tf.shape(self.time_first_input)[0],
+                                            tf.shape(self.time_first_input)[1], self.z_dim])  # (n_samples, window_length, batch_size, z_dim)
+            noise = tf.transpose(noise, [1, 0, 2, 3])   # (window_length, n_samples, batch_size, z_dim)
+
+            time_indices_shape = tf.convert_to_tensor([n_samples, tf.shape(self.time_first_input)[1], self.z_dim])  # (n_samples, batch_size, z_dim)
+
+            results = tf.scan(fn=self.sample_step,
+                              elems=(noise, self.time_first_input),
+                              initializer=(tf.zeros(time_indices_shape),
+                                           tf.zeros(time_indices_shape),
+                                           tf.zeros(time_indices_shape),
+                                           tf.zeros(time_indices_shape)),
+                              back_prop=True
+                              )  # 4 * window_length * n_samples * batch_size * z_dim
+
+            samples = tf.transpose(results[0], [1, 2, 0, 3])  # n_samples * batch_size * window_length * z_dim
+
+            log_prob = tf.transpose(results[-1], [1, 2, 0, 3])  # (n_samples, batch_size, window_length, z_dim)
+
+            if self.need_unflatten:
+                # unflatten to (n_samples, n_samples_of_input_tensor, batch_size, window_length, z_dim)
+                samples = tf.stack([spt.ops.unflatten_from_ndims(samples[i], self.s1, self.s2) for i in range(n_samples)], axis=0)
+                log_prob = tf.stack([spt.ops.unflatten_from_ndims(log_prob[i], self.s1, self.s2) for i in range(n_samples)], axis=0)
+
+            log_prob = spt.reduce_group_ndims(tf.reduce_sum, log_prob, group_ndims)
+
+            if n_samples_is_none:
+                t = StochasticTensor(
+                    distribution=self,
+                    tensor=tf.reduce_mean(samples, axis=0),
+                    group_ndims=group_ndims,
+                    is_reparameterized=self.is_reparameterized,
+                    log_prob=tf.reduce_mean(log_prob, axis=0)
+                )
+                self._mu = tf.reduce_mean(tf.transpose(results[1], [1, 2, 0, 3]), axis=0)
+                self._logstd = tf.reduce_mean(tf.transpose(results[2], [1, 2, 0, 3]), axis=0)
+                if self.need_unflatten:
+                    self._mu = spt.ops.unflatten_from_ndims(self._mu, self.s1, self.s2)
+                    self._logstd = spt.ops.unflatten_from_ndims(self._logstd, self.s1, self.s2)
+            else:
+                t = StochasticTensor(
+                    distribution=self,
+                    tensor=samples,
+                    n_samples=n_samples,
+                    group_ndims=group_ndims,
+                    is_reparameterized=self.is_reparameterized,
+                    log_prob=log_prob
+                )
+                self._mu = tf.transpose(results[1], [1, 2, 0, 3])
+                self._logstd = tf.transpose(results[2], [1, 2, 0, 3])
+                if self.need_unflatten:
+                    self._mu = tf.stack([spt.ops.unflatten_from_ndims(self._mu[i], self.s1, self.s2) for i in range(n_samples)], axis=0)
+                    self._logstd = tf.stack([spt.ops.unflatten_from_ndims(self._logstd[i], self.s1, self.s2) for i in range(n_samples)], axis=0)
+
+            return t
+
+    def log_prob(self, given, group_ndims=0, name=None):
+        with tf.name_scope(name=name, default_name='log_prob'):
+            if self.need_unflatten:
+                assert len(given.shape) == len(self.origin_input.shape)
+                assert given.shape[0] == self.origin_input.shape[0]
+                time_first_input = tf.transpose(self.origin_input, [2, 0, 1, 3])    # (window, sample, batch, feature)
+                # time_indices_shape: (n_sample, batch_size, z_dim)
+                time_indices_shape = tf.convert_to_tensor([tf.shape(given)[0], tf.shape(time_first_input)[2], self.z_dim])
+                given = tf.transpose(given, [2, 0, 1, 3])
+            else:
+                if len(given.shape) > 3:    # (n_sample, batch_size, window_length, z_dim)
+                    time_indices_shape = tf.convert_to_tensor([tf.shape(given)[0], tf.shape(self.time_first_input)[1], self.z_dim])
+                    given = tf.transpose(given, [2, 0, 1, 3])
+                    time_first_input = self.time_first_input
+                else:                       # (batch_size, window_length, z_dim)
+                    time_indices_shape = tf.convert_to_tensor([tf.shape(self.time_first_input)[1], self.z_dim])
+                    given = tf.transpose(given, [1, 0, 2])
+                    time_first_input = self.time_first_input
+            results = tf.scan(fn=self.log_prob_step,
+                               elems=(given, time_first_input),
+                               initializer=(tf.zeros(time_indices_shape),
+                                            tf.zeros(time_indices_shape),
+                                            tf.zeros(time_indices_shape),
+                                            tf.zeros(time_indices_shape)),
+                               back_prop=True
+                               )        # (window_length, ?, batch_size, z_dim)
+            if len(given.shape) > 3:
+                log_prob = tf.transpose(results[-1], [1, 2, 0, 3])
+            else:
+                log_prob = tf.transpose(results[-1], [1, 0, 2])
+
+            log_prob = spt.reduce_group_ndims(tf.reduce_sum, log_prob, group_ndims)
+            return log_prob
+
+    def prob(self, given, group_ndims=0, name=None):
+        with tf.name_scope(name=name, default_name='prob'):
+            log_prob = self.log_prob(given, group_ndims, name)
+            return tf.exp(log_prob)
diff --git a/networks/InterFusion/algorithm/utils.py b/networks/InterFusion/algorithm/utils.py
new file mode 100644
index 0000000..7d10a4d
--- /dev/null
+++ b/networks/InterFusion/algorithm/utils.py
@@ -0,0 +1,276 @@
+import tfsnippet as spt
+import numpy as np
+import os
+import pickle
+from sklearn.preprocessing import MinMaxScaler
+from typing import *
+import tensorflow as tf
+
+# here, use 'min_max' or 'mean_std' for different method
+# method = 'min_max' or 'mean_std'
+method = 'min_max'
+alpha = 4.0  # mean +/- alpha * std
+
+
+def get_sliding_window_data_flow(window_size, batch_size, x, u=None, y=None, shuffle=False, skip_incomplete=False) -> spt.DataFlow:
+    n = len(x)
+    seq = np.arange(window_size - 1, n, dtype=np.int32).reshape([-1, 1])
+    seq_df: spt.DataFlow = spt.DataFlow.arrays(
+        [seq], shuffle=shuffle, skip_incomplete=skip_incomplete, batch_size=batch_size)
+    offset = np.arange(-window_size + 1, 1, dtype=np.int32)
+
+    if y is not None:
+        if u is not None:
+            df = seq_df.map(lambda idx: (x[idx + offset], u[idx + offset], y[idx + offset]))
+        else:
+            df = seq_df.map(lambda idx: (x[idx + offset], y[idx + offset]))
+    else:
+        if u is not None:
+            df = seq_df.map(lambda idx: (x[idx + offset], u[idx + offset]))
+        else:
+            df = seq_df.map(lambda idx: (x[idx + offset],))
+
+    return df
+
+
+def time_generator(timestamp):
+    mins = 60
+    hours = 24
+    days = 7
+    timestamp %= (mins * hours * days)
+    res = np.zeros([mins + hours + days])
+    res[int(timestamp / hours / mins)] = 1  # day
+    res[days + int((timestamp % (mins * hours)) / mins)] = 1  # hours
+    res[days + hours + int(timestamp % mins)] = 1  # min
+    return res
+
+
+def get_data_dim(dataset):
+    if dataset == 'SWaT':
+        return 51
+    elif dataset == 'WADI':
+        return 118
+    elif str(dataset).startswith('machine'):
+        return 38
+    elif str(dataset).startswith('omi'):
+        return 19
+    else:
+        raise ValueError('unknown dataset '+str(dataset))
+
+
+def get_data(dataset, max_train_size=None, max_test_size=None, print_log=True, do_preprocess=True, train_start=0,
+             test_start=0, valid_portion=0.3, prefix="./data/processed"):
+    """
+    get data from pkl files
+    return shape: (([train_size, x_dim], [train_size] or None), ([test_size, x_dim], [test_size]))
+    """
+    if max_train_size is None:
+        train_end = None
+    else:
+        train_end = train_start + max_train_size
+    if max_test_size is None:
+        test_end = None
+    else:
+        test_end = test_start + max_test_size
+    print('load data of:', dataset)
+    print("train: ", train_start, train_end)
+    print("test: ", test_start, test_end)
+    x_dim = get_data_dim(dataset)
+    f = open(os.path.join(prefix, dataset + '_train.pkl'), "rb")
+    train_data = pickle.load(f).reshape((-1, x_dim))[train_start:train_end, :]
+    f.close()
+    try:
+        f = open(os.path.join(prefix, dataset + '_test.pkl'), "rb")
+        test_data = pickle.load(f).reshape((-1, x_dim))[test_start:test_end, :]
+        f.close()
+    except (KeyError, FileNotFoundError):
+        test_data = None
+    try:
+        f = open(os.path.join(prefix, dataset + "_test_label.pkl"), "rb")
+        test_label = pickle.load(f).reshape((-1))[test_start:test_end]
+        f.close()
+    except (KeyError, FileNotFoundError):
+        test_label = None
+    if do_preprocess:
+        # train_data = preprocess(train_data)
+        # test_data = preprocess(test_data)
+        train_data, test_data = preprocess(train_data, test_data, valid_portion=valid_portion)
+    print("train set shape: ", train_data.shape)
+    print("test set shape: ", test_data.shape)
+    print("test set label shape: ", test_label.shape)
+    return (train_data, None), (test_data, test_label)
+
+
+def preprocess(train, test, valid_portion=0):
+    train = np.asarray(train, dtype=np.float32)
+    test = np.asarray(test, dtype=np.float32)
+
+    if len(train.shape) == 1 or len(test.shape) == 1:
+        raise ValueError('Data must be a 2-D array')
+
+    if np.any(sum(np.isnan(train)) != 0):
+        print('Train data contains null values. Will be replaced with 0')
+        train = np.nan_to_num(train)
+
+    if np.any(sum(np.isnan(test)) != 0):
+        print('Test data contains null values. Will be replaced with 0')
+        test = np.nan_to_num(test)
+
+    # revise here for other preprocess methods
+    if method == 'min_max':
+        if valid_portion > 0:
+            split_idx = int(len(train) * valid_portion)
+            train, valid = train[:-split_idx], train[-split_idx:]
+            scaler = MinMaxScaler().fit(train)
+            train = scaler.transform(train)
+            valid = scaler.transform(valid)
+            valid = np.clip(valid, a_min=-3.0, a_max=3.0)
+            test = scaler.transform(test)
+            test = np.clip(test, a_min=-3.0, a_max=3.0)
+            train = np.concatenate([train, valid], axis=0)
+            print('Data normalized with min-max scaler')
+        else:
+            scaler = MinMaxScaler().fit(train)
+            train = scaler.transform(train)
+            test = scaler.transform(test)
+            test = np.clip(test, a_min=-3.0, a_max=3.0)
+            print('Data normalized with min-max scaler')
+
+    elif method == 'mean_std':
+
+        def my_transform(value, ret_all=True, mean=None, std=None):
+            if mean is None:
+                mean = np.mean(value, axis=0)
+            if std is None:
+                std = np.std(value, axis=0)
+            for i in range(value.shape[0]):
+                clip_value = mean + alpha * std  # compute clip value: (mean - a * std, mean + a * std)
+                temp = value[i] < clip_value
+                value[i] = temp * value[i] + (1 - temp) * clip_value
+                clip_value = mean - alpha * std
+                temp = value[i] > clip_value
+                value[i] = temp * value[i] + (1 - temp) * clip_value
+                std = np.maximum(std, 1e-5)  # to avoid std -> 0
+                value[i] = (value[i] - mean) / std  # normalization
+            return value, mean, std if ret_all else value
+
+        train, _mean, _std = my_transform(train)
+        test = my_transform(test, False, _mean, _std)[0]
+        print('Data normalized with standard scaler method')
+
+    elif method == 'none':
+        print('No pre-processing')
+
+    else:
+        raise RuntimeError('unknown preprocess method')
+
+    return train, test
+
+
+TensorLike = Union[tf.Tensor, spt.StochasticTensor]
+
+
+class GraphNodes(Dict[str, TensorLike]):
+    """A dict that maps name to TensorFlow graph nodes."""
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+        for k, v in self.items():
+            if not spt.utils.is_tensor_object(v):
+                raise TypeError(f'The value of `{k}` is not a tensor: {v!r}.')
+
+    def eval(self,
+             session: tf.Session = None,
+             feed_dict: Dict[tf.Tensor, Any] = None) -> Dict[str, Any]:
+        """
+        Evaluate all the nodes with the specified `session`.
+        Args:
+            session: The TensorFlow session.
+            feed_dict: The feed dict.
+        Returns:
+            The node evaluation outputs.
+        """
+        if session is None:
+            session = spt.utils.get_default_session_or_error()
+
+        keys = list(self)
+        tensors = [self[key] for key in keys]
+        outputs = session.run(tensors, feed_dict=feed_dict)
+
+        return dict(zip(keys, outputs))
+
+    def add_prefix(self, prefix: str) -> 'GraphNodes':
+        """
+        Add a common prefix to all metrics in this collection.
+        Args:
+             prefix: The common prefix.
+        """
+        return GraphNodes({f'{prefix}{k}': v for k, v in self.items()})
+
+
+def get_score(recons_probs, preserve_feature_dim=False, score_avg_window_size=1):
+    """
+    Evaluate the anomaly score at each timestamp according to the reconstruction probability obtained by model.
+    :param recons_probs: (data_length-window_length+1, window_length, x_dim). The reconstruction probabilities correspond
+    to each timestamp and each dimension of x, evaluated in sliding windows with length 'window_length'. The larger the
+    reconstruction probability, the less likely a point is an anomaly.
+    :param preserve_feature_dim: bool. Whether sum over the feature dimension. If True, preserve the anomaly score on
+    each feature dimension. If False, sum over the anomaly scores along feature dimension and return a single score on
+    each timestamp.
+    :param score_avg_window_size: int. How many scores in different sliding windows are used to evaluate the anomaly score
+    at a given timestamp. By default score_avg_window_size=1, only the score of last point are used in each sliding window,
+    and this score is directly used as the final anomaly score at this timestamp. When score_avg_window_size > 1, then
+    the last 'score_avg_window_size' scores are used in each sliding window. Then for timestamp t, if t is the last point
+    of sliding window k, then the anomaly score of t is now evaluated as the average score_{t} in sliding windows
+    [k, k+1, ..., k+score_avg_window_size-1].
+    :return: Anomaly scores (reconstruction probability) at each timestamps.
+    With shape ``(data_length - window_size + score_avg_window_size,)`` if `preserve_feature_dim` is `False`,
+    or ``(data_length - window_size + score_avg_window_size, x_dim)`` if `preserve_feature_dim` is `True`.
+    The first `window_size - score_avg_window_size` points are discarded since there aren't enough previous values to evaluate the score.
+    """
+    data_length = recons_probs.shape[0] + recons_probs.shape[1] - 1
+    window_length = recons_probs.shape[1]
+    score_collector = [[] for i in range(data_length)]
+    for i in range(recons_probs.shape[0]):
+        for j in range(score_avg_window_size):
+            score_collector[i + window_length - j - 1].append(recons_probs[i, -j-1])
+
+    score_collector = score_collector[window_length-score_avg_window_size:]
+    scores = []
+    for i in range(len(score_collector)):
+        scores.append(np.mean(score_collector[i], axis=0))
+    scores = np.array(scores)                 # average over the score_avg_window. (data_length-window_length+score_avg_window_size, x_dim)
+    if not preserve_feature_dim:
+        scores = np.sum(scores, axis=-1)
+    return scores
+
+
+def get_avg_recons(recons_vals, window_length, recons_avg_window_size=1):
+    """
+    Get the averaged reconstruction values for plotting. The last `recons_avg_window_size` points in each reconstruct
+    sliding windows are used, the final reconstruction values at each timestamp is the mean of each value at this timestamp.
+    :param recons_vals: original reconstruction values. shape: (data_length - window_length + 1, window_length, x_dim)
+    :param recons_avg_window_size:  int. How many points are used in each reconstruct sliding window.
+    :return: final reconstruction curve. shape: (data_length, x_dim)
+    The first `window_size - recons_avg_window_size` points use the reconstruction value of the first reconstruction window,
+    others use the averaged values according to `recons_vals` and `recons_avg_window_size`.
+    """
+    data_length = recons_vals.shape[0] + window_length - 1
+    recons_collector = [[] for i in range(data_length)]
+    for i in range(recons_vals.shape[0]):
+        for j in range(recons_avg_window_size):
+            recons_collector[i + window_length - j - 1].append(recons_vals[i, -j-1, :])
+
+    if recons_vals.shape[1] < window_length:
+        for i in range(window_length - recons_avg_window_size):
+            recons_collector[i] = [recons_vals[0, -1, :]]
+    else:
+        for i in range(window_length - recons_avg_window_size):
+            recons_collector[i] = [recons_vals[0, i, :]]
+
+    final_recons = []
+    for i in range(len(recons_collector)):
+        final_recons.append(np.mean(recons_collector[i], axis=0))
+    final_recons = np.array(final_recons)    # average over the recons_avg_window. (data_length, x_dim)
+    return final_recons
diff --git a/networks/InterFusion/predict.py b/networks/InterFusion/predict.py
new file mode 100644
index 0000000..7e8e103
--- /dev/null
+++ b/networks/InterFusion/predict.py
@@ -0,0 +1,633 @@
+import mltk
+import os
+
+import logging
+
+from .algorithm.utils import (
+    GraphNodes,
+    time_generator,
+    get_sliding_window_data_flow,
+    get_score,
+)
+import tfsnippet as spt
+import tensorflow as tf
+from tqdm import tqdm
+from .algorithm.InterFusion import MTSAD
+from .algorithm.InterFusion_swat import MTSAD_SWAT
+import numpy as np
+from typing import Optional
+from .algorithm.mcmc_recons import mcmc_reconstruct, masked_reconstruct
+
+__all__ = ["PredictConfig", "final_testing", "build_test_graph"]
+
+
+class PredictConfig(mltk.Config):
+    load_model_dir: Optional[str]
+
+    # evaluation params
+    test_n_z = 100
+    test_batch_size = 50
+    test_start = 0
+    max_test_size = None  # `None` means full test set
+
+    output_dirs = "analysis_results"
+    train_score_filename = "train_score.pkl"
+    test_score_filename = "test_score.pkl"
+    preserve_feature_dim = False  # whether to preserve the feature dim in score. If `True`, the score will be a 2-dim ndarray
+    anomaly_score_calculate_latency = 1  # How many scores are averaged for the final score at a timestamp. `1` means use last point in each sliding window only.
+
+    use_mcmc = True  # use mcmc on the last point for anomaly detection
+    mcmc_iter = 10
+    mcmc_rand_mask = False
+    n_mc_chain: int = 10
+    pos_mask = True
+    mcmc_track = True  # use mcmc tracker for anomaly interpretation and calculate IPS.
+
+
+def build_test_graph(
+    chain: spt.VariationalChain, input_x, origin_chain: spt.VariationalChain = None
+) -> GraphNodes:
+    test_recons = tf.reduce_mean(chain.model["x"].log_prob(), axis=0)
+
+    logpx = chain.model["x"].log_prob()
+    logpz = chain.model["z2"].log_prob() + chain.model["z1"].log_prob()
+    logqz_x = chain.variational["z1"].log_prob() + chain.variational["z2"].log_prob()
+    test_lb = tf.reduce_mean(logpx + logpz - logqz_x, axis=0)
+
+    log_joint = logpx + logpz
+    latent_log_prob = logqz_x
+    test_ll = spt.importance_sampling_log_likelihood(
+        log_joint=log_joint, latent_log_prob=latent_log_prob, axis=0
+    )
+    test_nll = -test_ll
+
+    # average over sample dim
+    if origin_chain is not None:
+        full_recons_prob = tf.reduce_mean(
+            (
+                chain.model["x"].distribution.base_distribution.log_prob(input_x)
+                - origin_chain.model["x"].distribution.base_distribution.log_prob(
+                    input_x
+                )
+            ),
+            axis=0,
+        )
+    else:
+        full_recons_prob = tf.reduce_mean(
+            chain.model["x"].distribution.base_distribution.log_prob(input_x), axis=0
+        )
+
+    if origin_chain is not None:
+        origin_log_joint = (
+            origin_chain.model["x"].log_prob()
+            + origin_chain.model["z1"].log_prob()
+            + origin_chain.model["z2"].log_prob()
+        )
+        origin_latent_log_prob = (
+            origin_chain.variational["z1"].log_prob()
+            + origin_chain.variational["z2"].log_prob()
+        )
+        origin_ll = spt.importance_sampling_log_likelihood(
+            log_joint=origin_log_joint, latent_log_prob=origin_latent_log_prob, axis=0
+        )
+        test_ll_score = test_ll - origin_ll
+    else:
+        test_ll_score = test_ll
+
+    outputs = {
+        "test_nll": test_nll,
+        "test_lb": test_lb,
+        "test_recons": test_recons,
+        "test_kl": test_recons - test_lb,
+        "full_recons_prob": full_recons_prob,
+        "test_ll": test_ll_score,
+    }
+
+    return GraphNodes(outputs)
+
+
+def build_recons_graph(
+    chain: spt.VariationalChain, window_length, feature_dim, unified_x_std=False
+) -> GraphNodes:
+    # average over sample dim
+    recons_x = tf.reduce_mean(
+        chain.model["x"].distribution.base_distribution.mean, axis=0
+    )
+    recons_x = spt.utils.InputSpec(shape=["?", window_length, feature_dim]).validate(
+        "recons", recons_x
+    )
+    if unified_x_std:
+        recons_x_std = chain.model["x"].distribution.base_distribution.std
+        recons_x_std = spt.ops.broadcast_to_shape(recons_x_std, tf.shape(recons_x))
+    else:
+        recons_x_std = tf.reduce_mean(
+            chain.model["x"].distribution.base_distribution.std, axis=0
+        )
+    recons_x_std = spt.utils.InputSpec(
+        shape=["?", window_length, feature_dim]
+    ).validate("recons_std", recons_x_std)
+    return GraphNodes({"recons_x": recons_x, "recons_x_std": recons_x_std})
+
+
+def final_testing(
+    test_metrics: GraphNodes,
+    input_x,
+    input_u,
+    data_flow: spt.DataFlow,
+    total_batch_count,
+    y_test=None,
+    mask=None,
+    rand_x=None,
+):
+    data_flow = data_flow.threaded(5)
+    full_recons_collector = []
+    ll_collector = []
+    epoch_out = {}
+    stats = {}
+    session = spt.utils.get_default_session_or_error()
+    with data_flow:
+        for batch_x, batch_u in tqdm(
+            data_flow, unit="step", total=total_batch_count, ascii=True
+        ):
+            if mask is not None:
+                batch_mask = np.zeros(shape=batch_x.shape)
+                batch_mask[:, -1, :] = 1  # mask all dims of the last point in x
+                if rand_x is not None:
+                    batch_output = test_metrics.eval(
+                        session,
+                        feed_dict={
+                            input_x: batch_x,
+                            input_u: batch_u,
+                            mask: batch_mask,
+                            rand_x: np.random.random(batch_x.shape),
+                        },
+                    )
+                else:
+                    batch_output = test_metrics.eval(
+                        session,
+                        feed_dict={
+                            input_x: batch_x,
+                            input_u: batch_u,
+                            mask: batch_mask,
+                        },
+                    )
+            else:
+                batch_output = test_metrics.eval(
+                    session, feed_dict={input_x: batch_x, input_u: batch_u}
+                )
+            for k, v in batch_output.items():
+
+                if k == "full_recons_prob":
+                    full_recons_collector.append(v)
+                elif k == "test_ll":
+                    ll_collector.append(v)
+                    if k not in epoch_out:
+                        epoch_out[k] = []
+                    epoch_out[k].append(v)
+                else:
+                    if k not in epoch_out:
+                        epoch_out[k] = []
+                    epoch_out[k].append(v)
+
+    # save the results of this epoch, and compute epoch stats. Take average over both batch and window_length dim.
+    for k, v in epoch_out.items():
+        epoch_out[k] = np.concatenate(epoch_out[k], axis=0)
+        if k not in stats:
+            stats[k] = []
+        stats[k].append(float(np.mean(epoch_out[k])))
+
+    # collect full recons prob for calculate anomaly score
+    full_recons_probs = np.concatenate(
+        full_recons_collector, axis=0
+    )  # (data_length-window_length+1, window_length, x_dim)
+    ll = np.concatenate(ll_collector, axis=0)
+
+    if y_test is not None:
+        assert full_recons_probs.shape[0] + full_recons_probs.shape[1] - 1 == len(
+            y_test
+        )
+        tmp1 = []
+        for i in range(full_recons_probs.shape[0]):
+            if y_test[i + full_recons_probs.shape[1] - 1] < 0.5:
+                tmp1.append(
+                    np.sum(full_recons_probs[i, -1], axis=-1)
+                )  # normal point recons score
+        stats["normal_point_test_recons"] = [float(np.mean(tmp1))]
+
+    # calculate average statistics
+    for k, v in stats.items():
+        stats[k] = float(np.mean(v))
+
+    return stats, full_recons_probs, ll
+
+
+def mcmc_tracker(
+    flow: spt.DataFlow,
+    baseline,
+    model,
+    input_x,
+    input_u,
+    mask,
+    max_iter,
+    total_window_num,
+    window_length,
+    x_dim,
+    mask_last=False,
+    pos_mask=False,
+    use_rand_mask=False,
+    n_mc_chain=1,
+):
+    # the baseline is the avg total score in a window on training set.
+    session = spt.utils.get_default_session_or_error()
+    last_x = tf.placeholder(
+        dtype=tf.float32, shape=[None, window_length, x_dim], name="last_x"
+    )
+
+    x_r = masked_reconstruct(model.reconstruct, last_x, input_u, mask)
+    score, recons_mean, recons_std = model.get_score(
+        x_embed=x_r, x_eval=input_x, u=input_u
+    )
+    tot_score = tf.reduce_sum(tf.multiply(score, tf.cast((1 - mask), score.dtype)))
+
+    def avg_multi_chain(x, n_chain):
+        shape = (-1,) + (n_chain,) + x.shape[1:]
+        return np.mean(x.reshape(shape), axis=1)
+
+    res = {}
+    with flow.threaded(5) as flow:
+        for (
+            batch_x,
+            batch_u,
+            batch_score,
+            batch_ori_recons,
+            batch_ori_std,
+            batch_idx,
+        ) in tqdm(flow, unit="step", total=total_window_num, ascii=True):
+            batch_idx = batch_idx[0]
+            res[batch_idx] = {
+                "x": [batch_x],
+                "recons": [batch_ori_recons],
+                "std": [batch_ori_std],
+                "score": [batch_score],
+                "K": [0],
+                "iter": [-1],
+                "mask": [np.zeros(shape=batch_x.shape)],
+                "total_score": [np.mean(np.sum(batch_score, axis=-1))],
+            }
+            best_score = batch_score
+            best_total_score = np.mean(np.sum(batch_score, axis=-1))
+            best_K = 0
+            if pos_mask:
+                pos_scores = np.mean(batch_score, axis=0)  # (window, x_dim)
+                sorted_pos_idx = np.argsort(pos_scores, axis=None)
+                potential_dim_num = np.sum(
+                    (pos_scores < (baseline / (x_dim * window_length))).astype(np.int32)
+                )
+            else:
+                dim_scores = np.mean(batch_score, axis=(-2, -3))  # (x_dim, )
+                sorted_dim_idx = np.argsort(dim_scores)
+                potential_dim_num = np.sum(
+                    (dim_scores < (baseline / (x_dim * window_length))).astype(np.int32)
+                )  # num of dims whose avg score < baseline
+
+            if potential_dim_num > 0:
+                K_init = max(potential_dim_num // 5, 1)
+                K_inc = max(potential_dim_num // 10, 1)
+            else:
+                res[batch_idx]["best_score"] = best_score
+                res[batch_idx]["best_total_score"] = best_total_score
+                res[batch_idx]["best_K"] = best_K
+                continue
+            if use_rand_mask:
+                rand_x = np.random.random(size=batch_x.shape)
+            if pos_mask:
+                max_K = x_dim * window_length
+            else:
+                max_K = x_dim
+            for K in range(K_init, min(potential_dim_num + 1, max_K), K_inc):
+                if pos_mask:
+                    mask_idx = sorted_pos_idx[:K]
+                    batch_mask = np.zeros(shape=batch_x.shape)
+                    batch_mask = batch_mask.reshape([batch_x.shape[0], -1])
+                    batch_mask[:, mask_idx] = 1
+                    batch_mask = batch_mask.reshape(batch_x.shape)
+                else:
+                    mask_idx = sorted_dim_idx[:K]
+                    batch_mask = np.zeros(shape=batch_x.shape)
+                    batch_mask[:, :, mask_idx] = 1
+                if mask_last:
+                    batch_mask[:, -1, :] = 1
+
+                batch_last_x = batch_x
+                if use_rand_mask:
+                    batch_last_x = np.where(
+                        batch_mask.astype(np.bool), rand_x, batch_last_x
+                    )
+                if n_mc_chain > 1:
+                    init_x = np.repeat(batch_x, n_mc_chain, axis=0)
+                    init_u = np.repeat(batch_u, n_mc_chain, axis=0)
+                    init_mask = np.repeat(batch_mask, n_mc_chain, axis=0)
+                    init_last_x = np.repeat(batch_last_x, n_mc_chain, axis=0)
+                for i in range(max_iter):
+                    if n_mc_chain > 1:
+                        x_mc, x_recons, x_std, x_score, x_tot_score = session.run(
+                            [x_r, recons_mean, recons_std, score, tot_score],
+                            feed_dict={
+                                input_x: init_x,
+                                input_u: init_u,
+                                mask: init_mask,
+                                last_x: init_last_x,
+                            },
+                        )
+                        init_last_x = x_mc
+                        x_mc = avg_multi_chain(x_mc, n_mc_chain)
+                        x_recons = avg_multi_chain(x_recons, n_mc_chain)
+                        x_std = avg_multi_chain(x_std, n_mc_chain)
+                        x_score = avg_multi_chain(x_score, n_mc_chain)
+                        x_tot_score = float(x_tot_score) / float(n_mc_chain)
+                    else:
+                        x_mc, x_recons, x_std, x_score, x_tot_score = session.run(
+                            [x_r, recons_mean, recons_std, score, tot_score],
+                            feed_dict={
+                                input_x: batch_x,
+                                input_u: batch_u,
+                                mask: batch_mask,
+                                last_x: batch_last_x,
+                            },
+                        )
+                        batch_last_x = x_mc
+                    total_score = (
+                        float(x_tot_score)
+                        / (window_length * x_dim - np.sum(batch_mask))
+                        / batch_x.shape[0]
+                        * x_dim
+                    )
+                    res[batch_idx]["x"].append(x_mc)
+                    res[batch_idx]["recons"].append(x_recons)
+                    res[batch_idx]["std"].append(x_std)
+                    res[batch_idx]["score"].append(x_score)
+                    res[batch_idx]["K"].append(K)
+                    res[batch_idx]["iter"].append(i)
+                    res[batch_idx]["mask"].append(batch_mask)
+                    res[batch_idx]["total_score"].append(total_score)
+
+                last_score = res[batch_idx]["total_score"][-1]
+                if last_score >= best_total_score:
+                    best_total_score = last_score
+                    best_score = res[batch_idx]["score"][-1]
+                    best_K = res[batch_idx]["K"][-1]
+
+                if best_total_score >= (baseline / window_length):
+                    break
+            res[batch_idx]["best_score"] = best_score
+            res[batch_idx]["best_total_score"] = best_total_score
+            res[batch_idx]["best_K"] = best_K
+    return res
+
+
+def log_mean_exp(x, axis, keepdims=False):
+    x_max = np.max(x, axis=axis, keepdims=True)
+    ret = x_max + np.log(np.mean(np.exp(x - x_max), axis=axis, keepdims=True))
+    if not keepdims:
+        ret = np.squeeze(ret, axis=axis)
+    return ret
+
+
+def log_sum_exp(x, axis, keepdims=False):
+    x_max = np.max(x, axis=axis, keepdims=True)
+    ret = x_max + np.log(np.sum(np.exp(x - x_max), axis=axis, keepdims=True))
+    if not keepdims:
+        ret = np.squeeze(ret, axis=axis)
+    return ret
+
+
+def predict_prob(x_test, y_test, train_config, model_root):
+    tf.reset_default_graph()
+    # tf.compat.v1.reset_default_graph()
+
+    exp = mltk.Experiment(PredictConfig(), output_dir=model_root)
+    test_config = exp.config
+    test_config.load_model_dir = model_root
+
+    logging.info(mltk.format_key_values(train_config, title="Train configurations"))
+    logging.info(mltk.format_key_values(test_config, title="Test configurations"))
+
+    # set TFSnippet settings
+    spt.settings.enable_assertions = False
+    spt.settings.check_numerics = train_config.check_numerics
+    test_config.test_batch_size = train_config.train.batch_size
+
+    exp.make_dirs(test_config.output_dirs)
+
+    if train_config.use_time_info:
+        u_test = np.asarray(
+            [time_generator(_i) for _i in range(len(x_test))]
+        )  # (test_size, u_dim)
+    else:
+        u_test = np.zeros([len(x_test), train_config.model.u_dim])
+
+    # prepare data_flow
+    test_flow = get_sliding_window_data_flow(
+        window_size=train_config.model.window_length,
+        batch_size=test_config.test_batch_size,
+        x=x_test,
+        u=u_test,
+        shuffle=False,
+        skip_incomplete=False,
+    )
+
+    # build computation graph
+    if train_config.dataset == "SWaT" or train_config.dataset == "WADI":
+        model = MTSAD_SWAT(train_config.model, scope="model")
+    else:
+        model = MTSAD(train_config.model, scope="model")
+
+    # input placeholders
+    input_x = tf.placeholder(
+        dtype=tf.float32,
+        shape=[None, train_config.model.window_length, train_config.model.x_dim],
+        name="input_x",
+    )
+    input_u = tf.placeholder(
+        dtype=tf.float32,
+        shape=[None, train_config.model.window_length, train_config.model.u_dim],
+        name="input_u",
+    )
+    mask = tf.placeholder(
+        dtype=tf.int32,
+        shape=[None, train_config.model.window_length, train_config.model.x_dim],
+        name="mask",
+    )
+    rand_x = tf.placeholder(
+        dtype=tf.float32,
+        shape=[None, train_config.model.window_length, train_config.model.x_dim],
+        name="rand_x",
+    )
+
+    tmp_out = None
+    if test_config.use_mcmc:
+        with tf.name_scope("mcmc_init"):
+            tmp_qnet = model.q_net(input_x, u=input_u, n_z=test_config.test_n_z)
+            tmp_chain = tmp_qnet.chain(
+                model.p_net, observed={"x": input_x}, latent_axis=0, u=input_u
+            )
+            tmp_out = tf.reduce_mean(tmp_chain.vi.lower_bound.elbo())
+
+    # derive testing nodes
+    with tf.name_scope("testing"):
+        if test_config.use_mcmc:
+            if (
+                test_config.mcmc_rand_mask
+            ):  # use random value to mask the initial input for mcmc (otherwise use the original one)
+                if (
+                    test_config.n_mc_chain > 1
+                ):  # average the results of multi-mcmc chain for each input x.
+                    init_x = tf.where(tf.cast(mask, dtype=tf.bool), rand_x, input_x)
+                    init_x, s1, s2 = spt.ops.flatten_to_ndims(
+                        tf.tile(
+                            tf.expand_dims(init_x, 1), [1, test_config.n_mc_chain, 1, 1]
+                        ),
+                        3,
+                    )
+                    init_u, _, _ = spt.ops.flatten_to_ndims(
+                        tf.tile(
+                            tf.expand_dims(input_u, 1),
+                            [1, test_config.n_mc_chain, 1, 1],
+                        ),
+                        3,
+                    )
+                    init_mask, _, _ = spt.ops.flatten_to_ndims(
+                        tf.tile(
+                            tf.expand_dims(mask, 1), [1, test_config.n_mc_chain, 1, 1]
+                        ),
+                        3,
+                    )
+                    x_mcmc = mcmc_reconstruct(
+                        model.reconstruct,
+                        init_x,
+                        init_u,
+                        init_mask,
+                        test_config.mcmc_iter,
+                        back_prop=False,
+                    )
+                    x_mcmc = spt.ops.unflatten_from_ndims(x_mcmc, s1, s2)
+                    x_mcmc = tf.reduce_mean(x_mcmc, axis=1)
+                else:
+                    init_x = tf.where(tf.cast(mask, dtype=tf.bool), rand_x, input_x)
+                    x_mcmc = mcmc_reconstruct(
+                        model.reconstruct,
+                        init_x,
+                        input_u,
+                        mask,
+                        test_config.mcmc_iter,
+                        back_prop=False,
+                    )
+            else:
+                if test_config.n_mc_chain > 1:
+                    init_x, s1, s2 = spt.ops.flatten_to_ndims(
+                        tf.tile(
+                            tf.expand_dims(input_x, 1),
+                            [1, test_config.n_mc_chain, 1, 1],
+                        ),
+                        3,
+                    )
+                    init_u, _, _ = spt.ops.flatten_to_ndims(
+                        tf.tile(
+                            tf.expand_dims(input_u, 1),
+                            [1, test_config.n_mc_chain, 1, 1],
+                        ),
+                        3,
+                    )
+                    init_mask, _, _ = spt.ops.flatten_to_ndims(
+                        tf.tile(
+                            tf.expand_dims(mask, 1), [1, test_config.n_mc_chain, 1, 1]
+                        ),
+                        3,
+                    )
+                    x_mcmc = mcmc_reconstruct(
+                        model.reconstruct,
+                        init_x,
+                        init_u,
+                        init_mask,
+                        test_config.mcmc_iter,
+                        back_prop=False,
+                    )
+                    x_mcmc = spt.ops.unflatten_from_ndims(x_mcmc, s1, s2)
+                    x_mcmc = tf.reduce_mean(x_mcmc, axis=1)
+                else:
+                    x_mcmc = mcmc_reconstruct(
+                        model.reconstruct,
+                        input_x,
+                        input_u,
+                        mask,
+                        test_config.mcmc_iter,
+                        back_prop=False,
+                    )
+        else:
+            x_mcmc = input_x
+
+        test_q_net = model.q_net(x_mcmc, u=input_u, n_z=test_config.test_n_z)
+        test_chain = test_q_net.chain(
+            model.p_net, observed={"x": input_x}, latent_axis=0, u=input_u
+        )
+
+        test_metrics = build_test_graph(test_chain, input_x)
+
+    # obtain params to restore
+    variables_to_restore = tf.global_variables()
+
+    logging.info("path: {}".format(test_config.load_model_dir))
+    restore_path = os.path.join(
+        test_config.load_model_dir, "result_params/restored_params.dat"
+    )
+
+    # obtain the variables initializer
+    var_initializer = tf.variables_initializer(tf.global_variables())
+    test_flow = test_flow.threaded(5)
+
+    with spt.utils.create_session().as_default() as session:
+
+        session.run(var_initializer)
+
+        saver = tf.train.Saver(var_list=variables_to_restore)
+        saver.restore(session, restore_path)
+
+        logging.info("Model params restored.")
+        # Evaluate the whole network
+        if test_config.use_mcmc:
+            for batch_x, batch_u in test_flow:
+                _ = session.run(tmp_out, feed_dict={input_x: batch_x, input_u: batch_u})
+                break
+
+        # do evaluation
+        logging.info("Evaluating scores...")
+
+        test_batch_count = (
+            len(x_test) - train_config.model.window_length + test_config.test_batch_size
+        ) // test_config.test_batch_size
+
+        test_stats, test_full_recons_probs, test_ll = final_testing(
+            test_metrics,
+            input_x,
+            input_u,
+            test_flow,
+            test_batch_count,
+            y_test,
+            mask=mask if test_config.use_mcmc else None,
+            rand_x=rand_x if test_config.mcmc_rand_mask else None,
+        )
+
+        logging.info(mltk.format_key_values(test_stats, "Final testing statistics"))
+        exp.update_results(test_stats)
+
+        test_score = get_score(
+            test_full_recons_probs,
+            preserve_feature_dim=test_config.preserve_feature_dim,
+            score_avg_window_size=test_config.anomaly_score_calculate_latency,
+        )
+
+        if y_test is not None:
+            y_test = y_test[-len(test_score) :]
+            return test_score, y_test
+        else:
+            return test_score
diff --git a/networks/InterFusion/train.py b/networks/InterFusion/train.py
new file mode 100644
index 0000000..a72d503
--- /dev/null
+++ b/networks/InterFusion/train.py
@@ -0,0 +1,511 @@
+import os
+import mltk
+import logging
+import numpy as np
+import tensorflow as tf
+import tfsnippet as spt
+from tfsnippet.scaffold import TrainLoop
+from tfsnippet.trainer import Trainer, Evaluator
+from networks.InterFusion.algorithm.utils import (
+    get_sliding_window_data_flow,
+    time_generator,
+    GraphNodes,
+)
+
+from networks.InterFusion.algorithm.InterFusion import ModelConfig, MTSAD
+from networks.InterFusion.algorithm.InterFusion_swat import MTSAD_SWAT
+from .predict import PredictConfig
+
+
+class TrainConfig(mltk.Config):
+    # training params
+    batch_size = 100
+    train_start = 0
+    max_train_size = None  # `None` means full train set
+    initial_lr = 0.001
+    lr_anneal_factor = 0.5
+    lr_anneal_epoch_freq = 10
+    lr_anneal_step_freq = None
+    pretrain_lr_anneal_epoch_freq = 10
+
+    early_stopping = True
+    valid_portion = 0.3
+
+    save_test_stats = True
+
+
+class ExpConfig(mltk.Config):
+    # model params
+    model = ModelConfig()
+
+    use_time_info = False  # whether to use time information (minute, hour, day) as input u. discarded.
+    model_type = "mtsad"
+
+    train = TrainConfig()
+    test = PredictConfig()
+
+    # debugging params
+    write_summary = False
+    write_histogram_summary = False
+    check_numerics = False
+    save_results = True
+    save_ckpt = True
+    ckpt_epoch_freq = 10
+    ckpt_max_keep = 10
+    pretrain_ckpt_epoch_freq = 20
+    pretrain_ckpt_max_keep = 10
+
+    exp_dir_save_path = None  # The file path to save the exp dirs for batch run training on different datasets.
+
+    def init_params(self, dataset):
+        self.dataset = dataset
+        if self.dataset.lower() == "swat" or self.dataset.lower() == "wadi":
+            self.train.save_test_stats = False
+            self.train.pretrain_lr_anneal_epoch_freq = 5
+            self.train.lr_anneal_epoch_freq = 5
+        if self.dataset.lower() == "swat":
+            self.train.initial_lr = 0.0005
+        if self.dataset.lower() == "wadi":
+            self.train.initial_lr = 0.0002
+        if self.dataset.lower() == "swat":
+            self.model.z_dim = 2
+        if self.dataset.lower() == "wadi":
+            self.model.z_dim = 4
+
+
+def get_lr_value(
+    init_lr,
+    anneal_factor,
+    anneal_freq,
+    loop: spt.TrainLoop,
+) -> spt.DynamicValue:
+    """
+    Get the learning rate scheduler for specified experiment.
+
+    Args:
+        exp: The experiment object.
+        loop: The train loop object.
+
+    Returns:
+        A dynamic value, which returns the learning rate each time
+        its `.get()` is called.
+    """
+    return spt.AnnealingScalar(
+        loop=loop,
+        initial_value=init_lr,
+        ratio=anneal_factor,
+        epochs=anneal_freq,
+    )
+
+
+def sgvb_loss(qnet, pnet, metrics_dict: GraphNodes, prefix="train_", name=None):
+    with tf.name_scope(name, default_name="sgvb_loss"):
+        logpx_z = pnet["x"].log_prob(name="logpx_z")
+        logpz1_z2 = pnet["z1"].log_prob(name="logpz1_z2")
+        logpz2 = pnet["z2"].log_prob(name="logpz2")
+        logpz = logpz1_z2 + logpz2
+        logqz1_x = qnet["z1"].log_prob(name="logqz1_x")
+        logqz2_x = qnet["z2"].log_prob(name="logqz2_x")
+        logqz_x = logqz1_x + logqz2_x
+
+        recons_term = tf.reduce_mean(logpx_z)
+        kl_term = tf.reduce_mean(logqz_x - logpz)
+        metrics_dict[prefix + "recons"] = recons_term
+        metrics_dict[prefix + "kl"] = kl_term
+
+        return -tf.reduce_mean(logpx_z + logpz - logqz_x)
+
+
+def fit(
+    dataset,
+    model_root,
+    x_train,
+    x_valid,
+    dim,
+    lr,
+    window_size,
+    batch_size,
+    pretrain_max_epoch,
+    max_epoch,
+):
+    tf.reset_default_graph()
+    exp = mltk.Experiment(ExpConfig(), output_dir=model_root)
+    exp.save_config()
+    config = exp.config
+
+    spt.settings.check_numerics = config.check_numerics
+    spt.settings.enable_assertions = False
+
+    exp.make_dirs("train_summary")
+    exp.make_dirs("result_params")
+    exp.make_dirs("ckpt_params")
+    exp.make_dirs(config.test.output_dirs)
+
+    config.model.x_dim = dim
+    config.model.window_length = window_size
+    config.train.batch_size = batch_size
+    config.train.initial_lr = lr
+    config.train.pretrain_max_epoch = pretrain_max_epoch
+    config.train.max_epoch = max_epoch
+    config.init_params(dataset)
+
+    if config.use_time_info:
+        u_train = np.asarray([time_generator(_i) for _i in range(len(x_train))])
+        if x_valid is not None:
+            u_valid = np.asarray([time_generator(_i) for _i in range(len(x_valid))])
+        else:
+            u_valid = None
+    else:
+        u_train = np.zeros([len(x_train), config.model.u_dim])
+        if x_valid is not None:
+            u_valid = np.zeros([len(x_valid), config.model.u_dim])
+        else:
+            u_valid = None
+
+    # prepare data_flow
+    train_flow = get_sliding_window_data_flow(
+        window_size=config.model.window_length,
+        batch_size=config.train.batch_size,
+        x=x_train,
+        u=u_train,
+        shuffle=True,
+        skip_incomplete=True,
+    )
+
+    if x_valid is not None:
+        valid_flow = get_sliding_window_data_flow(
+            window_size=config.model.window_length,
+            batch_size=config.train.batch_size,
+            x=x_valid,
+            u=u_valid,
+            shuffle=False,
+            skip_incomplete=False,
+        )
+
+    # build computation graph
+    if config.dataset == "SWaT" or config.dataset == "WADI":
+        model = MTSAD_SWAT(config.model, scope="model")
+    else:
+        model = MTSAD(config.model, scope="model")
+
+    # input placeholders
+    input_x = tf.placeholder(
+        dtype=tf.float32,
+        shape=[None, config.model.window_length, config.model.x_dim],
+        name="input_x",
+    )
+    input_u = tf.placeholder(
+        dtype=tf.float32,
+        shape=[None, config.model.window_length, config.model.u_dim],
+        name="input_u",
+    )
+    learning_rate = tf.placeholder(dtype=tf.float32, shape=(), name="learning_rate")
+    is_training = tf.placeholder(dtype=tf.bool, shape=(), name="is_training")
+
+    # derive training nodes
+    with tf.name_scope("training"):
+        # pretrain time-vae to get z2
+        pretrain_q_net = model.pretrain_q_net(input_x, is_training=is_training)
+        pretrain_chain = pretrain_q_net.chain(
+            model.pretrain_p_net, observed={"x": input_x}, is_training=is_training
+        )
+        pretrain_loss = (
+            tf.reduce_mean(pretrain_chain.vi.training.sgvb())
+            + tf.losses.get_regularization_loss()
+        )
+        pretrain_train_recons = tf.reduce_mean(pretrain_chain.model["x"].log_prob())
+
+        # train the whole network with z1 and z2
+        train_q_net = model.q_net(input_x, u=input_u, is_training=is_training)
+        train_chain = train_q_net.chain(
+            model.p_net, observed={"x": input_x}, u=input_u, is_training=is_training
+        )
+        train_metrics = GraphNodes()
+        vae_loss = sgvb_loss(
+            train_chain.variational,
+            train_chain.model,
+            train_metrics,
+            name="train_sgvb_loss",
+        )
+        reg_loss = tf.losses.get_regularization_loss()
+        loss = vae_loss + reg_loss
+        train_metrics["loss"] = loss
+
+    with tf.name_scope("validation"):
+        # pretrain validation
+        pretrain_valid_q_net = model.pretrain_q_net(input_x, n_z=config.test.test_n_z)
+        pretrain_valid_chain = pretrain_valid_q_net.chain(
+            model.pretrain_p_net, observed={"x": input_x}, latent_axis=0
+        )
+        pretrain_valid_loss = (
+            tf.reduce_mean(pretrain_valid_chain.vi.training.sgvb())
+            + tf.losses.get_regularization_loss()
+        )
+        pretrain_valid_recons = tf.reduce_mean(
+            pretrain_valid_chain.model["x"].log_prob()
+        )
+
+        # validation of the whole network
+        valid_q_net = model.q_net(input_x, u=input_u, n_z=config.test.test_n_z)
+        valid_chain = valid_q_net.chain(
+            model.p_net, observed={"x": input_x}, latent_axis=0, u=input_u
+        )
+        valid_metrics = GraphNodes()
+        valid_loss = (
+            sgvb_loss(
+                valid_chain.variational,
+                valid_chain.model,
+                valid_metrics,
+                prefix="valid_",
+                name="valid_sgvb_loss",
+            )
+            + tf.losses.get_regularization_loss()
+        )
+        valid_metrics["valid_loss"] = valid_loss
+
+    # pretrain
+    pre_variables_to_save = sum(
+        [
+            tf.global_variables("model/pretrain_q_net"),
+            tf.global_variables("model/pretrain_p_net"),
+            tf.global_variables("model/h_for_qz"),
+            tf.global_variables("model/h_for_px"),
+        ],
+        [],
+    )
+    pre_train_params = sum(
+        [
+            tf.trainable_variables("model/pretrain_q_net"),
+            tf.trainable_variables("model/pretrain_p_net"),
+            tf.trainable_variables("model/h_for_qz"),
+            tf.trainable_variables("model/h_for_px"),
+        ],
+        [],
+    )
+    pre_optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
+    pre_gradients = pre_optimizer.compute_gradients(
+        pretrain_loss, var_list=pre_train_params
+    )
+    with tf.name_scope("PreClipGradients"):
+        for i, (g, v) in enumerate(pre_gradients):
+            if g is not None:
+                pre_gradients[i] = (
+                    tf.clip_by_norm(
+                        spt.utils.maybe_check_numerics(
+                            g, message="gradient on %s exceed" % str(v.name)
+                        ),
+                        10,
+                    ),
+                    v,
+                )
+    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
+        pre_train_op = pre_optimizer.apply_gradients(pre_gradients)
+
+    # obtain params and gradients (whole model)
+    variables_to_save = tf.global_variables()
+    train_params = tf.trainable_variables()
+
+    # optimizer
+    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
+    gradients = optimizer.compute_gradients(loss, var_list=train_params)
+    # clip gradient by norm
+    with tf.name_scope("ClipGradients"):
+        for i, (g, v) in enumerate(gradients):
+            if g is not None:
+                gradients[i] = (
+                    tf.clip_by_norm(
+                        spt.utils.maybe_check_numerics(
+                            g, message="gradient on %s exceed" % str(v.name)
+                        ),
+                        10,
+                    ),
+                    v,
+                )
+    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
+        train_op = optimizer.apply_gradients(gradients)
+
+    pre_var_groups = [
+        model.variable_scope.name + "/pretrain_q_net",
+        model.variable_scope.name + "/pretrain_p_net",
+        model.variable_scope.name + "/h_for_qz",
+        model.variable_scope.name + "/h_for_px",
+    ]
+
+    var_groups = [
+        # for q_net
+        model.variable_scope.name + "/q_net",
+        # for p_net
+        model.variable_scope.name + "/p_net",
+        # for flow
+        model.variable_scope.name + "/posterior_flow",
+    ]
+
+    var_initializer = tf.variables_initializer(tf.global_variables())
+
+    train_flow = train_flow.threaded(5)
+
+    if x_valid is not None:
+        valid_flow = valid_flow.threaded(5)
+
+    pre_loop = TrainLoop(
+        param_vars=pre_variables_to_save,
+        var_groups=pre_var_groups,
+        max_epoch=config.train.pretrain_max_epoch,
+        summary_dir=(
+            exp.abspath("pre_train_summary") if config.write_summary else None
+        ),
+        summary_graph=tf.get_default_graph(),
+        summary_commit_freqs={"pretrain_loss": 10},
+        early_stopping=config.train.early_stopping,
+        valid_metric_name="pretrain_valid_loss",
+        valid_metric_smaller_is_better=True,
+        checkpoint_dir=(exp.abspath("pre_ckpt_params") if config.save_ckpt else None),
+        checkpoint_epoch_freq=config.pretrain_ckpt_epoch_freq,
+        checkpoint_max_to_keep=config.pretrain_ckpt_max_keep,
+        print_func=logging.info,
+    )
+
+    loop = TrainLoop(
+        param_vars=variables_to_save,
+        var_groups=var_groups,
+        max_epoch=config.train.max_epoch,
+        summary_dir=(exp.abspath("train_summary") if config.write_summary else None),
+        summary_graph=tf.get_default_graph(),
+        summary_commit_freqs={"loss": 10},
+        early_stopping=config.train.early_stopping,
+        valid_metric_name="valid_loss",
+        valid_metric_smaller_is_better=True,
+        checkpoint_dir=(exp.abspath("ckpt_params") if config.save_ckpt else None),
+        checkpoint_epoch_freq=config.ckpt_epoch_freq,
+        checkpoint_max_to_keep=config.ckpt_max_keep,
+        print_func=logging.info,
+    )
+
+    if config.write_histogram_summary:
+        summary_op = tf.summary.merge_all()
+    else:
+        summary_op = None
+
+    pre_lr_value = get_lr_value(
+        config.train.initial_lr,
+        config.train.lr_anneal_factor,
+        config.train.pretrain_lr_anneal_epoch_freq,
+        pre_loop,
+    )
+    lr_value = get_lr_value(
+        config.train.initial_lr,
+        config.train.lr_anneal_factor,
+        config.train.lr_anneal_epoch_freq,
+        loop,
+    )
+
+    pre_trainer = Trainer(
+        loop=pre_loop,
+        train_op=pre_train_op,
+        inputs=[input_x, input_u],
+        data_flow=train_flow,
+        feed_dict={learning_rate: pre_lr_value, is_training: True},
+        metrics={
+            "pretrain_loss": pretrain_loss,
+            "pretrain_train_recons": pretrain_train_recons,
+        },
+        summaries=summary_op,
+    )
+
+    trainer = Trainer(
+        loop=loop,
+        train_op=train_op,
+        inputs=[input_x, input_u],
+        data_flow=train_flow,
+        feed_dict={learning_rate: lr_value, is_training: True},
+        metrics=train_metrics,
+        summaries=summary_op,
+    )
+
+    if x_valid is not None:
+        pre_validator = Evaluator(
+            loop=pre_loop,
+            metrics={
+                "pretrain_valid_loss": pretrain_valid_loss,
+                "pretrain_valid_recons": pretrain_valid_recons,
+            },
+            inputs=[input_x, input_u],
+            data_flow=valid_flow,
+            time_metric_name="pre_valid_time",
+        )
+
+        pre_validator.events.on(
+            spt.EventKeys.AFTER_EXECUTION,
+            lambda e: exp.update_results(pre_validator.last_metrics_dict),
+        )
+
+        validator = Evaluator(
+            loop=loop,
+            metrics=valid_metrics,
+            inputs=[input_x, input_u],
+            data_flow=valid_flow,
+            time_metric_name="valid_time",
+        )
+
+        validator.events.on(
+            spt.EventKeys.AFTER_EXECUTION,
+            lambda e: exp.update_results(validator.last_metrics_dict),
+        )
+
+    train_losses = []
+    tmp_collector = []
+    valid_losses = []
+
+    def on_metrics_collected(loop: TrainLoop, metrics):
+        if "loss" in metrics:
+            tmp_collector.append(metrics["loss"])
+        if loop.epoch % 1 == 0:
+            if "valid_loss" in metrics:
+                if x_valid is not None:
+                    valid_losses.append(metrics["valid_loss"])
+                train_losses.append(np.mean(tmp_collector))
+                tmp_collector.clear()
+
+    loop.events.on(spt.EventKeys.METRICS_COLLECTED, on_metrics_collected)
+
+    if x_valid is not None:
+        pre_trainer.evaluate_after_epochs(pre_validator, freq=1)
+        trainer.evaluate_after_epochs(validator, freq=1)
+
+    pre_trainer.log_after_epochs(freq=1)
+    trainer.log_after_epochs(freq=1)
+
+    with spt.utils.create_session().as_default() as session:
+
+        session.run(var_initializer)
+
+        with pre_loop:
+            pre_trainer.run()
+
+        logging.info("PreTraining Finished.")
+
+        if config.save_results:
+            saver = tf.train.Saver(var_list=pre_variables_to_save)
+            saver.save(
+                session,
+                os.path.join(
+                    exp.abspath("result_params"), "restored_pretrain_params.dat"
+                ),
+            )
+
+        logging.info("Pretrain Model saved.")
+        logging.info("Start train the whole network.")
+
+        with loop:
+            trainer.run()
+
+        logging.info("Training Finished.")
+        if config.save_results:
+            saver = tf.train.Saver(var_list=variables_to_save)
+            saver.save(
+                session,
+                os.path.join(exp.abspath("result_params"), "restored_params.dat"),
+            )
+        logging.info("Model saved.")
+
+    return exp
diff --git a/networks/InterFusion/wrapper.py b/networks/InterFusion/wrapper.py
new file mode 100644
index 0000000..362aece
--- /dev/null
+++ b/networks/InterFusion/wrapper.py
@@ -0,0 +1,36 @@
+from .train import ExpConfig, TrainConfig, fit
+from .predict import predict_prob
+
+
+class InterFusion:
+    def __init__(self, dataset, model_root, dim):
+        self.dataset = dataset
+        self.model_root = model_root
+        self.dim = dim
+        self.train_exp = None
+
+    def fit(
+        self,
+        x_train,
+        x_valid,
+        lr,
+        window_size,
+        batch_size,
+        pretrain_max_epoch,
+        max_epoch,
+    ):
+        self.train_exp = fit(
+            self.dataset,
+            self.model_root,
+            x_train,
+            x_valid,
+            self.dim,
+            lr,
+            window_size,
+            batch_size,
+            pretrain_max_epoch,
+            max_epoch,
+        )
+
+    def predict_prob(self, x_test, y_test):
+        return predict_prob(x_test, y_test, self.train_exp.config, self.model_root)
diff --git a/networks/RANS/__init__.py b/networks/RANS/__init__.py
new file mode 100644
index 0000000..8b3f387
--- /dev/null
+++ b/networks/RANS/__init__.py
@@ -0,0 +1,2 @@
+import imp
+from .main import *
diff --git a/networks/RANS/main.py b/networks/RANS/main.py
new file mode 100644
index 0000000..2816676
--- /dev/null
+++ b/networks/RANS/main.py
@@ -0,0 +1,557 @@
+# -*- coding: utf-8 -*-
+"""
+Created on Wed Dec 16 12:30:26 2020
+
+@author: aabdulaal
+................................................................................................................................
+"""
+
+import os
+import logging
+import numpy as np
+from scipy.signal import find_peaks
+from spectrum import Periodogram
+from joblib import dump, load
+from .models import freqcoder, sincoder, RANCoders
+import tensorflow as tf
+import tensorflow.keras.backend as K
+from tensorflow.python.keras.layers import Input
+from tensorflow.python.keras.models import Model, model_from_json
+from typing import List, Optional
+
+
+class RANSynCoders:
+    """class for building, training, and testing rancoders models"""
+
+    def __init__(
+        self,
+        # Rancoders inputs:
+        n_estimators: int = 100,
+        max_features: int = 3,
+        encoding_depth: int = 2,
+        latent_dim: int = 2,
+        decoding_depth: int = 2,
+        activation: str = "linear",
+        output_activation: str = "linear",
+        delta: float = 0.05,  # quantile bound for regression
+        # Syncrhonization inputs
+        synchronize: bool = False,
+        force_synchronization: bool = True,  # if synchronization is true but no significant frequencies found
+        min_periods: int = 3,  # if synchronize and forced, this is the minimum bound on cycles to look for in train set
+        freq_init: Optional[
+            List[float]
+        ] = None,  # initial guess for the dominant angular frequency
+        max_freqs: int = 1,  # the number of sinusoidal signals to fit
+        min_dist: int = 60,  # maximum distance for finding local maximums in the PSD
+        trainable_freq: bool = False,  # whether to make the frequency a variable during layer weight training
+        bias: bool = True,  # add intercept (vertical displacement)
+    ):
+        # Rancoders inputs:
+        self.n_estimators = n_estimators
+        self.max_features = max_features
+        self.encoding_depth = encoding_depth
+        self.latent_dim = latent_dim
+        self.decoding_depth = decoding_depth
+        self.activation = activation
+        self.output_activation = output_activation
+        self.delta = delta
+
+        # Syncrhonization inputs
+        self.synchronize = synchronize
+        self.force_synchronization = force_synchronization
+        self.min_periods = min_periods
+        self.freq_init = freq_init  # in radians (angular frequency)
+        self.max_freqs = max_freqs
+        self.min_dist = min_dist
+        self.trainable_freq = trainable_freq
+        self.bias = bias
+
+        # set all variables to default to float32
+        tf.keras.backend.set_floatx("float32")
+
+    def build(self, input_shape, initial_stage: bool = False):
+        x_in = Input(
+            shape=(input_shape[-1],)
+        )  # created for either raw signal or synchronized signal
+        if initial_stage:
+            freq_out = freqcoder()(x_in)
+            self.freqcoder = Model(inputs=x_in, outputs=freq_out)
+            self.freqcoder.compile(
+                optimizer="adam", loss=lambda y, f: quantile_loss(0.5, y, f)
+            )
+        else:
+            bounds_out = RANCoders(
+                n_estimators=self.n_estimators,
+                max_features=self.max_features,
+                encoding_depth=self.encoding_depth,
+                latent_dim=self.latent_dim,
+                decoding_depth=self.decoding_depth,
+                delta=self.delta,
+                activation=self.activation,
+                output_activation=self.output_activation,
+                name="rancoders",
+            )(x_in)
+            self.rancoders = Model(inputs=x_in, outputs=bounds_out)
+            self.rancoders.compile(
+                optimizer="adam",
+                loss=[
+                    lambda y, f: quantile_loss(1 - self.delta, y, f),
+                    lambda y, f: quantile_loss(self.delta, y, f),
+                ],
+            )
+            if self.synchronize:
+                t_in = Input(shape=(input_shape[-1],))
+                sin_out = sincoder(
+                    freq_init=self.freq_init, trainable_freq=self.trainable_freq
+                )(t_in)
+                self.sincoder = Model(inputs=t_in, outputs=sin_out)
+                self.sincoder.compile(
+                    optimizer="adam", loss=lambda y, f: quantile_loss(0.5, y, f)
+                )
+
+    def fit(
+        self,
+        x: np.ndarray,
+        epochs: int = 100,
+        batch_size: int = 360,
+        shuffle: bool = True,
+        freq_warmup: int = 10,  # number of warmup epochs to prefit the frequency
+        sin_warmup: int = 10,  # number of warmup epochs to prefit the sinusoidal representation
+        pos_amp: bool = True,  # whether to constraint amplitudes to be +ve only
+    ):
+        t = np.tile(np.array(range(x.shape[0])).reshape(-1, 1), (1, x.shape[1]))
+        # Prepare the training batches.
+        dataset = tf.data.Dataset.from_tensor_slices(
+            (x.astype(np.float32), t.astype(np.float32))
+        )
+        if shuffle:
+            dataset = dataset.shuffle(buffer_size=x.shape[0]).batch(batch_size)
+
+        # build and compile models (stage 1)
+        if self.synchronize:
+            self.build(x.shape, initial_stage=True)
+            if self.freq_init:
+                self.build(x.shape)
+        else:
+            self.build(x.shape)
+
+        # pretraining step 1:
+        if freq_warmup > 0 and self.synchronize and not self.freq_init:
+            for epoch in range(freq_warmup):
+                logging.info("\nStart of frequency pre-train epoch %d" % (epoch,))
+                for step, (x_batch, t_batch) in enumerate(dataset):
+                    # Prefit the oscillation encoder
+                    with tf.GradientTape() as tape:
+                        # forward pass
+                        z, x_pred = self.freqcoder(x_batch)
+
+                        # compute loss
+                        x_loss = self.freqcoder.loss(x_batch, x_pred)  # median loss
+
+                    # retrieve gradients and update weights
+                    grads = tape.gradient(x_loss, self.freqcoder.trainable_weights)
+                    self.freqcoder.optimizer.apply_gradients(
+                        zip(grads, self.freqcoder.trainable_weights)
+                    )
+                logging.info("pre-reconstruction_loss: {}".format(tf.reduce_mean(x_loss).numpy()))
+
+            # estimate dominant frequency
+            z = (
+                self.freqcoder(x)[0].numpy().reshape(-1)
+            )  # must be done on full unshuffled series
+            z = ((z - z.min()) / (z.max() - z.min())) * (
+                1 - -1
+            ) + -1  #  scale between -1 & 1
+            p = Periodogram(z, sampling=1)
+            p()
+            peak_idxs = find_peaks(p.psd, distance=self.min_dist, height=(0, np.inf))[0]
+            peak_order = p.psd[peak_idxs].argsort()[
+                -self.min_periods - self.max_freqs :
+            ][
+                ::-1
+            ]  # max PSDs found
+            peak_idxs = peak_idxs[peak_order]
+            if peak_idxs[0] < self.min_periods and not self.force_synchronization:
+                self.synchronize = False
+                logging.info(
+                    "no common oscillations found, switching off synchronization attempts"
+                )
+            elif max(peak_idxs[: self.min_periods]) >= self.min_periods:
+                idxs = peak_idxs[peak_idxs >= self.min_periods]
+                peak_freqs = [
+                    p.frequencies()[idx]
+                    for idx in idxs[: min(len(idxs), self.max_freqs)]
+                ]
+                self.freq_init = [2 * np.pi * f for f in peak_freqs]
+                logging.info(
+                    "found common oscillations at period(s) = {}".format(
+                        [1 / f for f in peak_freqs]
+                    )
+                )
+            else:
+                self.synchronize = False
+                logging.info(
+                    "no common oscillations found, switching off synchronization attempts"
+                )
+
+            # build and compile models (stage 2)
+            self.build(x.shape)
+
+        # pretraining step 2:
+        if sin_warmup > 0 and self.synchronize:
+            for epoch in range(sin_warmup):
+                logging.info(
+                    "\nStart of sine representation pre-train epoch %d" % (epoch,)
+                )
+                for step, (x_batch, t_batch) in enumerate(dataset):
+                    # Train the sine wave encoder
+                    with tf.GradientTape() as tape:
+                        # forward pass
+                        s = self.sincoder(t_batch)
+
+                        # compute loss
+                        s_loss = self.sincoder.loss(x_batch, s)  # median loss
+
+                    # retrieve gradients and update weights
+                    grads = tape.gradient(s_loss, self.sincoder.trainable_weights)
+                    self.sincoder.optimizer.apply_gradients(
+                        zip(grads, self.sincoder.trainable_weights)
+                    )
+                logging.info("sine_loss: {}".format(tf.reduce_mean(s_loss).numpy()))
+
+            # invert params (all amplitudes should either be -ve or +ve). Here we make them +ve
+            if pos_amp:
+                a_adj = tf.where(
+                    self.sincoder.layers[1].amp[:, 0] < 0,
+                    self.sincoder.layers[1].amp[:, 0] * -1,
+                    self.sincoder.layers[1].amp[:, 0],
+                )  # invert all -ve amplitudes
+                wb_adj = tf.where(
+                    self.sincoder.layers[1].amp[:, 0] < 0,
+                    self.sincoder.layers[1].wb[:, 0] + np.pi,
+                    self.sincoder.layers[1].wb[:, 0],
+                )  # shift inverted waves by half cycle
+                wb_adj = tf.where(
+                    wb_adj > 2 * np.pi, self.sincoder.layers[1].wb[:, 0] - np.pi, wb_adj
+                )  # any cycle > freq must be reduced by half the cycle
+                g_adj = tf.where(
+                    self.sincoder.layers[1].amp[:, 0] < 0,
+                    self.sincoder.layers[1].disp - a_adj,
+                    self.sincoder.layers[1].disp,
+                )  # adjust the vertical displacements after reversing amplitude signs
+                K.set_value(self.sincoder.layers[1].amp[:, 0], a_adj)
+                K.set_value(self.sincoder.layers[1].wb[:, 0], wb_adj)
+                K.set_value(self.sincoder.layers[1].disp, g_adj)
+
+        # train anomaly detector
+        for epoch in range(epochs):
+            logging.info("\nStart of epoch %d" % (epoch,))
+            if self.synchronize:
+                for step, (x_batch, t_batch) in enumerate(dataset):
+                    # Train the sine wave encoder
+                    with tf.GradientTape() as tape:
+                        # forward pass
+                        s = self.sincoder(t_batch)
+
+                        # compute loss
+                        s_loss = self.sincoder.loss(x_batch, s)  # median loss
+
+                    # retrieve gradients and update weights
+                    grads = tape.gradient(s_loss, self.sincoder.trainable_weights)
+                    self.sincoder.optimizer.apply_gradients(
+                        zip(grads, self.sincoder.trainable_weights)
+                    )
+
+                    # synchronize batch
+                    b = (
+                        self.sincoder.layers[1].wb / self.sincoder.layers[1].freq
+                    )  # phase shift(s)
+                    b_sync = b - tf.expand_dims(b[:, 0], axis=-1)
+                    th_sync = tf.expand_dims(
+                        tf.expand_dims(self.sincoder.layers[1].freq, axis=0), axis=0
+                    ) * (
+                        tf.expand_dims(t_batch, axis=-1)
+                        + tf.expand_dims(b_sync, axis=0)
+                    )  # synchronized angle
+                    e = (x_batch - s) * tf.sin(
+                        self.sincoder.layers[1].freq[0]
+                        * ((np.pi / (2 * self.sincoder.layers[1].freq[0])) - b[:, 0])
+                    )  # noise
+                    x_batch_sync = (
+                        tf.reduce_sum(
+                            tf.expand_dims(self.sincoder.layers[1].amp, axis=0)
+                            * tf.sin(th_sync),
+                            axis=-1,
+                        )
+                        + self.sincoder.layers[1].disp
+                        + e
+                    )
+
+                    # train the rancoders
+                    with tf.GradientTape() as tape:
+                        # forward pass
+                        o_hi, o_lo = self.rancoders(x_batch_sync)
+
+                        # compute losses
+                        o_hi_loss = self.rancoders.loss[0](
+                            tf.tile(
+                                tf.expand_dims(x_batch_sync, axis=0),
+                                (self.n_estimators, 1, 1),
+                            ),
+                            o_hi,
+                        )
+                        o_lo_loss = self.rancoders.loss[1](
+                            tf.tile(
+                                tf.expand_dims(x_batch_sync, axis=0),
+                                (self.n_estimators, 1, 1),
+                            ),
+                            o_lo,
+                        )
+                        o_loss = o_hi_loss + o_lo_loss
+
+                    # retrieve gradients and update weights
+                    grads = tape.gradient(o_loss, self.rancoders.trainable_weights)
+                    self.rancoders.optimizer.apply_gradients(
+                        zip(grads, self.rancoders.trainable_weights)
+                    )
+                logging.info(
+                    "sine_loss: {}, upper_bound_loss: {}, lower_bound_loss: {}".format(
+                        tf.reduce_mean(s_loss).numpy(),
+                        tf.reduce_mean(o_hi_loss).numpy(),
+                        tf.reduce_mean(o_lo_loss).numpy(),
+                    )
+                )
+            else:
+                for step, (x_batch, t_batch) in enumerate(dataset):
+                    # train the rancoders
+                    with tf.GradientTape() as tape:
+                        # forward pass
+                        o_hi, o_lo = self.rancoders(x_batch)
+
+                        # compute losses
+                        o_hi_loss = self.rancoders.loss[0](
+                            tf.tile(
+                                tf.expand_dims(x_batch, axis=0),
+                                (self.n_estimators, 1, 1),
+                            ),
+                            o_hi,
+                        )
+                        o_lo_loss = self.rancoders.loss[1](
+                            tf.tile(
+                                tf.expand_dims(x_batch, axis=0),
+                                (self.n_estimators, 1, 1),
+                            ),
+                            o_lo,
+                        )
+                        o_loss = o_hi_loss + o_lo_loss
+
+                    # retrieve gradients and update weights
+                    grads = tape.gradient(o_loss, self.rancoders.trainable_weights)
+                    self.rancoders.optimizer.apply_gradients(
+                        zip(grads, self.rancoders.trainable_weights)
+                    )
+                logging.info(
+                    "upper_bound_loss: {} lower_bound_loss: {}".format(
+                        tf.reduce_mean(o_hi_loss).numpy(),
+                        tf.reduce_mean(o_lo_loss).numpy(),
+                    )
+                )
+
+    def predict_prob(
+        self,
+        x: np.ndarray,
+        # t: np.ndarray,
+        N: int,
+        batch_size: int = 1000,
+        desync: bool = False,
+    ):
+        t = np.tile(np.array(range(x.shape[0])).reshape(-1, 1), (1, x.shape[1]))
+        # Prepare the training batches.
+        dataset = tf.data.Dataset.from_tensor_slices(
+            (x.astype(np.float32), t.astype(np.float32))
+        )
+        dataset = dataset.batch(batch_size)
+        batches = int(np.ceil(x.shape[0] / batch_size))
+
+        # loop through the batches of the dataset.
+        if self.synchronize:
+            s, x_sync, o_hi, o_lo = (
+                [None] * batches,
+                [None] * batches,
+                [None] * batches,
+                [None] * batches,
+            )
+            for step, (x_batch, t_batch) in enumerate(dataset):
+                s_i = self.sincoder(t_batch).numpy()
+                b = (
+                    self.sincoder.layers[1].wb / self.sincoder.layers[1].freq
+                )  # phase shift(s)
+                b_sync = b - tf.expand_dims(b[:, 0], axis=-1)
+                th_sync = tf.expand_dims(
+                    tf.expand_dims(self.sincoder.layers[1].freq, axis=0), axis=0
+                ) * (
+                    tf.expand_dims(t_batch, axis=-1) + tf.expand_dims(b_sync, axis=0)
+                )  # synchronized angle
+                e = (x_batch - s_i) * tf.sin(
+                    self.sincoder.layers[1].freq[0]
+                    * ((np.pi / (2 * self.sincoder.layers[1].freq[0])) - b[:, 0])
+                )  # noise
+                x_sync_i = (
+                    tf.reduce_sum(
+                        tf.expand_dims(self.sincoder.layers[1].amp, axis=0)
+                        * tf.sin(th_sync),
+                        axis=-1,
+                    )
+                    + self.sincoder.layers[1].disp
+                    + e
+                ).numpy()
+                o_hi_i, o_lo_i = self.rancoders(x_sync_i)
+                o_hi_i, o_lo_i = (
+                    tf.transpose(o_hi_i, [1, 0, 2]).numpy(),
+                    tf.transpose(o_lo_i, [1, 0, 2]).numpy(),
+                )
+                if desync:
+                    o_hi_i, o_lo_i = self.predict_desynchronize(
+                        x_batch, x_sync_i, o_hi_i, o_lo_i
+                    )
+                s[step], x_sync[step], o_hi[step], o_lo[step] = (
+                    s_i,
+                    x_sync_i,
+                    o_hi_i,
+                    o_lo_i,
+                )
+            sins, synched, upper, lower = (
+                np.concatenate(s, axis=0),
+                np.concatenate(x_sync, axis=0),
+                np.concatenate(o_hi, axis=0),
+                np.concatenate(o_lo, axis=0),
+            )
+
+            synched_tiles = np.tile(
+                synched.reshape(synched.shape[0], 1, synched.shape[1]), (1, N, 1)
+            )
+            result = np.where((synched_tiles < lower) | (synched_tiles > upper), 1, 0)
+            inference = np.mean(np.mean(result, axis=1), axis=1)
+            return inference
+        else:
+            o_hi, o_lo = [None] * batches, [None] * batches
+            for step, (x_batch, t_batch) in enumerate(dataset):
+                o_hi_i, o_lo_i = self.rancoders(x_batch)
+                o_hi_i, o_lo_i = (
+                    tf.transpose(o_hi_i, [1, 0, 2]).numpy(),
+                    tf.transpose(o_lo_i, [1, 0, 2]).numpy(),
+                )
+                o_hi[step], o_lo[step] = o_hi_i, o_lo_i
+            return np.concatenate(o_hi, axis=0), np.concatenate(o_lo, axis=0)
+
+    def save(self, filepath: str = os.path.join(os.getcwd(), "ransyncoders.z")):
+        file = {"params": self.get_config()}
+        if self.synchronize:
+            file["freqcoder"] = {
+                "model": self.freqcoder.to_json(),
+                "weights": self.freqcoder.get_weights(),
+            }
+            file["sincoder"] = {
+                "model": self.sincoder.to_json(),
+                "weights": self.sincoder.get_weights(),
+            }
+        file["rancoders"] = {
+            "model": self.rancoders.to_json(),
+            "weights": self.rancoders.get_weights(),
+        }
+        dump(file, filepath, compress=True)
+
+    @classmethod
+    def load(cls, filepath: str = os.path.join(os.getcwd(), "ransyncoders.z")):
+        file = load(filepath)
+        cls = cls()
+        for param, val in file["params"].items():
+            setattr(cls, param, val)
+        if cls.synchronize:
+            cls.freqcoder = model_from_json(
+                file["freqcoder"]["model"], custom_objects={"freqcoder": freqcoder}
+            )
+            cls.freqcoder.set_weights(file["freqcoder"]["weights"])
+            cls.sincoder = model_from_json(
+                file["sincoder"]["model"], custom_objects={"sincoder": sincoder}
+            )
+            cls.sincoder.set_weights(file["sincoder"]["weights"])
+        cls.rancoders = model_from_json(
+            file["rancoders"]["model"], custom_objects={"RANCoders": RANCoders}
+        )
+        cls.rancoders.set_weights(file["rancoders"]["weights"])
+        return cls
+
+    def predict_desynchronize(
+        self, x: np.ndarray, x_sync: np.ndarray, o_hi: np.ndarray, o_lo: np.ndarray
+    ):
+        if self.synchronize:
+            E = (o_hi + o_lo) / 2  # expected values
+            deviation = (
+                tf.expand_dims(x_sync, axis=1) - E
+            )  # input (synchronzied) deviation from expected
+            deviation = self.desynchronize(deviation)  # desynchronize
+            E = (
+                tf.expand_dims(x, axis=1) - deviation
+            )  # expected values in desynchronized form
+            offset = (o_hi - o_lo) / 2  # this is the offet from the expected value
+            offset = abs(self.desynchronize(offset))  # desynch
+            o_hi, o_lo = (
+                E + offset,
+                E - offset,
+            )  # add bound displacement to expected values
+            return o_hi.numpy(), o_lo.numpy()
+        else:
+            raise ParameterError(
+                "synchronize", "parameter not set correctly for this method"
+            )
+
+    def desynchronize(self, e: np.ndarray):
+        if self.synchronize:
+            b = (
+                self.sincoder.layers[1].wb / self.sincoder.layers[1].freq
+            )  # phase shift(s)
+            return (
+                e
+                * tf.sin(
+                    self.sincoder.layers[1].freq[0]
+                    * ((np.pi / (2 * self.sincoder.layers[1].freq[0])) + b[:, 0])
+                ).numpy()
+            )
+        else:
+            raise ParameterError(
+                "synchronize", "parameter not set correctly for this method"
+            )
+
+    def get_config(self):
+        config = {
+            "n_estimators": self.n_estimators,
+            "max_features": self.max_features,
+            "encoding_depth": self.encoding_depth,
+            "latent_dim": self.encoding_depth,
+            "decoding_depth": self.decoding_depth,
+            "activation": self.activation,
+            "output_activation": self.output_activation,
+            "delta": self.delta,
+            "synchronize": self.synchronize,
+            "force_synchronization": self.force_synchronization,
+            "min_periods": self.min_periods,
+            "freq_init": self.freq_init,
+            "max_freqs": self.max_freqs,
+            "min_dist": self.min_dist,
+            "trainable_freq": self.trainable_freq,
+            "bias": self.bias,
+        }
+        return config
+
+
+# Loss function
+def quantile_loss(q, y, f):
+    e = y - f
+    return K.mean(K.maximum(q * e, (q - 1) * e), axis=-1)
+
+
+class ParameterError(Exception):
+    def __init__(self, expression, message):
+        self.expression = expression
+        self.message = message
diff --git a/networks/RANS/models.py b/networks/RANS/models.py
new file mode 100644
index 0000000..f7beda2
--- /dev/null
+++ b/networks/RANS/models.py
@@ -0,0 +1,224 @@
+# -*- coding: utf-8 -*-
+"""
+Created on Thu Dec 10 11:52:54 2020
+
+@author: aabdulaal
+................................................................................................................................
+"""
+import numpy as np
+import tensorflow as tf
+from tensorflow.python.keras.constraints import NonNeg
+from tensorflow.python.keras.initializers import Constant
+from tensorflow.python.keras.layers import Dense, Layer
+from tensorflow.python.keras.models import Model
+from typing import List, Optional
+
+# ==============================================================================================================================
+# SINCODER
+# ==============================================================================================================================
+class freqcoder(Layer):
+    """ 
+    Encode multivariate to a latent space of size 1 for extracting common oscillations in the series (similar to finding PCA).
+    """
+    def __init__(self, **kwargs):
+        super(freqcoder, self).__init__(**kwargs)
+        self.kwargs = kwargs
+        
+    def build(self, input_shape):
+        self.latent = Dense(1, activation='linear')
+        self.decoder = Dense(input_shape[-1], activation='linear')
+    
+    def call(self, inputs):
+        z = self.latent(inputs)
+        x_pred = self.decoder(z)
+        return z, x_pred
+    
+    def get_config(self):
+        base_config = super(freqcoder, self).get_config()
+        return dict(list(base_config.items()))
+    
+class sincoder(Layer):
+    """ Fit m sinusoidal waves to an input t-matrix (matrix of m epochtimes) """
+    def __init__(self, freq_init: Optional[List[float]] = None, max_freqs: int = 1, trainable_freq: bool = False, **kwargs):
+        super(sincoder, self).__init__(**kwargs)
+        self.freq_init = freq_init
+        if freq_init:
+            self.max_freqs = len(freq_init)
+        else:
+            self.max_freqs = max_freqs
+        self.trainable_freq = trainable_freq
+        self.kwargs = kwargs
+        
+    def build(self, input_shape):
+        self.amp = self.add_weight(shape=(input_shape[-1], self.max_freqs), initializer="zeros", trainable=True)
+        if self.freq_init and not self.trainable_freq:
+            self.freq = [self.add_weight(initializer=Constant(f), trainable=False) for f in self.freq_init]
+        elif self.freq_init:
+            self.freq = [self.add_weight(initializer=Constant(f), constraint=NonNeg(), trainable=True) for f in self.freq_init]
+        else:
+            self.freq = [
+                self.add_weight(initializer="zeros", constraint=NonNeg(), trainable=True) for f in range(self.max_freqs)
+            ]
+        self.wb = self.add_weight(
+            shape=(input_shape[-1], self.max_freqs), initializer="zeros", trainable=True
+        )  # angular frequency (w) x phase shift
+        self.disp = self.add_weight(shape=input_shape[-1], initializer="zeros", trainable=True)
+    
+    def call(self, inputs):
+        th = tf.expand_dims(
+            tf.expand_dims(self.freq, axis=0), axis=0
+        ) * tf.expand_dims(inputs, axis=-1) + tf.expand_dims(self.wb, axis=0)
+        return tf.reduce_sum(tf.expand_dims(self.amp, axis=0) * tf.sin(th), axis=-1) + self.disp
+    
+    def get_config(self):
+        base_config = super(sincoder, self).get_config()
+        config = {"freq_init": self.freq_init, "max_freqs": self.max_freqs, "trainable_freq": self.trainable_freq}
+        return dict(list(base_config.items()) + list(config.items()))
+
+# ==============================================================================================================================
+# RANCODER
+# ==============================================================================================================================
+class Encoder(Layer):
+    def __init__(self, latent_dim: int, activation: str, depth: int = 2, **kwargs,):
+        super(Encoder, self).__init__(**kwargs)
+        self.latent_dim = latent_dim
+        self.activation = activation
+        self.depth = depth
+        self.kwargs = kwargs
+        
+    def build(self, input_shape):
+        self.hidden = {
+            'hidden_{}'.format(i): Dense(
+                int(input_shape[-1] / (2**(i+1))), activation=self.activation,
+            ) for i in range(self.depth)
+        }
+        self.latent = Dense(self.latent_dim, activation=self.activation)
+        
+    def call(self, inputs):
+        x = self.hidden['hidden_0'](inputs)
+        for i in range(1, self.depth):
+            x = self.hidden['hidden_{}'.format(i)](x)
+        return self.latent(x)
+    
+    def get_config(self):
+        base_config = super(Encoder, self).get_config()
+        config = {"latent_dim": self.latent_dim, "activation": self.activation,"depth": self.depth,}
+        return dict(list(base_config.items()) + list(config.items()))
+    
+class Decoder(Layer):
+    def __init__(self, output_dim: int, activation: str, output_activation: str,depth: int, **kwargs):
+        super(Decoder, self).__init__(**kwargs)
+        self.output_dim = output_dim
+        self.activation = activation
+        self.output_activation = output_activation
+        self.depth = depth
+        self.kwargs = kwargs
+        
+    def build(self, input_shape):
+        self.hidden = {
+            'hidden_{}'.format(i): Dense(
+                int(self.output_dim/ (2**(self.depth-i))), activation=self.activation,
+            ) for i in range(self.depth)
+        }
+        self.restored = Dense(self.output_dim, activation=self.output_activation)
+        
+    def call(self, inputs):
+        x = self.hidden['hidden_0'](inputs)
+        for i in range(1, self.depth):
+            x = self.hidden['hidden_{}'.format(i)](x)
+        return self.restored(x)
+    
+    def get_config(self):
+        base_config = super(Decoder, self).get_config()
+        config = {
+            "output_dim": self.output_dim, 
+            "activation": self.activation, 
+            "output_activation": self.output_activation, 
+            "depth": self.depth,
+        }
+        return dict(list(base_config.items()) + list(config.items()))
+    
+class RANCoders(Layer):
+    def __init__(
+            self, 
+            n_estimators: int = 100,
+            max_features: int = 3,
+            encoding_depth: int = 2,
+            latent_dim: int = 2, 
+            decoding_depth: int = 2,
+            delta: float = 0.05,
+            activation: str = 'linear',
+            output_activation: str = 'linear',
+            **kwargs,
+    ):
+        super(RANCoders, self).__init__(**kwargs)
+        self.n_estimators = n_estimators
+        self.max_features = max_features
+        self.encoding_depth = encoding_depth
+        self.latent_dim = latent_dim
+        self.decoding_depth = decoding_depth
+        self.delta = delta
+        self.activation = activation
+        self.output_activation = output_activation
+        self.kwargs = kwargs
+        
+    def build(self, input_shape):
+        assert(input_shape[-1] > self.max_features)
+        self.encoders = {
+            'encoder_{}'.format(i): Encoder(
+                self.latent_dim, self.activation, depth=self.encoding_depth,
+            ) for i in range(self.n_estimators)
+        }
+        self.decoders_upper = {
+            'decoder_hi_{}'.format(i): Decoder(
+                input_shape[-1], self.activation, self.output_activation, self.decoding_depth
+            ) for i in range(self.n_estimators)
+        }
+        self.decoders_lower = {
+            'decoder_lo_{}'.format(i): Decoder(
+                input_shape[-1], self.activation, self.output_activation, self.decoding_depth
+            ) for i in range(self.n_estimators)
+        }
+        self.randsamples = tf.Variable(
+            np.concatenate(
+                [
+                    np.random.choice(
+                        input_shape[-1], replace=False, size=(1, self.max_features),
+                    ) for i in range(self.n_estimators)
+                ]
+            ), trainable=False
+        )  # the feature selector (bootstrapping)
+        
+    def call(self, inputs):
+        z = {
+            'z_{}'.format(i): self.encoders['encoder_{}'.format(i)](
+                tf.gather(inputs, self.randsamples[i], axis=-1)
+            ) for i in range(self.n_estimators)
+        }
+        w_hi = {
+            'w_{}'.format(i): self.decoders_upper['decoder_hi_{}'.format(i)](
+                z['z_{}'.format(i)]
+            ) for i in range(self.n_estimators)
+        }
+        w_lo = {
+            'w_{}'.format(i): self.decoders_lower['decoder_lo_{}'.format(i)](
+                z['z_{}'.format(i)]
+            ) for i in range(self.n_estimators)
+        }
+        o_hi = tf.concat([tf.expand_dims(i, axis=0) for i in w_hi.values()], axis=0)  
+        o_lo = tf.concat([tf.expand_dims(i, axis=0) for i in w_lo.values()], axis=0)
+        return o_hi, o_lo
+    
+    def get_config(self):
+        base_config = super(RANCoders, self).get_config()
+        config = {
+            "n_estimators": self.n_estimators,
+            "max_features": self.max_features,
+            "encoding_depth": self.encoding_depth,
+            "latent_dim": self.latent_dim,
+            "decoding_depth": self.decoding_depth,
+            "delta": self.delta,
+            "activation": self.activation,
+            "output_activation": self.output_activation,
+        }
+        return dict(list(base_config.items()) + list(config.items()))
\ No newline at end of file
diff --git a/networks/__init__.py b/networks/__init__.py
new file mode 100644
index 0000000..5b0ee68
--- /dev/null
+++ b/networks/__init__.py
@@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+
+#   http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+# import pkgutil
+
+# __all__ = []
+# for loader, module_name, is_pkg in pkgutil.walk_packages(__path__):
+#     __all__.append(module_name)
+#     module = loader.find_module(module_name).load_module(module_name)
+#     exec('%s = module' % module_name)
diff --git a/networks/anomaly_transformer/__init__.py b/networks/anomaly_transformer/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/networks/anomaly_transformer/model/AnomalyTransformer.py b/networks/anomaly_transformer/model/AnomalyTransformer.py
new file mode 100644
index 0000000..bc4eb22
--- /dev/null
+++ b/networks/anomaly_transformer/model/AnomalyTransformer.py
@@ -0,0 +1,110 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from .attn import AnomalyAttention, AttentionLayer
+from .embed import DataEmbedding
+
+
+class EncoderLayer(nn.Module):
+    def __init__(self, attention, d_model, d_ff=None, dropout=0.1, activation="relu"):
+        super(EncoderLayer, self).__init__()
+        d_ff = d_ff or 4 * d_model
+        self.attention = attention
+        self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
+        self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)
+        self.norm1 = nn.LayerNorm(d_model)
+        self.norm2 = nn.LayerNorm(d_model)
+        self.dropout = nn.Dropout(dropout)
+        self.activation = F.relu if activation == "relu" else F.gelu
+
+    def forward(self, x, attn_mask=None):
+        new_x, attn, mask, sigma = self.attention(x, x, x, attn_mask=attn_mask)
+        x = x + self.dropout(new_x)
+        y = x = self.norm1(x)
+        y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
+        y = self.dropout(self.conv2(y).transpose(-1, 1))
+
+        return self.norm2(x + y), attn, mask, sigma
+
+
+class Encoder(nn.Module):
+    def __init__(self, attn_layers, norm_layer=None):
+        super(Encoder, self).__init__()
+        self.attn_layers = nn.ModuleList(attn_layers)
+        self.norm = norm_layer
+
+    def forward(self, x, attn_mask=None):
+        # x [B, L, D]
+        series_list = []
+        prior_list = []
+        sigma_list = []
+        for attn_layer in self.attn_layers:
+            x, series, prior, sigma = attn_layer(x, attn_mask=attn_mask)
+            series_list.append(series)
+            prior_list.append(prior)
+            sigma_list.append(sigma)
+
+        if self.norm is not None:
+            x = self.norm(x)
+
+        return x, series_list, prior_list, sigma_list
+
+
+class Anomaly_Transformer(nn.Module):
+    def __init__(
+        self,
+        win_size,
+        device,
+        enc_in,
+        c_out,
+        d_model=512,
+        n_heads=8,
+        e_layers=3,
+        d_ff=512,
+        dropout=0.0,
+        activation="gelu",
+        output_attention=True,
+    ):
+        super(Anomaly_Transformer, self).__init__()
+        self.output_attention = output_attention
+
+        # Encoding
+        self.embedding = DataEmbedding(enc_in, d_model, dropout)
+
+        # Encoder
+        self.encoder = Encoder(
+            [
+                EncoderLayer(
+                    AttentionLayer(
+                        AnomalyAttention(
+                            win_size,
+                            device,
+                            False,
+                            attention_dropout=dropout,
+                            output_attention=output_attention,
+                        ),
+                        d_model,
+                        n_heads,
+                    ),
+                    d_model,
+                    d_ff,
+                    dropout=dropout,
+                    activation=activation,
+                )
+                for l in range(e_layers)
+            ],
+            norm_layer=torch.nn.LayerNorm(d_model),
+        )
+
+        self.projection = nn.Linear(d_model, c_out, bias=True)
+
+    def forward(self, x):
+        enc_out = self.embedding(x)
+        enc_out, series, prior, sigmas = self.encoder(enc_out)
+        enc_out = self.projection(enc_out)
+
+        if self.output_attention:
+            return enc_out, series, prior, sigmas
+        else:
+            return enc_out  # [B, L, D]
diff --git a/networks/anomaly_transformer/model/__init__.py b/networks/anomaly_transformer/model/__init__.py
new file mode 100755
index 0000000..e69de29
diff --git a/networks/anomaly_transformer/model/attn.py b/networks/anomaly_transformer/model/attn.py
new file mode 100644
index 0000000..24900fa
--- /dev/null
+++ b/networks/anomaly_transformer/model/attn.py
@@ -0,0 +1,112 @@
+import torch
+import torch.nn as nn
+import numpy as np
+import math
+from math import sqrt
+
+
+class TriangularCausalMask:
+    def __init__(self, B, L, device="cpu"):
+        mask_shape = [B, 1, L, L]
+        with torch.no_grad():
+            self._mask = torch.triu(
+                torch.ones(mask_shape, dtype=torch.bool), diagonal=1
+            ).to(device)
+
+    @property
+    def mask(self):
+        return self._mask
+
+
+class AnomalyAttention(nn.Module):
+    def __init__(
+        self,
+        win_size,
+        device,
+        mask_flag=True,
+        scale=None,
+        attention_dropout=0.0,
+        output_attention=False,
+    ):
+        super(AnomalyAttention, self).__init__()
+        self.device = device
+        self.scale = scale
+        self.mask_flag = mask_flag
+        self.output_attention = output_attention
+        self.dropout = nn.Dropout(attention_dropout)
+        window_size = win_size
+        self.distances = torch.zeros((window_size, window_size)).to(self.device)
+        for i in range(window_size):
+            for j in range(window_size):
+                self.distances[i][j] = abs(i - j)
+
+    def forward(self, queries, keys, values, sigma, attn_mask):
+        B, L, H, E = queries.shape
+        _, S, _, D = values.shape
+        scale = self.scale or 1.0 / sqrt(E)
+
+        scores = torch.einsum("blhe,bshe->bhls", queries, keys)
+        if self.mask_flag:
+            if attn_mask is None:
+                attn_mask = TriangularCausalMask(B, L, device=queries.device)
+            scores.masked_fill_(attn_mask.mask, -np.inf)
+        attn = scale * scores
+
+        sigma = sigma.transpose(1, 2)  # B L H ->  B H L
+        window_size = attn.shape[-1]
+        sigma = torch.sigmoid(sigma * 5) + 1e-5
+        sigma = torch.pow(3, sigma) - 1
+        sigma = sigma.unsqueeze(-1).repeat(1, 1, 1, window_size)  # B H L L
+        prior = (
+            self.distances.unsqueeze(0)
+            .unsqueeze(0)
+            .repeat(sigma.shape[0], sigma.shape[1], 1, 1)
+            .to(self.device)
+        )
+        prior = (
+            1.0
+            / (math.sqrt(2 * math.pi) * sigma)
+            * torch.exp(-(prior ** 2) / 2 / (sigma ** 2))
+        )
+
+        series = self.dropout(torch.softmax(attn, dim=-1))
+        V = torch.einsum("bhls,bshd->blhd", series, values)
+
+        if self.output_attention:
+            return (V.contiguous(), series, prior, sigma)
+        else:
+            return (V.contiguous(), None)
+
+
+class AttentionLayer(nn.Module):
+    def __init__(self, attention, d_model, n_heads, d_keys=None, d_values=None):
+        super(AttentionLayer, self).__init__()
+
+        d_keys = d_keys or (d_model // n_heads)
+        d_values = d_values or (d_model // n_heads)
+        self.norm = nn.LayerNorm(d_model)
+        self.inner_attention = attention
+        self.query_projection = nn.Linear(d_model, d_keys * n_heads)
+        self.key_projection = nn.Linear(d_model, d_keys * n_heads)
+        self.value_projection = nn.Linear(d_model, d_values * n_heads)
+        self.sigma_projection = nn.Linear(d_model, n_heads)
+        self.out_projection = nn.Linear(d_values * n_heads, d_model)
+
+        self.n_heads = n_heads
+
+    def forward(self, queries, keys, values, attn_mask):
+        B, L, _ = queries.shape
+        _, S, _ = keys.shape
+        H = self.n_heads
+        x = queries
+        queries = self.query_projection(queries).view(B, L, H, -1)
+        keys = self.key_projection(keys).view(B, S, H, -1)
+        values = self.value_projection(values).view(B, S, H, -1)
+        sigma = self.sigma_projection(x).view(B, L, H)
+
+        out, series, prior, sigma = self.inner_attention(
+            queries, keys, values, sigma, attn_mask
+        )
+        out = out.view(B, L, -1)
+
+        return self.out_projection(out), series, prior, sigma
diff --git a/networks/anomaly_transformer/model/embed.py b/networks/anomaly_transformer/model/embed.py
new file mode 100755
index 0000000..7a9b532
--- /dev/null
+++ b/networks/anomaly_transformer/model/embed.py
@@ -0,0 +1,54 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.nn.utils import weight_norm
+import math
+
+
+class PositionalEmbedding(nn.Module):
+    def __init__(self, d_model, max_len=5000):
+        super(PositionalEmbedding, self).__init__()
+        # Compute the positional encodings once in log space.
+        pe = torch.zeros(max_len, d_model).float()
+        pe.require_grad = False
+
+        position = torch.arange(0, max_len).float().unsqueeze(1)
+        div_term = (torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model)).exp()
+
+        pe[:, 0::2] = torch.sin(position * div_term)
+        pe[:, 1::2] = torch.cos(position * div_term)
+
+        pe = pe.unsqueeze(0)
+        self.register_buffer('pe', pe)
+
+    def forward(self, x):
+        return self.pe[:, :x.size(1)]
+
+
+class TokenEmbedding(nn.Module):
+    def __init__(self, c_in, d_model):
+        super(TokenEmbedding, self).__init__()
+        padding = 1 if torch.__version__ >= '1.5.0' else 2
+        self.tokenConv = nn.Conv1d(in_channels=c_in, out_channels=d_model,
+                                   kernel_size=3, padding=padding, padding_mode='circular', bias=False)
+        for m in self.modules():
+            if isinstance(m, nn.Conv1d):
+                nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='leaky_relu')
+
+    def forward(self, x):
+        x = self.tokenConv(x.permute(0, 2, 1)).transpose(1, 2)
+        return x
+
+
+class DataEmbedding(nn.Module):
+    def __init__(self, c_in, d_model, dropout=0.0):
+        super(DataEmbedding, self).__init__()
+
+        self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model)
+        self.position_embedding = PositionalEmbedding(d_model=d_model)
+
+        self.dropout = nn.Dropout(p=dropout)
+
+    def forward(self, x):
+        x = self.value_embedding(x) + self.position_embedding(x)
+        return self.dropout(x)
diff --git a/networks/anomaly_transformer/solver.py b/networks/anomaly_transformer/solver.py
new file mode 100644
index 0000000..ec436ea
--- /dev/null
+++ b/networks/anomaly_transformer/solver.py
@@ -0,0 +1,385 @@
+import logging
+import torch
+import torch.nn as nn
+import numpy as np
+import os
+import time
+
+from common.utils import set_device
+
+from .model.AnomalyTransformer import Anomaly_Transformer
+
+torch.autograd.set_detect_anomaly(True)
+
+
+def my_kl_loss(p, q):
+    res = p * (torch.log(p + 0.0001) - torch.log(q + 0.0001))
+    return torch.mean(torch.sum(res, dim=-1), dim=1)
+
+
+def adjust_learning_rate(optimizer, epoch, lr_):
+    lr_adjust = {epoch: lr_ * (0.5 ** ((epoch - 1) // 1))}
+    if epoch in lr_adjust.keys():
+        lr = lr_adjust[epoch]
+        for param_group in optimizer.param_groups:
+            param_group["lr"] = lr
+        logging.info("Updating learning rate to {}".format(lr))
+
+
+class EarlyStopping:
+    def __init__(self, patience=7, verbose=False, delta=0):
+        self.patience = patience
+        self.verbose = verbose
+        self.counter = 0
+        self.best_score = None
+        self.best_score2 = None
+        self.early_stop = False
+        self.val_loss_min = np.Inf
+        self.val_loss2_min = np.Inf
+        self.delta = delta
+
+    def __call__(self, val_loss, val_loss2, model, path):
+        score = -val_loss
+        score2 = -val_loss2
+        if self.best_score is None:
+            self.best_score = score
+            self.best_score2 = score2
+            self.save_checkpoint(val_loss, val_loss2, model, path)
+        elif (
+            score < self.best_score + self.delta
+            or score2 < self.best_score2 + self.delta
+        ):
+            self.counter = counter + 1
+            logging.info(
+                f"EarlyStopping counter: {self.counter} out of {self.patience}"
+            )
+            if self.counter >= self.patience:
+                self.early_stop = True
+        else:
+            self.best_score = score
+            self.best_score2 = score2
+            self.save_checkpoint(val_loss, val_loss2, model, path)
+            self.counter = 0
+
+    def save_checkpoint(self, val_loss, val_loss2, model, path):
+        if self.verbose:
+            logging.info(
+                f"Validation loss decreased ({self.val_loss_min:.6f} --> {val_loss:.6f}).  Saving model ..."
+            )
+        torch.save(
+            model.state_dict(),
+            os.path.join(path, "checkpoint.pth"),
+        )
+        self.val_loss_min = val_loss
+        self.val_loss2_min = val_loss2
+
+
+class AnomalyTransformer(object):
+    DEFAULTS = {}
+
+    def __init__(self, **config):
+
+        self.__dict__.update(AnomalyTransformer.DEFAULTS, **config)
+
+        self.device = set_device(self.device)
+        self.build_model()
+        self.criterion = nn.MSELoss(reduction="mean")
+
+    def build_model(self):
+        self.model = Anomaly_Transformer(
+            win_size=self.win_size,
+            device=self.device,
+            enc_in=self.input_c,
+            c_out=self.output_c,
+            e_layers=3,
+        )
+        self.optimizer = torch.optim.Adam(self.model.parameters(), lr=self.lr)
+        self.model.to(self.device)
+
+    def vali(self, vali_loader):
+        self.model.eval()
+
+        loss_1 = []
+        loss_2 = []
+        for i, (input_data, _) in enumerate(vali_loader):
+            input = input_data.float().to(self.device)
+            output, series, prior, _ = self.model(input)
+            series_loss = 0.0
+            prior_loss = 0.0
+            for u in range(len(prior)):
+                series_loss = (
+                    series_loss
+                    + torch.mean(
+                        my_kl_loss(
+                            series[u],
+                            (
+                                prior[u]
+                                / torch.unsqueeze(
+                                    torch.sum(prior[u], dim=-1), dim=-1
+                                ).repeat(1, 1, 1, self.win_size)
+                            ).detach(),
+                        )
+                    )
+                    + torch.mean(
+                        my_kl_loss(
+                            (
+                                prior[u]
+                                / torch.unsqueeze(
+                                    torch.sum(prior[u], dim=-1), dim=-1
+                                ).repeat(1, 1, 1, self.win_size)
+                            ).detach(),
+                            series[u],
+                        )
+                    )
+                )
+                prior_loss = (
+                    prior_loss
+                    + torch.mean(
+                        my_kl_loss(
+                            (
+                                prior[u]
+                                / torch.unsqueeze(
+                                    torch.sum(prior[u], dim=-1), dim=-1
+                                ).repeat(1, 1, 1, self.win_size)
+                            ),
+                            series[u].detach(),
+                        )
+                    )
+                    + torch.mean(
+                        my_kl_loss(
+                            series[u].detach(),
+                            (
+                                prior[u]
+                                / torch.unsqueeze(
+                                    torch.sum(prior[u], dim=-1), dim=-1
+                                ).repeat(1, 1, 1, self.win_size)
+                            ),
+                        )
+                    )
+                )
+            series_loss = series_loss / len(prior)
+            prior_loss = prior_loss / len(prior)
+
+            rec_loss = self.criterion(output, input)
+            loss_1.append((rec_loss - self.k * series_loss).item())
+            loss_2.append((rec_loss + self.k * prior_loss).item())
+
+        return np.average(loss_1), np.average(loss_2)
+
+    def fit(self, train_loader, vali_loader):
+        self.train_loader = train_loader
+        self.vali_loader = vali_loader
+
+        logging.info("======================TRAIN MODE======================")
+
+        time_now = time.time()
+        path = self.model_save_path
+        if not os.path.exists(path):
+            os.makedirs(path)
+        early_stopping = EarlyStopping(patience=3, verbose=True)
+        train_steps = len(self.train_loader)
+
+        for epoch in range(self.num_epochs):
+            iter_count = 0
+            loss1_list = []
+
+            epoch_time = time.time()
+            self.model.train()
+            for i, input_data in enumerate(self.train_loader):
+
+                self.optimizer.zero_grad()
+                iter_count = iter_count + 1
+                input = input_data.float().to(self.device)
+
+                output, series, prior, _ = self.model(input)
+
+                # calculate Association discrepancy
+                series_loss = 0.0
+                prior_loss = 0.0
+                for u in range(len(prior)):
+                    series_loss = (
+                        series_loss
+                        + torch.mean(
+                            my_kl_loss(
+                                series[u],
+                                (
+                                    prior[u]
+                                    / torch.unsqueeze(
+                                        torch.sum(prior[u], dim=-1), dim=-1
+                                    ).repeat(1, 1, 1, self.win_size)
+                                ).detach(),
+                            )
+                        )
+                        + torch.mean(
+                            my_kl_loss(
+                                (
+                                    prior[u]
+                                    / torch.unsqueeze(
+                                        torch.sum(prior[u], dim=-1), dim=-1
+                                    ).repeat(1, 1, 1, self.win_size)
+                                ).detach(),
+                                series[u],
+                            )
+                        )
+                    )
+                    prior_loss = (
+                        prior_loss
+                        + torch.mean(
+                            my_kl_loss(
+                                (
+                                    prior[u]
+                                    / torch.unsqueeze(
+                                        torch.sum(prior[u], dim=-1), dim=-1
+                                    ).repeat(1, 1, 1, self.win_size)
+                                ),
+                                series[u].detach(),
+                            )
+                        )
+                        + torch.mean(
+                            my_kl_loss(
+                                series[u].detach(),
+                                (
+                                    prior[u]
+                                    / torch.unsqueeze(
+                                        torch.sum(prior[u], dim=-1), dim=-1
+                                    ).repeat(1, 1, 1, self.win_size)
+                                ),
+                            )
+                        )
+                    )
+                series_loss = series_loss / len(prior)
+                prior_loss = prior_loss / len(prior)
+
+                rec_loss = self.criterion(output, input)
+
+                loss1_list.append((rec_loss - self.k * series_loss).item())
+                loss1 = rec_loss - self.k * series_loss
+                loss2 = rec_loss + self.k * prior_loss
+
+                if (i + 1) % 100 == 0:
+                    speed = (time.time() - time_now) / iter_count
+                    left_time = speed * ((self.num_epochs - epoch) * train_steps - i)
+                    logging.info(
+                        "\tspeed: {:.4f}s/iter; left time: {:.4f}s".format(
+                            speed, left_time
+                        )
+                    )
+                    iter_count = 0
+                    time_now = time.time()
+
+                # Minimax strategy
+                loss1.backward(retain_graph=True)
+                loss2.backward()
+                self.optimizer.step()
+                # self.optimizer.step()
+
+            logging.info(
+                "Epoch: {} cost time: {}".format(epoch + 1, time.time() - epoch_time)
+            )
+            train_loss = np.average(loss1_list)
+
+            if self.vali_loader is not None:
+                vali_loss1, vali_loss2 = self.vali(self.vali_loader)
+                early_stopping(vali_loss1, vali_loss2, self.model, path)
+                if early_stopping.early_stop:
+                    logging.info("Early stopping")
+                    break
+            else:
+                torch.save(
+                    self.model.state_dict(),
+                    os.path.join(path, "checkpoint.pth"),
+                )
+                vali_loss1 = 0
+            logging.info(
+                "Epoch: {0}, Steps: {1} | Train Loss: {2:.7f} Vali Loss: {3:.7f} ".format(
+                    epoch + 1, train_steps, train_loss, vali_loss1
+                )
+            )
+            adjust_learning_rate(self.optimizer, epoch + 1, self.lr)
+
+    def predict_prob(self, test_loader, windows_label=None):
+        self.test_loader = test_loader
+        self.model.load_state_dict(
+            torch.load(os.path.join(str(self.model_save_path), "checkpoint.pth"))
+        )
+        self.model.eval()
+        temperature = 50
+
+        logging.info("======================TEST MODE======================")
+        criterion = nn.MSELoss(reduction="none")
+        # (2) evaluation on the test set
+        test_labels = []
+        attens_energy = []
+        for i, input_data in enumerate(self.test_loader):
+            input = input_data.float().to(self.device)
+            output, series, prior, _ = self.model(input)
+
+            loss = torch.mean(criterion(input, output), dim=-1)
+
+            series_loss = 0.0
+            prior_loss = 0.0
+            for u in range(len(prior)):
+                if u == 0:
+                    series_loss = (
+                        my_kl_loss(
+                            series[u],
+                            (
+                                prior[u]
+                                / torch.unsqueeze(
+                                    torch.sum(prior[u], dim=-1), dim=-1
+                                ).repeat(1, 1, 1, self.win_size)
+                            ).detach(),
+                        )
+                        * temperature
+                    )
+                    prior_loss = (
+                        my_kl_loss(
+                            (
+                                prior[u]
+                                / torch.unsqueeze(
+                                    torch.sum(prior[u], dim=-1), dim=-1
+                                ).repeat(1, 1, 1, self.win_size)
+                            ),
+                            series[u].detach(),
+                        )
+                        * temperature
+                    )
+                else:
+                    series_loss = series_loss + (
+                        my_kl_loss(
+                            series[u],
+                            (
+                                prior[u]
+                                / torch.unsqueeze(
+                                    torch.sum(prior[u], dim=-1), dim=-1
+                                ).repeat(1, 1, 1, self.win_size)
+                            ).detach(),
+                        )
+                        * temperature
+                    )
+                    prior_loss = prior_loss + (
+                        my_kl_loss(
+                            (
+                                prior[u]
+                                / torch.unsqueeze(
+                                    torch.sum(prior[u], dim=-1), dim=-1
+                                ).repeat(1, 1, 1, self.win_size)
+                            ),
+                            series[u].detach(),
+                        )
+                        * temperature
+                    )
+            metric = torch.softmax((-series_loss - prior_loss), dim=-1)
+
+            cri = metric * loss
+            cri = cri.detach().cpu().numpy()
+            attens_energy.append(cri)
+
+        attens_energy = np.concatenate(attens_energy, axis=0)
+        anomaly_score = np.array(attens_energy).mean(axis=1)
+
+        if windows_label is not None:
+            windows_label = (np.sum(windows_label, axis=1) >= 1) + 0
+            return anomaly_score, windows_label
+        else:
+            return anomaly_score
diff --git a/networks/dagmm/__init__.py b/networks/dagmm/__init__.py
new file mode 100644
index 0000000..cc05500
--- /dev/null
+++ b/networks/dagmm/__init__.py
@@ -0,0 +1,6 @@
+# -*- coding: utf-8 -*-
+
+from .compression_net import CompressionNet
+from .estimation_net import EstimationNet
+from .gmm import GMM
+from .dagmm import DAGMM
diff --git a/networks/dagmm/compression_net.py b/networks/dagmm/compression_net.py
new file mode 100644
index 0000000..ed9ab7f
--- /dev/null
+++ b/networks/dagmm/compression_net.py
@@ -0,0 +1,119 @@
+import tensorflow as tf
+
+class CompressionNet:
+    """ Compression Network.
+    This network converts the input data to the representations
+    suitable for calculation of anormaly scores by "Estimation Network".
+
+    Outputs of network consist of next 2 components:
+    1) reduced low-dimensional representations learned by AutoEncoder.
+    2) the features derived from reconstruction error.
+    """
+    def __init__(self, hidden_layer_sizes, activation=tf.nn.tanh):
+        """
+        Parameters
+        ----------
+        hidden_layer_sizes : list of int
+            list of the size of hidden layers.
+            For example, if the sizes are [n1, n2],
+            the sizes of created networks are:
+            input_size -> n1 -> n2 -> n1 -> input_sizes
+            (network outputs the representation of "n2" layer)
+        activation : function
+            activation function of hidden layer.
+            the last layer uses linear function.
+        """
+        self.hidden_layer_sizes = hidden_layer_sizes
+        self.activation = activation
+
+    def compress(self, x):
+        self.input_size = x.shape[1]
+
+        with tf.variable_scope("Encoder"):
+            z = x
+            n_layer = 0
+            for size in self.hidden_layer_sizes[:-1]:
+                n_layer += 1
+                z = tf.layers.dense(z, size, activation=self.activation,
+                    name="layer_{}".format(n_layer))
+
+            # activation function of last layer is linear
+            n_layer += 1
+            z = tf.layers.dense(z, self.hidden_layer_sizes[-1],
+                name="layer_{}".format(n_layer))
+
+        return z
+
+    def reverse(self, z):
+        with tf.variable_scope("Decoder"):
+            n_layer = 0
+            for size in self.hidden_layer_sizes[:-1][::-1]:
+                n_layer += 1
+                z = tf.layers.dense(z, size, activation=self.activation,
+                    name="layer_{}".format(n_layer))
+
+            # activation function of last layes is linear
+            n_layer += 1
+            x_dash = tf.layers.dense(z, self.input_size,
+                name="layer_{}".format(n_layer))
+
+        return x_dash
+
+    def loss(self, x, x_dash):
+        def euclid_norm(x):
+            return tf.sqrt(tf.reduce_sum(tf.square(x), axis=1))
+
+        # Calculate Euclid norm, distance
+        norm_x = euclid_norm(x)
+        norm_x_dash = euclid_norm(x_dash)
+        dist_x = euclid_norm(x - x_dash)
+        dot_x = tf.reduce_sum(x * x_dash, axis=1)
+
+        # Based on the original paper, features of reconstraction error
+        # are composed of these loss functions:
+        #  1. loss_E : relative Euclidean distance
+        #  2. loss_C : cosine similarity
+        min_val = 1e-3
+        loss_E = dist_x  / (norm_x + min_val)
+        loss_C = 0.5 * (1.0 - dot_x / (norm_x * norm_x_dash + min_val))
+        return tf.concat([loss_E[:,None], loss_C[:,None]], axis=1)
+
+    def extract_feature(self, x, x_dash, z_c):
+        z_r = self.loss(x, x_dash)
+        return tf.concat([z_c, z_r], axis=1)
+
+    def inference(self, x):
+        """ convert input to output tensor, which is composed of
+        low-dimensional representation and reconstruction error.
+
+        Parameters
+        ----------
+        x : tf.Tensor shape : (n_samples, n_features)
+            Input data
+
+        Results
+        -------
+        z : tf.Tensor shape : (n_samples, n2 + 2)
+            Result data
+            Second dimension of this data is equal to
+            sum of compressed representation size and
+            number of loss function (=2)
+
+        x_dash : tf.Tensor shape : (n_samples, n_features)
+            Reconstructed data for calculation of
+            reconstruction error.
+        """
+
+        with tf.variable_scope("CompNet"):
+            # AutoEncoder
+            z_c = self.compress(x)
+            x_dash = self.reverse(z_c)
+
+            # compose feature vector
+            z = self.extract_feature(x, x_dash, z_c)
+
+        return z, x_dash
+
+    def reconstruction_error(self, x, x_dash):
+        return tf.reduce_mean(tf.reduce_sum(
+            tf.square(x - x_dash), axis=1), axis=0)
diff --git a/networks/dagmm/dagmm.py b/networks/dagmm/dagmm.py
new file mode 100644
index 0000000..00ec013
--- /dev/null
+++ b/networks/dagmm/dagmm.py
@@ -0,0 +1,252 @@
+import tensorflow as tf
+import numpy as np
+import time
+import joblib
+import logging
+
+from .compression_net import CompressionNet
+from .estimation_net import EstimationNet
+from .gmm import GMM
+
+from os import makedirs
+from os.path import exists, join
+
+
+class DAGMM:
+    """Deep Autoencoding Gaussian Mixture Model.
+
+    This implementation is based on the paper:
+    Bo Zong+ (2018) Deep Autoencoding Gaussian Mixture Model
+    for Unsupervised Anomaly Detection, ICLR 2018
+    (this is UNOFFICIAL implementation)
+    """
+
+    MODEL_FILENAME = "DAGMM_model"
+    SCALER_FILENAME = "DAGMM_scaler"
+
+    def __init__(
+        self,
+        comp_hiddens,
+        est_hiddens,
+        est_dropout_ratio=0.5,
+        minibatch_size=1024,
+        epoch_size=100,
+        learning_rate=0.0001,
+        lambda1=0.1,
+        lambda2=0.0001,
+        comp_activation=tf.nn.tanh,
+        est_activation=tf.nn.tanh,
+    ):
+        """
+        Parameters
+        ----------
+        comp_hiddens : list of int
+            sizes of hidden layers of compression network
+            For example, if the sizes are [n1, n2],
+            structure of compression network is:
+            input_size -> n1 -> n2 -> n1 -> input_sizes
+        comp_activation : function
+            activation function of compression network
+        est_hiddens : list of int
+            sizes of hidden layers of estimation network.
+            The last element of this list is assigned as n_comp.
+            For example, if the sizes are [n1, n2],
+            structure of estimation network is:
+            input_size -> n1 -> n2 (= n_comp)
+        est_activation : function
+            activation function of estimation network
+        est_dropout_ratio : float (optional)
+            dropout ratio of estimation network applied during training
+            if 0 or None, dropout is not applied.
+        minibatch_size: int (optional)
+            mini batch size during training
+        epoch_size : int (optional)
+            epoch size during training
+        learning_rate : float (optional)
+            learning rate during training
+        lambda1 : float (optional)
+            a parameter of loss function (for energy term)
+        lambda2 : float (optional)
+            a parameter of loss function
+            (for sum of diagonal elements of covariance)
+        random_seed : int (optional)
+            random seed used when fit() is called.
+        """
+        self.comp_net = CompressionNet(comp_hiddens, comp_activation)
+        self.est_net = EstimationNet(est_hiddens, est_activation)
+        self.est_dropout_ratio = est_dropout_ratio
+
+        n_comp = est_hiddens[-1]
+        self.gmm = GMM(n_comp)
+
+        self.minibatch_size = minibatch_size
+        self.epoch_size = epoch_size
+        self.learning_rate = learning_rate
+        self.lambda1 = lambda1
+        self.lambda2 = lambda2
+
+        self.scaler = None
+
+        self.graph = None
+        self.sess = None
+
+        self.time_tracker = {}
+
+    def __del__(self):
+        if self.sess is not None:
+            self.sess.close()
+
+    def fit(self, x):
+        """Fit the DAGMM model according to the given data.
+
+        Parameters
+        ----------
+        x : array-like, shape (n_samples, n_features)
+            Training data.
+        """
+        n_samples, n_features = x.shape
+
+        with tf.Graph().as_default() as graph:
+            self.graph = graph
+
+            # Create Placeholder
+            self.input = input = tf.placeholder(
+                dtype=tf.float32, shape=[None, n_features]
+            )
+            self.drop = drop = tf.placeholder(dtype=tf.float32, shape=[])
+
+            # Build graph
+            z, x_dash = self.comp_net.inference(input)
+            gamma = self.est_net.inference(z, drop)
+            self.gmm.fit(z, gamma)
+            energy = self.gmm.energy(z)
+
+            self.x_dash = x_dash
+
+            # Loss function
+            loss = (
+                self.comp_net.reconstruction_error(input, x_dash)
+                + self.lambda1 * tf.reduce_mean(energy)
+                + self.lambda2 * self.gmm.cov_diag_loss()
+            )
+
+            # Minimizer
+            minimizer = tf.train.AdamOptimizer(self.learning_rate).minimize(loss)
+
+            # Number of batch
+            n_batch = (n_samples - 1) // self.minibatch_size + 1
+
+            # Create tensorflow session and initilize
+            init = tf.global_variables_initializer()
+
+            config = tf.ConfigProto(allow_soft_placement=True)
+            config.gpu_options.allow_growth = True
+            self.sess = tf.Session(graph=graph, config=config)
+            self.sess.run(init)
+
+            # Training
+            idx = np.arange(x.shape[0])
+            np.random.shuffle(idx)
+
+            start = time.time()
+            for epoch in range(1, self.epoch_size + 1):
+                for batch in range(n_batch):
+                    i_start = batch * self.minibatch_size
+                    i_end = (batch + 1) * self.minibatch_size
+                    x_batch = x[idx[i_start:i_end]]
+
+                    self.sess.run(
+                        minimizer,
+                        feed_dict={input: x_batch, drop: self.est_dropout_ratio},
+                    )
+
+                if epoch % 5 == 0:
+                    loss_val = self.sess.run(loss, feed_dict={input: x, drop: 0})
+                    logging.info(
+                        " epoch {}/{} : train loss = {:.3f}".format(
+                            epoch, self.epoch_size, loss_val
+                        )
+                    )
+            end = time.time()
+            self.time_tracker["train"] = end - start
+
+            # Fix GMM parameter
+            fix = self.gmm.fix_op()
+            self.sess.run(fix, feed_dict={input: x, drop: 0})
+            self.energy = self.gmm.energy(z)
+
+            tf.add_to_collection("save", self.input)
+            tf.add_to_collection("save", self.energy)
+
+            self.saver = tf.train.Saver()
+
+    def predict_prob(self, x):
+        """Calculate anormaly scores (sample energy) on samples in X.
+
+        Parameters
+        ----------
+        x : array-like, shape (n_samples, n_features)
+            Data for which anomaly scores are calculated.
+            n_features must be equal to n_features of the fitted data.
+
+        Returns
+        -------
+        energies : array-like, shape (n_samples)
+            Calculated sample energies.
+        """
+        if self.sess is None:
+            raise Exception("Trained model does not exist.")
+
+        start = time.time()
+        energies = self.sess.run(self.energy, feed_dict={self.input: x})
+        end = time.time()
+        self.time_tracker["test"] = end - start
+        return energies
+
+    def save(self, fdir):
+        """Save trained model to designated directory.
+        This method have to be called after training.
+        (If not, throw an exception)
+
+        Parameters
+        ----------
+        fdir : str
+            Path of directory trained model is saved.
+            If not exists, it is created automatically.
+        """
+        if self.sess is None:
+            raise Exception("Trained model does not exist.")
+
+        if not exists(fdir):
+            makedirs(fdir)
+
+        model_path = join(fdir, self.MODEL_FILENAME)
+        self.saver.save(self.sess, model_path)
+
+    def restore(self, fdir):
+        """Restore trained model from designated directory.
+
+        Parameters
+        ----------
+        fdir : str
+            Path of directory trained model is saved.
+        """
+        if not exists(fdir):
+            raise Exception("Model directory does not exist.")
+
+        model_path = join(fdir, self.MODEL_FILENAME)
+        meta_path = model_path + ".meta"
+
+        with tf.Graph().as_default() as graph:
+            self.graph = graph
+            config = tf.ConfigProto(allow_soft_placement=True)
+            config.gpu_options.allow_growth = True
+            self.sess = tf.Session(graph=graph, config=config)
+            self.saver = tf.train.import_meta_graph(meta_path)
+            self.saver.restore(self.sess, model_path)
+
+            self.input, self.energy = tf.get_collection("save")
+
+        if self.normalize:
+            scaler_path = join(fdir, self.SCALER_FILENAME)
+            self.scaler = joblib.load(scaler_path)
diff --git a/networks/dagmm/estimation_net.py b/networks/dagmm/estimation_net.py
new file mode 100644
index 0000000..50602e6
--- /dev/null
+++ b/networks/dagmm/estimation_net.py
@@ -0,0 +1,65 @@
+# -*- coding: utf-8 -*-
+import tensorflow as tf
+
+
+class EstimationNet:
+    """Estimation Network
+
+    This network converts input feature vector to softmax probability.
+    Bacause loss function for this network is not defined,
+    it should be implemented outside of this class.
+    """
+
+    def __init__(self, hidden_layer_sizes, activation=tf.nn.relu):
+        """
+        Parameters
+        ----------
+        hidden_layer_sizes : list of int
+            list of sizes of hidden layers.
+            For example, if the sizes are [n1, n2],
+            layer sizes of the network are:
+            input_size -> n1 -> n2
+            (network outputs the softmax probabilities of "n2" layer)
+        activation : function
+            activation function of hidden layer.
+            the funtcion of last layer is softmax function.
+        """
+        self.hidden_layer_sizes = hidden_layer_sizes
+        self.activation = activation
+
+    def inference(self, z, dropout_ratio=None):
+        """Output softmax probabilities
+
+        Parameters
+        ----------
+        z : tf.Tensor shape : (n_samples, n_features)
+            Data inferenced by this network
+        dropout_ratio : tf.Tensor shape : 0-dimension float (optional)
+            Specify dropout ratio
+            (if None, dropout is not applied)
+
+        Results
+        -------
+        probs : tf.Tensor shape : (n_samples, n_classes)
+            Calculated probabilities
+        """
+        with tf.variable_scope("EstNet"):
+            n_layer = 0
+            for size in self.hidden_layer_sizes[:-1]:
+                n_layer += 1
+                z = tf.layers.dense(
+                    z, size, activation=self.activation, name="layer_{}".format(n_layer)
+                )
+                if dropout_ratio is not None:
+                    z = tf.layers.dropout(
+                        z, dropout_ratio, name="drop_{}".format(n_layer)
+                    )
+
+            # Last layer uses linear function (=logits)
+            size = self.hidden_layer_sizes[-1]
+            logits = tf.layers.dense(z, size, activation=None, name="logits")
+
+            # Softmax output
+            output = tf.contrib.layers.softmax(logits)
+
+        return output
diff --git a/networks/dagmm/gmm.py b/networks/dagmm/gmm.py
new file mode 100644
index 0000000..5084a30
--- /dev/null
+++ b/networks/dagmm/gmm.py
@@ -0,0 +1,145 @@
+# -*- coding: utf-8 -*-
+import numpy as np
+import tensorflow as tf
+
+
+class GMM:
+    """ Gaussian Mixture Model (GMM) """
+
+    def __init__(self, n_comp):
+        self.n_comp = n_comp
+        self.phi = self.mu = self.sigma = None
+        self.training = False
+
+    def create_variables(self, n_features):
+        with tf.variable_scope("GMM"):
+            phi = tf.Variable(
+                tf.zeros(shape=[self.n_comp]), dtype=tf.float32, name="phi"
+            )
+            mu = tf.Variable(
+                tf.zeros(shape=[self.n_comp, n_features]), dtype=tf.float32, name="mu"
+            )
+            sigma = tf.Variable(
+                tf.zeros(shape=[self.n_comp, n_features, n_features]),
+                dtype=tf.float32,
+                name="sigma",
+            )
+            L = tf.Variable(
+                tf.zeros(shape=[self.n_comp, n_features, n_features]),
+                dtype=tf.float32,
+                name="L",
+            )
+
+        return phi, mu, sigma, L
+
+    def fit(self, z, gamma):
+        """fit data to GMM model
+
+        Parameters
+        ----------
+        z : tf.Tensor, shape (n_samples, n_features)
+            data fitted to GMM.
+        gamma : tf.Tensor, shape (n_samples, n_comp)
+            probability. each row is correspond to row of z.
+        """
+
+        with tf.variable_scope("GMM"):
+            # Calculate mu, sigma
+            # i   : index of samples
+            # k   : index of components
+            # l,m : index of features
+            gamma_sum = tf.reduce_sum(gamma, axis=0)
+            self.phi = phi = tf.reduce_mean(gamma, axis=0)
+            self.mu = mu = tf.einsum("ik,il->kl", gamma, z) / gamma_sum[:, None]
+            z_centered = tf.sqrt(gamma[:, :, None]) * (z[:, None, :] - mu[None, :, :])
+            self.sigma = sigma = (
+                tf.einsum("ikl,ikm->klm", z_centered, z_centered)
+                / gamma_sum[:, None, None]
+            )
+
+            # Calculate a cholesky decomposition of covariance in advance
+            n_features = z.shape[1]
+            min_vals = tf.diag(tf.ones(n_features, dtype=tf.float32)) * 1e-6
+            self.L = tf.cholesky(sigma + min_vals[None, :, :])
+
+        self.training = False
+
+    def fix_op(self):
+        """return operator to fix paramters of GMM
+        Using this operator outside of this class,
+        you can fix current parameter to static tensor variable.
+
+        After you call this method, you have to run result
+        operator immediatelly, and call energy() to use static
+        variables of model parameter.
+
+        Returns
+        -------
+        op : operator of tensorflow
+            operator to assign current parameter to variables
+        """
+
+        phi, mu, sigma, L = self.create_variables(self.mu.shape[1])
+
+        op = tf.group(
+            tf.assign(phi, self.phi),
+            tf.assign(mu, self.mu),
+            tf.assign(sigma, self.sigma),
+            tf.assign(L, self.L),
+        )
+
+        self.phi, self.phi_org = phi, self.phi
+        self.mu, self.mu_org = mu, self.mu
+        self.sigma, self.sigma_org = sigma, self.sigma
+        self.L, self.L_org = L, self.L
+
+        self.training = False
+
+        return op
+
+    def energy(self, z):
+        """calculate an energy of each row of z
+
+        Parameters
+        ----------
+        z : tf.Tensor, shape (n_samples, n_features)
+            data each row of which is calculated its energy.
+
+        Returns
+        -------
+        energy : tf.Tensor, shape (n_samples)
+            calculated energies
+        """
+
+        if self.training and self.phi is None:
+            self.phi, self.mu, self.sigma, self.L = self.create_variable(z.shape[1])
+
+        with tf.variable_scope("GMM_energy"):
+            # Instead of inverse covariance matrix, exploit cholesky decomposition
+            # for stability of calculation.
+            z_centered = z[:, None, :] - self.mu[None, :, :]  # ikl
+            v = tf.matrix_triangular_solve(
+                self.L, tf.transpose(z_centered, [1, 2, 0])
+            )  # kli
+
+            # log(det(Sigma)) = 2 * sum[log(diag(L))]
+            log_det_sigma = 2.0 * tf.reduce_sum(
+                tf.log(tf.matrix_diag_part(self.L)), axis=1
+            )
+
+            # To calculate energies, use "log-sum-exp" (different from orginal paper)
+            d = z.get_shape().as_list()[1]
+            logits = tf.log(self.phi[:, None]) - 0.5 * (
+                tf.reduce_sum(tf.square(v), axis=1)
+                + d * tf.log(2.0 * np.pi)
+                + log_det_sigma[:, None]
+            )
+            energies = -tf.reduce_logsumexp(logits, axis=0)
+
+        return energies
+
+    def cov_diag_loss(self):
+        with tf.variable_scope("GMM_diag_loss"):
+            diag_loss = tf.reduce_sum(tf.divide(1, tf.matrix_diag_part(self.sigma)))
+
+        return diag_loss
diff --git a/networks/dagmm/main.py b/networks/dagmm/main.py
new file mode 100644
index 0000000..fc71ecd
--- /dev/null
+++ b/networks/dagmm/main.py
@@ -0,0 +1,54 @@
+import tensorflow as tf
+from DAGMM.dagmm import DAGMM
+import numpy as np
+import pandas as pd
+import os
+
+# Initialize
+model = DAGMM(
+    comp_hiddens=[32, 16, 2], comp_activation=tf.nn.tanh,
+    est_hiddens=[80, 40], est_activation=tf.nn.tanh,
+    est_dropout_ratio=0.25
+)
+
+# Fit the training data to model
+data_dir_path = 'C:/Users/Administrator/Downloads/DAGMM-master/SMD/data_concat/'
+csvs = os.listdir(data_dir_path)
+
+csv_path = []
+
+for i in csvs:
+    csv_path.append(data_dir_path + i)
+
+numbers = []
+
+for j in csvs:
+    name_temp = os.path.split(j)[1]
+    numbers.append(name_temp[5:-4])
+
+
+def generate_score(number):
+    # Read the raw data.
+    input_dir_path = 'C:/Users/Administrator/Downloads/DAGMM-master/SMD/data_concat/data-' + number + '.csv'
+    data = np.array(pd.read_csv(input_dir_path, header=None), dtype=np.float64)
+    x_train = data[: len(data) // 2]
+    x_test = data[len(data) // 2:]
+    print(len(x_train))
+    print(len(x_test))
+    model.fit(x_train)
+    if not os.path.exists('../score'):
+        os.makedirs('../score')
+    # Evaluate energies
+    # (the more the energy is, the more it is anomaly)
+    energy = model.predict(x_test)
+    np.save('../score/' + number + '.npy', energy)
+    # Save fitted model to the directory
+    model.save('./model/fitted_model' + number)
+
+    # Restore saved model from directory
+    model.restore('./model/fitted_model' + number)
+
+
+for j in numbers:
+    generate_score(j)
+    print('Finish generating ' + j)
diff --git a/networks/ganf/DROCC.py b/networks/ganf/DROCC.py
new file mode 100644
index 0000000..969bb82
--- /dev/null
+++ b/networks/ganf/DROCC.py
@@ -0,0 +1,217 @@
+import os
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+class LSTM_FC(nn.Module):
+    def __init__(self,
+                 input_dim=32,
+                 num_classes=1, 
+                 num_hidden_nodes=8
+    ):
+
+        super(LSTM_FC, self).__init__()
+        self.input_dim = input_dim
+        self.num_classes = num_classes
+        self.num_hidden_nodes = num_hidden_nodes
+        self.encoder = nn.LSTM(input_size=self.input_dim, hidden_size=self.num_hidden_nodes,
+                                num_layers=1, batch_first=True)
+        self.fc = nn.Linear(self.num_hidden_nodes, self.num_classes)
+        activ = nn.ReLU(True)
+
+    def forward(self, input):
+        features = self.encoder(input)[0][:,-1,:]
+        # pdb.set_trace()
+        logits = self.fc(features)
+        return logits
+
+    def half_forward_start(self, x):
+        features = self.encoder(x)[0][:,-1,:]
+        return features
+
+    def half_forward_end(self, x):
+        logits = self.fc(x)
+        return logits
+
+class DROCCTrainer:
+    """
+    Trainer class that implements the DROCC algorithm proposed in
+    https://arxiv.org/abs/2002.12718
+    """
+
+    def __init__(self, model, optimizer, lamda, radius, gamma, device):
+        """Initialize the DROCC Trainer class
+        Parameters
+        ----------
+        model: Torch neural network object
+        optimizer: Total number of epochs for training.
+        lamda: Weight given to the adversarial loss
+        radius: Radius of hypersphere to sample points from.
+        gamma: Parameter to vary projection.
+        device: torch.device object for device to use.
+        """     
+        self.model = model
+        self.optimizer = optimizer
+        self.lamda = lamda
+        self.radius = radius
+        self.gamma = gamma
+        self.device = device
+
+    def train(self, train_loader, lr_scheduler, total_epochs, save_path, name,
+                only_ce_epochs=5, ascent_step_size=0.001, ascent_num_steps=50):
+        """Trains the model on the given training dataset with periodic 
+        evaluation on the validation dataset.
+        Parameters
+        ----------
+        train_loader: Dataloader object for the training dataset.
+        val_loader: Dataloader object for the validation dataset.
+        learning_rate: Initial learning rate for training.
+        total_epochs: Total number of epochs for training.
+        only_ce_epochs: Number of epochs for initial pretraining.
+        ascent_step_size: Step size for gradient ascent for adversarial 
+                          generation of negative points.
+        ascent_num_steps: Number of gradient ascent steps for adversarial 
+                          generation of negative points.
+        metric: Metric used for evaluation (AUC / F1).
+        """
+        self.ascent_num_steps = ascent_num_steps
+        self.ascent_step_size = ascent_step_size
+        for epoch in range(total_epochs): 
+            #Make the weights trainable
+            self.model.train()
+            
+            #Placeholder for the respective 2 loss values
+            epoch_adv_loss = 0.0  #AdvLoss
+            epoch_ce_loss = 0.0  #Cross entropy Loss
+            
+            batch_idx = -1
+            for data in train_loader:
+                batch_idx += 1
+                data = data.to(self.device)
+                target = torch.ones([data.shape[0]], dtype=torch.float32).to(self.device)
+
+                data = torch.transpose(data, dim0=1, dim1=2)
+                data = data.reshape(data.shape[0], data.shape[1], data.shape[2]*data.shape[3])
+
+                self.optimizer.zero_grad()
+                
+                # Extract the logits for cross entropy loss
+                logits = self.model(data)
+                logits = torch.squeeze(logits, dim = 1)
+                ce_loss = F.binary_cross_entropy_with_logits(logits, target)
+                # Add to the epoch variable for printing average CE Loss
+                epoch_ce_loss += ce_loss.item()
+
+                '''
+                Adversarial Loss is calculated only for the positive data points (label==1).
+                '''
+                if  epoch >= only_ce_epochs:
+                    data = data[target == 1]
+                    # AdvLoss 
+                    adv_loss = self.one_class_adv_loss(data)
+                    epoch_adv_loss += adv_loss.item()
+
+                    loss = ce_loss + adv_loss * self.lamda
+                else: 
+                    # If only CE based training has to be done
+                    loss = ce_loss
+                
+                # Backprop
+                loss.backward()
+                self.optimizer.step()
+            lr_scheduler.step()
+                    
+            epoch_ce_loss = epoch_ce_loss/(batch_idx + 1)  #Average CE Loss
+            epoch_adv_loss = epoch_adv_loss/(batch_idx + 1) #Average AdvLoss
+
+            print('Epoch: {}, CE Loss: {}, AdvLoss: {}'.format(
+                epoch, epoch_ce_loss, epoch_adv_loss))
+            self.save(os.path.join(save_path, "{}_{}.pt".format(name, epoch)))
+    def test(self, test_loader):
+        """Evaluate the model on the given test dataset.
+        Parameters
+        ----------
+        test_loader: Dataloader object for the test dataset.
+        metric: Metric used for evaluation (AUC / F1).
+        """        
+        self.model.eval()
+        scores = []
+        with torch.no_grad():
+            for data in test_loader:
+                data = data.to(self.device)
+
+                data = torch.transpose(data, dim0=1, dim1=2)
+                data = data.reshape(data.shape[0], data.shape[1], data.shape[2]*data.shape[3])
+
+
+                logits = self.model(data).cpu().numpy()
+                scores.append(logits)
+        scores = -np.concatenate(scores)
+
+        return scores
+        
+    
+    def one_class_adv_loss(self, x_train_data):
+        """Computes the adversarial loss:
+        1) Sample points initially at random around the positive training
+            data points
+        2) Gradient ascent to find the most optimal point in set N_i(r) 
+            classified as +ve (label=0). This is done by maximizing 
+            the CE loss wrt label 0
+        3) Project the points between spheres of radius R and gamma * R 
+            (set N_i(r))
+        4) Pass the calculated adversarial points through the model, 
+            and calculate the CE loss wrt target class 0
+        
+        Parameters
+        ----------
+        x_train_data: Batch of data to compute loss on.
+        """
+        batch_size = len(x_train_data)
+        # Randomly sample points around the training data
+        # We will perform SGD on these to find the adversarial points
+        x_adv = torch.randn(x_train_data.shape).to(self.device).detach().requires_grad_()
+        x_adv_sampled = x_adv + x_train_data
+
+        for step in range(self.ascent_num_steps):
+            with torch.enable_grad():
+
+                new_targets = torch.zeros(batch_size, 1).to(self.device)
+                new_targets = torch.squeeze(new_targets)
+                new_targets = new_targets.to(torch.float)
+                
+                logits = self.model(x_adv_sampled)         
+                logits = torch.squeeze(logits, dim = 1)
+                new_loss = F.binary_cross_entropy_with_logits(logits, new_targets)
+
+                grad = torch.autograd.grad(new_loss, [x_adv_sampled])[0]
+                grad_norm = torch.norm(grad, p=2, dim = tuple(range(1, grad.dim())))
+                grad_norm = grad_norm.view(-1, *[1]*(grad.dim()-1))
+                grad_normalized = grad/grad_norm 
+            with torch.no_grad():
+                x_adv_sampled.add_(self.ascent_step_size * grad_normalized)
+
+            if (step + 1) % 10==0:
+                # Project the normal points to the set N_i(r)
+                h = x_adv_sampled - x_train_data
+                norm_h = torch.sqrt(torch.sum(h**2, 
+                                                dim=tuple(range(1, h.dim()))))
+                alpha = torch.clamp(norm_h, self.radius, 
+                                    self.gamma * self.radius).to(self.device)
+                # Make use of broadcast to project h
+                proj = (alpha/norm_h).view(-1, *[1] * (h.dim()-1))
+                h = proj * h
+                x_adv_sampled = x_train_data + h  #These adv_points are now on the surface of hyper-sphere
+
+        adv_pred = self.model(x_adv_sampled)
+        adv_pred = torch.squeeze(adv_pred, dim=1)
+        adv_loss = F.binary_cross_entropy_with_logits(adv_pred, (new_targets * 0))
+
+        return adv_loss
+
+    def save(self, path):
+        torch.save(self.model.state_dict(),path)
+
+    def load(self, path):
+        self.model.load_state_dict(torch.load(path))
\ No newline at end of file
diff --git a/networks/ganf/DeepSAD.py b/networks/ganf/DeepSAD.py
new file mode 100644
index 0000000..e634a0d
--- /dev/null
+++ b/networks/ganf/DeepSAD.py
@@ -0,0 +1,427 @@
+import json
+import torch
+import logging
+import time
+import torch
+
+import torch.optim as optim
+from networks.ganf.RNN import RecurrentAE
+from networks.ganf.GAN import CNNAE
+from networks.ganf.utils import roc_auc_all
+import numpy as np
+from utils import roc_auc_all
+import numpy as np
+
+class AETrainer:
+
+    def __init__(self, device: str = 'cuda'):
+
+        self.device = device
+
+    def train(self, train_loader, ae_net, args):
+        logger = logging.getLogger()
+
+        # Set device for network
+        ae_net = ae_net.to(self.device)
+
+        # Set optimizer (Adam optimizer for now)
+        optimizer = optim.Adam(ae_net.parameters(), lr=args.lr, weight_decay=args.weight_decay)
+
+        # Training
+        print('Starting pretraining...')
+        start_time = time.time()
+        ae_net.train()
+        for epoch in range(10):
+
+
+            loss_epoch = 0.0
+            n_batches = 0
+            epoch_start_time = time.time()
+            for data in train_loader:
+
+                if isinstance(data, list):
+                    data = data[0]
+                
+                x = data.to(self.device)
+                x = torch.transpose(x, dim0=2, dim1=3)
+                inputs = x.reshape(x.shape[0], x.shape[1]*x.shape[2], x.shape[3])
+
+                # Zero the network parameter gradients
+                optimizer.zero_grad()
+
+                # Update network parameters via backpropagation: forward + backward + optimize
+                outputs = ae_net(inputs)
+                scores = torch.sum((outputs - inputs) ** 2, dim=tuple(range(1, outputs.dim())))
+                loss = torch.mean(scores)
+                loss.backward()
+                optimizer.step()
+
+                loss_epoch += loss.item()
+                n_batches += 1
+
+            # log epoch statistics
+            epoch_train_time = time.time() - epoch_start_time
+            print('  Epoch {}/{}\t Time: {:.3f}\t Loss: {:.8f}'
+                        .format(epoch + 1, 10, epoch_train_time, loss_epoch / n_batches))
+
+        pretrain_time = time.time() - start_time
+        print('Pretraining time: %.3f' % pretrain_time)
+        print('Finished pretraining.')
+
+        return ae_net
+
+class DeepSVDDTrainer:
+
+    def __init__(self, device: str = 'cuda'):
+
+        self.device = device
+        # Deep SVDD parameters
+       
+        self.c = None
+
+
+    def train(self, train_loader, net, args):
+        self.args = args
+
+        # Set device for network
+        net = net.to(self.device)
+
+        # Set optimizer (Adam optimizer for now)
+        optimizer = optim.Adam(net.parameters(), lr=args.lr, weight_decay=args.weight_decay)
+
+        # Set learning rate scheduler
+        scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[20], gamma=0.1)
+
+        # Initialize hypersphere center c (if c not loaded)
+        if self.c is None:
+            print('Initializing center c...')
+            self.c = self.init_center_c(train_loader, net)
+            print(self.c.shape)
+            print('Center c initialized.')
+
+        # Training
+        print('Starting training...')
+        start_time = time.time()
+        net.train()
+
+        save_path = os.path.join(args.output_dir,args.name)
+        if not os.path.exists(save_path):
+            os.makedirs(save_path)
+
+        for epoch in range(args.n_epochs):
+
+            scheduler.step()
+
+            loss_epoch = 0.0
+            n_batches = 0
+            epoch_start_time = time.time()
+
+            for data in train_loader:
+                x = data.to(self.device)
+                x = torch.transpose(x, dim0=2, dim1=3)
+                inputs = x.reshape(x.shape[0], x.shape[1]*x.shape[2], x.shape[3])
+
+                # Zero the network parameter gradients
+                optimizer.zero_grad()
+
+                # Update network parameters via backpropagation: forward + backward + optimize
+                outputs = net(inputs).squeeze().mean(dim=-1)
+                dist = torch.sum((outputs - self.c) ** 2, dim=1)
+
+                loss = torch.mean(dist)
+                loss.backward()
+                optimizer.step()
+
+                loss_epoch += loss.item()
+                n_batches += 1
+
+            # log epoch statistics
+            epoch_train_time = time.time() - epoch_start_time
+            print('  Epoch {}/{}\t Time: {:.3f}\t Loss: {:.8f}'
+                        .format(epoch + 1, args.n_epochs, epoch_train_time, loss_epoch / n_batches))
+            torch.save({'c': self.c, 'net_dict': net.state_dict()}, os.path.join(save_path, "{}_{}.pt".format(args.name, epoch)))
+        self.train_time = time.time() - start_time
+        print('Training time: %.3f' % self.train_time)
+
+        print('Finished training.')
+
+        return net
+
+    def init_center_c(self, train_loader, net, eps=0.1):
+        """Initialize hypersphere center c as the mean from an initial forward pass on the data."""
+        n_samples = 0
+        c = 0.0
+
+        net.eval()
+        with torch.no_grad():
+            for data in train_loader:
+                # get the inputs of the batch
+                x = data.to(self.device)
+                x = torch.transpose(x, dim0=2, dim1=3)
+                inputs = x.reshape(x.shape[0], x.shape[1]*x.shape[2], x.shape[3])
+                outputs = net(inputs).squeeze() 
+                n_samples += outputs.shape[0]
+                c += torch.sum(outputs, dim=0)
+
+        c /= n_samples
+
+        # If c_i is too close to 0, set to +-eps. Reason: a zero unit can be trivially matched with zero weights.
+        c[(abs(c) < eps) & (c < 0)] = -eps
+        c[(abs(c) < eps) & (c > 0)] = eps
+
+        return c.mean(dim=-1)
+
+
+class DeepSVDD(object):
+
+    def __init__(self, n_features, hidden_size, device):
+
+        self.c = None  # hypersphere center c
+
+
+        self.trainer = None
+
+        # if encoder=='RNN':
+        # self.ae_net = RecurrentAE(n_features, hidden_size, device)
+        self.ae_net = CNNAE(n_features, hidden_size).to(device)
+        self.net = self.ae_net.encoder
+
+        self.ae_trainer = None
+        self.results = {
+            'test_auc': None
+        }
+
+    def train(self, dataset, args, device: str = 'cuda'):
+        """Trains the Deep SVDD model on the training data."""
+
+        self.trainer = DeepSVDDTrainer(device=device)
+        # Get the model
+        self.trainer.train(dataset, self.net, args)
+        self.c = self.trainer.c
+
+    def test(self, test_loader, delta_t, sigma, device):
+
+        self.net.eval()
+        self.net.to(device)
+        loss = []
+
+        with torch.no_grad():
+            for data in test_loader:
+                
+                x = data.to(device)
+                x = torch.transpose(x, dim0=2, dim1=3)
+                inputs = x.reshape(x.shape[0], x.shape[1]*x.shape[2], x.shape[3])
+                outputs = self.net(inputs).squeeze().mean(dim=-1)
+                batch_loss= torch.sum((outputs - self.c) ** 2, dim=1).cpu().numpy()
+                loss.append(batch_loss)
+        loss = np.concatenate(loss)
+
+        auc_score, fps,tps = roc_auc_all(loss, delta_t, sigma)
+        print("meann: {:.4f}, median: {:.4f}, auc:{:.4f}".format(np.mean(loss), np.median(loss),auc_score))# %%
+        self.results['test_auc'] = auc_score
+        return auc_score, fps,tps
+
+    def pretrain(self, train_loader, args, device):
+        """Pretrains the weights for the Deep SVDD network \phi via autoencoder."""
+
+        self.ae_trainer = AETrainer(device=device)
+        self.ae_net = self.ae_trainer.train(train_loader, self.ae_net, args)
+        self.net = self.ae_net.encoder
+
+    def save_model(self, export_model):
+        """Save Deep SVDD model to export_model."""
+
+        net_dict = self.net.state_dict()
+
+        torch.save({'c': self.c,
+                    'net_dict': net_dict}, export_model)
+
+    def load_model(self, model_path):
+        """Load Deep SVDD model from model_path."""
+
+        model_dict = torch.load(model_path)
+
+        self.c = model_dict['c']
+        self.net.load_state_dict(model_dict['net_dict'])
+
+    def save_results(self, export_json):
+        """Save results dict to a JSON-file."""
+        with open(export_json, 'w') as fp:
+            json.dump(self.results, fp)
+
+
+import os
+class DeepSADTrainer:
+
+    def __init__(self, device: str = 'cuda'):
+
+        self.device = device
+       
+        self.c = None
+
+
+    def train(self, train_loader, net, args):
+        self.args = args
+
+        # Set device for network
+        net = net.to(self.device)
+
+        # Set optimizer (Adam optimizer for now)
+        optimizer = optim.Adam(net.parameters(), lr=args.lr, weight_decay=args.weight_decay)
+
+        # Set learning rate scheduler
+        scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[20], gamma=0.1)
+
+        # Initialize hypersphere center c (if c not loaded)
+        if self.c is None:
+            logging.info('Initializing center c...')
+            self.c = self.init_center_c(train_loader, net)
+            logging.info('Center c initialized.')
+
+        # Training
+        print('Starting training...')
+        start_time = time.time()
+        net.train()
+
+        save_path = os.path.join(args.output_dir,args.name)
+        if not os.path.exists(save_path):
+            os.makedirs(save_path)
+
+        for epoch in range(args.n_epochs):
+
+
+
+            loss_epoch = 0.0
+            n_batches = 0
+            epoch_start_time = time.time()
+
+            for data, semi_targets in train_loader:
+
+                x = data.to(self.device)
+                x = torch.transpose(x, dim0=2, dim1=3)
+                inputs = x.reshape(x.shape[0], x.shape[1]*x.shape[2], x.shape[3])
+
+                semi_targets = semi_targets.to(self.device)
+                # Zero the network parameter gradients
+                optimizer.zero_grad()
+
+                # Update network parameters via backpropagation: forward + backward + optimize
+                outputs = net(inputs).squeeze().mean(dim=-1)
+                dist = torch.sum((outputs - self.c) ** 2, dim=1)
+
+                losses = torch.where(semi_targets == 0, dist, args.eta * ((dist + 1e-6) ** semi_targets.float()))
+                loss = torch.mean(losses)
+                loss.backward()
+                optimizer.step()
+
+                loss_epoch += loss.item()
+                n_batches += 1
+            scheduler.step()
+            # log epoch statistics
+
+            epoch_train_time = time.time() - epoch_start_time
+            print('  Epoch {}/{}\t Time: {:.3f}\t Loss: {:.8f}'
+                        .format(epoch + 1, args.n_epochs, epoch_train_time, loss_epoch / n_batches))
+            torch.save({'c': self.c, 'net_dict': net.state_dict()}, os.path.join(save_path, "{}_{}.pt".format(args.name, epoch)))
+
+        self.train_time = time.time() - start_time
+        print('Training time: %.3f' % self.train_time)
+
+        print('Finished training.')
+
+        return net
+
+    def init_center_c(self, train_loader, net, eps=0.1):
+        """Initialize hypersphere center c as the mean from an initial forward pass on the data."""
+        n_samples = 0
+        c = 0.0
+
+        net.eval()
+        with torch.no_grad():
+            for data, _ in train_loader:
+                # get the inputs of the batch
+                x = data.to(self.device)
+                x = torch.transpose(x, dim0=2, dim1=3)
+                inputs = x.reshape(x.shape[0], x.shape[1]*x.shape[2], x.shape[3])
+                outputs = net(inputs).squeeze() 
+                n_samples += outputs.shape[0]
+                c += torch.sum(outputs, dim=0)
+
+        c /= n_samples
+
+        # If c_i is too close to 0, set to +-eps. Reason: a zero unit can be trivially matched with zero weights.
+        c[(abs(c) < eps) & (c < 0)] = -eps
+        c[(abs(c) < eps) & (c > 0)] = eps
+
+        return c.mean(dim=-1)
+class DeepSAD(object):
+
+    def __init__(self, n_features, hidden_size, device):
+
+        self.c = None  # hypersphere center c
+
+
+        self.trainer = None
+
+        self.ae_net = CNNAE(n_features, hidden_size).to(device)
+        self.net = self.ae_net.encoder
+
+        self.ae_trainer = None
+        self.results = {
+            'test_auc': None
+        }
+
+    def train(self, dataset, args, device: str = 'cuda'):
+
+        self.trainer = DeepSADTrainer(device=device)
+        # Get the model
+        self.trainer.train(dataset, self.net, args)
+        self.c = self.trainer.c
+
+    def test(self, test_loader, delta_t, sigma, device):
+        self.net.eval()
+        self.net.to(device)
+        loss = []
+
+        with torch.no_grad():
+            for data in test_loader:
+                
+                x = data.to(device)
+                x = torch.transpose(x, dim0=2, dim1=3)
+                inputs = x.reshape(x.shape[0], x.shape[1]*x.shape[2], x.shape[3])
+                outputs = self.net(inputs).squeeze().mean(dim=-1)
+                batch_loss= torch.sum((outputs - self.c) ** 2, dim=1).cpu().numpy()
+                loss.append(batch_loss)
+        loss = np.concatenate(loss)
+
+        auc_score, fps,tps = roc_auc_all(loss, delta_t, sigma)
+        print("mean: {:.4f}, median: {:.4f}, auc:{:.4f}".format(np.mean(loss), np.median(loss),auc_score))# %%
+        self.results['test_auc'] = auc_score
+        return auc_score,fps,tps
+
+    def pretrain(self, train_loader, args, device):
+
+        self.ae_trainer = AETrainer(device=device)
+        self.ae_net = self.ae_trainer.train(train_loader, self.ae_net, args)
+        self.net = self.ae_net.encoder
+
+    def save_model(self, export_model):
+        """Save Deep SVDD model to export_model."""
+
+        net_dict = self.net.state_dict()
+
+        torch.save({'c': self.c,
+                    'net_dict': net_dict}, export_model)
+
+    def load_model(self, model_path, load_ae=False):
+        """Load Deep SVDD model from model_path."""
+
+        model_dict = torch.load(model_path)
+
+        self.c = model_dict['c']
+        self.net.load_state_dict(model_dict['net_dict'])
+
+    def save_results(self, export_json):
+        """Save results dict to a JSON-file."""
+        with open(export_json, 'w') as fp:
+            json.dump(self.results, fp)
diff --git a/networks/ganf/GAN.py b/networks/ganf/GAN.py
new file mode 100644
index 0000000..155515b
--- /dev/null
+++ b/networks/ganf/GAN.py
@@ -0,0 +1,306 @@
+import os
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from timeit import default_timer as timer
+import logging
+
+
+def ConvEncoder(activation = nn.LeakyReLU, in_channels:int = 3, n_c:int = 64,
+                    k_size:int = 5):
+    
+    enc = nn.Sequential(*(nn.Conv1d(in_channels, n_c, k_size, stride=2, padding=2),
+                                            nn.BatchNorm1d(n_c),
+                                            activation(),
+                                            nn.Conv1d(n_c, n_c*2, k_size, stride=2, padding=2),
+                                            nn.BatchNorm1d(n_c*2),
+                                            activation(),
+                                            nn.Conv1d(n_c*2, n_c*4, k_size, stride=2, padding=2),
+                                            nn.BatchNorm1d(n_c*4),
+                                            activation()))
+    return enc
+
+def ConvDecoder(activation = nn.LeakyReLU, in_channels:int = 3, n_c:int = 64,
+                    k_size:int = 5):
+
+    decoder = nn.Sequential(*(nn.ConvTranspose1d(n_c*4, n_c*2, k_size, stride=2, padding=2, output_padding=0),
+                                            torch.nn.BatchNorm1d(n_c*2),
+                                            activation(),
+                                            torch.nn.ConvTranspose1d(n_c*2, n_c, k_size,stride=2, padding=2, output_padding=1),
+                                            torch.nn.BatchNorm1d(n_c),
+                                            activation(),
+                                            torch.nn.ConvTranspose1d(n_c, in_channels, k_size,stride=2, padding=2, output_padding=1)))
+    return decoder
+
+class CNNAE(torch.nn.Module):
+    """Recurrent autoencoder"""
+
+    def __init__(self,in_channels:int = 3, n_channels:int = 16,
+                    kernel_size:int = 5):
+        super(CNNAE, self).__init__()
+
+        # Encoder and decoder argsuration
+        activation = torch.nn.LeakyReLU
+        self.in_channels = in_channels
+        self.n_c = n_channels
+        self.k_size = kernel_size
+
+        self.encoder = ConvEncoder(activation, in_channels, n_channels, kernel_size)
+
+        self.decoder = ConvDecoder(activation, in_channels, n_channels, kernel_size)
+
+
+    def forward(self, x:torch.Tensor):
+
+        z = self.encoder.forward(x)
+
+        x_out = self.decoder.forward(z)
+
+        return x_out
+
+class R_Net(torch.nn.Module):
+
+    def __init__(self, activation = torch.nn.LeakyReLU, in_channels:int = 3, n_channels:int = 16,
+                    kernel_size:int = 5, std:float = 0.2):
+
+        super(R_Net, self).__init__()
+
+        self.activation = activation
+        self.in_channels = in_channels
+        self.n_c = n_channels
+        self.k_size = kernel_size
+        self.std = std
+
+        self.Encoder = ConvEncoder(activation, in_channels, n_channels, kernel_size)
+
+        self.Decoder = ConvDecoder(activation, in_channels, n_channels, kernel_size)
+
+    def forward(self, x:torch.Tensor, noise:bool = True):
+
+        x_hat = self.add_noise(x) if noise else x
+        z = self.Encoder.forward(x_hat)
+
+        x_out = self.Decoder.forward(z)
+
+        return x_out
+
+    def add_noise(self, x):
+
+        noise = torch.randn_like(x) * self.std
+        x_hat = x + noise
+
+        return x_hat
+
+class D_Net(torch.nn.Module):
+
+    def __init__(self, in_resolution:int, activation = torch.nn.LeakyReLU, in_channels:int = 3, n_channels:int = 16, kernel_size:int = 5):
+
+        super(D_Net, self).__init__()
+
+        self.activation = activation
+        self.in_resolution = in_resolution
+        self.in_channels = in_channels
+        self.n_c = n_channels
+        self.k_size = kernel_size
+
+        self.cnn = ConvEncoder(activation, in_channels, n_channels, kernel_size)
+
+        # Compute output dimension after conv part of D network
+
+        self.out_dim = self._compute_out_dim()
+
+        self.fc = torch.nn.Linear(self.out_dim, 1)
+
+    def _compute_out_dim(self):
+        
+        test_x = torch.Tensor(1, self.in_channels, self.in_resolution)
+        for p in self.cnn.parameters():
+            p.requires_grad = False
+        test_x = self.cnn(test_x)
+        out_dim = torch.prod(torch.tensor(test_x.shape[1:])).item()
+        for p in self.cnn.parameters():
+            p.requires_grad = True
+
+        return out_dim
+
+    def forward(self, x:torch.Tensor):
+
+        x = self.cnn(x)
+
+        x = torch.flatten(x, start_dim = 1)
+
+        out = self.fc(x)
+
+        return out
+
+def R_Loss(d_net: torch.nn.Module, x_real: torch.Tensor, x_fake: torch.Tensor, lambd: float) -> dict:
+
+    pred = d_net(x_fake)
+    y = torch.ones_like(pred)
+
+    rec_loss = F.mse_loss(x_fake, x_real)
+    gen_loss = F.binary_cross_entropy_with_logits(pred, y) # generator loss
+
+    L_r = gen_loss + lambd * rec_loss
+
+    return {'rec_loss' : rec_loss, 'gen_loss' : gen_loss, 'L_r' : L_r}
+
+def D_Loss(d_net: torch.nn.Module, x_real: torch.Tensor, x_fake: torch.Tensor) -> torch.Tensor:
+
+    pred_real = d_net(x_real)
+    pred_fake = d_net(x_fake.detach())
+    
+    y_real = torch.ones_like(pred_real)
+    y_fake = torch.zeros_like(pred_fake)
+
+    real_loss = F.binary_cross_entropy_with_logits(pred_real, y_real)
+    fake_loss = F.binary_cross_entropy_with_logits(pred_fake, y_fake)
+
+    return real_loss + fake_loss
+
+# Wasserstein GAN loss (https://arxiv.org/abs/1701.07875)
+
+def R_WLoss(d_net: torch.nn.Module, x_real: torch.Tensor, x_fake: torch.Tensor, lambd: float) -> dict:
+
+    pred = torch.sigmoid(d_net(x_fake))
+
+    rec_loss = F.mse_loss(x_fake, x_real)
+    gen_loss = -torch.mean(pred) # Wasserstein G loss: - E[ D(G(x)) ]
+
+    L_r = gen_loss + lambd * rec_loss
+
+    return {'rec_loss' : rec_loss, 'gen_loss' : gen_loss, 'L_r' : L_r}
+
+def D_WLoss(d_net: torch.nn.Module, x_real: torch.Tensor, x_fake: torch.Tensor) -> torch.Tensor:
+
+    pred_real = torch.sigmoid(d_net(x_real))
+    pred_fake = torch.sigmoid(d_net(x_fake.detach()))
+    
+    dis_loss = -torch.mean(pred_real) + torch.mean(pred_fake) # Wasserstein D loss: -E[D(x_real)] + E[D(x_fake)]
+
+    return dis_loss
+
+# %%
+def train_model(r_net: torch.nn.Module,
+                d_net: torch.nn.Module,
+                train_dataset: torch.utils.data.Dataset,
+                valid_dataset: torch.utils.data.Dataset,
+                r_loss = R_Loss,
+                d_loss = D_Loss,
+                lr_scheduler = None,
+                optimizer_class = torch.optim.Adam,
+                optim_r_params: dict = {},
+                optim_d_params: dict = {},
+                learning_rate: float = 0.001,
+                scheduler_r_params: dict = {},
+                scheduler_d_params: dict = {},
+                batch_size: int = 1024,
+                max_epochs: int = 40,
+                epoch_step: int = 1,
+                save_step: int = 5,
+                lambd: float = 0.2,
+                device: torch.device = torch.device('cuda'),
+                save_path: str = ".") -> tuple:
+
+    optim_r = optimizer_class(r_net.parameters(), lr = learning_rate, **optim_r_params)
+    optim_d = optimizer_class(d_net.parameters(), lr = learning_rate, **optim_d_params)
+
+    if lr_scheduler:
+        scheduler_r = lr_scheduler(optim_r, **scheduler_r_params)
+        scheduler_d = lr_scheduler(optim_d, **scheduler_d_params)
+
+    train_loader = torch.utils.data.DataLoader(train_dataset, shuffle=True, batch_size=batch_size)
+    valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=batch_size)
+
+    for epoch in range(max_epochs):
+
+        start = timer()
+        train_metrics = train_single_epoch(r_net, d_net, optim_r, optim_d, r_loss, d_loss, train_loader, lambd, device)
+        valid_metrics = validate_single_epoch(r_net, d_net, r_loss, d_loss, valid_loader, device)
+        time = timer() - start
+
+
+        if epoch % epoch_step == 0:
+            logging.info(f'Epoch {epoch}:')
+            logging.info('Train Metrics:', train_metrics)
+            logging.info('Val Metrics:', valid_metrics)
+            logging.info(f'TIME: {time:.2f} s')
+
+        if lr_scheduler:
+            scheduler_r.step()
+            scheduler_d.step()
+
+        if epoch % save_step == 0:
+            torch.save(r_net.state_dict(), os.path.join(save_path, "r_net_{}.pt".format(epoch)))
+            torch.save(d_net.state_dict(), os.path.join(save_path, "d_net_{}.pt".format(epoch)))
+            logging.info(f'Saving model on epoch {epoch}')
+
+    return (r_net, d_net)
+
+def train_single_epoch(r_net, d_net, optim_r, optim_d, r_loss, d_loss, train_loader, lambd, device) -> dict:
+
+    r_net.train()
+    d_net.train()
+
+    train_metrics = {'rec_loss' : 0, 'gen_loss' : 0, 'dis_loss' : 0}
+
+    for data in train_loader:
+
+        x = data.to(device)
+        x = torch.transpose(x, dim0=2, dim1=3)
+        x_real = x.reshape(x.shape[0], x.shape[1]*x.shape[2], x.shape[3])
+        
+        x_fake = r_net(x_real)
+        d_net.zero_grad()
+
+        dis_loss = d_loss(d_net, x_real, x_fake)
+
+        dis_loss.backward()
+        optim_d.step()
+
+        r_net.zero_grad()
+
+        r_metrics = r_loss(d_net, x_real, x_fake, lambd) # L_r = gen_loss + lambda * rec_loss
+
+        r_metrics['L_r'].backward()
+        optim_r.step()
+
+        train_metrics['rec_loss'] += r_metrics['rec_loss']
+        train_metrics['gen_loss'] += r_metrics['gen_loss']
+        train_metrics['dis_loss'] += dis_loss
+
+    train_metrics['rec_loss'] = train_metrics['rec_loss'].item() / (len(train_loader.dataset) / train_loader.batch_size)
+    train_metrics['gen_loss'] = train_metrics['gen_loss'].item() / (len(train_loader.dataset) / train_loader.batch_size)
+    train_metrics['dis_loss'] = train_metrics['dis_loss'].item() / (len(train_loader.dataset) / train_loader.batch_size)
+
+    return train_metrics
+
+def validate_single_epoch(r_net, d_net, r_loss, d_loss, valid_loader, device) -> dict:
+
+    r_net.eval()
+    d_net.eval()
+
+    valid_metrics = {'rec_loss' : 0, 'gen_loss' : 0, 'dis_loss' : 0}
+
+    with torch.no_grad():
+        for data in valid_loader:
+
+            x = data.to(device)
+            x = torch.transpose(x, dim0=2, dim1=3)
+            x_real = x.reshape(x.shape[0], x.shape[1]*x.shape[2], x.shape[3])
+
+            x_fake = r_net(x_real)
+
+            dis_loss = d_loss(d_net, x_real, x_fake)
+
+            r_metrics = r_loss(d_net, x_real, x_fake, 0)
+                
+            valid_metrics['rec_loss'] += r_metrics['rec_loss']
+            valid_metrics['gen_loss'] += r_metrics['gen_loss']
+            valid_metrics['dis_loss'] += dis_loss
+
+        valid_metrics['rec_loss'] = valid_metrics['rec_loss'].item() / (len(valid_loader.dataset) / valid_loader.batch_size)
+        valid_metrics['gen_loss'] = valid_metrics['gen_loss'].item() / (len(valid_loader.dataset) / valid_loader.batch_size)
+        valid_metrics['dis_loss'] = valid_metrics['dis_loss'].item() / (len(valid_loader.dataset) / valid_loader.batch_size)
+
+        return valid_metrics
diff --git a/networks/ganf/GANF.py b/networks/ganf/GANF.py
new file mode 100644
index 0000000..256943c
--- /dev/null
+++ b/networks/ganf/GANF.py
@@ -0,0 +1,241 @@
+import os
+import logging
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import numpy as np
+from networks.ganf.NF import MAF, RealNVP
+from torch.nn.utils import clip_grad_value_
+from common.utils import set_device
+from torch.nn.init import xavier_uniform_
+
+
+class GNN(nn.Module):
+    """
+    The GNN module applied in GANF
+    """
+
+    def __init__(self, input_size, hidden_size):
+
+        super(GNN, self).__init__()
+        self.lin_n = nn.Linear(input_size, hidden_size)
+        self.lin_r = nn.Linear(input_size, hidden_size, bias=False)
+        self.lin_2 = nn.Linear(hidden_size, hidden_size)
+
+    def forward(self, h, A):
+        ## A: K X K
+        ## H: N X K  X L X D
+
+        h_n = self.lin_n(torch.einsum("nkld,kj->njld", h, A))
+        h_r = self.lin_r(h[:, :, :-1])
+        h_n[:, :, 1:] += h_r
+        h = self.lin_2(F.relu(h_n))
+
+        return h
+
+
+class GANF(nn.Module):
+    def __init__(
+        self,
+        n_blocks,
+        input_size,
+        hidden_size,
+        n_hidden,
+        dropout=0.1,
+        model="MAF",
+        batch_norm=True,
+        model_root="./checkpoint",
+        device="cpu",
+    ):
+        super(GANF, self).__init__()
+        self.device = set_device(device)
+        self.model_root = model_root
+        self.rnn = nn.LSTM(
+            input_size=input_size,
+            hidden_size=hidden_size,
+            batch_first=True,
+            dropout=dropout,
+        )
+        self.gcn = GNN(input_size=hidden_size, hidden_size=hidden_size)
+        if model == "MAF":
+            self.nf = MAF(
+                n_blocks,
+                input_size,
+                hidden_size,
+                n_hidden,
+                cond_label_size=hidden_size,
+                batch_norm=batch_norm,
+                activation="tanh",
+            )
+        else:
+            self.nf = RealNVP(
+                n_blocks,
+                input_size,
+                hidden_size,
+                n_hidden,
+                cond_label_size=hidden_size,
+                batch_norm=batch_norm,
+            )
+
+    def forward(self, x, A):
+
+        return self.test(x, A).mean()
+
+    def test(self, x, A):
+        # x: N X K X L X D
+        full_shape = x.shape
+
+        # reshape: N*K, L, D
+        x = x.reshape((x.shape[0] * x.shape[1], x.shape[2], x.shape[3]))
+        h, _ = self.rnn(x)
+
+        # resahpe: N, K, L, H
+        h = h.reshape((full_shape[0], full_shape[1], h.shape[1], h.shape[2]))
+
+        h = self.gcn(h, A)
+
+        # reshappe N*K*L,H
+        h = h.reshape((-1, h.shape[3]))
+        x = x.reshape((-1, full_shape[3]))
+
+        log_prob = self.nf.log_prob(x, h).reshape(
+            [full_shape[0], -1]
+        )  # *full_shape[1]*full_shape[2]
+        log_prob = log_prob.mean(dim=1)
+
+        return log_prob
+
+    def locate(self, x, A):
+        # x: N X K X L X D
+        full_shape = x.shape
+
+        # reshape: N*K, L, D
+        x = x.reshape((x.shape[0] * x.shape[1], x.shape[2], x.shape[3]))
+        h, _ = self.rnn(x)
+
+        # resahpe: N, K, L, H
+        h = h.reshape((full_shape[0], full_shape[1], h.shape[1], h.shape[2]))
+
+        h = self.gcn(h, A)
+
+        # reshappe N*K*L,H
+        h = h.reshape((-1, h.shape[3]))
+        x = x.reshape((-1, full_shape[3]))
+
+        log_prob = self.nf.log_prob(x, h).reshape(
+            [full_shape[0], full_shape[1], -1]
+        )  # *full_shape[1]*full_shape[2]
+        log_prob = log_prob.mean(dim=2)
+
+        return log_prob
+
+    def predict_prob(self, test_iterator, window_labels=None):
+        device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
+        self.to(device)
+
+        model_file = os.path.join(self.model_root, "model.pt")
+        graph_file = os.path.join(self.model_root, "graph.pt")
+        self.load_state_dict(torch.load(model_file, map_location=self.device))
+        A = torch.load(graph_file).to(device)
+        self.eval()
+
+        loss_test = []
+        with torch.no_grad():
+            for x in test_iterator:
+                x = x.unsqueeze(-1).transpose(1, 2)
+                x = x.to(device)
+                loss = -self.test(x, A.data).cpu().numpy()
+                loss_test.append(loss)
+        loss_test = np.concatenate(loss_test)
+        anomaly_score = loss_test
+        if window_labels is not None:
+            anomaly_label = (window_labels.sum(axis=1) > 0).astype(int)
+            return anomaly_score, anomaly_label
+        else:
+            return anomaly_score
+
+    def fit(
+        self,
+        train_iterator,
+        valid_iterator=None,
+        n_sensor=None,
+        weight_decay=5e-4,
+        n_epochs=1,
+        lr=2e-3,
+        h_tol=1e-4,
+        rho_max=1e16,
+        lambda1=0.0,
+        rho_init=1.0,
+        alpha_init=0.0,
+    ):
+        self.to(self.device)
+
+        logging.info("Loading dataset")
+
+        rho = rho_init
+        alpha = alpha_init
+        lambda1 = lambda1
+        h_A_old = np.inf
+
+        # initialize A
+        init = torch.zeros([n_sensor, n_sensor])
+        init = xavier_uniform_(init).abs()
+        init = init.fill_diagonal_(0.0)
+        A = torch.tensor(init, requires_grad=True, device=self.device)
+        A = A.to(self.device)
+
+        optimizer = torch.optim.Adam(
+            [
+                {"params": self.parameters(), "weight_decay": weight_decay},
+                {"params": [A]},
+            ],
+            lr=lr,
+            weight_decay=weight_decay,
+        )
+
+        loss_best = 100
+        epoch = 0
+        for _ in range(n_epochs):
+            loss_train = []
+            epoch += 1
+            self.train()
+            for x in train_iterator:
+                x = x.unsqueeze(-1).transpose(1, 2)
+                x = x.to(self.device)
+
+                optimizer.zero_grad()
+                loss = -self(x, A)
+                h = torch.trace(torch.matrix_exp(A * A)) - n_sensor
+                total_loss = loss + 0.5 * rho * h * h + alpha * h
+
+                total_loss.backward()
+                clip_grad_value_(self.parameters(), 1)
+                optimizer.step()
+                loss_train.append(loss.item())
+                A.data.copy_(torch.clamp(A.data, min=0, max=1))
+            logging.info(
+                "Epoch: {}, train loss: {:.2f}".format(epoch, np.mean(loss_train))
+            )
+            # eval
+            self.eval()
+            loss_val = []
+            if valid_iterator is not None:
+                with torch.no_grad():
+                    for x in valid_iterator:
+                        x = x.unsqueeze(-1).transpose(1, 2)
+                        x = x.to(self.device)
+                        loss = -self.test(x, A.data).cpu().numpy()
+                        loss_val.append(loss)
+                loss_val = np.concatenate(loss_val)
+                loss_val = np.nan_to_num(loss_val)
+                if np.mean(loss_val) < loss_best:
+                    loss_best = np.mean(loss_val)
+                    logging.info("save model {} epoch".format(epoch))
+                    torch.save(A.data, os.path.join(self.model_root, "graph.pt"))
+                    torch.save(
+                        self.state_dict(),
+                        os.path.join(self.model_root, "model.pt"),
+                    )
+            else:
+                torch.save(A.data, os.path.join(self.model_root, "graph.pt"))
+                torch.save(self.state_dict(), os.path.join(self.model_root, "model.pt"))
diff --git a/networks/ganf/NF.py b/networks/ganf/NF.py
new file mode 100644
index 0000000..2292b81
--- /dev/null
+++ b/networks/ganf/NF.py
@@ -0,0 +1,426 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.distributions as D
+import math
+import copy
+
+
+# --------------------
+# Model layers and helpers
+# --------------------
+
+def create_masks(input_size, hidden_size, n_hidden, input_order='sequential', input_degrees=None):
+    # MADE paper sec 4:
+    # degrees of connections between layers -- ensure at most in_degree - 1 connections
+    degrees = []
+
+    # set input degrees to what is provided in args (the flipped order of the previous layer in a stack of mades);
+    # else init input degrees based on strategy in input_order (sequential or random)
+    if input_size>1:
+        if input_order == 'sequential':
+            degrees += [torch.arange(input_size)] if input_degrees is None else [input_degrees]
+            for _ in range(n_hidden + 1):
+                degrees += [torch.arange(hidden_size) % (input_size - 1)]
+            degrees += [torch.arange(input_size) % input_size - 1] if input_degrees is None else [input_degrees % input_size - 1]
+
+        elif input_order == 'random':
+            degrees += [torch.randperm(input_size)] if input_degrees is None else [input_degrees]
+            for _ in range(n_hidden + 1):
+                min_prev_degree = min(degrees[-1].min().item(), input_size - 1)
+                degrees += [torch.randint(min_prev_degree, input_size, (hidden_size,))]
+            min_prev_degree = min(degrees[-1].min().item(), input_size - 1)
+            degrees += [torch.randint(min_prev_degree, input_size, (input_size,)) - 1] if input_degrees is None else [input_degrees - 1]
+    else:
+        degrees += [torch.zeros([1]).long()]
+        for _ in range(n_hidden+1):
+            degrees += [torch.zeros([hidden_size]).long()]
+        degrees += [torch.zeros([input_size]).long()]
+    # construct masks
+    masks = []
+    for (d0, d1) in zip(degrees[:-1], degrees[1:]):
+        masks += [(d1.unsqueeze(-1) >= d0.unsqueeze(0)).float()]
+
+    return masks, degrees[0]
+
+#%%
+
+def create_masks_pmu(input_size, hidden_size, n_hidden, input_order='sequential', input_degrees=None):
+    # MADE paper sec 4:
+    # degrees of connections between layers -- ensure at most in_degree - 1 connections
+    degrees = []
+
+    # set input degrees to what is provided in args (the flipped order of the previous layer in a stack of mades);
+    # else init input degrees based on strategy in input_order (sequential or random)
+    if input_order == 'sequential':
+        degrees += [torch.arange(input_size)] if input_degrees is None else [input_degrees]
+        for _ in range(n_hidden + 1):
+            degrees += [torch.arange(hidden_size) % (input_size - 1)]
+        degrees += [torch.arange(input_size) % input_size - 1] if input_degrees is None else [input_degrees % input_size - 1]
+
+    # construct masks
+    masks = []
+    for (d0, d1) in zip(degrees[:-1], degrees[1:]):
+        masks += [(d1.unsqueeze(-1) >= d0.unsqueeze(0)).float()]
+    masks[0] = masks[0].repeat_interleave(3, dim=1)
+    masks[-1] = masks[-1].repeat_interleave(3, dim=0)
+
+    return masks, degrees[0]
+#%%
+class MaskedLinear(nn.Linear):
+    """ MADE building block layer """
+    def __init__(self, input_size, n_outputs, mask, cond_label_size=None):
+        super().__init__(input_size, n_outputs)
+
+        self.register_buffer('mask', mask)
+
+        self.cond_label_size = cond_label_size
+        if cond_label_size is not None:
+            self.cond_weight = nn.Parameter(torch.rand(n_outputs, cond_label_size) / math.sqrt(cond_label_size))
+
+    def forward(self, x, y=None):
+        out = F.linear(x, self.weight * self.mask, self.bias)
+        if y is not None:
+            out = out + F.linear(y, self.cond_weight)
+        return out
+
+    def extra_repr(self):
+        return 'in_features={}, out_features={}, bias={}'.format(
+            self.in_features, self.out_features, self.bias is not None
+        ) + (self.cond_label_size != None) * ', cond_features={}'.format(self.cond_label_size)
+
+
+class LinearMaskedCoupling(nn.Module):
+    """ Modified RealNVP Coupling Layers per the MAF paper """
+    def __init__(self, input_size, hidden_size, n_hidden, mask, cond_label_size=None):
+        super().__init__()
+
+        self.register_buffer('mask', mask)
+
+        # scale function
+        s_net = [nn.Linear(input_size + (cond_label_size if cond_label_size is not None else 0), hidden_size)]
+        for _ in range(n_hidden):
+            s_net += [nn.Tanh(), nn.Linear(hidden_size, hidden_size)]
+        s_net += [nn.Tanh(), nn.Linear(hidden_size, input_size)]
+        self.s_net = nn.Sequential(*s_net)
+
+        # translation function
+        self.t_net = copy.deepcopy(self.s_net)
+        # replace Tanh with ReLU's per MAF paper
+        for i in range(len(self.t_net)):
+            if not isinstance(self.t_net[i], nn.Linear): self.t_net[i] = nn.ReLU()
+
+    def forward(self, x, y=None):
+        # apply mask
+        mx = x * self.mask
+
+        # run through model
+        s = self.s_net(mx if y is None else torch.cat([y, mx], dim=1))
+        t = self.t_net(mx if y is None else torch.cat([y, mx], dim=1))
+        u = mx + (1 - self.mask) * (x - t) * torch.exp(-s)  # cf RealNVP eq 8 where u corresponds to x (here we're modeling u)
+
+        log_abs_det_jacobian = - (1 - self.mask) * s  # log det du/dx; cf RealNVP 8 and 6; note, sum over input_size done at model log_prob
+
+        return u, log_abs_det_jacobian
+
+    def inverse(self, u, y=None):
+        # apply mask
+        mu = u * self.mask
+
+        # run through model
+        s = self.s_net(mu if y is None else torch.cat([y, mu], dim=1))
+        t = self.t_net(mu if y is None else torch.cat([y, mu], dim=1))
+        x = mu + (1 - self.mask) * (u * s.exp() + t)  # cf RealNVP eq 7
+
+        log_abs_det_jacobian = (1 - self.mask) * s  # log det dx/du
+
+        return x, log_abs_det_jacobian
+
+
+class BatchNorm(nn.Module):
+    """ RealNVP BatchNorm layer """
+    def __init__(self, input_size, momentum=0.9, eps=1e-5):
+        super().__init__()
+        self.momentum = momentum
+        self.eps = eps
+
+        self.log_gamma = nn.Parameter(torch.zeros(input_size))
+        self.beta = nn.Parameter(torch.zeros(input_size))
+
+        self.register_buffer('running_mean', torch.zeros(input_size))
+        self.register_buffer('running_var', torch.ones(input_size))
+
+    def forward(self, x, cond_y=None):
+        if self.training:
+            self.batch_mean = x.mean(0)
+            self.batch_var = x.var(0) # note MAF paper uses biased variance estimate; ie x.var(0, unbiased=False)
+
+            # update running mean
+            self.running_mean.mul_(self.momentum).add_(self.batch_mean.data * (1 - self.momentum))
+            self.running_var.mul_(self.momentum).add_(self.batch_var.data * (1 - self.momentum))
+
+            mean = self.batch_mean
+            var = self.batch_var
+        else:
+            mean = self.running_mean
+            var = self.running_var
+
+        # compute normalized input (cf original batch norm paper algo 1)
+        x_hat = (x - mean) / torch.sqrt(var + self.eps)
+        y = self.log_gamma.exp() * x_hat + self.beta
+
+        # compute log_abs_det_jacobian (cf RealNVP paper)
+        log_abs_det_jacobian = self.log_gamma - 0.5 * torch.log(var + self.eps)
+#        print('in sum log var {:6.3f} ; out sum log var {:6.3f}; sum log det {:8.3f}; mean log_gamma {:5.3f}; mean beta {:5.3f}'.format(
+#            (var + self.eps).log().sum().data.numpy(), y.var(0).log().sum().data.numpy(), log_abs_det_jacobian.mean(0).item(), self.log_gamma.mean(), self.beta.mean()))
+        return y, log_abs_det_jacobian.expand_as(x)
+
+    def inverse(self, y, cond_y=None):
+        if self.training:
+            mean = self.batch_mean
+            var = self.batch_var
+        else:
+            mean = self.running_mean
+            var = self.running_var
+
+        x_hat = (y - self.beta) * torch.exp(-self.log_gamma)
+        x = x_hat * torch.sqrt(var + self.eps) + mean
+
+        log_abs_det_jacobian = 0.5 * torch.log(var + self.eps) - self.log_gamma
+
+        return x, log_abs_det_jacobian.expand_as(x)
+
+
+class FlowSequential(nn.Sequential):
+    """ Container for layers of a normalizing flow """
+    def forward(self, x, y):
+        sum_log_abs_det_jacobians = 0
+        for module in self:
+            x, log_abs_det_jacobian = module(x, y)
+            sum_log_abs_det_jacobians = sum_log_abs_det_jacobians + log_abs_det_jacobian
+        return x, sum_log_abs_det_jacobians
+
+    def inverse(self, u, y):
+        sum_log_abs_det_jacobians = 0
+        for module in reversed(self):
+            u, log_abs_det_jacobian = module.inverse(u, y)
+            sum_log_abs_det_jacobians = sum_log_abs_det_jacobians + log_abs_det_jacobian
+        return u, sum_log_abs_det_jacobians
+
+# --------------------
+# Models
+# --------------------
+
+class MADE(nn.Module):
+    def __init__(self, input_size, hidden_size, n_hidden, cond_label_size=None, activation='relu', input_order='sequential', input_degrees=None):
+        """
+        Args:
+            input_size -- scalar; dim of inputs
+            hidden_size -- scalar; dim of hidden layers
+            n_hidden -- scalar; number of hidden layers
+            activation -- str; activation function to use
+            input_order -- str or tensor; variable order for creating the autoregressive masks (sequential|random)
+                            or the order flipped from the previous layer in a stack of mades
+            conditional -- bool; whether model is conditional
+        """
+        super().__init__()
+        # base distribution for calculation of log prob under the model
+        self.register_buffer('base_dist_mean', torch.zeros(input_size))
+        self.register_buffer('base_dist_var', torch.ones(input_size))
+
+        # create masks
+        masks, self.input_degrees = create_masks(input_size, hidden_size, n_hidden, input_order, input_degrees)
+
+        # setup activation
+        if activation == 'relu':
+            activation_fn = nn.ReLU()
+        elif activation == 'tanh':
+            activation_fn = nn.Tanh()
+        else:
+            raise ValueError('Check activation function.')
+
+        # construct model
+        self.net_input = MaskedLinear(input_size, hidden_size, masks[0], cond_label_size)
+        self.net = []
+        for m in masks[1:-1]:
+            self.net += [activation_fn, MaskedLinear(hidden_size, hidden_size, m)]
+        self.net += [activation_fn, MaskedLinear(hidden_size, 2 * input_size, masks[-1].repeat(2,1))]
+        self.net = nn.Sequential(*self.net)
+
+    @property
+    def base_dist(self):
+        return D.Normal(self.base_dist_mean, self.base_dist_var)
+
+    def forward(self, x, y=None):
+        # MAF eq 4 -- return mean and log std
+        m, loga = self.net(self.net_input(x, y)).chunk(chunks=2, dim=1)
+        u = (x - m) * torch.exp(-loga)
+        # MAF eq 5
+        log_abs_det_jacobian = - loga
+        return u, log_abs_det_jacobian
+
+    def inverse(self, u, y=None, sum_log_abs_det_jacobians=None):
+        # MAF eq 3
+        D = u.shape[1]
+        x = torch.zeros_like(u)
+        # run through reverse model
+        for i in self.input_degrees:
+            m, loga = self.net(self.net_input(x, y)).chunk(chunks=2, dim=1)
+            x[:,i] = u[:,i] * torch.exp(loga[:,i]) + m[:,i]
+        log_abs_det_jacobian = loga
+        return x, log_abs_det_jacobian
+
+    def log_prob(self, x, y=None):
+        u, log_abs_det_jacobian = self.forward(x, y)
+        return torch.sum(self.base_dist.log_prob(u) + log_abs_det_jacobian, dim=1)
+
+
+class MADE_Full(nn.Module):
+    def __init__(self, input_size, hidden_size, n_hidden, cond_label_size=None, activation='relu', input_order='sequential', input_degrees=None):
+        """
+        Args:
+            input_size -- scalar; dim of inputs
+            hidden_size -- scalar; dim of hidden layers
+            n_hidden -- scalar; number of hidden layers
+            activation -- str; activation function to use
+            input_order -- str or tensor; variable order for creating the autoregressive masks (sequential|random)
+                            or the order flipped from the previous layer in a stack of mades
+            conditional -- bool; whether model is conditional
+        """
+        super().__init__()
+        # base distribution for calculation of log prob under the model
+        self.register_buffer('base_dist_mean', torch.zeros(input_size))
+        self.register_buffer('base_dist_var', torch.ones(input_size))
+
+        # create masks
+        masks, self.input_degrees = create_masks_pmu(int(input_size/3), hidden_size, n_hidden, input_order, input_degrees)
+
+        # setup activation
+        if activation == 'relu':
+            activation_fn = nn.ReLU()
+        elif activation == 'tanh':
+            activation_fn = nn.Tanh()
+        else:
+            raise ValueError('Check activation function.')
+
+        # construct model
+        self.net_input = MaskedLinear(input_size, hidden_size, masks[0], cond_label_size)
+        self.net = []
+        for m in masks[1:-1]:
+            self.net += [activation_fn, MaskedLinear(hidden_size, hidden_size, m)]
+        self.net += [activation_fn, MaskedLinear(hidden_size, 2 * input_size, masks[-1].repeat(2,1))]
+        self.net = nn.Sequential(*self.net)
+
+    @property
+    def base_dist(self):
+        return D.Normal(self.base_dist_mean, self.base_dist_var)
+
+    def forward(self, x, y=None):
+        # MAF eq 4 -- return mean and log std
+        m, loga = self.net(self.net_input(x, y)).chunk(chunks=2, dim=1)
+        u = (x - m) * torch.exp(-loga)
+        # MAF eq 5
+        log_abs_det_jacobian = - loga
+        return u, log_abs_det_jacobian
+
+    def log_prob(self, x, y=None):
+        u, log_abs_det_jacobian = self.forward(x, y)
+        return torch.sum(self.base_dist.log_prob(u) + log_abs_det_jacobian, dim=1)
+
+
+class MAF(nn.Module):
+    def __init__(self, n_blocks, input_size, hidden_size, n_hidden, cond_label_size=None, activation='relu', input_order='sequential', batch_norm=True):
+        super().__init__()
+        # base distribution for calculation of log prob under the model
+        self.register_buffer('base_dist_mean', torch.zeros(input_size))
+        self.register_buffer('base_dist_var', torch.ones(input_size))
+
+        # construct model
+        modules = []
+        self.input_degrees = None
+        for i in range(n_blocks):
+            modules += [MADE(input_size, hidden_size, n_hidden, cond_label_size, activation, input_order, self.input_degrees)]
+            self.input_degrees = modules[-1].input_degrees.flip(0)
+            modules += batch_norm * [BatchNorm(input_size)]
+
+        self.net = FlowSequential(*modules)
+
+    @property
+    def base_dist(self):
+        return D.Normal(self.base_dist_mean, self.base_dist_var)
+
+    def forward(self, x, y=None):
+        return self.net(x, y)
+
+    def inverse(self, u, y=None):
+        return self.net.inverse(u, y)
+
+    def log_prob(self, x, y=None):
+        u, sum_log_abs_det_jacobians = self.forward(x, y)
+        return torch.sum(self.base_dist.log_prob(u) + sum_log_abs_det_jacobians, dim=1)
+
+
+class MAF_Full(nn.Module):
+    def __init__(self, n_blocks, input_size, hidden_size, n_hidden, cond_label_size=None, activation='relu', input_order='sequential', batch_norm=True):
+        super().__init__()
+        # base distribution for calculation of log prob under the model
+        self.register_buffer('base_dist_mean', torch.zeros(input_size))
+        self.register_buffer('base_dist_var', torch.ones(input_size))
+
+        # construct model
+        modules = []
+        self.input_degrees = None
+        for i in range(n_blocks):
+            modules += [MADE_Full(input_size, hidden_size, n_hidden, cond_label_size, activation, input_order, self.input_degrees)]
+            self.input_degrees = modules[-1].input_degrees.flip(0)
+            modules += batch_norm * [BatchNorm(input_size)]
+
+        self.net = FlowSequential(*modules)
+
+    @property
+    def base_dist(self):
+        return D.Normal(self.base_dist_mean, self.base_dist_var)
+
+    def forward(self, x, y=None):
+        return self.net(x, y)
+
+    def inverse(self, u, y=None):
+        return self.net.inverse(u, y)
+
+    def log_prob(self, x, y=None):
+        u, sum_log_abs_det_jacobians = self.forward(x, y)
+        return torch.sum(self.base_dist.log_prob(u) + sum_log_abs_det_jacobians, dim=1)
+
+
+
+class RealNVP(nn.Module):
+    def __init__(self, n_blocks, input_size, hidden_size, n_hidden, cond_label_size=None, batch_norm=True):
+        super().__init__()
+
+        # base distribution for calculation of log prob under the model
+        self.register_buffer('base_dist_mean', torch.zeros(input_size))
+        self.register_buffer('base_dist_var', torch.ones(input_size))
+
+        # construct model
+        modules = []
+        mask = torch.arange(input_size).float() % 2
+        for i in range(n_blocks):
+            modules += [LinearMaskedCoupling(input_size, hidden_size, n_hidden, mask, cond_label_size)]
+            mask = 1 - mask
+            modules += batch_norm * [BatchNorm(input_size)]
+
+        self.net = FlowSequential(*modules)
+
+    @property
+    def base_dist(self):
+        return D.Normal(self.base_dist_mean, self.base_dist_var)
+
+    def forward(self, x, y=None):
+        return self.net(x, y)
+
+    def inverse(self, u, y=None):
+        return self.net.inverse(u, y)
+
+    def log_prob(self, x, y=None):
+        u, sum_log_abs_det_jacobians = self.forward(x, y)
+        return torch.sum(self.base_dist.log_prob(u) + sum_log_abs_det_jacobians, dim=1)
diff --git a/networks/ganf/RNN.py b/networks/ganf/RNN.py
new file mode 100644
index 0000000..2801595
--- /dev/null
+++ b/networks/ganf/RNN.py
@@ -0,0 +1,101 @@
+import torch
+import torch.nn as nn
+from functools import partial
+
+class RecurrentEncoder(nn.Module):
+    """Recurrent encoder"""
+
+    def __init__(self, n_features, latent_dim, rnn):
+        super().__init__()
+
+        self.rec_enc1 = rnn(n_features, latent_dim, batch_first=True)
+
+    def forward(self, x):
+        _, h_n = self.rec_enc1(x)
+
+        return h_n
+
+class RecurrentDecoder(nn.Module):
+    """Recurrent decoder for RNN and GRU"""
+
+    def __init__(self, latent_dim, n_features, rnn_cell, device):
+        super().__init__()
+
+        self.n_features = n_features
+        self.device = device
+        self.rec_dec1 = rnn_cell(n_features, latent_dim)
+        self.dense_dec1 = nn.Linear(latent_dim, n_features)
+
+    def forward(self, h_0, seq_len):
+        # Initialize output
+        x = torch.tensor([], device = self.device)
+
+        # Squeezing
+        h_i = h_0.squeeze()
+
+        # Reconstruct first element with encoder output
+        x_i = self.dense_dec1(h_i)
+
+        # Reconstruct remaining elements
+        for i in range(0, seq_len):
+            h_i = self.rec_dec1(x_i, h_i)
+            x_i = self.dense_dec1(h_i)
+            x = torch.cat([x, x_i], axis=1)
+
+        return x.view(-1, seq_len, self.n_features)
+
+
+class RecurrentDecoderLSTM(nn.Module):
+    """Recurrent decoder LSTM"""
+
+    def __init__(self, latent_dim, n_features, rnn_cell, device):
+        super().__init__()
+
+        self.n_features = n_features
+        self.device = device
+        self.rec_dec1 = rnn_cell(n_features, latent_dim)
+        self.dense_dec1 = nn.Linear(latent_dim, n_features)
+
+    def forward(self, h_0, seq_len):
+        # Initialize output
+        x = torch.tensor([], device = self.device)
+
+        # Squeezing
+        h_i = [h.squeeze() for h in h_0]
+
+        # Reconstruct first element with encoder output
+        x_i = self.dense_dec1(h_i[0])
+
+        # Reconstruct remaining elements
+        for i in range(0, seq_len):
+            h_i = self.rec_dec1(x_i, h_i)
+            x_i = self.dense_dec1(h_i[0])
+            x = torch.cat([x, x_i], axis = 1)
+
+        return x.view(-1, seq_len, self.n_features)
+
+
+class RecurrentAE(nn.Module):
+    """Recurrent autoencoder"""
+
+    def __init__(self, n_features, latent_dim, device):
+        super().__init__()
+
+        # Encoder and decoder argsuration
+        self.rnn, self.rnn_cell = nn.LSTM, nn.LSTMCell
+        self.decoder = RecurrentDecoderLSTM
+        self.latent_dim = latent_dim
+        self.n_features = n_features
+        self.device = device
+
+        # Encoder and decoder
+        self.encoder = RecurrentEncoder(self.n_features, self.latent_dim, self.rnn)
+        self.decoder = self.decoder(self.latent_dim, self.n_features, self.rnn_cell, self.device)
+
+    def forward(self, x):
+        # x: N X K X L X D 
+        seq_len = x.shape[1]
+        h_n = self.encoder(x)
+        out = self.decoder(h_n, seq_len)
+
+        return torch.flip(out, [1])
diff --git a/networks/ganf/dataset.py b/networks/ganf/dataset.py
new file mode 100644
index 0000000..10756ac
--- /dev/null
+++ b/networks/ganf/dataset.py
@@ -0,0 +1,61 @@
+import pandas as pd
+import torch
+from torch.utils.data import Dataset
+import numpy as np
+
+from torch.utils.data import DataLoader
+
+def load_SMD(data, label, batch_size):
+
+    #mean_df = data.mean(axis=0)
+    #std_df = data.std(axis=0)
+
+    #data = pd.DataFrame((data-mean_df)/std_df)
+    n_sensor = data.shape[1]
+    #data = data.dropna(axis=1)
+    data = np.array(data)
+
+    train_df = data[:int(0.5*len(data))]
+    train_label = label[:int(0.5*len(data))]
+
+    val_df = data[int(0.5*len(data)):int(0.7*len(data))]
+    val_label = label[int(0.5*len(data)):int(0.7*len(data))]
+    
+    test_df = data[int(0.7*len(data)):]
+    test_label = label[int(0.7*len(data)):]
+
+    train_loader = DataLoader(SMD(train_df, train_label), batch_size=batch_size, shuffle=True)
+
+    val_loader = DataLoader(SMD(val_df,val_label), batch_size=batch_size, shuffle=False)
+    test_loader = DataLoader(SMD(test_df,test_label), batch_size=batch_size, shuffle=False)
+
+    return train_loader, val_loader, test_loader, n_sensor
+
+
+class SMD(Dataset):
+    def __init__(self, data, label, window_size=60, stride_size=1):
+        super(SMD, self).__init__()
+        self.data = data
+        self.window_size = window_size
+        self.stride_size = stride_size
+
+        self.data, self.idx, self.label = self.preprocess(data, label)
+
+    def preprocess(self, data, label):
+        start_idx = np.arange(0, len(data) - self.window_size, self.stride_size)
+        end_idx = np.arange(self.window_size, len(data), self.stride_size)
+
+        return data, start_idx, label[end_idx]
+
+    def __len__(self):
+        length = len(self.idx)
+
+        return length
+
+    def __getitem__(self, index):
+        #  N X K X L X D
+        start = self.idx[index]
+        end = start + self.window_size
+        data = self.data[start:end].reshape([self.window_size, -1, 1])
+
+        return torch.FloatTensor(data).transpose(0, 1)
diff --git a/networks/ganf/fit.py b/networks/ganf/fit.py
new file mode 100644
index 0000000..fc7a261
--- /dev/null
+++ b/networks/ganf/fit.py
@@ -0,0 +1,16 @@
+import os
+import argparse
+import torch
+from torch.utils.data import DataLoader
+from networks.ganf.GANF import GANF
+from sklearn.metrics import roc_auc_score
+import sys
+import random
+import numpy as np
+from torch.nn.utils import clip_grad_value_
+import seaborn as sns
+import matplotlib.pyplot as plt
+import logging
+sys.path.append("../")
+
+
diff --git a/networks/ganf/graph_layer.py b/networks/ganf/graph_layer.py
new file mode 100644
index 0000000..0421b6e
--- /dev/null
+++ b/networks/ganf/graph_layer.py
@@ -0,0 +1,122 @@
+import torch
+from torch.nn import Parameter, Linear, Sequential, BatchNorm1d, ReLU
+import torch.nn.functional as F
+from torch_geometric.nn.conv import MessagePassing
+from torch_geometric.utils import remove_self_loops, add_self_loops, softmax
+
+from torch_geometric.nn.inits import glorot, zeros
+
+class GraphLayer(MessagePassing):
+    def __init__(self, in_channels, out_channels, heads=1, concat=True,
+                 negative_slope=0.2, dropout=0, bias=True, inter_dim=-1,**kwargs):
+        super(GraphLayer, self).__init__(aggr='add', **kwargs)
+
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.heads = heads
+        self.concat = concat
+        self.negative_slope = negative_slope
+        self.dropout = dropout
+
+        self.__alpha__ = None
+
+        self.lin = Linear(in_channels, heads * out_channels, bias=False)
+
+        self.att_i = Parameter(torch.Tensor(1, heads, out_channels))
+        self.att_j = Parameter(torch.Tensor(1, heads, out_channels))
+        self.att_em_i = Parameter(torch.Tensor(1, heads, out_channels))
+        self.att_em_j = Parameter(torch.Tensor(1, heads, out_channels))
+
+        if bias and concat:
+            self.bias = Parameter(torch.Tensor(heads * out_channels))
+        elif bias and not concat:
+            self.bias = Parameter(torch.Tensor(out_channels))
+        else:
+            self.register_parameter('bias', None)
+
+        self.reset_parameters()
+
+    def reset_parameters(self):
+        glorot(self.lin.weight)
+        glorot(self.att_i)
+        glorot(self.att_j)
+        
+        zeros(self.att_em_i)
+        zeros(self.att_em_j)
+
+        zeros(self.bias)
+
+
+
+    def forward(self, x, edge_index, embedding, return_attention_weights=False):
+        """"""
+        if torch.is_tensor(x):
+            x = self.lin(x)
+            x = (x, x)
+        else:
+            x = (self.lin(x[0]), self.lin(x[1]))
+
+        edge_index, _ = remove_self_loops(edge_index)
+        edge_index, _ = add_self_loops(edge_index,
+                                       num_nodes=x[1].size(self.node_dim))
+
+        out = self.propagate(edge_index, x=x, embedding=embedding, edges=edge_index,
+                             return_attention_weights=return_attention_weights)
+
+        if self.concat:
+            out = out.view(-1, self.heads * self.out_channels)
+        else:
+            out = out.mean(dim=1)
+
+        if self.bias is not None:
+            out = out + self.bias
+
+        if return_attention_weights:
+            alpha, self.__alpha__ = self.__alpha__, None
+            return out, (edge_index, alpha)
+        else:
+            return out
+
+    def message(self, x_i, x_j, edge_index_i, size_i,
+                embedding,
+                edges,
+                return_attention_weights):
+
+        x_i = x_i.view(-1, self.heads, self.out_channels)
+        x_j = x_j.view(-1, self.heads, self.out_channels)
+
+        if embedding is not None:
+            embedding_i, embedding_j = embedding[edge_index_i], embedding[edges[0]]
+            embedding_i = embedding_i.unsqueeze(1).repeat(1,self.heads,1)
+            embedding_j = embedding_j.unsqueeze(1).repeat(1,self.heads,1)
+
+            key_i = torch.cat((x_i, embedding_i), dim=-1)
+            key_j = torch.cat((x_j, embedding_j), dim=-1)
+
+
+
+        cat_att_i = torch.cat((self.att_i, self.att_em_i), dim=-1)
+        cat_att_j = torch.cat((self.att_j, self.att_em_j), dim=-1)
+
+        alpha = (key_i * cat_att_i).sum(-1) + (key_j * cat_att_j).sum(-1)
+
+
+        alpha = alpha.view(-1, self.heads, 1)
+
+
+        alpha = F.leaky_relu(alpha, self.negative_slope)
+        alpha = softmax(alpha, edge_index_i, num_nodes=size_i)
+
+        if return_attention_weights:
+            self.__alpha__ = alpha
+
+        alpha = F.dropout(alpha, p=self.dropout, training=self.training)
+        
+        return x_j * alpha.view(-1, self.heads, 1)
+
+
+
+    def __repr__(self):
+        return '{}({}, {}, heads={})'.format(self.__class__.__name__,
+                                             self.in_channels,
+                                             self.out_channels, self.heads)
diff --git a/networks/ganf/predict.py b/networks/ganf/predict.py
new file mode 100644
index 0000000..0516322
--- /dev/null
+++ b/networks/ganf/predict.py
@@ -0,0 +1,27 @@
+import os
+import argparse
+import torch
+from networks.ganf.GANF import GANF
+import numpy as np
+from sklearn.metrics import roc_auc_score
+
+
+def predict_prob(model, test_iterator, evaluate_dir, window_labels):
+    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
+    model = model.to(device)
+    model.load_state_dict(torch.load(evaluate_dir + "/GANF_SMD_best.pt"))
+    A = torch.load(evaluate_dir + "/graph_best.pt").to(device)
+    model.eval()
+
+    loss_test = []
+    with torch.no_grad():
+        for x in test_iterator:
+            x = x.unsqueeze(-1).transpose(1, 2)
+            x = x.to(device)
+            loss = -model.test(x, A.data).cpu().numpy()
+            loss_test.append(loss)
+    loss_test = np.concatenate(loss_test)
+    anomaly_score = loss_test
+    anomaly_label = window_labels[-len(anomaly_score) :]
+
+    return anomaly_score, anomaly_label
diff --git a/networks/ganf/utils.py b/networks/ganf/utils.py
new file mode 100644
index 0000000..1979818
--- /dev/null
+++ b/networks/ganf/utils.py
@@ -0,0 +1,74 @@
+#%%
+import torch
+
+def h(A):
+    return torch.trace(torch.matrix_exp(A*A)) - A.shape[0]
+
+def normalize(A):
+    D = A.sum(dim=0)
+    D_inv = D.pow_(-1)
+    D_inv.masked_fill_(D_inv == float('inf'), 0)
+    
+    return A * D_inv
+    
+def thresholding(A, thre):
+    return torch.where(A.abs()>thre, A, torch.scalar_tensor(0.0, dtype=torch.float32, device=A.device))
+
+def binarize(A, thre):
+    return torch.where(A.abs()>thre, 1.0, 0.0)
+# %%
+import pandas as pd
+def get_timestamp(stamps):
+    return (stamps - pd.Timestamp("1970-01-01")) // pd.Timedelta("1s")
+# %%
+import numpy as np
+from sklearn.metrics import auc
+def roc_auc(label_time, pred, negative_sample, sigma):
+    negative_sample = np.sort(negative_sample)[::-1]
+    thresholds = list(negative_sample[::int(len(negative_sample)/50)])
+    thresholds.append(negative_sample[-1])
+    tps=[]
+    fps=[]
+
+    for thre in thresholds:
+        pred_pos = pred[pred>thre]
+
+        tp = 0
+        for i in range(len(label_time)):
+            start_time = label_time[i] - pd.Timedelta(30, unit='min')
+            end_time = label_time[i] + pd.Timedelta(30, unit='min')
+
+            detected_event = pred_pos[str(start_time): str(end_time)]
+            if len(detected_event)>0:
+                timestamps = get_timestamp(detected_event.index)
+                delta_t = np.min(np.abs(timestamps.values - get_timestamp(label_time[i])))
+                tp += np.exp(-np.power(delta_t/sigma,2))
+        tp = tp/len(label_time)
+        tps.append(tp)
+
+        fp = (negative_sample>thre).sum()/len(negative_sample)
+        fps.append(fp)
+    return auc(fps,tps), (fps,tps)
+# %%
+def roc_auc_all(loss_np, delta_t, sigma):
+
+    ground_truth = np.exp(-np.power((delta_t.values)/sigma,2))
+
+    loss_sort = np.sort(loss_np)[::-1]
+    thresholds = list(loss_sort[::int(len(loss_sort)/50)])
+    thresholds.append(loss_sort[-1])
+
+    n_pos = ground_truth.sum()
+    n_neg = (1-ground_truth).sum()
+    tps = []
+    fps = []
+    for thre in thresholds:
+        pred_pos = loss_np>thre
+
+        tp = ground_truth[pred_pos].sum()/n_pos
+        fp = (1-ground_truth[pred_pos]).sum()/n_neg
+        tps.append(tp)
+        fps.append(fp)
+
+    auc_score = auc(fps, tps)
+    return auc_score, fps, tps
\ No newline at end of file
diff --git a/networks/lstm/__init__.py b/networks/lstm/__init__.py
new file mode 100644
index 0000000..1a86963
--- /dev/null
+++ b/networks/lstm/__init__.py
@@ -0,0 +1 @@
+from .lstm import LSTM
\ No newline at end of file
diff --git a/networks/lstm/lstm.py b/networks/lstm/lstm.py
new file mode 100644
index 0000000..b74ee07
--- /dev/null
+++ b/networks/lstm/lstm.py
@@ -0,0 +1,75 @@
+## Unit test only start
+# import torch
+# import sys
+import torch
+from torch import nn
+from networks.lstm.wrappers import TimeSeriesEncoder
+
+
+class LSTM(TimeSeriesEncoder):
+    """
+    Encoder of a time series using a LSTM, ccomputing a linear transformation
+    of the output of an LSTM
+
+    Takes as input a three-dimensional tensor (`B`, `L`, `C`) where `B` is the
+    batch size, `C` is the number of input channels, and `L` is the length of
+    the input. Outputs a two-dimensional tensor (`B`, `C`).
+    """
+
+    def __init__(
+        self,
+        in_channels,
+        hidden_size=64,
+        num_layers=1,
+        dropout=0,
+        prediction_length=1,
+        prediction_dims=[],
+        **kwargs,
+    ):
+        super().__init__(architecture="LSTM", **kwargs)
+
+        self.prediction_dims = (
+            prediction_dims if prediction_dims else list(range(in_channels))
+        )
+        self.prediction_length = prediction_length
+
+        self.lstm = nn.LSTM(
+            input_size=in_channels,
+            hidden_size=hidden_size,
+            num_layers=num_layers,
+            batch_first=True,
+        )
+        clf_input_dim = hidden_size
+        final_output_dim = prediction_length * len(self.prediction_dims)
+
+        self.predcitor = nn.Linear(clf_input_dim, final_output_dim)
+
+        self.dropout = nn.Dropout(dropout)
+        self.loss_fn = nn.MSELoss(reduction="none")
+
+        self.compile()
+
+    def forward(self, batch_window):
+        # batch_window = batch_window.permute(0, 2, 1)  # b x win x ts_dim
+        self.batch_size = batch_window.size(0)
+        x, y = (
+            batch_window[:, 0 : -self.prediction_length, :],
+            batch_window[:, -self.prediction_length :, self.prediction_dims],
+        )
+
+        lstm_out, _ = self.lstm(x)
+        lstm_out = self.dropout(lstm_out[:, -1, :])
+
+        recst = self.predcitor(lstm_out).view(
+            self.batch_size, self.prediction_length, len(self.prediction_dims)
+        )
+
+        loss = self.loss_fn(recst, y)
+        return_dict = {
+            "loss": loss.sum(),
+            "recst": recst,
+            "score": loss,
+            "y": y,
+        }
+
+        return return_dict
diff --git a/networks/lstm/wrappers.py b/networks/lstm/wrappers.py
new file mode 100644
index 0000000..1214720
--- /dev/null
+++ b/networks/lstm/wrappers.py
@@ -0,0 +1,149 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+
+#   http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import os
+import logging
+import time
+import torch
+from common.utils import set_device
+from collections import defaultdict
+
+
+class TimeSeriesEncoder(torch.nn.Module):
+    def __init__(
+        self,
+        save_path,
+        nb_epoch,
+        lr,
+        device="cpu",
+        architecture="base",
+        **kwargs,
+    ):
+        super().__init__()
+        self.device = set_device(device)
+        self.nb_epoch = nb_epoch
+        self.lr = lr
+        self.best_metric = float("inf")
+        self.time_tracker = {}
+        self.model_save_file = os.path.join(save_path, f"{architecture}_model.ckpt")
+
+    def compile(self):
+        logging.info("Compiling finished.")
+        self.optimizer = torch.optim.Adam(
+            self.parameters(), lr=self.lr, weight_decay=0.001
+        )
+        self = self.to(self.device)
+
+    def save_encoder(self):
+        logging.info("Saving model to {}".format(self.model_save_file))
+        try:
+            torch.save(
+                self.state_dict(),
+                self.model_save_file,
+                _use_new_zipfile_serialization=False,
+            )
+        except:
+            torch.save(self.state_dict(), self.model_save_file)
+
+    def load_encoder(self, model_save_path=""):
+        logging.info("Loading model from {}".format(self.model_save_file))
+        self.load_state_dict(torch.load(self.model_save_file, map_location=self.device))
+
+    def fit(
+        self,
+        train_iterator,
+        patience=10,
+        **kwargs,
+    ):
+        num_batches = len(train_iterator)
+        logging.info("Start training for {} batches.".format(num_batches))
+        train_start = time.time()
+        # Encoder training
+        for epoch in range(1, self.nb_epoch + 1):
+            running_loss = 0
+            for idx, batch in enumerate(train_iterator):
+                # batch: b x d x dim
+                batch = batch.to(self.device).float()
+                return_dict = self(batch)
+                self.optimizer.zero_grad()
+                loss = return_dict["loss"]
+                loss.backward()
+                self.optimizer.step()
+                running_loss += loss.item()
+            avg_loss = running_loss / num_batches
+            logging.info("Epoch: {}, loss: {:.5f}".format(epoch, avg_loss))
+            stop_training = self.__on_epoch_end(avg_loss, patience=patience)
+            if stop_training:
+                logging.info("Early stop at epoch {}.".format(epoch))
+                break
+        train_end = time.time()
+
+        self.time_tracker["train"] = train_end - train_start
+        return self
+
+    def __on_epoch_end(self, monitor_value, patience):
+        if monitor_value < self.best_metric:
+            self.best_metric = monitor_value
+            logging.info("Saving model for performance: {:.3f}".format(monitor_value))
+            self.save_encoder()
+            self.worse_count = 0
+        else:
+            self.worse_count += 1
+        if self.worse_count >= patience:
+            return True
+        return False
+
+    def encode(self, iterator):
+        # Check if the given time series have unequal lengths
+        save_dict = defaultdict(list)
+        self = self.eval()
+
+        used_keys = ["recst", "y", "diff"]
+        with torch.no_grad():
+            for batch in iterator:
+                batch = batch.to(self.device).float()
+                return_dict = self(batch)
+                for k in used_keys:
+                    save_dict[k].append(return_dict[k])
+        self = self.train()
+        return {k: torch.cat(v) for k, v in save_dict.items()}
+
+    def predict_prob(self, iterator, window_labels=None):
+        logging.info("Evaluating")
+        self = self.eval()
+        test_start = time.time()
+        with torch.no_grad():
+            score_list = []
+            for batch in iterator:
+                batch = batch.to(self.device).float()
+                return_dict = self(batch)
+                score = (
+                    # average all dimension
+                    return_dict["score"]
+                    .mean(dim=-1)
+                    .sigmoid()  # b x prediction_length
+                )
+                # mean all timestamp
+                score_list.append(score.mean(dim=-1))
+        test_end = time.time()
+        self.time_tracker["test"] = test_end - test_start
+
+        anomaly_score = torch.cat(score_list, dim=0).cpu().numpy()
+        if window_labels is not None:
+            anomaly_label = (window_labels.sum(axis=1) > 0).astype(int)
+            return anomaly_score, anomaly_label
+        return anomaly_score
diff --git a/networks/mscred/__init__.py b/networks/mscred/__init__.py
new file mode 100644
index 0000000..3a806dc
--- /dev/null
+++ b/networks/mscred/__init__.py
@@ -0,0 +1 @@
+from .models import MSCRED
diff --git a/networks/mscred/dlutils.py b/networks/mscred/dlutils.py
new file mode 100644
index 0000000..96e4276
--- /dev/null
+++ b/networks/mscred/dlutils.py
@@ -0,0 +1,228 @@
+import torch.nn as nn
+import torch
+import torch.nn.functional as F
+from torch.autograd import Variable
+import numpy as np
+
+
+class ConvLSTMCell(nn.Module):
+    def __init__(self, input_dim, hidden_dim, kernel_size, bias):
+        """
+        Initialize ConvLSTM cell.
+
+        Parameters
+        ----------
+        input_dim: int
+            Number of channels of input tensor.
+        hidden_dim: int
+            Number of channels of hidden state.
+        kernel_size: (int, int)
+            Size of the convolutional kernel.
+        bias: bool
+            Whether or not to add the bias.
+        """
+
+        super(ConvLSTMCell, self).__init__()
+
+        self.input_dim = input_dim
+        self.hidden_dim = hidden_dim
+
+        self.kernel_size = kernel_size
+        self.padding = kernel_size[0] // 2, kernel_size[1] // 2
+        self.bias = bias
+
+        self.conv = nn.Conv2d(
+            in_channels=self.input_dim + self.hidden_dim,
+            out_channels=4 * self.hidden_dim,
+            kernel_size=self.kernel_size,
+            padding=self.padding,
+            bias=self.bias,
+        )
+
+    def forward(self, input_tensor, cur_state):
+        h_cur, c_cur = cur_state
+
+        combined = torch.cat(
+            [input_tensor, h_cur], dim=1
+        )  # concatenate along channel axis
+
+        combined_conv = self.conv(combined)
+        cc_i, cc_f, cc_o, cc_g = torch.split(combined_conv, self.hidden_dim, dim=1)
+        i = torch.sigmoid(cc_i)
+        f = torch.sigmoid(cc_f)
+        o = torch.sigmoid(cc_o)
+        g = torch.tanh(cc_g)
+
+        c_next = f * c_cur + i * g
+        h_next = o * torch.tanh(c_next)
+
+        return h_next, c_next
+
+    def init_hidden(self, batch_size, image_size):
+        height, width = image_size
+        return (
+            torch.zeros(
+                batch_size,
+                self.hidden_dim,
+                height,
+                width,
+                device=self.conv.weight.device,
+            ),
+            torch.zeros(
+                batch_size,
+                self.hidden_dim,
+                height,
+                width,
+                device=self.conv.weight.device,
+            ),
+        )
+
+
+class ConvLSTM(nn.Module):
+
+    """
+
+    Parameters:
+        input_dim: Number of channels in input
+        hidden_dim: Number of hidden channels
+        kernel_size: Size of kernel in convolutions
+        num_layers: Number of LSTM layers stacked on each other
+        batch_first: Whether or not dimension 0 is the batch or not
+        bias: Bias or no bias in Convolution
+        return_all_layers: Return the list of computations for all layers
+        Note: Will do same padding.
+
+    Input:
+        A tensor of size B, T, C, H, W or T, B, C, H, W
+    Output:
+        A tuple of two lists of length num_layers (or length 1 if return_all_layers is False).
+            0 - layer_output_list is the list of lists of length T of each output
+            1 - last_state_list is the list of last states
+                    each element of the list is a tuple (h, c) for hidden state and memory
+    Example:
+        >> x = torch.rand((32, 10, 64, 128, 128))
+        >> convlstm = ConvLSTM(64, 16, 3, 1, True, True, False)
+        >> _, last_states = convlstm(x)
+        >> h = last_states[0][0]  # 0 for layer index, 0 for h index
+    """
+
+    def __init__(
+        self,
+        input_dim,
+        hidden_dim,
+        kernel_size,
+        num_layers,
+        batch_first=False,
+        bias=True,
+        return_all_layers=False,
+    ):
+        super(ConvLSTM, self).__init__()
+
+        self._check_kernel_size_consistency(kernel_size)
+
+        # Make sure that both `kernel_size` and `hidden_dim` are lists having len == num_layers
+        kernel_size = self._extend_for_multilayer(kernel_size, num_layers)
+        hidden_dim = self._extend_for_multilayer(hidden_dim, num_layers)
+        if not len(kernel_size) == len(hidden_dim) == num_layers:
+            raise ValueError("Inconsistent list length.")
+
+        self.input_dim = input_dim
+        self.hidden_dim = hidden_dim
+        self.kernel_size = kernel_size
+        self.num_layers = num_layers
+        self.batch_first = batch_first
+        self.bias = bias
+        self.return_all_layers = return_all_layers
+
+        cell_list = []
+        for i in range(0, self.num_layers):
+            cur_input_dim = self.input_dim if i == 0 else self.hidden_dim[i - 1]
+
+            cell_list.append(
+                ConvLSTMCell(
+                    input_dim=cur_input_dim,
+                    hidden_dim=self.hidden_dim[i],
+                    kernel_size=self.kernel_size[i],
+                    bias=self.bias,
+                )
+            )
+
+        self.cell_list = nn.ModuleList(cell_list)
+
+    def forward(self, input_tensor, hidden_state=None):
+        """
+
+        Parameters
+        ----------
+        input_tensor: todo
+            5-D Tensor either of shape (t, b, c, h, w) or (b, t, c, h, w)
+        hidden_state: todo
+            None. todo implement stateful
+
+        Returns
+        -------
+        last_state_list, layer_output
+        """
+        if not self.batch_first:
+            # (t, b, c, h, w) -> (b, t, c, h, w)
+            input_tensor = input_tensor.permute(1, 0, 2, 3, 4)
+
+        b, _, _, h, w = input_tensor.size()
+
+        # Implement stateful ConvLSTM
+        if hidden_state is not None:
+            raise NotImplementedError()
+        else:
+            # Since the init is done in forward. Can send image size here
+            hidden_state = self._init_hidden(batch_size=b, image_size=(h, w))
+
+        layer_output_list = []
+        last_state_list = []
+
+        seq_len = input_tensor.size(1)
+        cur_layer_input = input_tensor
+
+        for layer_idx in range(self.num_layers):
+
+            h, c = hidden_state[layer_idx]
+            output_inner = []
+            for t in range(seq_len):
+                h, c = self.cell_list[layer_idx](
+                    input_tensor=cur_layer_input[:, t, :, :, :], cur_state=[h, c]
+                )
+                output_inner.append(h)
+
+            layer_output = torch.stack(output_inner, dim=1)
+            cur_layer_input = layer_output
+
+            layer_output_list.append(layer_output)
+            last_state_list.append([h, c])
+
+        if not self.return_all_layers:
+            layer_output_list = layer_output_list[-1:]
+            last_state_list = last_state_list[-1:]
+
+        return layer_output_list, last_state_list
+
+    def _init_hidden(self, batch_size, image_size):
+        init_states = []
+        for i in range(self.num_layers):
+            init_states.append(self.cell_list[i].init_hidden(batch_size, image_size))
+        return init_states
+
+    @staticmethod
+    def _check_kernel_size_consistency(kernel_size):
+        if not (
+            isinstance(kernel_size, tuple)
+            or (
+                isinstance(kernel_size, list)
+                and all([isinstance(elem, tuple) for elem in kernel_size])
+            )
+        ):
+            raise ValueError("`kernel_size` must be tuple or list of tuples")
+
+    @staticmethod
+    def _extend_for_multilayer(param, num_layers):
+        if not isinstance(param, list):
+            param = [param] * num_layers
+        return param
diff --git a/networks/mscred/models.py b/networks/mscred/models.py
new file mode 100644
index 0000000..ef242b2
--- /dev/null
+++ b/networks/mscred/models.py
@@ -0,0 +1,93 @@
+import logging
+import os
+import torch
+import torch.nn as nn
+import numpy as np
+
+from common.utils import set_device
+from .dlutils import ConvLSTM
+
+## MSCRED Model (AAAI 19)
+class MSCRED(nn.Module):
+    def __init__(self, feats, window_size, lr, model_root, device):
+        super(MSCRED, self).__init__()
+        self.name = "MSCRED"
+        self.name = "TranAD"
+        self.n_feats = feats
+        self.n_window = window_size
+        self.lr = lr
+        self.device = set_device(device)
+        self.encoder = nn.ModuleList(
+            [
+                ConvLSTM(1, 32, (3, 3), 1, True, True, False),
+                ConvLSTM(32, 64, (3, 3), 1, True, True, False),
+                ConvLSTM(64, 128, (3, 3), 1, True, True, False),
+            ]
+        )
+        self.decoder = nn.Sequential(
+            nn.ConvTranspose2d(128, 64, (3, 3), 1, 1),
+            nn.ReLU(True),
+            nn.ConvTranspose2d(64, 32, (3, 3), 1, 1),
+            nn.ReLU(True),
+            nn.ConvTranspose2d(32, 1, (3, 3), 1, 1),
+            nn.Sigmoid(),
+        )
+        self.init_model(lr, model_root)
+
+    def forward(self, g):
+        batch_size = g.shape[0]
+        ## Encode
+        z = g.view(batch_size, 1, self.n_window, self.n_feats)
+        for cell in self.encoder:
+            _, z = cell(z.unsqueeze(1))
+            z = z[0][0]
+        ## Decode
+        x = self.decoder(z)
+        x = x.view(batch_size, self.n_window, self.n_feats)
+        return x
+
+    def init_model(self, lr, model_root, retrain=True, test=False):
+        optimizer = torch.optim.AdamW(self.parameters(), lr=lr, weight_decay=1e-5)
+
+        if os.path.exists(model_root) and (not retrain or test):
+            logging.info("Loading pre-trained model")
+            checkpoint = torch.load(os.path.join(model_root, "model.pt"))
+            self.load_state_dict(checkpoint["model_state_dict"])
+            optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
+        else:
+            logging.info("Creating new model: MSCRED")
+
+        self.optimizer = optimizer
+        logging.info("Finish model initialization.")
+
+    def fit(self, nb_epoch, dataloader, training=True):
+        self.to(self.device)
+        for epoch in range(1, nb_epoch + 1):
+            mse_func = nn.MSELoss(reduction="none")
+            if training:
+                logging.info("Training epoch: {}".format(epoch))
+                for _, d in enumerate(dataloader):
+                    d = d.to(self.device)
+                    x = self(d)
+                    loss = torch.mean(mse_func(x, d))
+                    self.optimizer.zero_grad()
+                    loss.backward()
+                    self.optimizer.step()
+                logging.info("Epoch: {} finished.".format(epoch))
+
+    def predict_prob(self, test_iterator, label_windows=None):
+        with torch.no_grad():
+            self.eval()
+            mse_func = nn.MSELoss(reduction="none")
+            loss_steps = []
+            for d in test_iterator:
+                d = d.to(self.device)
+                x = self(d)
+                loss = mse_func(x, d).view(-1, self.n_window, self.n_feats)
+                loss_steps.append(loss.detach().cpu().numpy())
+            anomaly_score = np.concatenate(loss_steps).mean(axis=(2, 1))
+            if label_windows is None:
+                return anomaly_score
+            else:
+                anomaly_label = (np.sum(label_windows, axis=1) >= 1) + 0
+                return anomaly_score, anomaly_label
diff --git a/networks/mtad_gat/__init__.py b/networks/mtad_gat/__init__.py
new file mode 100644
index 0000000..162e027
--- /dev/null
+++ b/networks/mtad_gat/__init__.py
@@ -0,0 +1 @@
+from .mtad_gat import MTAD_GAT
diff --git a/networks/mtad_gat/modules.py b/networks/mtad_gat/modules.py
new file mode 100644
index 0000000..6470d10
--- /dev/null
+++ b/networks/mtad_gat/modules.py
@@ -0,0 +1,339 @@
+import torch
+import torch.nn as nn
+
+
+class ConvLayer(nn.Module):
+    """1-D Convolution layer to extract high-level features of each time-series input
+    :param n_features: Number of input features/nodes
+    :param window_size: length of the input sequence
+    :param kernel_size: size of kernel to use in the convolution operation
+    """
+
+    def __init__(self, n_features, kernel_size=7):
+        super(ConvLayer, self).__init__()
+        self.padding = nn.ConstantPad1d((kernel_size - 1) // 2, 0.0)
+        self.conv = nn.Conv1d(
+            in_channels=n_features, out_channels=n_features, kernel_size=kernel_size
+        )
+        self.relu = nn.ReLU()
+
+    def forward(self, x):
+        x = x.permute(0, 2, 1)
+        x = self.padding(x)
+        x = self.relu(self.conv(x))
+        return x.permute(0, 2, 1)  # Permute back
+
+
+class FeatureAttentionLayer(nn.Module):
+    """Single Graph Feature/Spatial Attention Layer
+    :param n_features: Number of input features/nodes
+    :param window_size: length of the input sequence
+    :param dropout: percentage of nodes to dropout
+    :param alpha: negative slope used in the leaky rely activation function
+    :param embed_dim: embedding dimension (output dimension of linear transformation)
+    :param use_gatv2: whether to use the modified attention mechanism of GATv2 instead of standard GAT
+    :param use_bias: whether to include a bias term in the attention layer
+    """
+
+    def __init__(
+        self,
+        n_features,
+        window_size,
+        dropout,
+        alpha,
+        embed_dim=None,
+        use_gatv2=True,
+        use_bias=True,
+    ):
+        super(FeatureAttentionLayer, self).__init__()
+        self.n_features = n_features
+        self.window_size = window_size
+        self.dropout = dropout
+        self.embed_dim = embed_dim if embed_dim is not None else window_size
+        self.use_gatv2 = use_gatv2
+        self.num_nodes = n_features
+        self.use_bias = use_bias
+
+        # Because linear transformation is done after concatenation in GATv2
+        if self.use_gatv2:
+            self.embed_dim *= 2
+            lin_input_dim = 2 * window_size
+            a_input_dim = self.embed_dim
+        else:
+            lin_input_dim = window_size
+            a_input_dim = 2 * self.embed_dim
+
+        self.lin = nn.Linear(lin_input_dim, self.embed_dim)
+        self.a = nn.Parameter(torch.zeros((a_input_dim, 1)))
+        nn.init.xavier_uniform_(self.a.data, gain=1.414)
+
+        if self.use_bias:
+            self.bias = nn.Parameter(torch.zeros(n_features, n_features))
+
+        self.leakyrelu = nn.LeakyReLU(alpha)
+        self.sigmoid = nn.Sigmoid()
+
+    def forward(self, x):
+        # x shape (b, n, k): b - batch size, n - window size, k - number of features
+        # For feature attention we represent a node as the values of a particular feature across all timestamps
+
+        x = x.permute(0, 2, 1)
+
+        # 'Dynamic' GAT attention
+        # Proposed by Brody et. al., 2021 (https://arxiv.org/pdf/2105.14491.pdf)
+        # Linear transformation applied after concatenation and attention layer applied after leakyrelu
+        if self.use_gatv2:
+            a_input = self._make_attention_input(x)  # (b, k, k, 2*window_size)
+            a_input = self.leakyrelu(self.lin(a_input))  # (b, k, k, embed_dim)
+            e = torch.matmul(a_input, self.a).squeeze(3)  # (b, k, k, 1)
+
+        # Original GAT attention
+        else:
+            Wx = self.lin(x)  # (b, k, k, embed_dim)
+            a_input = self._make_attention_input(Wx)  # (b, k, k, 2*embed_dim)
+            e = self.leakyrelu(torch.matmul(a_input, self.a)).squeeze(3)  # (b, k, k, 1)
+
+        if self.use_bias:
+            e += self.bias
+
+        # Attention weights
+        attention = torch.softmax(e, dim=2)
+        attention = torch.dropout(attention, self.dropout, train=self.training)
+
+        # Computing new node features using the attention
+        h = self.sigmoid(torch.matmul(attention, x))
+
+        return h.permute(0, 2, 1)
+
+    def _make_attention_input(self, v):
+        """Preparing the feature attention mechanism.
+        Creating matrix with all possible combinations of concatenations of node.
+        Each node consists of all values of that node within the window
+            v1 || v1,
+            ...
+            v1 || vK,
+            v2 || v1,
+            ...
+            v2 || vK,
+            ...
+            ...
+            vK || v1,
+            ...
+            vK || vK,
+        """
+
+        K = self.num_nodes
+        blocks_repeating = v.repeat_interleave(K, dim=1)  # Left-side of the matrix
+        blocks_alternating = v.repeat(1, K, 1)  # Right-side of the matrix
+        combined = torch.cat(
+            (blocks_repeating, blocks_alternating), dim=2
+        )  # (b, K*K, 2*window_size)
+
+        if self.use_gatv2:
+            return combined.view(v.size(0), K, K, 2 * self.window_size)
+        else:
+            return combined.view(v.size(0), K, K, 2 * self.embed_dim)
+
+
+class TemporalAttentionLayer(nn.Module):
+    """Single Graph Temporal Attention Layer
+    :param n_features: number of input features/nodes
+    :param window_size: length of the input sequence
+    :param dropout: percentage of nodes to dropout
+    :param alpha: negative slope used in the leaky rely activation function
+    :param embed_dim: embedding dimension (output dimension of linear transformation)
+    :param use_gatv2: whether to use the modified attention mechanism of GATv2 instead of standard GAT
+    :param use_bias: whether to include a bias term in the attention layer
+
+    """
+
+    def __init__(
+        self,
+        n_features,
+        window_size,
+        dropout,
+        alpha,
+        embed_dim=None,
+        use_gatv2=True,
+        use_bias=True,
+    ):
+        super(TemporalAttentionLayer, self).__init__()
+        self.n_features = n_features
+        self.window_size = window_size
+        self.dropout = dropout
+        self.use_gatv2 = use_gatv2
+        self.embed_dim = embed_dim if embed_dim is not None else n_features
+        self.num_nodes = window_size
+        self.use_bias = use_bias
+
+        # Because linear transformation is performed after concatenation in GATv2
+        if self.use_gatv2:
+            self.embed_dim *= 2
+            lin_input_dim = 2 * n_features
+            a_input_dim = self.embed_dim
+        else:
+            lin_input_dim = n_features
+            a_input_dim = 2 * self.embed_dim
+
+        self.lin = nn.Linear(lin_input_dim, self.embed_dim)
+        self.a = nn.Parameter(torch.zeros((a_input_dim, 1)))
+        nn.init.xavier_uniform_(self.a.data, gain=1.414)
+
+        if self.use_bias:
+            self.bias = nn.Parameter(torch.zeros(window_size, window_size))
+
+        self.leakyrelu = nn.LeakyReLU(alpha)
+        self.sigmoid = nn.Sigmoid()
+
+    def forward(self, x):
+        # x shape (b, n, k): b - batch size, n - window size, k - number of features
+        # For temporal attention a node is represented as all feature values at a specific timestamp
+
+        # 'Dynamic' GAT attention
+        # Proposed by Brody et. al., 2021 (https://arxiv.org/pdf/2105.14491.pdf)
+        # Linear transformation applied after concatenation and attention layer applied after leakyrelu
+        if self.use_gatv2:
+            a_input = self._make_attention_input(x)  # (b, n, n, 2*n_features)
+            a_input = self.leakyrelu(self.lin(a_input))  # (b, n, n, embed_dim)
+            e = torch.matmul(a_input, self.a).squeeze(3)  # (b, n, n, 1)
+
+        # Original GAT attention
+        else:
+            Wx = self.lin(x)  # (b, n, n, embed_dim)
+            a_input = self._make_attention_input(Wx)  # (b, n, n, 2*embed_dim)
+            e = self.leakyrelu(torch.matmul(a_input, self.a)).squeeze(3)  # (b, n, n, 1)
+
+        if self.use_bias:
+            e += self.bias  # (b, n, n, 1)
+
+        # Attention weights
+        attention = torch.softmax(e, dim=2)
+        attention = torch.dropout(attention, self.dropout, train=self.training)
+
+        h = self.sigmoid(torch.matmul(attention, x))  # (b, n, k)
+
+        return h
+
+    def _make_attention_input(self, v):
+        """Preparing the temporal attention mechanism.
+        Creating matrix with all possible combinations of concatenations of node values:
+            (v1, v2..)_t1 || (v1, v2..)_t1
+            (v1, v2..)_t1 || (v1, v2..)_t2
+
+            ...
+            ...
+
+            (v1, v2..)_tn || (v1, v2..)_t1
+            (v1, v2..)_tn || (v1, v2..)_t2
+
+        """
+
+        K = self.num_nodes
+        blocks_repeating = v.repeat_interleave(K, dim=1)  # Left-side of the matrix
+        blocks_alternating = v.repeat(1, K, 1)  # Right-side of the matrix
+        combined = torch.cat((blocks_repeating, blocks_alternating), dim=2)
+
+        if self.use_gatv2:
+            return combined.view(v.size(0), K, K, 2 * self.n_features)
+        else:
+            return combined.view(v.size(0), K, K, 2 * self.embed_dim)
+
+
+class GRULayer(nn.Module):
+    """Gated Recurrent Unit (GRU) Layer
+    :param in_dim: number of input features
+    :param hid_dim: hidden size of the GRU
+    :param n_layers: number of layers in GRU
+    :param dropout: dropout rate
+    """
+
+    def __init__(self, in_dim, hid_dim, n_layers, dropout):
+        super(GRULayer, self).__init__()
+        self.hid_dim = hid_dim
+        self.n_layers = n_layers
+        self.dropout = 0.0 if n_layers == 1 else dropout
+        self.gru = nn.GRU(
+            in_dim, hid_dim, num_layers=n_layers, batch_first=True, dropout=self.dropout
+        )
+
+    def forward(self, x):
+        out, h = self.gru(x)
+        out, h = out[-1, :, :], h[-1, :, :]  # Extracting from last layer
+        return out, h
+
+
+class RNNDecoder(nn.Module):
+    """GRU-based Decoder network that converts latent vector into output
+    :param in_dim: number of input features
+    :param n_layers: number of layers in RNN
+    :param hid_dim: hidden size of the RNN
+    :param dropout: dropout rate
+    """
+
+    def __init__(self, in_dim, hid_dim, n_layers, dropout):
+        super(RNNDecoder, self).__init__()
+        self.in_dim = in_dim
+        self.dropout = 0.0 if n_layers == 1 else dropout
+        self.rnn = nn.GRU(
+            in_dim, hid_dim, n_layers, batch_first=True, dropout=self.dropout
+        )
+
+    def forward(self, x):
+        decoder_out, _ = self.rnn(x)
+        return decoder_out
+
+
+class ReconstructionModel(nn.Module):
+    """Reconstruction Model
+    :param window_size: length of the input sequence
+    :param in_dim: number of input features
+    :param n_layers: number of layers in RNN
+    :param hid_dim: hidden size of the RNN
+    :param in_dim: number of output features
+    :param dropout: dropout rate
+    """
+
+    def __init__(self, window_size, in_dim, hid_dim, out_dim, n_layers, dropout):
+        super(ReconstructionModel, self).__init__()
+        self.window_size = window_size
+        self.decoder = RNNDecoder(in_dim, hid_dim, n_layers, dropout)
+        self.fc = nn.Linear(hid_dim, out_dim)
+
+    def forward(self, x):
+        # x will be last hidden state of the GRU layer
+        h_end = x
+        h_end_rep = h_end.repeat_interleave(self.window_size, dim=1).view(
+            x.size(0), self.window_size, -1
+        )
+
+        decoder_out = self.decoder(h_end_rep)
+        out = self.fc(decoder_out)
+        return out
+
+
+class Forecasting_Model(nn.Module):
+    """Forecasting model (fully-connected network)
+    :param in_dim: number of input features
+    :param hid_dim: hidden size of the FC network
+    :param out_dim: number of output features
+    :param n_layers: number of FC layers
+    :param dropout: dropout rate
+    """
+
+    def __init__(self, in_dim, hid_dim, out_dim, n_layers, dropout):
+        super(Forecasting_Model, self).__init__()
+        layers = [nn.Linear(in_dim, hid_dim)]
+        for _ in range(n_layers - 1):
+            layers.append(nn.Linear(hid_dim, hid_dim))
+
+        layers.append(nn.Linear(hid_dim, out_dim))
+
+        self.layers = nn.ModuleList(layers)
+        self.dropout = nn.Dropout(dropout)
+        self.relu = nn.ReLU()
+
+    def forward(self, x):
+        for i in range(len(self.layers) - 1):
+            x = self.relu(self.layers[i](x))
+            x = self.dropout(x)
+        return self.layers[-1](x)
diff --git a/networks/mtad_gat/mtad_gat.py b/networks/mtad_gat/mtad_gat.py
new file mode 100644
index 0000000..b1c6258
--- /dev/null
+++ b/networks/mtad_gat/mtad_gat.py
@@ -0,0 +1,228 @@
+import os
+import time
+import logging
+import torch
+import torch.nn as nn
+import numpy as np
+from common.utils import set_device
+
+from .modules import (
+    ConvLayer,
+    FeatureAttentionLayer,
+    TemporalAttentionLayer,
+    GRULayer,
+    Forecasting_Model,
+    ReconstructionModel,
+)
+
+
+class MTAD_GAT(nn.Module):
+    """MTAD-GAT model class.
+
+    :param n_features: Number of input features
+    :param window_size: Length of the input sequence
+    :param out_dim: Number of features to output
+    :param kernel_size: size of kernel to use in the 1-D convolution
+    :param feat_gat_embed_dim: embedding dimension (output dimension of linear transformation)
+           in feat-oriented GAT layer
+    :param time_gat_embed_dim: embedding dimension (output dimension of linear transformation)
+           in time-oriented GAT layer
+    :param use_gatv2: whether to use the modified attention mechanism of GATv2 instead of standard GAT
+    :param gru_n_layers: number of layers in the GRU layer
+    :param gru_hid_dim: hidden dimension in the GRU layer
+    :param forecast_n_layers: number of layers in the FC-based Forecasting Model
+    :param forecast_hid_dim: hidden dimension in the FC-based Forecasting Model
+    :param recon_n_layers: number of layers in the GRU-based Reconstruction Model
+    :param recon_hid_dim: hidden dimension in the GRU-based Reconstruction Model
+    :param dropout: dropout rate
+    :param alpha: negative slope used in the leaky rely activation function
+
+    """
+
+    def __init__(
+        self,
+        n_features,
+        window_size,
+        out_dim,
+        kernel_size=7,
+        feat_gat_embed_dim=None,
+        time_gat_embed_dim=None,
+        use_gatv2=True,
+        gru_n_layers=1,
+        gru_hid_dim=150,
+        forecast_n_layers=1,
+        forecast_hid_dim=150,
+        recon_n_layers=1,
+        recon_hid_dim=150,
+        dropout=0.2,
+        alpha=0.2,
+        device="cpu",
+    ):
+        super(MTAD_GAT, self).__init__()
+
+        window_size = window_size - 1
+        self.n_features = n_features
+        self.device = set_device(device)
+        self.conv = ConvLayer(n_features, kernel_size)
+        self.feature_gat = FeatureAttentionLayer(
+            n_features, window_size, dropout, alpha, feat_gat_embed_dim, use_gatv2
+        )
+        self.temporal_gat = TemporalAttentionLayer(
+            n_features, window_size, dropout, alpha, time_gat_embed_dim, use_gatv2
+        )
+        self.gru = GRULayer(3 * n_features, gru_hid_dim, gru_n_layers, dropout)
+        self.forecasting_model = Forecasting_Model(
+            gru_hid_dim, forecast_hid_dim, out_dim, forecast_n_layers, dropout
+        )
+        self.recon_model = ReconstructionModel(
+            window_size, gru_hid_dim, recon_hid_dim, out_dim, recon_n_layers, dropout
+        )
+
+    def forward(self, x):
+        # x shape (b, n, k): b - batch size, n - window size, k - number of features
+
+        x = self.conv(x)
+        h_feat = self.feature_gat(x)
+        h_temp = self.temporal_gat(x)
+
+        h_cat = torch.cat([x, h_feat, h_temp], dim=2)  # (b, n, 3k)
+
+        _, h_end = self.gru(h_cat)
+        h_end = h_end.view(x.shape[0], -1)  # Hidden state for last timestamp
+
+        predictions = self.forecasting_model(h_end)
+        recons = self.recon_model(h_end)
+
+        return predictions, recons
+
+    def fit(
+        self,
+        train_loader,
+        val_loader=None,
+        n_epochs=200,
+        batch_size=256,
+        init_lr=0.001,
+        model_root="output/",
+        print_every=1,
+    ):
+        self.n_epochs = n_epochs
+        self.batch_size = batch_size
+        self.init_lr = init_lr
+        self.forecast_criterion = nn.MSELoss()
+        self.recon_criterion = nn.MSELoss()
+        self.print_every = print_every
+
+        logging.info(f"Training model for {self.n_epochs} epochs..")
+        self.to(self.device)
+        self.optimizer = torch.optim.Adam(self.parameters(), lr=init_lr)
+        train_start = time.time()
+        for epoch in range(self.n_epochs):
+            epoch_start = time.time()
+            self.train()
+            forecast_b_losses = []
+            recon_b_losses = []
+
+            for x, y in train_loader:
+                x = x.to(self.device)
+                y = y.to(self.device)
+                self.optimizer.zero_grad()
+
+                preds, recons = self(x)
+
+                if preds.ndim == 3:
+                    preds = preds.squeeze(1)
+                if y.ndim == 3:
+                    y = y.squeeze(1)
+
+                forecast_loss = torch.sqrt(self.forecast_criterion(y, preds))
+                recon_loss = torch.sqrt(self.recon_criterion(x, recons))
+                loss = forecast_loss + recon_loss
+
+                loss.backward()
+                self.optimizer.step()
+
+                forecast_b_losses.append(forecast_loss.item())
+                recon_b_losses.append(recon_loss.item())
+
+            forecast_b_losses = np.array(forecast_b_losses)
+            recon_b_losses = np.array(recon_b_losses)
+
+            forecast_epoch_loss = np.sqrt((forecast_b_losses ** 2).mean())
+            recon_epoch_loss = np.sqrt((recon_b_losses ** 2).mean())
+
+            total_epoch_loss = forecast_epoch_loss + recon_epoch_loss
+
+            # Evaluate on validation set
+            epoch_time = time.time() - epoch_start
+
+            if epoch % self.print_every == 0:
+                s = (
+                    f"[Epoch {epoch + 1}] "
+                    f"forecast_loss = {forecast_epoch_loss:.5f}, "
+                    f"recon_loss = {recon_epoch_loss:.5f}, "
+                    f"total_loss = {total_epoch_loss:.5f}"
+                )
+                s += f" [{epoch_time:.1f}s]"
+                logging.info(s)
+
+        if val_loader is None:
+            self.save(os.path.join(model_root, "model.pt"))
+
+        train_time = int(time.time() - train_start)
+        logging.info(f"-- Training done in {train_time}s.")
+
+    def predict_prob(self, data_loader, gamma, window_labels=None):
+        self.gamma = gamma
+        self.eval()
+        preds = []
+        recons = []
+        actual = []
+        self.to(self.device)
+        with torch.no_grad():
+            for x, y in data_loader:
+                x = x.to(self.device)
+                y = y.to(self.device)
+
+                y_hat, _ = self(x)
+
+                # Shifting input to include the observed value (y) when doing the reconstruction
+                recon_x = torch.cat((x[:, 1:, :], y), dim=1)
+                _, window_recon = self(recon_x)
+
+                preds.append(y_hat.detach().cpu().numpy())
+                # Extract last reconstruction only
+                recons.append(window_recon[:, -1, :].detach().cpu().numpy())
+                actual.append(
+                    x[:, -1, :].detach().cpu().numpy()
+                )  # take the last observation as actual
+
+        preds = np.concatenate(preds, axis=0)
+        recons = np.concatenate(recons, axis=0)
+        actual = np.concatenate(actual, axis=0)
+
+        anomaly_scores = np.zeros_like(actual)
+        for i in range(preds.shape[1]):
+            a_score = np.sqrt((preds[:, i] - actual[:, i]) ** 2) + self.gamma * np.sqrt(
+                (recons[:, i] - actual[:, i]) ** 2
+            )
+            anomaly_scores[:, i] = a_score
+        anomaly_scores = np.mean(anomaly_scores, 1)
+        if window_labels is not None:
+            anomaly_label = (window_labels.sum(axis=1) > 0).astype(int)
+            return anomaly_scores, anomaly_label
+        else:
+            return anomaly_scores
+
+    def save(self, file_path):
+        """
+        Pickles the model parameters to be retrieved later
+        :param file_name: the filename to be saved as,`dload` serves as the download directory
+        """
+        torch.save(self.state_dict(), file_path)
+
+    def load(self, file_path):
+        """
+        Loads the model's parameters from the path mentioned
+        :param PATH: Should contain pickle file
+        """
+        self.load_state_dict(torch.load(file_path, map_location=self.device))
diff --git a/networks/mtad_gat/plotting.py b/networks/mtad_gat/plotting.py
new file mode 100644
index 0000000..e94ca50
--- /dev/null
+++ b/networks/mtad_gat/plotting.py
@@ -0,0 +1,624 @@
+from utils import get_data_dim, get_series_color, get_y_height
+import pandas as pd
+import numpy as np
+import os
+import json
+from datetime import datetime
+import plotly as py
+import matplotlib.pyplot as plt
+import plotly.graph_objs as go
+from plotly.subplots import make_subplots
+import cufflinks as cf
+
+cf.go_offline()
+
+
+class Plotter:
+
+    """
+    Class for visualizing results of anomaly detection.
+    Includes visualization of forecasts, reconstructions, anomaly scores, predicted and actual anomalies
+    Plotter-class inspired by TelemAnom (https://github.com/khundman/telemanom)
+    """
+
+    def __init__(self, result_path, model_id="-1"):
+        self.result_path = result_path
+        self.model_id = model_id
+        self.train_output = None
+        self.test_output = None
+        self.labels_available = True
+        self.pred_cols = None
+        self._load_results()
+        self.train_output["timestamp"] = self.train_output.index
+        self.test_output["timestamp"] = self.test_output.index
+
+        config_path = f"{self.result_path}/config.txt"
+        with open(config_path) as f:
+            self.lookback = json.load(f)["lookback"]
+
+        if "SMD" in self.result_path:
+            self.pred_cols = [f"feat_{i}" for i in range(get_data_dim("machine"))]
+        elif "SMAP" in self.result_path or "MSL" in self.result_path:
+            self.pred_cols = ["feat_1"]
+
+    def _load_results(self):
+        if self.model_id.startswith("-"):
+            dir_content = os.listdir(self.result_path)
+            datetimes = [
+                datetime.strptime(subf, "%d%m%Y_%H%M%S")
+                for subf in dir_content
+                if os.path.isdir(f"{self.result_path}/{subf}") and subf not in ["logs"]
+            ]
+            datetimes.sort()
+            model_id = datetimes[int(self.model_id)].strftime("%d%m%Y_%H%M%S")
+            self.result_path = f"{self.result_path}/{model_id}"
+
+        logging.info(f"Loading results of {self.result_path}")
+        train_output = pd.read_pickle(f"{self.result_path}/train_output.pkl")
+        train_output.to_pickle(f"{self.result_path}/train_output.pkl")
+        train_output["A_True_Global"] = 0
+        test_output = pd.read_pickle(f"{self.result_path}/test_output.pkl")
+
+        # Because for SMAP and MSL only one feature is predicted
+        if "SMAP" in self.result_path or "MSL" in self.result_path:
+            train_output[f"A_Pred_0"] = train_output["A_Pred_Global"]
+            train_output[f"A_Score_0"] = train_output["A_Score_Global"]
+            train_output[f"Thresh_0"] = train_output["Thresh_Global"]
+
+            test_output[f"A_Pred_0"] = test_output["A_Pred_Global"]
+            test_output[f"A_Score_0"] = test_output["A_Score_Global"]
+            test_output[f"Thresh_0"] = test_output["Thresh_Global"]
+
+        self.train_output = train_output
+        self.test_output = test_output
+
+    def result_summary(self):
+        path = f"{self.result_path}/summary.txt"
+        if not os.path.exists(path):
+            logging.info(f"Folder {self.result_path} do not have a summary.txt file")
+            return
+        try:
+            logging.info("Result summary:")
+            with open(path) as f:
+                result_dict = json.load(f)
+                epsilon_result = result_dict["epsilon_result"]
+                pot_result = result_dict["pot_result"]
+                bf_results = result_dict["bf_result"]
+                logging.info(f"Epsilon:")
+                logging.info(
+                    f'\t\tprecision: {epsilon_result["precision"]:.2f}, recall: {epsilon_result["recall"]:.2f}, F1: {epsilon_result["f1"]:.2f}'
+                )
+                logging.info(f"POT:")
+                logging.info(
+                    f'\t\tprecision: {pot_result["precision"]:.2f}, recall: {pot_result["recall"]:.2f}, F1: {pot_result["f1"]:.2f}'
+                )
+                logging.info(f"Brute-Force:")
+                logging.info(
+                    f'\t\tprecision: {bf_results["precision"]:.2f}, recall: {bf_results["recall"]:.2f}, F1: {bf_results["f1"]:.2f}'
+                )
+
+        except FileNotFoundError as e:
+            logging.info(e)
+
+    def create_shapes(
+        self,
+        ranges,
+        sequence_type,
+        _min,
+        _max,
+        plot_values,
+        is_test=True,
+        xref=None,
+        yref=None,
+    ):
+        """
+        Create shapes for regions to highlight in plotly (true and predicted anomaly sequences).
+
+        :param ranges: tuple of start and end indices for anomaly sequences for a feature
+        :param sequence_type: "predict" if predicted values else "true" if actual values. Determines colors.
+        :param _min: min y value of series
+        :param _max: max y value of series
+        :param plot_values: dictionary of different series to be plotted
+
+        :return: list of shapes specifications for plotly
+        """
+
+        if _max is None:
+            _max = max(plot_values["errors"])
+
+        if sequence_type is None:
+            color = "blue"
+        else:
+            color = "red" if sequence_type == "true" else "blue"
+        shapes = []
+
+        for r in ranges:
+            w = 5
+            x0 = r[0] - w
+            x1 = r[1] + w
+            shape = {
+                "type": "rect",
+                "x0": x0,
+                "y0": _min,
+                "x1": x1,
+                "y1": _max,
+                "fillcolor": color,
+                "opacity": 0.08,
+                "line": {
+                    "width": 0,
+                },
+            }
+            if xref is not None:
+                shape["xref"] = xref
+                shape["yref"] = yref
+
+            shapes.append(shape)
+
+        return shapes
+
+    @staticmethod
+    def get_anomaly_sequences(values):
+        splits = np.where(values[1:] != values[:-1])[0] + 1
+        if values[0] == 1:
+            splits = np.insert(splits, 0, 0)
+
+        a_seqs = []
+        for i in range(0, len(splits) - 1, 2):
+            a_seqs.append([splits[i], splits[i + 1] - 1])
+
+        if len(splits) % 2 == 1:
+            a_seqs.append([splits[-1], len(values) - 1])
+
+        return a_seqs
+
+    def plot_feature(
+        self,
+        feature,
+        plot_train=False,
+        plot_errors=True,
+        plot_feature_anom=False,
+        start=None,
+        end=None,
+    ):
+        """
+        Plot forecasting, reconstruction, true value of a specific feature (feature),
+        along with the anomaly score for that feature
+        """
+
+        test_copy = self.test_output.copy()
+
+        if start is not None and end is not None:
+            assert start < end
+        if start is not None:
+            test_copy = test_copy.iloc[start:, :]
+        if end is not None:
+            start = 0 if start is None else start
+            test_copy = test_copy.iloc[: end - start, :]
+
+        plot_data = [test_copy]
+
+        if plot_train:
+            train_copy = self.train_output.copy()
+            plot_data.append(train_copy)
+
+        for nr, data_copy in enumerate(plot_data):
+            is_test = nr == 0
+
+            if feature < 0 or f"Forecast_{feature}" not in data_copy.columns:
+                raise Exception(f"Channel {feature} not present in data.")
+
+            i = feature
+            plot_values = {
+                "timestamp": data_copy["timestamp"].values,
+                "y_forecast": data_copy[f"Forecast_{i}"].values,
+                "y_recon": data_copy[f"Recon_{i}"].values,
+                "y_true": data_copy[f"True_{i}"].values,
+                "errors": data_copy[f"A_Score_{i}"].values,
+                "threshold": data_copy[f"Thresh_{i}"],
+            }
+
+            anomaly_sequences = {
+                "pred": self.get_anomaly_sequences(data_copy[f"A_Pred_{i}"].values),
+                "true": self.get_anomaly_sequences(data_copy["A_True_Global"].values),
+            }
+
+            if is_test and start is not None:
+                anomaly_sequences["pred"] = [
+                    [s + start, e + start] for [s, e] in anomaly_sequences["pred"]
+                ]
+                anomaly_sequences["true"] = [
+                    [s + start, e + start] for [s, e] in anomaly_sequences["true"]
+                ]
+
+            y_min = 1.1 * plot_values["y_true"].min()
+            y_max = 1.1 * plot_values["y_true"].max()
+            e_max = 1.5 * plot_values["errors"].max()
+
+            y_shapes = self.create_shapes(
+                anomaly_sequences["pred"],
+                "predicted",
+                y_min,
+                y_max,
+                plot_values,
+                is_test=is_test,
+            )
+            e_shapes = self.create_shapes(
+                anomaly_sequences["pred"],
+                "predicted",
+                0,
+                e_max,
+                plot_values,
+                is_test=is_test,
+            )
+            if self.labels_available and (
+                "SMAP" in self.result_path or "MSL" in self.result_path
+            ):
+                y_shapes += self.create_shapes(
+                    anomaly_sequences["true"],
+                    "true",
+                    y_min,
+                    y_max,
+                    plot_values,
+                    is_test=is_test,
+                )
+                e_shapes += self.create_shapes(
+                    anomaly_sequences["true"],
+                    "true",
+                    0,
+                    e_max,
+                    plot_values,
+                    is_test=is_test,
+                )
+
+            y_df = pd.DataFrame(
+                {
+                    "timestamp": plot_values["timestamp"].reshape(
+                        -1,
+                    ),
+                    "y_forecast": plot_values["y_forecast"].reshape(
+                        -1,
+                    ),
+                    "y_recon": plot_values["y_recon"].reshape(
+                        -1,
+                    ),
+                    "y_true": plot_values["y_true"].reshape(
+                        -1,
+                    ),
+                }
+            )
+
+            e_df = pd.DataFrame(
+                {
+                    "timestamp": plot_values["timestamp"],
+                    "e_s": plot_values["errors"].reshape(
+                        -1,
+                    ),
+                    "threshold": plot_values["threshold"],
+                }
+            )
+
+            data_type = "Test data" if is_test else "Train data"
+            y_layout = {
+                "title": f"{data_type} | Forecast & reconstruction vs true value for {self.pred_cols[i] if self.pred_cols is not None else ''} ",
+                "showlegend": True,
+                "height": 400,
+                "width": 1100,
+            }
+
+            e_layout = {
+                "title": f"{data_type} | Error for {self.pred_cols[i] if self.pred_cols is not None else ''}",
+                # "yaxis": dict(range=[0, e_max]),
+                "height": 400,
+                "width": 1100,
+            }
+
+            if plot_feature_anom:
+                y_layout["shapes"] = y_shapes
+                e_layout["shapes"] = e_shapes
+
+            lines = [
+                go.Scatter(
+                    x=y_df["timestamp"],
+                    y=y_df["y_true"],
+                    line_color="rgb(0, 204, 150, 0.5)",
+                    name="y_true",
+                    line=dict(width=2),
+                ),
+                go.Scatter(
+                    x=y_df["timestamp"],
+                    y=y_df["y_forecast"],
+                    line_color="rgb(255, 127, 14, 1)",
+                    name="y_forecast",
+                    line=dict(width=2),
+                ),
+                go.Scatter(
+                    x=y_df["timestamp"],
+                    y=y_df["y_recon"],
+                    line_color="rgb(31, 119, 180, 1)",
+                    name="y_recon",
+                    line=dict(width=2),
+                ),
+            ]
+
+            fig = go.Figure(data=lines, layout=y_layout)
+            py.offline.iplot(fig)
+
+            e_lines = [
+                go.Scatter(
+                    x=e_df["timestamp"],
+                    y=e_df["e_s"],
+                    name="Error",
+                    line=dict(color="red", width=1),
+                )
+            ]
+            if plot_feature_anom:
+                e_lines.append(
+                    go.Scatter(
+                        x=e_df["timestamp"],
+                        y=e_df["threshold"],
+                        name="Threshold",
+                        line=dict(color="black", width=1, dash="dash"),
+                    )
+                )
+
+            if plot_errors:
+                e_fig = go.Figure(data=e_lines, layout=e_layout)
+                py.offline.iplot(e_fig)
+
+    def plot_all_features(self, start=None, end=None, type="test"):
+        """
+        Plotting all features, using the following order:
+            - forecasting for feature i
+            - reconstruction for feature i
+            - true value for feature i
+            - anomaly score (error) for feature i
+        """
+        if type == "train":
+            data_copy = self.train_output.copy()
+        elif type == "test":
+            data_copy = self.test_output.copy()
+
+        data_copy = data_copy.drop(
+            columns=["timestamp", "A_Score_Global", "Thresh_Global"]
+        )
+        cols = [
+            c
+            for c in data_copy.columns
+            if not (c.startswith("Thresh_") or c.startswith("A_Pred_"))
+        ]
+        data_copy = data_copy[cols]
+
+        if start is not None and end is not None:
+            assert start < end
+        if start is not None:
+            data_copy = data_copy.iloc[start:, :]
+        if end is not None:
+            start = 0 if start is None else start
+            data_copy = data_copy.iloc[: end - start, :]
+
+        num_cols = data_copy.shape[1]
+        plt.tight_layout()
+        colors = ["gray", "gray", "gray", "r"] * (num_cols // 4) + ["b", "g"]
+        data_copy.plot(
+            subplots=True, figsize=(20, num_cols), ylim=(0, 1.5), style=colors
+        )
+        plt.show()
+
+    def plot_anomaly_segments(
+        self, type="test", num_aligned_segments=None, show_boring_series=False
+    ):
+        """
+        Finds collective anomalies, i.e. feature-wise anomalies that occur at the same time, and visualize them
+        """
+        is_test = True
+        if type == "train":
+            data_copy = self.train_output.copy()
+            is_test = False
+        elif type == "test":
+            data_copy = self.test_output.copy()
+
+        def get_pred_cols(df):
+            pred_cols_to_remove = []
+            col_names_to_remove = []
+            for i, col in enumerate(self.pred_cols):
+                y = df[f"True_{i}"].values
+                if np.average(y) >= 0.95 or np.average(y) == 0.0:
+                    pred_cols_to_remove.append(col)
+                    cols = list(df.columns[4 * i : 4 * i + 4])
+                    col_names_to_remove.extend(cols)
+
+            df.drop(col_names_to_remove, axis=1, inplace=True)
+            return [x for x in self.pred_cols if x not in pred_cols_to_remove]
+
+        non_constant_pred_cols = (
+            self.pred_cols if show_boring_series else get_pred_cols(data_copy)
+        )
+
+        fig = make_subplots(
+            rows=len(non_constant_pred_cols),
+            cols=1,
+            vertical_spacing=0.4 / len(non_constant_pred_cols),
+            shared_xaxes=True,
+        )
+
+        timestamps = None
+        shapes = []
+        annotations = []
+        for i in range(len(non_constant_pred_cols)):
+            new_idx = int(data_copy.columns[4 * i].split("_")[-1])
+            values = data_copy[f"True_{new_idx}"].values
+
+            anomaly_sequences = self.get_anomaly_sequences(
+                data_copy[f"A_Pred_{new_idx}"].values
+            )
+
+            y_min = -0.1
+            y_max = 2  # 0.5 * y_max
+
+            j = i + 1
+            xref = f"x{j}" if i > 0 else "x"
+            yref = f"y{j}" if i > 0 else "y"
+            anomaly_shape = self.create_shapes(
+                anomaly_sequences,
+                None,
+                y_min,
+                y_max,
+                None,
+                xref=xref,
+                yref=yref,
+                is_test=is_test,
+            )
+            shapes.extend(anomaly_shape)
+
+            fig.append_trace(
+                go.Scatter(
+                    x=timestamps,
+                    y=values,
+                    line=dict(color=get_series_color(values), width=1),
+                ),
+                row=i + 1,
+                col=1,
+            )
+            fig.update_yaxes(range=[-0.1, get_y_height(values)], row=i + 1, col=1)
+
+            annotations.append(
+                dict(
+                    # xref="paper",
+                    xanchor="left",
+                    yref=yref,
+                    text=f"<b>{non_constant_pred_cols[i].upper()}</b>",
+                    font=dict(size=10),
+                    showarrow=False,
+                    yshift=35,
+                    xshift=(-523),
+                )
+            )
+
+        colors = ["blue", "green", "red", "black", "orange", "brown", "aqua", "hotpink"]
+        taken_shapes_i = []
+        keep_segments_i = []
+        corr_segments_count = 0
+        for nr, i in enumerate(range(len(shapes))):
+            corr_shapes = [i]
+            shape = shapes[i]
+            shape["opacity"] = 0.3
+            shape_x = shape["x0"]
+
+            for j in range(i + 1, len(shapes)):
+                if j not in taken_shapes_i and shapes[j]["x0"] == shape_x:
+                    corr_shapes.append(j)
+
+            if num_aligned_segments is not None:
+                if num_aligned_segments[0] == ">":
+                    num = int(num_aligned_segments[1:])
+                    keep_segment = len(corr_shapes) >= num
+                else:
+                    num = int(num_aligned_segments)
+                    keep_segment = len(corr_shapes) == num
+
+                if keep_segment:
+                    keep_segments_i.extend(corr_shapes)
+                    taken_shapes_i.extend(corr_shapes)
+                    if len(corr_shapes) != 1:
+                        for shape_i in corr_shapes:
+                            shapes[shape_i]["fillcolor"] = colors[
+                                corr_segments_count % len(colors)
+                            ]
+                        corr_segments_count += 1
+
+        if num_aligned_segments is not None:
+            shapes = np.array(shapes)
+            shapes = shapes[keep_segments_i].tolist()
+
+        fig.update_layout(
+            height=1800,
+            width=1200,
+            shapes=shapes,
+            template="simple_white",
+            annotations=annotations,
+            showlegend=False,
+        )
+
+        fig.update_yaxes(ticks="", showticklabels=False, showline=True, mirror=True)
+        fig.update_xaxes(ticks="", showticklabels=False, showline=True, mirror=True)
+        py.offline.iplot(fig)
+
+    def plot_global_predictions(self, type="test"):
+        if type == "test":
+            data_copy = self.test_output.copy()
+        else:
+            data_copy = self.train_output.copy()
+
+        fig, axs = plt.subplots(
+            3,
+            figsize=(30, 10),
+            sharex=True,
+        )
+        axs[0].plot(data_copy[f"A_Score_Global"], c="r", label="anomaly scores")
+        axs[0].plot(
+            data_copy["Thresh_Global"], linestyle="dashed", c="black", label="threshold"
+        )
+        axs[1].plot(data_copy["A_Pred_Global"], label="predicted anomalies", c="orange")
+        if self.labels_available and type == "test":
+            axs[2].plot(
+                data_copy["A_True_Global"],
+                label="actual anomalies",
+            )
+        axs[0].set_ylim([0, 5 * np.mean(data_copy["Thresh_Global"].values)])
+        fig.legend(prop={"size": 20})
+        plt.show()
+
+    def plotly_global_predictions(self, type="test"):
+        is_test = True
+        if type == "train":
+            data_copy = self.train_output.copy()
+            is_test = False
+        elif type == "test":
+            data_copy = self.test_output.copy()
+
+        tot_anomaly_scores = data_copy["A_Score_Global"].values
+        pred_anomaly_sequences = self.get_anomaly_sequences(
+            data_copy[f"A_Pred_Global"].values
+        )
+        threshold = data_copy["Thresh_Global"].values
+        y_min = -0.1
+        y_max = 5 * np.mean(threshold)  # np.max(tot_anomaly_scores)
+        shapes = self.create_shapes(
+            pred_anomaly_sequences, "pred", y_min, y_max, None, is_test=is_test
+        )
+        if self.labels_available and is_test:
+            true_anomaly_sequences = self.get_anomaly_sequences(
+                data_copy[f"A_True_Global"].values
+            )
+            shapes2 = self.create_shapes(
+                true_anomaly_sequences, "true", y_min, y_max, None, is_test=is_test
+            )
+            shapes.extend(shapes2)
+
+        layout = {
+            "title": f"{type} set | Total error, predicted anomalies in blue, true anomalies in red if available "
+            f"(making correctly predicted in purple)",
+            "shapes": shapes,
+            "yaxis": dict(range=[0, y_max]),
+            "height": 400,
+            "width": 1500,
+        }
+
+        fig = go.Figure(
+            data=[
+                go.Scatter(
+                    x=data_copy["timestamp"],
+                    y=tot_anomaly_scores,
+                    name="Error",
+                    line=dict(width=1, color="red"),
+                ),
+                go.Scatter(
+                    x=data_copy["timestamp"],
+                    y=threshold,
+                    name="Threshold",
+                    line=dict(color="black", width=1, dash="dash"),
+                ),
+            ],
+            layout=layout,
+        )
+        py.offline.iplot(fig)
diff --git a/networks/mtad_gat/predict.py b/networks/mtad_gat/predict.py
new file mode 100644
index 0000000..12da9b1
--- /dev/null
+++ b/networks/mtad_gat/predict.py
@@ -0,0 +1,204 @@
+import argparse
+import json
+import datetime
+
+from args import get_parser, str2bool
+from utils import *
+from mtad_gat import MTAD_GAT
+from prediction import Predictor
+
+if __name__ == "__main__":
+
+    parser = get_parser()
+    parser.add_argument(
+        "--model_id",
+        type=str,
+        default=None,
+        help="ID (datetime) of pretrained model to use, '-1' for latest, '-2' for second latest, etc",
+    )
+    parser.add_argument(
+        "--load_scores",
+        type=str2bool,
+        default=False,
+        help="To use already computed anomaly scores",
+    )
+    parser.add_argument("--save_output", type=str2bool, default=False)
+    args = parser.parse_args()
+    logging.info(args)
+
+    dataset = args.dataset
+    if args.model_id is None:
+        if dataset == "SMD":
+            dir_path = f"./output/{dataset}/{args.group}"
+        else:
+            dir_path = f"./output/{dataset}"
+        dir_content = os.listdir(dir_path)
+        subfolders = [
+            subf
+            for subf in dir_content
+            if os.path.isdir(f"{dir_path}/{subf}") and subf != "logs"
+        ]
+        date_times = [
+            datetime.datetime.strptime(subf, "%d%m%Y_%H%M%S") for subf in subfolders
+        ]
+        date_times.sort()
+        model_datetime = date_times[-1]
+        model_id = model_datetime.strftime("%d%m%Y_%H%M%S")
+
+    else:
+        model_id = args.model_id
+
+    if dataset == "SMD":
+        model_path = f"./output/{dataset}/{args.group}/{model_id}"
+    elif dataset in ["MSL", "SMAP"]:
+        model_path = f"./output/{dataset}/{model_id}"
+    else:
+        raise Exception(f'Dataset "{dataset}" not available.')
+
+    # Check that model exist
+    if not os.path.isfile(f"{model_path}/model.pt"):
+        raise Exception(f"<{model_path}/model.pt> does not exist.")
+
+    # Get configs of model
+    logging.info(f"Using model from {model_path}")
+    model_parser = argparse.ArgumentParser()
+    model_args, unknown = model_parser.parse_known_args()
+    model_args_path = f"{model_path}/config.txt"
+
+    with open(model_args_path, "r") as f:
+        model_args.__dict__ = json.load(f)
+    window_size = model_args.lookback
+
+    # Check that model is trained on specified dataset
+    if args.dataset.lower() != model_args.dataset.lower():
+        raise Exception(
+            f"Model trained on {model_args.dataset}, but asked to predict {args.dataset}."
+        )
+
+    elif args.dataset == "SMD" and args.group != model_args.group:
+        logging.info(
+            f"Model trained on SMD group {model_args.group}, but asked to predict SMD group {args.group}."
+        )
+
+    window_size = model_args.lookback
+    normalize = model_args.normalize
+    n_epochs = model_args.epochs
+    batch_size = model_args.bs
+    init_lr = model_args.init_lr
+    val_split = model_args.val_split
+    shuffle_dataset = model_args.shuffle_dataset
+    use_cuda = model_args.use_cuda
+    print_every = model_args.print_every
+    group_index = model_args.group[0]
+    index = model_args.group[2:]
+    args_summary = str(model_args.__dict__)
+
+    if dataset == "SMD":
+        (x_train, _), (x_test, y_test) = get_data(
+            f"machine-{group_index}-{index}", normalize=normalize
+        )
+    else:
+        (x_train, _), (x_test, y_test) = get_data(args.dataset, normalize=normalize)
+
+    x_train = torch.from_numpy(x_train).float()
+    x_test = torch.from_numpy(x_test).float()
+    n_features = x_train.shape[1]
+
+    target_dims = get_target_dims(args.dataset)
+    if target_dims is None:
+        out_dim = n_features
+    elif type(target_dims) == int:
+        out_dim = 1
+    else:
+        out_dim = len(target_dims)
+
+    train_dataset = SlidingWindowDataset(x_train, window_size, target_dims)
+    test_dataset = SlidingWindowDataset(x_test, window_size, target_dims)
+
+    train_loader, val_loader, test_loader = create_data_loaders(
+        train_dataset, batch_size, val_split, shuffle_dataset, test_dataset=test_dataset
+    )
+
+    train_dataset = SlidingWindowDataset(x_train, window_size, target_dims)
+    test_dataset = SlidingWindowDataset(x_test, window_size, target_dims)
+
+    model = MTAD_GAT(
+        n_features,
+        window_size,
+        out_dim,
+        kernel_size=args.kernel_size,
+        use_gatv2=args.use_gatv2,
+        feat_gat_embed_dim=args.feat_gat_embed_dim,
+        time_gat_embed_dim=args.time_gat_embed_dim,
+        gru_n_layers=args.gru_n_layers,
+        gru_hid_dim=args.gru_hid_dim,
+        forecast_n_layers=args.fc_n_layers,
+        forecast_hid_dim=args.fc_hid_dim,
+        recon_n_layers=args.recon_n_layers,
+        recon_hid_dim=args.recon_hid_dim,
+        dropout=args.dropout,
+        alpha=args.alpha,
+    )
+
+    device = "cuda" if args.use_cuda and torch.cuda.is_available() else "cpu"
+    load(model, f"{model_path}/model.pt", device=device)
+    model.to(device)
+
+    # Some suggestions for POT args
+    level_q_dict = {
+        "SMAP": (0.90, 0.005),
+        "MSL": (0.90, 0.001),
+        "SMD-1": (0.9950, 0.001),
+        "SMD-2": (0.9925, 0.001),
+        "SMD-3": (0.9999, 0.001),
+    }
+    key = "SMD-" + args.group[0] if args.dataset == "SMD" else args.dataset
+    level, q = level_q_dict[key]
+    if args.level is not None:
+        level = args.level
+    if args.q is not None:
+        q = args.q
+
+    # Some suggestions for Epsilon args
+    reg_level_dict = {"SMAP": 0, "MSL": 0, "SMD-1": 1, "SMD-2": 1, "SMD-3": 1}
+    key = "SMD-" + args.group[0] if dataset == "SMD" else dataset
+    reg_level = reg_level_dict[key]
+
+    prediction_args = {
+        "dataset": dataset,
+        "target_dims": target_dims,
+        "scale_scores": args.scale_scores,
+        "level": level,
+        "q": q,
+        "dynamic_pot": args.dynamic_pot,
+        "use_mov_av": args.use_mov_av,
+        "gamma": args.gamma,
+        "reg_level": reg_level,
+        "save_path": f"{model_path}",
+    }
+
+    # Creating a new summary-file each time when new prediction are made with a pre-trained model
+    count = 0
+    for filename in os.listdir(model_path):
+        if filename.startswith("summary"):
+            count += 1
+    if count == 0:
+        summary_file_name = "summary.txt"
+    else:
+        summary_file_name = f"summary_{count}.txt"
+
+    label = y_test[window_size:] if y_test is not None else None
+    predictor = Predictor(
+        model,
+        window_size,
+        n_features,
+        prediction_args,
+        summary_file_name=summary_file_name,
+    )
+    predictor.predict_anomalies(
+        x_train,
+        x_test,
+        label,
+        load_scores=args.load_scores,
+        save_output=args.save_output,
+    )
diff --git a/networks/mtad_gat/prediction.py b/networks/mtad_gat/prediction.py
new file mode 100644
index 0000000..e641f48
--- /dev/null
+++ b/networks/mtad_gat/prediction.py
@@ -0,0 +1,251 @@
+import json
+from tqdm import tqdm
+from eval_methods import *
+from utils import *
+
+
+class Predictor:
+    """MTAD-GAT predictor class.
+
+    :param model: MTAD-GAT model (pre-trained) used to forecast and reconstruct
+    :param window_size: Length of the input sequence
+    :param n_features: Number of input features
+    :param pred_args: params for thresholding and predicting anomalies
+
+    """
+
+    def __init__(
+        self, model, window_size, n_features, pred_args, summary_file_name="summary.txt"
+    ):
+        self.model = model
+        self.window_size = window_size
+        self.n_features = n_features
+        self.dataset = pred_args["dataset"]
+        self.target_dims = pred_args["target_dims"]
+        self.scale_scores = pred_args["scale_scores"]
+        self.q = pred_args["q"]
+        self.level = pred_args["level"]
+        self.dynamic_pot = pred_args["dynamic_pot"]
+        self.use_mov_av = pred_args["use_mov_av"]
+        self.gamma = pred_args["gamma"]
+        self.reg_level = pred_args["reg_level"]
+        self.save_path = pred_args["save_path"]
+        self.batch_size = 256
+        self.use_cuda = True
+        self.pred_args = pred_args
+        self.summary_file_name = summary_file_name
+
+    def get_score(self, values):
+        """Method that calculates anomaly score using given model and data
+        :param values: 2D array of multivariate time series data, shape (N, k)
+        :return np array of anomaly scores + dataframe with prediction for each channel and global anomalies
+        """
+
+        logging.info("Predicting and calculating anomaly scores..")
+        data = SlidingWindowDataset(values, self.window_size, self.target_dims)
+        loader = torch.utils.data.DataLoader(
+            data, batch_size=self.batch_size, shuffle=False
+        )
+        device = "cuda" if self.use_cuda and torch.cuda.is_available() else "cpu"
+
+        self.model.eval()
+        preds = []
+        recons = []
+        with torch.no_grad():
+            for x, y in tqdm(loader):
+                x = x.to(device)
+                y = y.to(device)
+
+                y_hat, _ = self.model(x)
+
+                # Shifting input to include the observed value (y) when doing the reconstruction
+                recon_x = torch.cat((x[:, 1:, :], y), dim=1)
+                _, window_recon = self.model(recon_x)
+
+                preds.append(y_hat.detach().cpu().numpy())
+                # Extract last reconstruction only
+                recons.append(window_recon[:, -1, :].detach().cpu().numpy())
+
+        preds = np.concatenate(preds, axis=0)
+        recons = np.concatenate(recons, axis=0)
+        actual = values.detach().cpu().numpy()[self.window_size :]
+
+        if self.target_dims is not None:
+            actual = actual[:, self.target_dims]
+
+        anomaly_scores = np.zeros_like(actual)
+        df = pd.DataFrame()
+        for i in range(preds.shape[1]):
+            df[f"Forecast_{i}"] = preds[:, i]
+            df[f"Recon_{i}"] = recons[:, i]
+            df[f"True_{i}"] = actual[:, i]
+            a_score = np.sqrt((preds[:, i] - actual[:, i]) ** 2) + self.gamma * np.sqrt(
+                (recons[:, i] - actual[:, i]) ** 2
+            )
+
+            if self.scale_scores:
+                q75, q25 = np.percentile(a_score, [75, 25])
+                iqr = q75 - q25
+                median = np.median(a_score)
+                a_score = (a_score - median) / (1 + iqr)
+
+            anomaly_scores[:, i] = a_score
+            df[f"A_Score_{i}"] = a_score
+
+        anomaly_scores = np.mean(anomaly_scores, 1)
+        df["A_Score_Global"] = anomaly_scores
+
+        return df
+
+    def predict_anomalies(
+        self,
+        train,
+        test,
+        true_anomalies,
+        load_scores=False,
+        save_output=True,
+        scale_scores=False,
+    ):
+        """Predicts anomalies
+
+        :param train: 2D array of train multivariate time series data
+        :param test: 2D array of test multivariate time series data
+        :param true_anomalies: true anomalies of test set, None if not available
+        :param save_scores: Whether to save anomaly scores of train and test
+        :param load_scores: Whether to load anomaly scores instead of calculating them
+        :param save_output: Whether to save output dataframe
+        :param scale_scores: Whether to feature-wise scale anomaly scores
+        """
+
+        if load_scores:
+            logging.info("Loading anomaly scores")
+
+            train_pred_df = pd.read_pickle(f"{self.save_path}/train_output.pkl")
+            test_pred_df = pd.read_pickle(f"{self.save_path}/test_output.pkl")
+
+            train_anomaly_scores = train_pred_df["A_Score_Global"].values
+            test_anomaly_scores = test_pred_df["A_Score_Global"].values
+
+        else:
+            train_pred_df = self.get_score(train)
+            test_pred_df = self.get_score(test)
+
+            train_anomaly_scores = train_pred_df["A_Score_Global"].values
+            test_anomaly_scores = test_pred_df["A_Score_Global"].values
+
+            train_anomaly_scores = adjust_anomaly_scores(
+                train_anomaly_scores, self.dataset, True, self.window_size
+            )
+            test_anomaly_scores = adjust_anomaly_scores(
+                test_anomaly_scores, self.dataset, False, self.window_size
+            )
+
+            # Update df
+            train_pred_df["A_Score_Global"] = train_anomaly_scores
+            test_pred_df["A_Score_Global"] = test_anomaly_scores
+
+        if self.use_mov_av:
+            smoothing_window = int(self.batch_size * self.window_size * 0.05)
+            train_anomaly_scores = (
+                pd.DataFrame(train_anomaly_scores)
+                .ewm(span=smoothing_window)
+                .mean()
+                .values.flatten()
+            )
+            test_anomaly_scores = (
+                pd.DataFrame(test_anomaly_scores)
+                .ewm(span=smoothing_window)
+                .mean()
+                .values.flatten()
+            )
+
+        # Find threshold and predict anomalies at feature-level (for plotting and diagnosis purposes)
+        out_dim = self.n_features if self.target_dims is None else len(self.target_dims)
+        all_preds = np.zeros((len(test_pred_df), out_dim))
+        for i in range(out_dim):
+            train_feature_anom_scores = train_pred_df[f"A_Score_{i}"].values
+            test_feature_anom_scores = test_pred_df[f"A_Score_{i}"].values
+            epsilon = find_epsilon(train_feature_anom_scores, reg_level=2)
+
+            train_feature_anom_preds = (train_feature_anom_scores >= epsilon).astype(
+                int
+            )
+            test_feature_anom_preds = (test_feature_anom_scores >= epsilon).astype(int)
+
+            train_pred_df[f"A_Pred_{i}"] = train_feature_anom_preds
+            test_pred_df[f"A_Pred_{i}"] = test_feature_anom_preds
+
+            train_pred_df[f"Thresh_{i}"] = epsilon
+            test_pred_df[f"Thresh_{i}"] = epsilon
+
+            all_preds[:, i] = test_feature_anom_preds
+
+        # Global anomalies (entity-level) are predicted using aggregation of anomaly scores across all features
+        # These predictions are used to evaluate performance, as true anomalies are labeled at entity-level
+        # Evaluate using different threshold methods: brute-force, epsilon and peaks-over-treshold
+        e_eval = epsilon_eval(
+            train_anomaly_scores,
+            test_anomaly_scores,
+            true_anomalies,
+            reg_level=self.reg_level,
+        )
+        p_eval = pot_eval(
+            train_anomaly_scores,
+            test_anomaly_scores,
+            true_anomalies,
+            q=self.q,
+            level=self.level,
+            dynamic=self.dynamic_pot,
+        )
+        if true_anomalies is not None:
+            bf_eval = bf_search(
+                test_anomaly_scores,
+                true_anomalies,
+                start=0.01,
+                end=2,
+                step_num=100,
+                verbose=False,
+            )
+        else:
+            bf_eval = {}
+
+        logging.info(f"Results using epsilon method:\n {e_eval}")
+        logging.info(f"Results using peak-over-threshold method:\n {p_eval}")
+        logging.info(f"Results using best f1 score search:\n {bf_eval}")
+
+        for k, v in e_eval.items():
+            if not type(e_eval[k]) == list:
+                e_eval[k] = float(v)
+        for k, v in p_eval.items():
+            if not type(p_eval[k]) == list:
+                p_eval[k] = float(v)
+        for k, v in bf_eval.items():
+            bf_eval[k] = float(v)
+
+        # Save
+        summary = {"epsilon_result": e_eval, "pot_result": p_eval, "bf_result": bf_eval}
+        with open(f"{self.save_path}/{self.summary_file_name}", "w") as f:
+            json.dump(summary, f, indent=2)
+
+        # Save anomaly predictions made using epsilon method (could be changed to pot or bf-method)
+        if save_output:
+            global_epsilon = e_eval["threshold"]
+            test_pred_df["A_True_Global"] = true_anomalies
+            train_pred_df["Thresh_Global"] = global_epsilon
+            test_pred_df["Thresh_Global"] = global_epsilon
+            train_pred_df[f"A_Pred_Global"] = (
+                train_anomaly_scores >= global_epsilon
+            ).astype(int)
+            test_preds_global = (test_anomaly_scores >= global_epsilon).astype(int)
+            # Adjust predictions according to evaluation strategy
+            if true_anomalies is not None:
+                test_preds_global = adjust_predicts(
+                    None, true_anomalies, global_epsilon, pred=test_preds_global
+                )
+            test_pred_df[f"A_Pred_Global"] = test_preds_global
+
+            logging.info(f"Saving output to {self.save_path}/<train/test>_output.pkl")
+            train_pred_df.to_pickle(f"{self.save_path}/train_output.pkl")
+            test_pred_df.to_pickle(f"{self.save_path}/test_output.pkl")
+
+        logging.info("-- Done.")
diff --git a/networks/mtad_gat/train.py b/networks/mtad_gat/train.py
new file mode 100644
index 0000000..6a183b6
--- /dev/null
+++ b/networks/mtad_gat/train.py
@@ -0,0 +1,178 @@
+import json
+from datetime import datetime
+import torch.nn as nn
+
+from args import get_parser
+from utils import *
+from mtad_gat import MTAD_GAT
+from prediction import Predictor
+from training import Trainer
+
+
+if __name__ == "__main__":
+
+    id = datetime.now().strftime("%d%m%Y_%H%M%S")
+
+    parser = get_parser()
+    args = parser.parse_args()
+
+    dataset = args.dataset
+    window_size = args.lookback
+    spec_res = args.spec_res
+    normalize = args.normalize
+    n_epochs = args.epochs
+    batch_size = args.bs
+    init_lr = args.init_lr
+    val_split = args.val_split
+    shuffle_dataset = args.shuffle_dataset
+    use_cuda = args.use_cuda
+    print_every = args.print_every
+    log_tensorboard = args.log_tensorboard
+    group_index = args.group[0]
+    index = args.group[2:]
+    args_summary = str(args.__dict__)
+    logging.info(args_summary)
+
+    if dataset == "SMD":
+        output_path = f"output/SMD/{args.group}"
+        (x_train, _), (x_test, y_test) = get_data(
+            f"machine-{group_index}-{index}", normalize=normalize
+        )
+    elif dataset in ["MSL", "SMAP"]:
+        output_path = f"output/{dataset}"
+        (x_train, _), (x_test, y_test) = get_data(dataset, normalize=normalize)
+    else:
+        raise Exception(f'Dataset "{dataset}" not available.')
+
+    log_dir = f"{output_path}/logs"
+    if not os.path.exists(output_path):
+        os.makedirs(output_path)
+    if not os.path.exists(log_dir):
+        os.makedirs(log_dir)
+    save_path = f"{output_path}/{id}"
+
+    x_train = torch.from_numpy(x_train).float()
+    x_test = torch.from_numpy(x_test).float()
+    n_features = x_train.shape[1]
+
+    target_dims = get_target_dims(dataset)
+    if target_dims is None:
+        out_dim = n_features
+        logging.info(f"Will forecast and reconstruct all {n_features} input features")
+    elif type(target_dims) == int:
+        logging.info(f"Will forecast and reconstruct input feature: {target_dims}")
+        out_dim = 1
+    else:
+        logging.info(f"Will forecast and reconstruct input features: {target_dims}")
+        out_dim = len(target_dims)
+
+    train_dataset = SlidingWindowDataset(x_train, window_size, target_dims)
+    test_dataset = SlidingWindowDataset(x_test, window_size, target_dims)
+
+    train_loader, val_loader, test_loader = create_data_loaders(
+        train_dataset, batch_size, val_split, shuffle_dataset, test_dataset=test_dataset
+    )
+
+    logging.info(next(iter(train_loader))[0].shape)
+    # logging.info(next(iter(val_loader)).shape)
+    # logging.info(next(iter(test_loader)).shape)
+
+    model = MTAD_GAT(
+        n_features,
+        window_size,
+        out_dim,
+        kernel_size=args.kernel_size,
+        use_gatv2=args.use_gatv2,
+        feat_gat_embed_dim=args.feat_gat_embed_dim,
+        time_gat_embed_dim=args.time_gat_embed_dim,
+        gru_n_layers=args.gru_n_layers,
+        gru_hid_dim=args.gru_hid_dim,
+        forecast_n_layers=args.fc_n_layers,
+        forecast_hid_dim=args.fc_hid_dim,
+        recon_n_layers=args.recon_n_layers,
+        recon_hid_dim=args.recon_hid_dim,
+        dropout=args.dropout,
+        alpha=args.alpha,
+    )
+
+    optimizer = torch.optim.Adam(model.parameters(), lr=args.init_lr)
+    forecast_criterion = nn.MSELoss()
+    recon_criterion = nn.MSELoss()
+
+    trainer = Trainer(
+        model,
+        optimizer,
+        window_size,
+        n_features,
+        target_dims,
+        n_epochs,
+        batch_size,
+        init_lr,
+        forecast_criterion,
+        recon_criterion,
+        use_cuda,
+        save_path,
+        log_dir,
+        print_every,
+        log_tensorboard,
+        args_summary,
+    )
+
+    trainer.fit(train_loader, val_loader)
+
+    plot_losses(trainer.losses, save_path=save_path, plot=False)
+
+    # Check test loss
+    test_loss = trainer.evaluate(test_loader)
+    logging.info(f"Test forecast loss: {test_loss[0]:.5f}")
+    logging.info(f"Test reconstruction loss: {test_loss[1]:.5f}")
+    logging.info(f"Test total loss: {test_loss[2]:.5f}")
+
+    # Some suggestions for POT args
+    level_q_dict = {
+        "SMAP": (0.90, 0.005),
+        "MSL": (0.90, 0.001),
+        "SMD-1": (0.9950, 0.001),
+        "SMD-2": (0.9925, 0.001),
+        "SMD-3": (0.9999, 0.001),
+    }
+    key = "SMD-" + args.group[0] if args.dataset == "SMD" else args.dataset
+    level, q = level_q_dict[key]
+    if args.level is not None:
+        level = args.level
+    if args.q is not None:
+        q = args.q
+
+    # Some suggestions for Epsilon args
+    reg_level_dict = {"SMAP": 0, "MSL": 0, "SMD-1": 1, "SMD-2": 1, "SMD-3": 1}
+    key = "SMD-" + args.group[0] if dataset == "SMD" else dataset
+    reg_level = reg_level_dict[key]
+
+    trainer.load(f"{save_path}/model.pt")
+    prediction_args = {
+        "dataset": dataset,
+        "target_dims": target_dims,
+        "scale_scores": args.scale_scores,
+        "level": level,
+        "q": q,
+        "dynamic_pot": args.dynamic_pot,
+        "use_mov_av": args.use_mov_av,
+        "gamma": args.gamma,
+        "reg_level": reg_level,
+        "save_path": save_path,
+    }
+    best_model = trainer.model
+    predictor = Predictor(
+        best_model,
+        window_size,
+        n_features,
+        prediction_args,
+    )
+
+    label = y_test[window_size:] if y_test is not None else None
+    predictor.predict_anomalies(x_train, x_test, label)
+
+    # Save config
+    args_path = f"{save_path}/config.txt"
+    with open(args_path, "w") as f:
+        json.dump(args.__dict__, f, indent=2)
diff --git a/networks/mtad_gat/training.py b/networks/mtad_gat/training.py
new file mode 100644
index 0000000..fd6721d
--- /dev/null
+++ b/networks/mtad_gat/training.py
@@ -0,0 +1,257 @@
+import os
+import time
+import numpy as np
+import torch
+import torch.nn as nn
+from torch.utils.tensorboard import SummaryWriter
+
+
+class Trainer:
+    """Trainer class for MTAD-GAT model.
+
+    :param model: MTAD-GAT model
+    :param optimizer: Optimizer used to minimize the loss function
+    :param window_size: Length of the input sequence
+    :param n_features: Number of input features
+    :param target_dims: dimension of input features to forecast and reconstruct
+    :param n_epochs: Number of iterations/epochs
+    :param batch_size: Number of windows in a single batch
+    :param init_lr: Initial learning rate of the module
+    :param forecast_criterion: Loss to be used for forecasting.
+    :param recon_criterion: Loss to be used for reconstruction.
+    :param boolean use_cuda: To be run on GPU or not
+    :param dload: Download directory where models are to be dumped
+    :param log_dir: Directory where SummaryWriter logs are written to
+    :param print_every: At what epoch interval to print losses
+    :param log_tensorboard: Whether to log loss++ to tensorboard
+    :param args_summary: Summary of args that will also be written to tensorboard if log_tensorboard
+    """
+
+    def __init__(
+        self,
+        model,
+        optimizer,
+        window_size,
+        n_features,
+        target_dims=None,
+        n_epochs=200,
+        batch_size=256,
+        init_lr=0.001,
+        forecast_criterion=nn.MSELoss(),
+        recon_criterion=nn.MSELoss(),
+        use_cuda=True,
+        dload="",
+        log_dir="output/",
+        print_every=1,
+        log_tensorboard=True,
+        args_summary="",
+    ):
+
+        self.model = model
+        self.optimizer = optimizer
+        self.window_size = window_size
+        self.n_features = n_features
+        self.target_dims = target_dims
+        self.n_epochs = n_epochs
+        self.batch_size = batch_size
+        self.init_lr = init_lr
+        self.forecast_criterion = forecast_criterion
+        self.recon_criterion = recon_criterion
+        self.device = "cuda" if use_cuda and torch.cuda.is_available() else "cpu"
+        self.dload = dload
+        self.log_dir = log_dir
+        self.print_every = print_every
+        self.log_tensorboard = log_tensorboard
+
+        self.losses = {
+            "train_total": [],
+            "train_forecast": [],
+            "train_recon": [],
+            "val_total": [],
+            "val_forecast": [],
+            "val_recon": [],
+        }
+        self.epoch_times = []
+
+        if self.device == "cuda":
+            self.model.cuda()
+
+        if self.log_tensorboard:
+            self.writer = SummaryWriter(f"{log_dir}")
+            self.writer.add_text("args_summary", args_summary)
+
+    def fit(self, train_loader, val_loader=None):
+        """Train model for self.n_epochs.
+        Train and validation (if validation loader given) losses stored in self.losses
+
+        :param train_loader: train loader of input data
+        :param val_loader: validation loader of input data
+        """
+
+        init_train_loss = self.evaluate(train_loader)
+        logging.info(f"Init total train loss: {init_train_loss[2]:5f}")
+
+        if val_loader is not None:
+            init_val_loss = self.evaluate(val_loader)
+            logging.info(f"Init total val loss: {init_val_loss[2]:.5f}")
+
+        logging.info(f"Training model for {self.n_epochs} epochs..")
+        train_start = time.time()
+        for epoch in range(self.n_epochs):
+            epoch_start = time.time()
+            self.model.train()
+            forecast_b_losses = []
+            recon_b_losses = []
+
+            for x, y in train_loader:
+                x = x.to(self.device)
+                y = y.to(self.device)
+
+                logging.info(x.shape, y.shape)
+                self.optimizer.zero_grad()
+
+                preds, recons = self.model(x)
+
+                if self.target_dims is not None:
+                    x = x[:, :, self.target_dims]
+                    y = y[:, :, self.target_dims].squeeze(-1)
+
+                if preds.ndim == 3:
+                    preds = preds.squeeze(1)
+                if y.ndim == 3:
+                    y = y.squeeze(1)
+
+                forecast_loss = torch.sqrt(self.forecast_criterion(y, preds))
+                recon_loss = torch.sqrt(self.recon_criterion(x, recons))
+                loss = forecast_loss + recon_loss
+
+                loss.backward()
+                self.optimizer.step()
+
+                forecast_b_losses.append(forecast_loss.item())
+                recon_b_losses.append(recon_loss.item())
+
+            forecast_b_losses = np.array(forecast_b_losses)
+            recon_b_losses = np.array(recon_b_losses)
+
+            forecast_epoch_loss = np.sqrt((forecast_b_losses ** 2).mean())
+            recon_epoch_loss = np.sqrt((recon_b_losses ** 2).mean())
+
+            total_epoch_loss = forecast_epoch_loss + recon_epoch_loss
+
+            self.losses["train_forecast"].append(forecast_epoch_loss)
+            self.losses["train_recon"].append(recon_epoch_loss)
+            self.losses["train_total"].append(total_epoch_loss)
+
+            # Evaluate on validation set
+            forecast_val_loss, recon_val_loss, total_val_loss = "NA", "NA", "NA"
+            if val_loader is not None:
+                forecast_val_loss, recon_val_loss, total_val_loss = self.evaluate(
+                    val_loader
+                )
+                self.losses["val_forecast"].append(forecast_val_loss)
+                self.losses["val_recon"].append(recon_val_loss)
+                self.losses["val_total"].append(total_val_loss)
+
+                if total_val_loss <= self.losses["val_total"][-1]:
+                    self.save(f"model.pt")
+
+            if self.log_tensorboard:
+                self.write_loss(epoch)
+
+            epoch_time = time.time() - epoch_start
+            self.epoch_times.append(epoch_time)
+
+            if epoch % self.print_every == 0:
+                s = (
+                    f"[Epoch {epoch + 1}] "
+                    f"forecast_loss = {forecast_epoch_loss:.5f}, "
+                    f"recon_loss = {recon_epoch_loss:.5f}, "
+                    f"total_loss = {total_epoch_loss:.5f}"
+                )
+
+                if val_loader is not None:
+                    s += (
+                        f" ---- val_forecast_loss = {forecast_val_loss:.5f}, "
+                        f"val_recon_loss = {recon_val_loss:.5f}, "
+                        f"val_total_loss = {total_val_loss:.5f}"
+                    )
+
+                s += f" [{epoch_time:.1f}s]"
+                logging.info(s)
+
+        if val_loader is None:
+            self.save(f"model.pt")
+
+        train_time = int(time.time() - train_start)
+        if self.log_tensorboard:
+            self.writer.add_text("total_train_time", str(train_time))
+        logging.info(f"-- Training done in {train_time}s.")
+
+    def evaluate(self, data_loader):
+        """Evaluate model
+
+        :param data_loader: data loader of input data
+        :return forecasting loss, reconstruction loss, total loss
+        """
+
+        self.model.eval()
+
+        forecast_losses = []
+        recon_losses = []
+
+        with torch.no_grad():
+            for x, y in data_loader:
+                x = x.to(self.device)
+                y = y.to(self.device)
+
+                preds, recons = self.model(x)
+
+                if self.target_dims is not None:
+                    x = x[:, :, self.target_dims]
+                    y = y[:, :, self.target_dims].squeeze(-1)
+
+                if preds.ndim == 3:
+                    preds = preds.squeeze(1)
+                if y.ndim == 3:
+                    y = y.squeeze(1)
+
+                forecast_loss = torch.sqrt(self.forecast_criterion(y, preds))
+                recon_loss = torch.sqrt(self.recon_criterion(x, recons))
+
+                forecast_losses.append(forecast_loss.item())
+                recon_losses.append(recon_loss.item())
+
+        forecast_losses = np.array(forecast_losses)
+        recon_losses = np.array(recon_losses)
+
+        forecast_loss = np.sqrt((forecast_losses ** 2).mean())
+        recon_loss = np.sqrt((recon_losses ** 2).mean())
+
+        total_loss = forecast_loss + recon_loss
+
+        return forecast_loss, recon_loss, total_loss
+
+    def save(self, file_name):
+        """
+        Pickles the model parameters to be retrieved later
+        :param file_name: the filename to be saved as,`dload` serves as the download directory
+        """
+        PATH = self.dload + "/" + file_name
+        if os.path.exists(self.dload):
+            pass
+        else:
+            os.mkdir(self.dload)
+        torch.save(self.model.state_dict(), PATH)
+
+    def load(self, PATH):
+        """
+        Loads the model's parameters from the path mentioned
+        :param PATH: Should contain pickle file
+        """
+        self.model.load_state_dict(torch.load(PATH, map_location=self.device))
+
+    def write_loss(self, epoch):
+        for key, value in self.losses.items():
+            if len(value) != 0:
+                self.writer.add_scalar(key, value[-1], epoch)
diff --git a/networks/mtad_gat/utils.py b/networks/mtad_gat/utils.py
new file mode 100644
index 0000000..f8f04ad
--- /dev/null
+++ b/networks/mtad_gat/utils.py
@@ -0,0 +1,279 @@
+import os
+import pickle
+import matplotlib.pyplot as plt
+import pandas as pd
+import numpy as np
+import torch
+from sklearn.preprocessing import MinMaxScaler, RobustScaler
+from torch.utils.data import DataLoader, Dataset, SubsetRandomSampler
+
+
+def normalize_data(data, scaler=None):
+    data = np.asarray(data, dtype=np.float32)
+    if np.any(sum(np.isnan(data))):
+        data = np.nan_to_num(data)
+
+    if scaler is None:
+        scaler = MinMaxScaler()
+        scaler.fit(data)
+    data = scaler.transform(data)
+    logging.info("Data normalized")
+
+    return data, scaler
+
+
+def get_data_dim(dataset):
+    """
+    :param dataset: Name of dataset
+    :return: Number of dimensions in data
+    """
+    if dataset == "SMAP":
+        return 25
+    elif dataset == "MSL":
+        return 55
+    elif str(dataset).startswith("machine"):
+        return 38
+    else:
+        raise ValueError("unknown dataset " + str(dataset))
+
+
+def get_target_dims(dataset):
+    """
+    :param dataset: Name of dataset
+    :return: index of data dimension that should be modeled (forecasted and reconstructed),
+                     returns None if all input dimensions should be modeled
+    """
+    if dataset == "SMAP":
+        return [0]
+    elif dataset == "MSL":
+        return [0]
+    elif dataset == "SMD":
+        return None
+    else:
+        raise ValueError("unknown dataset " + str(dataset))
+
+
+def get_data(
+    dataset,
+    max_train_size=None,
+    max_test_size=None,
+    normalize=False,
+    spec_res=False,
+    train_start=0,
+    test_start=0,
+):
+    """
+    Get data from pkl files
+
+    return shape: (([train_size, x_dim], [train_size] or None), ([test_size, x_dim], [test_size]))
+    Method from OmniAnomaly (https://github.com/NetManAIOps/OmniAnomaly)
+    """
+    prefix = "datasets"
+    if str(dataset).startswith("machine"):
+        prefix += "/ServerMachineDataset/processed"
+    elif dataset in ["MSL", "SMAP"]:
+        prefix += "/data/processed"
+    if max_train_size is None:
+        train_end = None
+    else:
+        train_end = train_start + max_train_size
+    if max_test_size is None:
+        test_end = None
+    else:
+        test_end = test_start + max_test_size
+    logging.info("load data of:", dataset)
+    logging.info("train: ", train_start, train_end)
+    logging.info("test: ", test_start, test_end)
+    x_dim = get_data_dim(dataset)
+    f = open(os.path.join(prefix, dataset + "_train.pkl"), "rb")
+    train_data = pickle.load(f).reshape((-1, x_dim))[train_start:train_end, :]
+    f.close()
+    try:
+        f = open(os.path.join(prefix, dataset + "_test.pkl"), "rb")
+        test_data = pickle.load(f).reshape((-1, x_dim))[test_start:test_end, :]
+        f.close()
+    except (KeyError, FileNotFoundError):
+        test_data = None
+    try:
+        f = open(os.path.join(prefix, dataset + "_test_label.pkl"), "rb")
+        test_label = pickle.load(f).reshape((-1))[test_start:test_end]
+        f.close()
+    except (KeyError, FileNotFoundError):
+        test_label = None
+
+    if normalize:
+        train_data, scaler = normalize_data(train_data, scaler=None)
+        test_data, _ = normalize_data(test_data, scaler=scaler)
+
+    logging.info("train set shape: ", train_data.shape)
+    logging.info("test set shape: ", test_data.shape)
+    logging.info(
+        "test set label shape: ", None if test_label is None else test_label.shape
+    )
+    return (train_data, None), (test_data, test_label)
+
+
+class SlidingWindowDataset(Dataset):
+    def __init__(self, data, window, target_dim=None, horizon=1):
+        self.data = data
+        self.window = window
+        self.target_dim = target_dim
+        self.horizon = horizon
+
+    def __getitem__(self, index):
+        x = self.data[index : index + self.window]
+        y = self.data[index + self.window : index + self.window + self.horizon]
+        return x, y
+
+    def __len__(self):
+        return len(self.data) - self.window
+
+
+def create_data_loaders(
+    train_dataset, batch_size, val_split=0.1, shuffle=True, test_dataset=None
+):
+    train_loader, val_loader, test_loader = None, None, None
+    if val_split == 0.0:
+        logging.info(f"train_size: {len(train_dataset)}")
+        train_loader = torch.utils.data.DataLoader(
+            train_dataset, batch_size=batch_size, shuffle=shuffle
+        )
+
+    else:
+        dataset_size = len(train_dataset)
+        indices = list(range(dataset_size))
+        split = int(np.floor(val_split * dataset_size))
+        if shuffle:
+            np.random.shuffle(indices)
+        train_indices, val_indices = indices[split:], indices[:split]
+
+        train_sampler = SubsetRandomSampler(train_indices)
+        valid_sampler = SubsetRandomSampler(val_indices)
+
+        train_loader = torch.utils.data.DataLoader(
+            train_dataset, batch_size=batch_size, sampler=train_sampler
+        )
+        val_loader = torch.utils.data.DataLoader(
+            train_dataset, batch_size=batch_size, sampler=valid_sampler
+        )
+
+        logging.info(f"train_size: {len(train_indices)}")
+        logging.info(f"validation_size: {len(val_indices)}")
+
+    if test_dataset is not None:
+        test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
+        logging.info(f"test_size: {len(test_dataset)}")
+
+    return train_loader, val_loader, test_loader
+
+
+def plot_losses(losses, save_path="", plot=True):
+    """
+    :param losses: dict with losses
+    :param save_path: path where plots get saved
+    """
+
+    plt.plot(losses["train_forecast"], label="Forecast loss")
+    plt.plot(losses["train_recon"], label="Recon loss")
+    plt.plot(losses["train_total"], label="Total loss")
+    plt.title("Training losses during training")
+    plt.xlabel("Epoch")
+    plt.ylabel("RMSE")
+    plt.legend()
+    plt.savefig(f"{save_path}/train_losses.png", bbox_inches="tight")
+    if plot:
+        plt.show()
+    plt.close()
+
+    plt.plot(losses["val_forecast"], label="Forecast loss")
+    plt.plot(losses["val_recon"], label="Recon loss")
+    plt.plot(losses["val_total"], label="Total loss")
+    plt.title("Validation losses during training")
+    plt.xlabel("Epoch")
+    plt.ylabel("RMSE")
+    plt.legend()
+    plt.savefig(f"{save_path}/validation_losses.png", bbox_inches="tight")
+    if plot:
+        plt.show()
+    plt.close()
+
+
+def load(model, PATH, device="cpu"):
+    """
+    Loads the model's parameters from the path mentioned
+    :param PATH: Should contain pickle file
+    """
+    model.load_state_dict(torch.load(PATH, map_location=device))
+
+
+def get_series_color(y):
+    if np.average(y) >= 0.95:
+        return "black"
+    elif np.average(y) == 0.0:
+        return "black"
+    else:
+        return "black"
+
+
+def get_y_height(y):
+    if np.average(y) >= 0.95:
+        return 1.5
+    elif np.average(y) == 0.0:
+        return 0.1
+    else:
+        return max(y) + 0.1
+
+
+def adjust_anomaly_scores(scores, dataset, is_train, lookback):
+    """
+    Method for MSL and SMAP where channels have been concatenated as part of the preprocessing
+    :param scores: anomaly_scores
+    :param dataset: name of dataset
+    :param is_train: if scores is from train set
+    :param lookback: lookback (window size) used in model
+    """
+
+    # Remove errors for time steps when transition to new channel (as this will be impossible for model to predict)
+    if dataset.upper() not in ["SMAP", "MSL"]:
+        return scores
+
+    adjusted_scores = scores.copy()
+    if is_train:
+        md = pd.read_csv(f"./datasets/data/{dataset.lower()}_train_md.csv")
+    else:
+        md = pd.read_csv("./datasets/data/labeled_anomalies.csv")
+        md = md[md["spacecraft"] == dataset.upper()]
+
+    md = md[md["chan_id"] != "P-2"]
+
+    # Sort values by channel
+    md = md.sort_values(by=["chan_id"])
+
+    # Getting the cumulative start index for each channel
+    sep_cuma = np.cumsum(md["num_values"].values) - lookback
+    sep_cuma = sep_cuma[:-1]
+    buffer = np.arange(1, 20)
+    i_remov = np.sort(
+        np.concatenate(
+            (
+                sep_cuma,
+                np.array([i + buffer for i in sep_cuma]).flatten(),
+                np.array([i - buffer for i in sep_cuma]).flatten(),
+            )
+        )
+    )
+    i_remov = i_remov[(i_remov < len(adjusted_scores)) & (i_remov >= 0)]
+    i_remov = np.sort(np.unique(i_remov))
+    if len(i_remov) != 0:
+        adjusted_scores[i_remov] = 0
+
+    # Normalize each concatenated part individually
+    sep_cuma = np.cumsum(md["num_values"].values) - lookback
+    s = [0] + sep_cuma.tolist()
+    for c_start, c_end in [(s[i], s[i + 1]) for i in range(len(s) - 1)]:
+        e_s = adjusted_scores[c_start : c_end + 1]
+
+        e_s = (e_s - np.min(e_s)) / (np.max(e_s) - np.min(e_s))
+        adjusted_scores[c_start : c_end + 1] = e_s
+
+    return adjusted_scores
diff --git a/networks/omni_anomaly/__init__.py b/networks/omni_anomaly/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/networks/omni_anomaly/detector.py b/networks/omni_anomaly/detector.py
new file mode 100644
index 0000000..0743ddc
--- /dev/null
+++ b/networks/omni_anomaly/detector.py
@@ -0,0 +1,190 @@
+import numpy as np
+import tensorflow as tf
+from tfsnippet import VariableSaver
+from tfsnippet.examples.utils import MLResults
+from tfsnippet.utils import get_variables_as_dict, Config
+from .model import OmniAnomaly
+from .prediction import Predictor
+from .training import Trainer
+from tensorflow.python.keras.utils import Sequence
+
+
+class DataGenerator(Sequence):
+    def __init__(
+        self,
+        data_array,
+        batch_size=32,
+        shuffle=False,
+    ):
+        self.darray = data_array
+        self.batch_size = batch_size
+        self.shuffle = shuffle
+        self.index_pool = list(range(self.darray.shape[0]))
+        self.length = int(np.ceil(len(self.index_pool) * 1.0 / self.batch_size))
+        self.on_epoch_end()
+
+    def __len__(self):
+        return self.length
+
+    def __getitem__(self, index):
+        indexes = self.index_pool[
+            index * self.batch_size : (index + 1) * self.batch_size
+        ]
+        X = self.darray[indexes]
+
+        # in case on_epoch_end not be called automatically :)
+        if index == self.length - 1:
+            self.on_epoch_end()
+        return X
+
+    def on_epoch_end(self):
+        if self.shuffle:
+            np.random.shuffle(self.index_pool)
+
+
+class ExpConfig(Config):
+    # model architecture configuration
+    use_connected_z_q = True
+    use_connected_z_p = True
+
+    # model parameters
+    z_dim = 3
+    rnn_cell = "GRU"  # 'GRU', 'LSTM' or 'Basic'
+    rnn_num_hidden = 500
+    window_length = 100
+    dense_dim = 500
+    posterior_flow_type = "nf"  # 'nf' or None
+    nf_layers = 20  # for nf
+    max_epoch = 1
+    train_start = 0
+    max_train_size = None  # `None` means full train set
+    batch_size = 256
+    l2_reg = 0.0001
+    initial_lr = 0.001
+    lr_anneal_factor = 0.5
+    lr_anneal_epoch_freq = 40
+    lr_anneal_step_freq = None
+    std_epsilon = 1e-4
+
+    # evaluation parameters
+    test_n_z = 1
+    test_batch_size = 50
+    test_start = 0
+    max_test_size = None  # `None` means full test set
+
+    # the range and step-size for score for searching best-f1
+    # may vary for different dataset
+    bf_search_min = -400.0
+    bf_search_max = 400.0
+    bf_search_step_size = 1.0
+
+    valid_step_freq = 100
+    gradient_clip_norm = 10.0
+
+    early_stop = False  # whether to apply early stop method
+
+    # pot parameters
+    # recommend values for `level`:
+    # SMAP: 0.07
+    # MSL: 0.01
+    # SMD group 1: 0.0050
+    # SMD group 2: 0.0075
+    # SMD group 3: 0.0001
+    level = 0.07
+
+    # outputs config
+    save_z = False  # whether to save sampled z in hidden space
+    get_score_on_dim = False  # whether to get score on dim. If `True`, the score will be a 2-dim ndarray
+    save_dir = "model"
+    restore_dir = None  # If not None, restore variables from this dir
+    result_dir = "result"  # Where to save the result file
+    train_score_filename = "train_score.pkl"
+    test_score_filename = "test_score.pkl"
+
+
+class OmniDetector:
+    def __init__(self, dim, model_root, window_size, initial_lr, l2_reg):
+        self.config = self.__init_config(dim, model_root, window_size, initial_lr, l2_reg)
+        self.time_tracker = {}
+        self.__init_model()
+
+    def __init_config(self, dim, model_root, window_size, initial_lr, l2_reg):
+        config = ExpConfig()
+        config.x_dim = dim
+        config.result_dir = model_root
+        config.window_length = window_size
+        config.initial_lr = initial_lr
+        config.l2_reg = l2_reg
+
+        results = MLResults(config.result_dir)
+        results.save_config(config)
+        results.make_dirs(config.save_dir, exist_ok=True)
+        return config
+
+    def __init_model(self):
+        tf.reset_default_graph()
+        with tf.variable_scope("model") as model_vs:
+            model = OmniAnomaly(config=self.config, name="model")
+            # construct the trainer
+            self.trainer = Trainer(
+                model=model,
+                model_vs=model_vs,
+                max_epoch=self.config.max_epoch,
+                batch_size=self.config.batch_size,
+                valid_batch_size=self.config.test_batch_size,
+                initial_lr=self.config.initial_lr,
+                lr_anneal_epochs=self.config.lr_anneal_epoch_freq,
+                lr_anneal_factor=self.config.lr_anneal_factor,
+                grad_clip_norm=self.config.gradient_clip_norm,
+                valid_step_freq=self.config.valid_step_freq,
+            )
+
+            # construct the predictor
+            self.predictor = Predictor(
+                model,
+                batch_size=self.config.batch_size,
+                n_z=self.config.test_n_z,
+                last_point_only=True,
+            )
+
+    def fit(self, iterator):
+        tf_config = tf.ConfigProto(allow_soft_placement=True)
+        tf_config.gpu_options.allow_growth = True
+        with tf.variable_scope("model") as model_vs:
+            with tf.Session(config=tf_config).as_default():
+                if self.config.restore_dir is not None:
+                    # Restore variables from `save_dir`.
+                    saver = VariableSaver(
+                        get_variables_as_dict(model_vs), self.config.restore_dir
+                    )
+                    saver.restore()
+
+                best_valid_metrics = self.trainer.fit(iterator)
+
+                self.time_tracker["train"] = best_valid_metrics["total_train_time"]
+                if self.config.save_dir is not None:
+                    # save the variables
+                    var_dict = get_variables_as_dict(model_vs)
+                    saver = VariableSaver(var_dict, self.config.save_dir)
+                    saver.save()
+                print("=" * 30 + "result" + "=" * 30)
+
+    def predict_prob(self, iterator, label_windows=None):
+        tf_config = tf.ConfigProto(allow_soft_placement=True)
+        tf_config.gpu_options.allow_growth = True
+        with tf.variable_scope("model") as model_vs:
+            with tf.Session(config=tf_config).as_default():
+                if self.config.save_dir is not None:
+                    # Restore variables from `save_dir`.
+                    saver = VariableSaver(
+                        get_variables_as_dict(model_vs), self.config.save_dir
+                    )
+                    saver.restore()
+
+                score, z, pred_time = self.predictor.get_score(iterator)
+                self.time_tracker["test"] = pred_time
+        if label_windows is not None:
+            anomaly_label = (np.sum(label_windows, axis=1) >= 1) + 0
+            return score, anomaly_label
+        else:
+            return score
diff --git a/networks/omni_anomaly/model.py b/networks/omni_anomaly/model.py
new file mode 100644
index 0000000..95afc6d
--- /dev/null
+++ b/networks/omni_anomaly/model.py
@@ -0,0 +1,242 @@
+# -*- coding: utf-8 -*-
+from functools import partial
+
+import tensorflow as tf
+import tfsnippet as spt
+from tensorflow.python.ops.linalg.linear_operator_identity import LinearOperatorIdentity
+from tensorflow_probability.python.distributions import LinearGaussianStateSpaceModel, MultivariateNormalDiag
+from tfsnippet.distributions import Normal
+from tfsnippet.utils import VarScopeObject, reopen_variable_scope
+from tfsnippet.variational import VariationalInference
+
+from .recurrent_distribution import RecurrentDistribution
+from .vae import Lambda, VAE
+from .wrapper import TfpDistribution, softplus_std, rnn, wrap_params_net
+
+
+class OmniAnomaly(VarScopeObject):
+    def __init__(self, config, name=None, scope=None):
+        self.config = config
+        super(OmniAnomaly, self).__init__(name=name, scope=scope)
+        with reopen_variable_scope(self.variable_scope):
+            if config.posterior_flow_type == "nf":
+                self._posterior_flow = spt.layers.planar_normalizing_flows(
+                    config.nf_layers, name="posterior_flow"
+                )
+            else:
+                self._posterior_flow = None
+            self._window_length = config.window_length
+            self._x_dims = config.x_dim
+            self._z_dims = config.z_dim
+            self._vae = VAE(
+                p_z=TfpDistribution(
+                    LinearGaussianStateSpaceModel(
+                        num_timesteps=config.window_length,
+                        transition_matrix=LinearOperatorIdentity(config.z_dim),
+                        transition_noise=MultivariateNormalDiag(
+                            scale_diag=tf.ones([config.z_dim])
+                        ),
+                        observation_matrix=LinearOperatorIdentity(config.z_dim),
+                        observation_noise=MultivariateNormalDiag(
+                            scale_diag=tf.ones([config.z_dim])
+                        ),
+                        initial_state_prior=MultivariateNormalDiag(
+                            scale_diag=tf.ones([config.z_dim])
+                        ),
+                    )
+                )
+                if config.use_connected_z_p
+                else Normal(mean=tf.zeros([config.z_dim]), std=tf.ones([config.z_dim])),
+                p_x_given_z=Normal,
+                q_z_given_x=partial(
+                    RecurrentDistribution,
+                    mean_q_mlp=partial(
+                        tf.layers.dense,
+                        units=config.z_dim,
+                        name="z_mean",
+                        reuse=tf.AUTO_REUSE,
+                    ),
+                    std_q_mlp=partial(
+                        softplus_std,
+                        units=config.z_dim,
+                        epsilon=config.std_epsilon,
+                        name="z_std",
+                    ),
+                    z_dim=config.z_dim,
+                    window_length=config.window_length,
+                )
+                if config.use_connected_z_q
+                else Normal,
+                h_for_p_x=Lambda(
+                    partial(
+                        wrap_params_net,
+                        h_for_dist=lambda x: rnn(
+                            x=x,
+                            window_length=config.window_length,
+                            rnn_num_hidden=config.rnn_num_hidden,
+                            hidden_dense=2,
+                            dense_dim=config.dense_dim,
+                            name="rnn_p_x",
+                        ),
+                        mean_layer=partial(
+                            tf.layers.dense,
+                            units=config.x_dim,
+                            name="x_mean",
+                            reuse=tf.AUTO_REUSE,
+                        ),
+                        std_layer=partial(
+                            softplus_std,
+                            units=config.x_dim,
+                            epsilon=config.std_epsilon,
+                            name="x_std",
+                        ),
+                    ),
+                    name="p_x_given_z",
+                ),
+                h_for_q_z=Lambda(
+                    lambda x: {
+                        "input_q": rnn(
+                            x=x,
+                            window_length=config.window_length,
+                            rnn_num_hidden=config.rnn_num_hidden,
+                            hidden_dense=2,
+                            dense_dim=config.dense_dim,
+                            name="rnn_q_z",
+                        )
+                    },
+                    name="q_z_given_x",
+                )
+                if config.use_connected_z_q
+                else Lambda(
+                    partial(
+                        wrap_params_net,
+                        h_for_dist=lambda x: rnn(
+                            x=x,
+                            window_length=config.window_length,
+                            rnn_num_hidden=config.rnn_num_hidden,
+                            hidden_dense=2,
+                            dense_dim=config.dense_dim,
+                            name="rnn_q_z",
+                        ),
+                        mean_layer=partial(
+                            tf.layers.dense,
+                            units=config.z_dim,
+                            name="z_mean",
+                            reuse=tf.AUTO_REUSE,
+                        ),
+                        std_layer=partial(
+                            softplus_std,
+                            units=config.z_dim,
+                            epsilon=config.std_epsilon,
+                            name="z_std",
+                        ),
+                    ),
+                    name="q_z_given_x",
+                ),
+            )
+
+    @property
+    def x_dims(self):
+        """Get the number of `x` dimensions."""
+        return self._x_dims
+
+    @property
+    def z_dims(self):
+        """Get the number of `z` dimensions."""
+        return self._z_dims
+
+    @property
+    def vae(self):
+        """
+        Get the VAE object of this :class:`OmniAnomaly` model.
+
+        Returns:
+            VAE: The VAE object of this model.
+        """
+        return self._vae
+
+    @property
+    def window_length(self):
+        return self._window_length
+
+    def get_training_loss(self, x, n_z=None):
+        """
+        Get the training loss for `x`.
+
+        Args:
+            x (tf.Tensor): 2-D `float32` :class:`tf.Tensor`, the windows of
+                KPI observations in a mini-batch.
+            n_z (int or None): Number of `z` samples to take for each `x`.
+                (default :obj:`None`, one sample without explicit sampling
+                dimension)
+
+        Returns:
+            tf.Tensor: 0-d tensor, the training loss, which can be optimized
+                by gradient descent algorithms.
+        """
+        with tf.name_scope("training_loss"):
+            chain = self.vae.chain(x, n_z=n_z, posterior_flow=self._posterior_flow)
+            x_log_prob = chain.model["x"].log_prob(group_ndims=0)
+            log_joint = tf.reduce_sum(x_log_prob, -1)
+            chain.vi.training.sgvb()
+            vi = VariationalInference(
+                log_joint=log_joint,
+                latent_log_probs=chain.vi.latent_log_probs,
+                axis=chain.vi.axis,
+            )
+            loss = tf.reduce_mean(vi.training.sgvb())
+            return loss
+
+    def get_score(self, x, n_z=None, last_point_only=True):
+        """
+        Get the reconstruction probability for `x`.
+
+        The larger `reconstruction probability`, the less likely a point
+        is anomaly.  You may take the negative of the score, if you want
+        something to directly indicate the severity of anomaly.
+
+        Args:
+            x (tf.Tensor): 2-D `float32` :class:`tf.Tensor`, the windows of
+                KPI observations in a mini-batch.
+            n_z (int or None): Number of `z` samples to take for each `x`.
+                (default :obj:`None`, one sample without explicit sampling
+                dimension)
+            last_point_only (bool): Whether to obtain the reconstruction
+                probability of only the last point in each window?
+                (default :obj:`True`)
+
+        Returns:
+            tf.Tensor: The reconstruction probability, with the shape
+                ``(len(x) - self.x_dims + 1,)`` if `last_point_only` is
+                :obj:`True`, or ``(len(x) - self.x_dims + 1, self.x_dims)``
+                if `last_point_only` is :obj:`False`.  This is because the
+                first ``self.x_dims - 1`` points are not the last point of
+                any window.
+        """
+        with tf.name_scope("get_score"):
+            x_r = x
+
+            # get the reconstruction probability
+            print("-" * 30, "testing", "-" * 30)
+            q_net = self.vae.variational(
+                x=x_r, n_z=n_z, posterior_flow=self._posterior_flow
+            )  # notice: x=x_r
+            p_net = self.vae.model(z=q_net["z"], x=x, n_z=n_z)  # notice: x=x
+            z_samples = q_net["z"].tensor
+            z_mean = tf.reduce_mean(z_samples, axis=0) if n_z is not None else z_samples
+            z_std = (
+                tf.sqrt(
+                    tf.reduce_sum(tf.square(z_samples - z_mean), axis=0) / (n_z - 1)
+                )
+                if n_z is not None and n_z > 1
+                else tf.zeros_like(z_mean)
+            )
+            z = tf.concat((z_mean, z_std), axis=-1)
+
+            r_prob = p_net["x"].log_prob(
+                group_ndims=int(not self.config.get_score_on_dim)
+            )
+
+            if last_point_only:
+                r_prob = r_prob[:, -1]
+            return r_prob, z
diff --git a/networks/omni_anomaly/prediction.py b/networks/omni_anomaly/prediction.py
new file mode 100644
index 0000000..f7544a4
--- /dev/null
+++ b/networks/omni_anomaly/prediction.py
@@ -0,0 +1,132 @@
+# -*- coding: utf-8 -*-
+import time
+
+import numpy as np
+import six
+import tensorflow as tf
+from tfsnippet.utils import (
+    VarScopeObject,
+    get_default_session_or_error,
+    reopen_variable_scope,
+)
+
+
+__all__ = ["Predictor"]
+
+
+class Predictor(VarScopeObject):
+    """
+    OmniAnomaly predictor.
+
+    Args:
+        model (OmniAnomaly): The :class:`OmniAnomaly` model instance.
+        n_z (int or None): Number of `z` samples to take for each `x`.
+            If :obj:`None`, one sample without explicit sampling dimension.
+            (default 1024)
+        batch_size (int): Size of each mini-batch for prediction.
+            (default 32)
+        feed_dict (dict[tf.Tensor, any]): User provided feed dict for
+            prediction. (default :obj:`None`)
+        last_point_only (bool): Whether to obtain the reconstruction
+            probability of only the last point in each window?
+            (default :obj:`True`)
+        name (str): Optional name of this predictor
+            (argument of :class:`tfsnippet.utils.VarScopeObject`).
+        scope (str): Optional scope of this predictor
+            (argument of :class:`tfsnippet.utils.VarScopeObject`).
+    """
+
+    def __init__(
+        self,
+        model,
+        n_z=1024,
+        batch_size=32,
+        feed_dict=None,
+        last_point_only=True,
+        name=None,
+        scope=None,
+    ):
+        super(Predictor, self).__init__(name=name, scope=scope)
+        self._model = model
+        self._n_z = n_z
+        self._batch_size = batch_size
+        if feed_dict is not None:
+            self._feed_dict = dict(six.iteritems(feed_dict))
+        else:
+            self._feed_dict = {}
+        self._last_point_only = last_point_only
+
+        with reopen_variable_scope(self.variable_scope):
+            # input placeholders
+            self._input_x = tf.placeholder(
+                dtype=tf.float32,
+                shape=[None, model.window_length, model.x_dims],
+                name="input_x",
+            )
+            self._input_y = tf.placeholder(
+                dtype=tf.int32, shape=[None, model.window_length], name="input_y"
+            )
+
+            # outputs of interest
+            self._score = self._score_without_y = None
+
+    def _get_score_without_y(self):
+        if self._score_without_y is None:
+            with reopen_variable_scope(self.variable_scope), tf.name_scope(
+                "score_without_y"
+            ):
+                self._score_without_y, self._q_net_z = self.model.get_score(
+                    x=self._input_x,
+                    n_z=self._n_z,
+                    last_point_only=self._last_point_only,
+                )
+                # print ('\t_get_score_without_y ',type(self._q_net_z))
+        return self._score_without_y, self._q_net_z
+
+    @property
+    def model(self):
+        """
+        Get the :class:`OmniAnomaly` model instance.
+
+        Returns:
+            OmniAnomaly: The :class:`OmniAnomaly` model instance.
+        """
+        return self._model
+
+    def get_score(self, test_iterator):
+        """
+        Get the `reconstruction probability` of specified KPI observations.
+
+        The larger `reconstruction probability`, the less likely a point
+        is anomaly.  You may take the negative of the score, if you want
+        something to directly indicate the severity of anomaly.
+
+        Args:
+            values (np.ndarray): 1-D float32 array, the KPI observations.
+
+        Returns:
+            np.ndarray: The `reconstruction probability`,
+                1-D array if `last_point_only` is :obj:`True`,
+                or 2-D array if `last_point_only` is :obj:`False`.
+        """
+        with tf.name_scope("Predictor.get_score"):
+            sess = get_default_session_or_error()
+            collector = []
+            collector_z = []
+            pred_time = []
+
+            for b_x in test_iterator:
+                start_iter_time = time.time()
+                feed_dict = dict(six.iteritems(self._feed_dict))
+                feed_dict[self._input_x] = b_x
+                b_r, q_net_z = sess.run(
+                    self._get_score_without_y(), feed_dict=feed_dict
+                )
+                collector.append(b_r)
+                pred_time.append(time.time() - start_iter_time)
+                collector_z.append(q_net_z)
+
+            # merge the results of mini-batches
+            result = np.concatenate(collector, axis=0)
+            result_z = np.concatenate(collector_z, axis=0)
+            return result, result_z, np.sum(pred_time)
diff --git a/networks/omni_anomaly/recurrent_distribution.py b/networks/omni_anomaly/recurrent_distribution.py
new file mode 100644
index 0000000..59acefb
--- /dev/null
+++ b/networks/omni_anomaly/recurrent_distribution.py
@@ -0,0 +1,162 @@
+# -*- coding: utf-8 -*-
+import tensorflow as tf
+from tfsnippet import Distribution, Normal
+
+
+class RecurrentDistribution(Distribution):
+    """
+    A multi-variable distribution integrated with recurrent structure.
+    """
+
+    @property
+    def dtype(self):
+        return self._dtype
+
+    @property
+    def is_continuous(self):
+        return self._is_continuous
+
+    @property
+    def is_reparameterized(self):
+        return self._is_reparameterized
+
+    @property
+    def value_shape(self):
+        return self.normal.value_shape
+
+    def get_value_shape(self):
+        return self.normal.get_value_shape()
+
+    @property
+    def batch_shape(self):
+        return self.normal.batch_shape
+
+    def get_batch_shape(self):
+        return self.normal.get_batch_shape()
+
+    def sample_step(self, a, t):
+        z_previous, mu_q_previous, std_q_previous = a
+        noise_n, input_q_n = t
+        input_q_n = tf.broadcast_to(input_q_n,
+                                    [tf.shape(z_previous)[0], tf.shape(input_q_n)[0], input_q_n.shape[1]])
+        input_q = tf.concat([input_q_n, z_previous], axis=-1)
+        mu_q = self.mean_q_mlp(input_q, reuse=tf.AUTO_REUSE)  # n_sample * batch_size * z_dim
+
+        std_q = self.std_q_mlp(input_q)  # n_sample * batch_size * z_dim
+
+        temp = tf.einsum('ik,ijk->ijk', noise_n, std_q)  # n_sample * batch_size * z_dim
+        mu_q = tf.broadcast_to(mu_q, tf.shape(temp))
+        std_q = tf.broadcast_to(std_q, tf.shape(temp))
+        z_n = temp + mu_q
+
+        return z_n, mu_q, std_q
+
+    # @global_reuse
+    def log_prob_step(self, _, t):
+
+        given_n, input_q_n = t
+        if len(given_n.shape) > 2:
+            input_q_n = tf.broadcast_to(input_q_n,
+                                        [tf.shape(given_n)[0], tf.shape(input_q_n)[0], input_q_n.shape[1]])
+        input_q = tf.concat([given_n, input_q_n], axis=-1)
+        mu_q = self.mean_q_mlp(input_q, reuse=tf.AUTO_REUSE)
+
+        std_q = self.std_q_mlp(input_q)
+        logstd_q = tf.log(std_q)
+        precision = tf.exp(-2 * logstd_q)
+        if self._check_numerics:
+            precision = tf.check_numerics(precision, "precision")
+        log_prob_n = - 0.9189385332046727 - logstd_q - 0.5 * precision * tf.square(tf.minimum(tf.abs(given_n - mu_q),
+                                                                                              1e8))
+        return log_prob_n
+
+    def __init__(self, input_q, mean_q_mlp, std_q_mlp, z_dim, window_length=100, is_reparameterized=True,
+                 check_numerics=True):
+        super(RecurrentDistribution, self).__init__()
+        self.normal = Normal(mean=tf.zeros([window_length, z_dim]), std=tf.ones([window_length, z_dim]))
+        self.std_q_mlp = std_q_mlp
+        self.mean_q_mlp = mean_q_mlp
+        self._check_numerics = check_numerics
+        self.input_q = tf.transpose(input_q, [1, 0, 2])
+        self._dtype = input_q.dtype
+        self._is_reparameterized = is_reparameterized
+        self._is_continuous = True
+        self.z_dim = z_dim
+        self.window_length = window_length
+        self.time_first_shape = tf.convert_to_tensor([self.window_length, tf.shape(input_q)[0], self.z_dim])
+
+    def sample(self, n_samples=1024, is_reparameterized=None, group_ndims=0, compute_density=False,
+               name=None):
+
+        from tfsnippet.stochastic import StochasticTensor
+        if n_samples is None:
+            n_samples = 1
+            n_samples_is_none = True
+        else:
+            n_samples_is_none = False
+        with tf.name_scope(name=name, default_name='sample'):
+            noise = self.normal.sample(n_samples=n_samples)
+
+            noise = tf.transpose(noise, [1, 0, 2])  # window_length * n_samples * z_dim
+            noise = tf.truncated_normal(tf.shape(noise))
+
+            time_indices_shape = tf.convert_to_tensor([n_samples, tf.shape(self.input_q)[1], self.z_dim])
+
+            samples = tf.scan(fn=self.sample_step,
+                              elems=(noise, self.input_q),
+                              initializer=(tf.zeros(time_indices_shape),
+                                           tf.zeros(time_indices_shape),
+                                           tf.ones(time_indices_shape)),
+                              back_prop=False
+                              )[0]  # time_step * n_samples * batch_size * z_dim
+
+            samples = tf.transpose(samples, [1, 2, 0, 3])  # n_samples * batch_size * time_step *  z_dim
+
+            if n_samples_is_none:
+                t = StochasticTensor(
+                    distribution=self,
+                    tensor=tf.reduce_mean(samples, axis=0),
+                    n_samples=1,
+                    group_ndims=group_ndims,
+                    is_reparameterized=self.is_reparameterized
+                )
+            else:
+                t = StochasticTensor(
+                    distribution=self,
+                    tensor=samples,
+                    n_samples=n_samples,
+                    group_ndims=group_ndims,
+                    is_reparameterized=self.is_reparameterized
+                )
+            if compute_density:
+                with tf.name_scope('compute_prob_and_log_prob'):
+                    log_p = t.log_prob()
+                    t._self_prob = tf.exp(log_p)
+            return t
+
+    def log_prob(self, given, group_ndims=0, name=None):
+        with tf.name_scope(name=name, default_name='log_prob'):
+            if len(given.shape) > 3:
+                time_indices_shape = tf.convert_to_tensor([tf.shape(given)[0], tf.shape(self.input_q)[1], self.z_dim])
+                given = tf.transpose(given, [2, 0, 1, 3])
+            else:
+                time_indices_shape = tf.convert_to_tensor([tf.shape(self.input_q)[1], self.z_dim])
+                given = tf.transpose(given, [1, 0, 2])
+            log_prob = tf.scan(fn=self.log_prob_step,
+                               elems=(given, self.input_q),
+                               initializer=tf.zeros(time_indices_shape),
+                               back_prop=False
+                               )
+            if len(given.shape) > 3:
+                log_prob = tf.transpose(log_prob, [1, 2, 0, 3])
+            else:
+                log_prob = tf.transpose(log_prob, [1, 0, 2])
+
+            if group_ndims == 1:
+                log_prob = tf.reduce_sum(log_prob, axis=-1)
+            return log_prob
+
+    def prob(self, given, group_ndims=0, name=None):
+        with tf.name_scope(name=name, default_name='prob'):
+            log_prob = self.log_prob(given, group_ndims, name)
+            return tf.exp(log_prob)
diff --git a/networks/omni_anomaly/requirements.txt b/networks/omni_anomaly/requirements.txt
new file mode 100644
index 0000000..54e6ce9
--- /dev/null
+++ b/networks/omni_anomaly/requirements.txt
@@ -0,0 +1,15 @@
+six == 1.11.0
+matplotlib == 3.0.2
+numpy == 1.15.4
+pandas == 0.23.4
+scipy == 1.2.0
+scikit_learn == 0.24.1
+# tensorflow == 1.12.0
+tensorflow-gpu == 1.12.0
+tensorflow_probability == 0.5.0
+tqdm == 4.28.1
+imageio == 2.4.1
+fs == 2.3.0
+click == 7.0
+git+https://github.com/thu-ml/zhusuan.git
+git+https://github.com/haowen-xu/tfsnippet.git@v0.2.0-alpha1
\ No newline at end of file
diff --git a/networks/omni_anomaly/training.py b/networks/omni_anomaly/training.py
new file mode 100644
index 0000000..5cd5cd1
--- /dev/null
+++ b/networks/omni_anomaly/training.py
@@ -0,0 +1,275 @@
+# -*- coding: utf-8 -*-
+import time
+
+import numpy as np
+import six
+import tensorflow as tf
+from tfsnippet.scaffold import TrainLoop
+from tfsnippet.shortcuts import VarScopeObject
+from tfsnippet.utils import (
+    reopen_variable_scope,
+    get_default_session_or_error,
+    ensure_variables_initialized,
+    get_variables_as_dict,
+)
+
+from .utils import BatchSlidingWindow
+
+
+__all__ = ["Trainer"]
+
+
+class Trainer(VarScopeObject):
+    """
+    OmniAnomaly trainer.
+
+    Args:
+        model (OmniAnomaly): The :class:`OmniAnomaly` model instance.
+        model_vs (str or tf.VariableScope): If specified, will collect
+            trainable variables only from this scope.  If :obj:`None`,
+            will collect all trainable variables within current graph.
+            (default :obj:`None`)
+        n_z (int or None): Number of `z` samples to take for each `x`.
+            (default :obj:`None`, one sample without explicit sampling
+            dimension)
+        feed_dict (dict[tf.Tensor, any]): User provided feed dict for
+            training. (default :obj:`None`, indicating no feeding)
+        valid_feed_dict (dict[tf.Tensor, any]): User provided feed dict for
+            validation.  If :obj:`None`, follow `feed_dict` of training.
+            (default :obj:`None`)
+        use_regularization_loss (bool): Whether or not to add regularization
+            loss from `tf.GraphKeys.REGULARIZATION_LOSSES` to the training
+            loss? (default :obj:`True`)
+        max_epoch (int or None): Maximum epochs to run.  If :obj:`None`,
+            will not stop at any particular epoch. (default 256)
+        max_step (int or None): Maximum steps to run.  If :obj:`None`,
+            will not stop at any particular step.  At least one of `max_epoch`
+            and `max_step` should be specified. (default :obj:`None`)
+        batch_size (int): Size of mini-batches for training. (default 256)
+        valid_batch_size (int): Size of mini-batches for validation.
+            (default 1024)
+        valid_step_freq (int): Run validation after every `valid_step_freq`
+            number of training steps. (default 100)
+        initial_lr (float): Initial learning rate. (default 0.001)
+        lr_anneal_epochs (int): Anneal the learning rate after every
+            `lr_anneal_epochs` number of epochs. (default 10)
+        lr_anneal_factor (float): Anneal the learning rate with this
+            discount factor, i.e., ``learning_rate = learning_rate
+            * lr_anneal_factor``. (default 0.75)
+        optimizer (Type[tf.train.Optimizer]): The class of TensorFlow
+            optimizer. (default :class:`tf.train.AdamOptimizer`)
+        optimizer_params (dict[str, any] or None): The named arguments
+            for constructing the optimizer. (default :obj:`None`)
+        grad_clip_norm (float or None): Clip gradient by this norm.
+            If :obj:`None`, disable gradient clip by norm. (default 10.0)
+        check_numerics (bool): Whether or not to add TensorFlow assertions
+            for numerical issues? (default :obj:`True`)
+        name (str): Optional name of this trainer
+            (argument of :class:`tfsnippet.utils.VarScopeObject`).
+        scope (str): Optional scope of this trainer
+            (argument of :class:`tfsnippet.utils.VarScopeObject`).
+    """
+
+    def __init__(
+        self,
+        model,
+        model_vs=None,
+        n_z=None,
+        feed_dict=None,
+        valid_feed_dict=None,
+        use_regularization_loss=True,
+        max_epoch=256,
+        max_step=None,
+        batch_size=256,
+        valid_batch_size=1024,
+        valid_step_freq=100,
+        initial_lr=0.001,
+        lr_anneal_epochs=10,
+        lr_anneal_factor=0.75,
+        optimizer=tf.train.AdamOptimizer,
+        optimizer_params=None,
+        grad_clip_norm=50.0,
+        check_numerics=True,
+        name=None,
+        scope=None,
+    ):
+        super(Trainer, self).__init__(name=name, scope=scope)
+
+        # memorize the arguments
+        self._model = model
+        self._n_z = n_z
+        if feed_dict is not None:
+            self._feed_dict = dict(six.iteritems(feed_dict))
+        else:
+            self._feed_dict = {}
+        if valid_feed_dict is not None:
+            self._valid_feed_dict = dict(six.iteritems(valid_feed_dict))
+        else:
+            self._valid_feed_dict = self._feed_dict
+        if max_epoch is None and max_step is None:
+            raise ValueError(
+                "At least one of `max_epoch` and `max_step` " "should be specified"
+            )
+        self._max_epoch = max_epoch
+        self._max_step = max_step
+        self._batch_size = batch_size
+        self._valid_batch_size = valid_batch_size
+        self._valid_step_freq = valid_step_freq
+        self._initial_lr = initial_lr
+        self._lr_anneal_epochs = lr_anneal_epochs
+        self._lr_anneal_factor = lr_anneal_factor
+
+        # build the trainer
+        with reopen_variable_scope(self.variable_scope):
+            # the global step for this model
+            self._global_step = tf.get_variable(
+                dtype=tf.int64,
+                name="global_step",
+                trainable=False,
+                initializer=tf.constant(0, dtype=tf.int64),
+                # reuse=True,
+            )
+
+            # input placeholders
+            self._input_x = tf.placeholder(
+                dtype=tf.float32,
+                shape=[None, model.window_length, model.x_dims],
+                name="input_x",
+            )
+            self._learning_rate = tf.placeholder(
+                dtype=tf.float32, shape=(), name="learning_rate"
+            )
+
+            # compose the training loss
+            with tf.name_scope("loss"):
+                loss = model.get_training_loss(x=self._input_x, n_z=n_z)
+                if use_regularization_loss:
+                    loss += tf.losses.get_regularization_loss()
+                self._loss = loss
+
+            # get the training variables
+            train_params = get_variables_as_dict(
+                scope=model_vs, collection=tf.GraphKeys.TRAINABLE_VARIABLES
+            )
+            self._train_params = train_params
+
+            # create the trainer
+            if optimizer_params is None:
+                optimizer_params = {}
+            else:
+                optimizer_params = dict(six.iteritems(optimizer_params))
+            optimizer_params["learning_rate"] = self._learning_rate
+            self._optimizer = optimizer(**optimizer_params)
+
+            # derive the training gradient
+            origin_grad_vars = self._optimizer.compute_gradients(
+                self._loss, list(six.itervalues(self._train_params))
+            )
+            grad_vars = []
+            for grad, var in origin_grad_vars:
+                if grad is not None and var is not None:
+                    if grad_clip_norm:
+                        grad = tf.clip_by_norm(grad, grad_clip_norm)
+                    if check_numerics:
+                        grad = tf.check_numerics(
+                            grad, "gradient for {} has numeric issue".format(var.name)
+                        )
+                    grad_vars.append((grad, var))
+
+            # build the training op
+            with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
+                self._train_op = self._optimizer.apply_gradients(
+                    grad_vars, global_step=self._global_step
+                )
+
+            # the training summary in case `summary_dir` is specified
+            with tf.name_scope("summary"):
+                self._summary_op = tf.summary.merge(
+                    [
+                        tf.summary.histogram(v.name.rsplit(":", 1)[0], v)
+                        for v in six.itervalues(self._train_params)
+                    ]
+                )
+
+            # initializer for the variables
+            self._trainer_initializer = tf.variables_initializer(
+                list(
+                    six.itervalues(
+                        get_variables_as_dict(
+                            scope=self.variable_scope,
+                            collection=tf.GraphKeys.GLOBAL_VARIABLES,
+                        )
+                    )
+                )
+            )
+
+    @property
+    def model(self):
+        """
+        Get the :class:`OmniAnomaly` model instance.
+
+        Returns:
+            OmniAnomaly: The :class:`OmniAnomaly` model instance.
+        """
+        return self._model
+
+    def fit(self, train_iterator, summary_dir=None):
+        """
+        Train the :class:`OmniAnomaly` model with given data.
+
+        Args:
+            values (np.ndarray): 1-D `float32` array, the standardized
+                KPI observations.
+            summary_dir (str): Optional summary directory for
+                :class:`tf.summary.FileWriter`. (default :obj:`None`,
+                summary is disabled)
+        """
+        sess = get_default_session_or_error()
+
+        # initialize the variables of the trainer, and the model
+        sess.run(self._trainer_initializer)
+        ensure_variables_initialized(self._train_params)
+
+        # training loop
+        lr = self._initial_lr
+        with TrainLoop(
+            param_vars=self._train_params,
+            summary_dir=summary_dir,
+            max_epoch=self._max_epoch,
+            max_step=self._max_step,
+        ) as loop:  # type: TrainLoop
+            # loop.print_training_summary()
+
+            train_batch_time = []
+            valid_batch_time = []
+
+            time_train_start = time.time()
+            for epoch in loop.iter_epochs():
+                start_time = time.time()
+                for step, idx in loop.iter_steps(range(len(train_iterator))):
+                    # run a training step
+                    # batch_x = train_iterator[idx]
+                    batch_x = next(iter(train_iterator))
+                    start_batch_time = time.time()
+                    feed_dict = dict(six.iteritems(self._feed_dict))
+                    feed_dict[self._learning_rate] = lr
+                    feed_dict[self._input_x] = batch_x
+                    loss, _ = sess.run(
+                        [self._loss, self._train_op], feed_dict=feed_dict
+                    )
+                    loop.collect_metrics({"loss": loss})
+                    train_batch_time.append(time.time() - start_batch_time)
+
+                # anneal the learning rate
+                if self._lr_anneal_epochs and epoch % self._lr_anneal_epochs == 0:
+                    lr *= self._lr_anneal_factor
+                    loop.println(
+                        "Learning rate decreased to {}".format(lr), with_tag=True
+                    )
+
+            time_train_end = time.time()
+            return {
+                # "best_valid_loss": float(loop.best_valid_metric),
+                "train_time": np.sum(train_batch_time),
+                "total_train_time": time_train_end - time_train_start,
+            }
diff --git a/networks/omni_anomaly/utils.py b/networks/omni_anomaly/utils.py
new file mode 100644
index 0000000..5677c36
--- /dev/null
+++ b/networks/omni_anomaly/utils.py
@@ -0,0 +1,415 @@
+# -*- coding: utf-8 -*-
+import os
+import pickle
+import copy
+
+import numpy as np
+from sklearn.preprocessing import MinMaxScaler
+from glob import glob
+from collections import defaultdict
+from sklearn.metrics import f1_score, precision_score, recall_score, roc_auc_score
+from tensorflow.python.keras.utils import Sequence
+
+prefix = "processed"
+
+
+class DataGenerator(Sequence):
+    def __init__(
+        self,
+        data_array,
+        batch_size=32,
+        shuffle=False,
+    ):
+        self.darray = data_array
+        self.batch_size = batch_size
+        self.shuffle = shuffle
+        self.index_pool = list(range(self.darray.shape[0]))
+        self.length = int(np.ceil(len(self.index_pool) * 1.0 / self.batch_size))
+        self.on_epoch_end()
+
+    def __len__(self):
+        return self.length
+
+    def __getitem__(self, index):
+        indexes = self.index_pool[
+            index * self.batch_size : (index + 1) * self.batch_size
+        ]
+        X = self.darray[indexes]
+
+        # in case on_epoch_end not be called automatically :)
+        if index == self.length - 1:
+            self.on_epoch_end()
+        return X
+
+    def on_epoch_end(self):
+        if self.shuffle:
+            np.random.shuffle(self.index_pool)
+
+
+def get_data_dim(dataset):
+    if dataset == "SMAP":
+        return 25
+    elif dataset == "MSL":
+        return 55
+    elif dataset == "SMD" or str(dataset).startswith("machine"):
+        return 38
+    else:
+        raise ValueError("unknown dataset " + str(dataset))
+
+
+def save_z(z, filename="z"):
+    """
+    save the sampled z in a txt file
+    """
+    for i in range(0, z.shape[1], 20):
+        with open(filename + "_" + str(i) + ".txt", "w") as file:
+            for j in range(0, z.shape[0]):
+                for k in range(0, z.shape[2]):
+                    file.write("%f " % (z[j][i][k]))
+                file.write("\n")
+    i = z.shape[1] - 1
+    with open(filename + "_" + str(i) + ".txt", "w") as file:
+        for j in range(0, z.shape[0]):
+            for k in range(0, z.shape[2]):
+                file.write("%f " % (z[j][i][k]))
+            file.write("\n")
+
+
+# def get_data_dim(dataset):
+#     if dataset == "SMAP":
+#         return 25
+#     elif dataset == "MSL":
+#         return 55
+#     elif str(dataset).startswith("machine"):
+#         return 38
+#     else:
+#         raise ValueError("unknown dataset " + str(dataset))
+
+
+def preprocess(df):
+    """returns normalized and standardized data."""
+
+    df = np.asarray(df, dtype=np.float32)
+
+    if len(df.shape) == 1:
+        raise ValueError("Data must be a 2-D array")
+
+    if np.any(sum(np.isnan(df)) != 0):
+        print("Data contains null values. Will be replaced with 0")
+        df = np.nan_to_num()
+
+    # normalize data
+    df = MinMaxScaler().fit_transform(df)
+    print("Data normalized")
+
+    return df
+
+
+def minibatch_slices_iterator(length, batch_size, ignore_incomplete_batch=False):
+    """
+    Iterate through all the mini-batch slices.
+
+    Args:
+        length (int): Total length of data in an epoch.
+        batch_size (int): Size of each mini-batch.
+        ignore_incomplete_batch (bool): If :obj:`True`, discard the final
+            batch if it contains less than `batch_size` number of items.
+            (default :obj:`False`)
+
+    Yields
+        slice: Slices of each mini-batch.  The last mini-batch may contain
+               less indices than `batch_size`.
+    """
+    start = 0
+    stop1 = (length // batch_size) * batch_size
+    while start < stop1:
+        yield slice(start, start + batch_size, 1)
+        start += batch_size
+    if not ignore_incomplete_batch and start < length:
+        yield slice(start, length, 1)
+
+
+class BatchSlidingWindow(object):
+    """
+    Class for obtaining mini-batch iterators of sliding windows.
+
+    Each mini-batch will have `batch_size` windows.  If the final batch
+    contains less than `batch_size` windows, it will be discarded if
+    `ignore_incomplete_batch` is :obj:`True`.
+
+    Args:
+        array_size (int): Size of the arrays to be iterated.
+        window_size (int): The size of the windows.
+        batch_size (int): Size of each mini-batch.
+        excludes (np.ndarray): 1-D `bool` array, indicators of whether
+            or not to totally exclude a point.  If a point is excluded,
+            any window which contains that point is excluded.
+            (default :obj:`None`, no point is totally excluded)
+        shuffle (bool): If :obj:`True`, the windows will be iterated in
+            shuffled order. (default :obj:`False`)
+        ignore_incomplete_batch (bool): If :obj:`True`, discard the final
+            batch if it contains less than `batch_size` number of windows.
+            (default :obj:`False`)
+    """
+
+    def __init__(
+        self,
+        array_size,
+        window_size,
+        batch_size,
+        excludes=None,
+        shuffle=False,
+        ignore_incomplete_batch=False,
+    ):
+        # check the parameters
+        if window_size < 1:
+            raise ValueError("`window_size` must be at least 1")
+        if array_size < window_size:
+            raise ValueError(
+                "`array_size` must be at least as large as " "`window_size`"
+            )
+        if excludes is not None:
+            excludes = np.asarray(excludes, dtype=np.bool)
+            expected_shape = (array_size,)
+            if excludes.shape != expected_shape:
+                raise ValueError(
+                    "The shape of `excludes` is expected to be "
+                    "{}, but got {}".format(expected_shape, excludes.shape)
+                )
+
+        # compute which points are not excluded
+        if excludes is not None:
+            mask = np.logical_not(excludes)
+        else:
+            mask = np.ones([array_size], dtype=np.bool)
+        mask[: window_size - 1] = False
+        where_excludes = np.where(excludes)[0]
+        for k in range(1, window_size):
+            also_excludes = where_excludes + k
+            also_excludes = also_excludes[also_excludes < array_size]
+            mask[also_excludes] = False
+
+        # generate the indices of window endings
+        indices = np.arange(array_size)[mask]
+        self._indices = indices.reshape([-1, 1])
+
+        # the offset array to generate the windows
+        self._offsets = np.arange(-window_size + 1, 1)
+
+        # memorize arguments
+        self._array_size = array_size
+        self._window_size = window_size
+        self._batch_size = batch_size
+        self._shuffle = shuffle
+        self._ignore_incomplete_batch = ignore_incomplete_batch
+
+    def get_iterator(self, arrays):
+        """
+        Iterate through the sliding windows of each array in `arrays`.
+
+        This method is not re-entrant, i.e., calling :meth:`get_iterator`
+        would invalidate any previous obtained iterator.
+
+        Args:
+            arrays (Iterable[np.ndarray]): 1-D arrays to be iterated.
+
+        Yields:
+            tuple[np.ndarray]: The windows of arrays of each mini-batch.
+        """
+        # check the parameters
+        arrays = tuple(np.asarray(a) for a in arrays)
+        if not arrays:
+            raise ValueError("`arrays` must not be empty")
+
+        # shuffle if required
+        if self._shuffle:
+            np.random.shuffle(self._indices)
+
+        # iterate through the mini-batches
+        for s in minibatch_slices_iterator(
+            length=len(self._indices),
+            batch_size=self._batch_size,
+            ignore_incomplete_batch=self._ignore_incomplete_batch,
+        ):
+            idx = self._indices[s] + self._offsets
+            yield tuple(a[idx] if len(a.shape) == 1 else a[idx, :] for a in arrays)
+
+
+def iter_thresholds(score, label):
+    best_f1 = -float("inf")
+    best_theta = None
+    best_adjust = None
+    best_raw = None
+    for anomaly_ratio in np.linspace(1e-3, 1, 500)[0:1]:
+        info_save = {}
+        adjusted_anomaly, raw_predict, threshold = score2pred(
+            score, label, percent=100 * (1 - anomaly_ratio), adjust=False
+        )
+
+        f1 = f1_score(adjusted_anomaly, label)
+        if f1 > best_f1:
+            best_f1 = f1
+            best_adjust = adjusted_anomaly
+            best_raw = raw_predict
+            best_theta = threshold
+    return best_f1, best_theta, best_adjust, best_raw
+
+
+def score2pred(
+    score,
+    label,
+    percent=None,
+    threshold=None,
+    pred=None,
+    calc_latency=False,
+    adjust=True,
+):
+    """
+    Calculate adjusted predict labels using given `score`, `threshold` (or given `pred`) and `label`.
+    Args:
+        score (np.ndarray): The anomaly score
+        label (np.ndarray): The ground-truth label
+        threshold (float): The threshold of anomaly score.
+            A point is labeled as "anomaly" if its score is higher than the threshold.
+        pred (np.ndarray or None): if not None, adjust `pred` and ignore `score` and `threshold`,
+        calc_latency (bool):
+    Returns:
+        np.ndarray: predict labels
+    """
+    if score is not None:
+        if len(score) != len(label):
+            raise ValueError("score and label must have the same length")
+        score = np.asarray(score)
+    label = np.asarray(label)
+    latency = 0
+    if pred is None:
+        if percent is not None:
+            threshold = np.percentile(score, percent)
+            # print("Threshold for {} percent is: {:.2f}".format(percent, threshold))
+            predict = score > threshold
+        elif threshold is not None:
+            predict = score > threshold
+    else:
+        predict = pred
+
+    if not adjust:
+        return predict, predict, threshold
+
+    raw_predict = copy.deepcopy(predict)
+
+    actual = label == 1
+    anomaly_state = False
+    anomaly_count = 0
+    for i in range(len(predict)):
+        if actual[i] and predict[i] and not anomaly_state:
+            anomaly_state = True
+            anomaly_count += 1
+            for j in range(i, 0, -1):
+                if not actual[j]:
+                    break
+                else:
+                    if not predict[j]:
+                        predict[j] = True
+                        latency += 1
+        elif not actual[i]:
+            anomaly_state = False
+        if anomaly_state:
+            predict[i] = True
+
+    if calc_latency:
+        return predict, latency / (anomaly_count + 1e-4)
+    else:
+        return predict, raw_predict, threshold
+
+
+entities = {
+    "SMD": ["machine-1-{}".format(i) for i in range(1, 9)]
+    + ["machine-2-{}".format(i) for i in range(1, 10)]
+    + ["machine-3-{}".format(i) for i in range(1, 12)],
+    "SMAP": [
+        "P-1",
+        "S-1",
+        "E-1",
+        "E-2",
+        "E-3",
+        "E-4",
+        "E-5",
+        "E-6",
+        "E-7",
+        "E-8",
+        "E-9",
+        "E-10",
+        "E-11",
+        "E-12",
+        "E-13",
+        "A-1",
+        "D-1",
+        "P-2",
+        "P-3",
+        "D-2",
+        "D-3",
+        "D-4",
+        "A-2",
+        "A-3",
+        "A-4",
+        "G-1",
+        "G-2",
+        "D-5",
+        "D-6",
+        "D-7",
+        "F-1",
+        "P-4",
+        "G-3",
+        "T-1",
+        "T-2",
+        "D-8",
+        "D-9",
+        "F-2",
+        "G-4",
+        "T-3",
+        "D-11",
+        "D-12",
+        "B-1",
+        "G-6",
+        "G-7",
+        "P-7",
+        "R-1",
+        "A-5",
+        "A-6",
+        "A-7",
+        "D-13",
+        "P-2",
+        "A-8",
+        "A-9",
+        "F-3",
+    ],
+    "MSL": [
+        "M-6",
+        "M-1",
+        "M-2",
+        "S-2",
+        "P-10",
+        "T-4",
+        "T-5",
+        "F-7",
+        "M-3",
+        "M-4",
+        "M-5",
+        "P-15",
+        "C-1",
+        "C-2",
+        "T-12",
+        "T-13",
+        "F-4",
+        "F-5",
+        "D-14",
+        "T-9",
+        "P-14",
+        "T-8",
+        "P-11",
+        "D-15",
+        "D-16",
+        "M-7",
+        "F-8",
+    ],
+}
diff --git a/networks/omni_anomaly/vae.py b/networks/omni_anomaly/vae.py
new file mode 100644
index 0000000..6f53e1e
--- /dev/null
+++ b/networks/omni_anomaly/vae.py
@@ -0,0 +1,518 @@
+# -*- coding: utf-8 -*-
+import tensorflow as tf
+from tfsnippet.bayes import BayesianNet
+from tfsnippet.distributions import Distribution
+from tfsnippet.stochastic import StochasticTensor, validate_n_samples_arg
+from tfsnippet.utils import (instance_reuse, is_tensor_object,
+                             reopen_variable_scope, VarScopeObject)
+from tfsnippet.variational import VariationalChain
+
+
+class VAE(VarScopeObject):
+    """
+    A general implementation of variational auto-encoder as module.
+
+    The variational auto-encoder ("Auto-Encoding Variational Bayes",
+    Kingma, D.P. and Welling) is a deep Bayesian network, with observed
+    variable `x` and latent variable `z`.  The generative process
+    starts from `z` with prior distribution :math:`p(z)`, following a
+    hidden network :math:`h(z)`, then comes to `x` with distribution
+    :math:`p(x|h(z))`.  To do posterior inference of :math:`p(z|x)`,
+    variational inference techniques are adopted, to train a separated
+    distribution :math:`q(z|h(x))` (:math:`h(x)` denoting the hidden network)
+    to approximate :math:`p(z|x)`.
+
+    This class provides a general implementation of variational auto-encoder,
+    with customizable :math:`p(z)`, :math:`p(x|h(z))`, :math:`q(z|h(x))`,
+    as well as the hidden networks :math:`h(z)` and :math:`h(x)`.
+
+    For example, to construct a VAE with diagonal Normal `z` and `x`:
+
+    .. code-block:: python
+
+        from tensorflow import keras as K
+        from tfsnippet.modules import VAE, DictMapper, Sequential
+        from tfsnippet.distributions import Normal
+
+        batch_size = 128
+        x_dims, z_dims = 100, 10
+        vae = VAE(
+            p_z=Normal(mean=tf.zeros([batch_size, z_dims]),
+                       std=tf.ones([batch_size, x_dims])),
+            p_x_given_z=Normal,
+            q_z_given_x=Normal,
+            h_for_p_x=Sequential([
+                K.layers.Dense(100, activation=tf.nn.relu),
+                DictMapper({'mean': K.layers.Dense(x_dims),
+                            'logstd': K.layers.Dense(x_dims)})
+            ]),
+            h_for_q_z=Sequential([
+                K.layers.Dense(100, activation=tf.nn.relu),
+                DictMapper({'mean': K.layers.Dense(z_dims),
+                            'logstd': K.layers.Dense(z_dims)})
+            ])
+        )
+
+    To train the `vae`:
+
+    .. code-block:: python
+
+        # Automatically derive a single-sample loss.
+        # Depending on ``z.is_reparameterized``, it might be derived by
+        # `sgvb` (is_reparameterized == True) or `reinforce` (otherwise).
+        loss = vae.get_training_loss(x)
+
+        # Automatically derive a multi-sample loss.
+        # Depending on ``z.is_reparameterized``, it might be derived by
+        # `iwae` (is_reparameterized == True) or `vimco` (otherwise).
+        loss = vae.get_training_loss(x, n_z=10)
+
+        # Or manually derive a reweighted wake-sleep training loss.
+        # Note the `VariationalTrainingObjectives` produce per-data
+        # training objectives, instead of a 0-d scalar loss as the
+        # `VAE.get_training_loss` does.
+        chain = vae.chain(x, n_z=10)
+        loss = tf.reduce_mean(chain.vi.training.rws_wake())
+
+    To map from `x` to `z`:
+
+    .. code-block:: python
+
+        # use the :class:`Module` interface for one-to-one mapping
+        z = vae(x)
+
+        # use the :class:`Module` interface for multiple `z` samples
+        z = vae(x, n_z=10)
+
+        # or obtain the variational :class:`BayesianNet` with observed `z`
+        q_net = vae.variational(x, z=observed_z)
+        z_log_prob = q_net['z'].log_prob()
+
+    To reconstruct `x`:
+
+    .. code-block:: python
+
+        # use the :meth:`VAE.reconstruct` for obtaining one `x` sample
+        x_reconstructed = vae.reconstruct(x)
+
+        # to obtain multiple `z` samples, and further multiple `x` samples
+        # (this results in 100 `x` samples for each input `x`)
+        x_reconstructed = vae.reconstruct(x, n_z=10, n_x=10)
+
+    To sample `x` from prior `z` or observed `z`:
+
+    .. code-block:: python
+
+        # sample multiple prior `z`, then one `x` for each `z`
+        x = vae.model(n_z=10)['x']
+
+        # sample multiple `x` for each observed `z`
+        x = vae.model(z=observed_z, n_x=10)['x']
+    """
+
+    def __init__(self, p_z, p_x_given_z, q_z_given_x, h_for_p_x, h_for_q_z,
+                 z_group_ndims=1, x_group_ndims=1, is_reparameterized=None,
+                 name=None, scope=None):
+        """
+        Construct the :class:`VAE`.
+
+        Args:
+            p_z (Distribution): :math:`p(z)`, the distribution instance.
+            p_x_given_z: :math:`p(x|h(z))`, a distribution class or
+                a :class:`DistributionFactory` object.
+            q_z_given_x: :math:`q(z|h(x))`, a distribution class or
+                a :class:`DistributionFactory` object.
+            h_for_p_x (Module): :math:`h(z)`, the hidden network module for
+                :math:`p(x|h(z))`. The output of `h_for_p_x` must be a
+                ``dict[str, any]``, the parameters for `p_x_given_z`.
+            h_for_q_z (Module): :math:`h(x)`, the hidden network module for
+                :math:`q(z|h(x))`. The output of `h_for_q_z` must be a
+                ``dict[str, any]``, the parameters for `q_z_given_x`.
+            z_group_ndims (int or tf.Tensor): `group_ndims` for `z`. (default 1)
+            x_group_ndims (int or tf.Tensor): `group_ndims` for `x`. (default 1)
+            is_reparameterized (bool or None): Whether or not `z` should be
+                re-parameterized? (default :obj:`None`, following the settings
+                of z distributions.)
+            name (str): Optional name of this module
+                (argument of :class:`~tfsnippet.utils.VarScopeObject`).
+            scope (str): Optional scope of this module
+                (argument of :class:`~tfsnippet.utils.VarScopeObject`).
+
+        See Also:
+            :meth:`tfsnippet.distributions.Distribution.log_prob` for
+                contents about `group_ndims`.
+        """
+        if not isinstance(p_z, Distribution):
+            raise TypeError('`p_z` must be an instance of `Distribution`')
+        if not callable(h_for_p_x):
+            raise TypeError('`h_for_p_x` must be an instance of `Module` or '
+                            'a callable object')
+        if not callable(h_for_q_z):
+            raise TypeError('`h_for_q_z` must be an instance of `Module` or '
+                            'a callable object')
+        super(VAE, self).__init__(name=name, scope=scope)
+
+        # Defensive coding: wrap `h_for_p_x` and `h_for_q_z` in reused scope.
+        if not isinstance(h_for_p_x, VarScopeObject):
+            with reopen_variable_scope(self.variable_scope):
+                h_for_p_x = Lambda(h_for_p_x, name='h_for_p_x')
+        if not isinstance(h_for_q_z, VarScopeObject):
+            with reopen_variable_scope(self.variable_scope):
+                h_for_q_z = Lambda(h_for_q_z, name='h_for_q_z')
+
+        self._p_z = p_z
+        self._p_x_given_z = p_x_given_z
+        self._q_z_given_x = q_z_given_x
+        self._h_for_p_x = h_for_p_x
+        self._h_for_q_z = h_for_q_z
+        self._z_group_ndims = z_group_ndims
+        self._x_group_ndims = x_group_ndims
+        self._is_reparameterized = is_reparameterized
+
+    def __call__(self, inputs, **kwargs):
+        with reopen_variable_scope(self.variable_scope):
+            # Here `reopen_name_scope` is set to True, so that multiple
+            # calls to the same Module instance will always generate operations
+            # within the original name scope.
+            # However, in order for ``tf.variable_scope(default_name=...)``
+            # to work properly with variable reusing, we must generate a nested
+            # unique name scope.
+            with tf.name_scope('forward'):
+                return self._forward(inputs, **kwargs)
+
+    @property
+    def p_z(self):
+        """
+        Get :math:`p(z)`, the prior distribution of `z`.
+
+        Returns:
+            Distribution: The distribution instance.
+        """
+        return self._p_z
+
+    @property
+    def p_x_given_z(self):
+        """
+        Get the factory for :math:`p(x|h(z))`.
+
+        Returns:
+            DistributionFactory: The distribution factory.
+        """
+        return self._p_x_given_z
+
+    @property
+    def q_z_given_x(self):
+        """
+        Get the factory for :math:`q(z|h(x))`.
+
+        Returns:
+            DistributionFactory: The distribution factory.
+        """
+        return self._q_z_given_x
+
+    @property
+    def h_for_p_x(self):
+        """
+        Get :math:`h(z)`, the hidden network for :math:`p(x|h(z))`.
+
+        Returns:
+            Module: The hidden network.
+        """
+        return self._h_for_p_x
+
+    @property
+    def h_for_q_z(self):
+        """
+        Get :math:`h(x)`, the hidden network for :math:`q(z|h(x))`.
+
+        Returns:
+            Module: The hidden network.
+        """
+        return self._h_for_q_z
+
+    @property
+    def z_group_ndims(self):
+        """Get the `group_ndims` for `z`."""
+        return self._z_group_ndims
+
+    @property
+    def x_group_ndims(self):
+        """Get the `group_ndims` for `x`."""
+        return self._x_group_ndims
+
+    @property
+    def is_reparameterized(self):
+        """Whether or not `z` is re-parameterized?"""
+        return self._is_reparameterized
+
+    @instance_reuse
+    def variational(self, x, z=None, n_z=None, posterior_flow=None):
+        """
+        Derive an instance of :math:`q(z|h(x))`, the variational net.
+
+        Args:
+            x: The observation `x` for the variational net.
+            z: If specified, observe `z` in the variational net.
+                (default :obj:`None`)
+            n_z: The number of `z` samples to take for each `x`, if `z`
+                is not observed. (default :obj:`None`, one sample for
+                each `x`, without dedicated sampling dimension)
+
+                It is recommended to specify this argument even if `z`
+                is observed, to make explicit how many samples are there
+                in the observation.
+
+        Returns:
+            BayesianNet: The variational net.
+        """
+        observed = {}
+        if z is not None:
+            observed['z'] = z
+        net = BayesianNet(observed=observed)
+        with tf.variable_scope('h_for_q_z'):
+            z_params = self.h_for_q_z(x)
+        with tf.variable_scope('q_z_given_x'):
+            q_z_given_x = self.q_z_given_x(**z_params)
+            assert (isinstance(q_z_given_x, Distribution))
+        with tf.name_scope('z'):
+            z = net.add('z', q_z_given_x, n_samples=n_z,
+                        group_ndims=self.z_group_ndims,
+                        is_reparameterized=self.is_reparameterized,
+                        flow=posterior_flow)
+        return net
+
+    @instance_reuse
+    def model(self, z=None, x=None, n_z=None, n_x=None):
+        """
+        Derive an instance of :math:`p(x|h(z))`, the model net.
+
+        Args:
+            z: If specified, observe `z` in the model net. (default :obj:`None`)
+            x: If specified, observe `x` in the model net. (default :obj:`None`)
+            n_z: The number of `z` samples to take for each `x`, if `z`
+                is not observed. (default :obj:`None`, one `z` sample for
+                each `x`, without dedicated sampling dimension)
+
+                It is recommended to specify this argument even if `z`
+                is observed, to make explicit how many samples are there
+                in the observation.
+            n_x: The number of `x` samples to take for each `z`, if `x`
+                is not observed. (default :obj:`None`, one `x` sample for
+                each `z`, without dedicated sampling dimension)
+
+                It is recommended to specify this argument even if `x`
+                is observed, to make explicit how many samples are there
+                in the observation.
+
+        Returns:
+            BayesianNet: The variational net.
+        """
+        observed = {k: v for k, v in [('z', z), ('x', x)] if v is not None}
+        net = BayesianNet(observed=observed)
+        with tf.name_scope('z'):
+            z = net.add('z', self.p_z, n_samples=n_z,
+                        group_ndims=self.z_group_ndims,
+                        is_reparameterized=self.is_reparameterized)
+        with tf.variable_scope('h_for_p_x'):
+            x_params = self.h_for_p_x(z)
+        with tf.variable_scope('p_x_given_z'):
+            p_x_given_z = self.p_x_given_z(**x_params)
+            assert (isinstance(p_x_given_z, Distribution))
+        with tf.name_scope('x'):
+            x = net.add('x', p_x_given_z, n_samples=n_x,
+                        group_ndims=self.x_group_ndims)
+        return net
+
+    def chain(self, x, n_z=None, posterior_flow=None):
+        """
+        Chain :math:`q(z|h(x))` and :math:`p(x,z|h(x))` together.
+
+        This method chains the variational net :math:`q(z|h(x))` and the
+        model net :math:`p(x,z|h(x))` together, with specified observation
+        `x`.  It is typically used to derive the training objectives of VAE.
+        It can also be used to calculate the `reconstruction probability`
+        ("Variational Autoencoder based Anomaly Detection using Reconstruction
+        Probability", An, J. and Cho, S. 2015) of `x`.
+
+        Notes:
+            The constructed :class:`~tfsnippet.variational.VariationalChain`
+            have `x` observed in its `model` net, thus this method cannot
+            be used to get reconstructed samples.  Use :meth:`reconstruct`
+            instead to obtain `x` samples.
+
+        Args:
+            x: The input observation `x`.
+            n_z: Number of `z` samples to take. (default :obj:`None`)
+
+        Returns:
+            VariationalChain: The variational chain.
+        """
+        with tf.name_scope('VAE.chain'):
+            q_net = self.variational(x, n_z=n_z, posterior_flow=posterior_flow)
+
+            # automatically detect the `latent_axis` for this chain
+            if n_z is not None:
+                latent_axis = 0
+            else:
+                latent_axis = None
+
+            chain = q_net.variational_chain(
+                lambda observed: self.model(n_z=n_z, n_x=None, **observed),
+                latent_axis=latent_axis,
+                observed={'x': x}
+            )
+        return chain
+
+    def get_training_loss(self, x, n_z=None):
+        """
+        Get the training loss for this VAE.
+
+        The variational solver is automatically chosen according to
+        `z.is_reparameterized`, and the argument `n_z`, by the following rules:
+
+        1. If `z.is_reparameterized` is :obj:`True`, then:
+
+            1. If `n_z` > 1, use `iwae`.
+            2. If `n_z` == 1 or `n_z` is :obj:`None`, use `sgvb`.
+
+        2. If `z.is_reparameterized` is :obj:`False`, then:
+
+            1. If `n_z` > 1, use `vimco`.
+            2. If `n_z` == 1 or `n_z` is :obj:`None`, use `reinforce`.
+
+        Dynamic `n_z` is not supported by this method.  Also, Reweighted
+        Wake-Sleep algorithm is not a choice of this method.  To derive
+        the training loss for either situation, use :meth:`chain`
+        to obtain a :class:`~tfsnippet.variational.VariationalChain`,
+        and further obtain the loss by `chain.vi.training.[algorithm]`.
+
+        Args:
+            x: The input observation `x`.
+            n_z (int or None): Number of `z` samples to take.  Must be
+                :obj:`None` or a constant integer.  Dynamic tensors are not
+                accepted, since we cannot automatically choose a variational
+                solver for undeterministic `n_z`. (default :obj:`None`)
+
+        Returns:
+            tf.Tensor: A 0-d tensor, the training loss which can be optimized
+                by gradient descent.
+
+        See Also:
+            :class:`tfsnippet.variational.VariationalChain`,
+            :class:`tfsnippet.variational.VariationalTrainingObjectives`
+        """
+        with tf.name_scope('VAE.get_training_loss'):
+            if n_z is not None:
+                if is_tensor_object(n_z):
+                    raise TypeError('Cannot choose the variational solver '
+                                    'automatically for dynamic `n_z`')
+                n_z = validate_n_samples_arg(n_z, 'n_z')
+
+            # derive the variational chain
+            chain = self.chain(x, n_z)
+            z = chain.variational['z']
+
+            # auto choose a variational solver for training loss
+            if n_z is not None and n_z > 1:
+                if z.is_reparameterized:
+                    solver = chain.vi.training.iwae
+                else:
+                    solver = chain.vi.training.vimco
+            else:
+                if z.is_reparameterized:
+                    solver = chain.vi.training.sgvb
+                else:
+                    solver = chain.vi.training.reinforce
+
+            # derive the training loss
+            return tf.reduce_mean(solver())
+
+    def reconstruct(self, x, n_z=None, n_x=None, posterior_flow=None):
+        """
+        Sample reconstructed `x` from :math:`p(x|h(z))`, where `z` is (are)
+        sampled from :math:`q(z|h(x))` using the specified observation `x`.
+
+        Args:
+            x: The observation `x` for :math:`q(z|h(x))`.
+            n_z: Number of intermediate `z` samples to take for each input `x`.
+            n_x: Number of reconstructed `x` samples to take for each `z`.
+
+        Returns:
+            StochasticTensor: The reconstructed samples `x`.
+        """
+        with tf.name_scope('VAE.reconstruct'):
+            q_net = self.variational(x, n_z=n_z, posterior_flow=posterior_flow)
+            model = self.model(z=q_net['z'], n_z=n_z, n_x=n_x)
+            return model['x']
+
+    def _forward(self, inputs, n_z=None, **kwargs):
+        """
+        Get a `z` sample from :math:`q(z|h(x))`, using the variational net.
+
+        Args:
+            inputs: The input `x`.
+            n_z: Number of samples to taken for `z`. (default :obj:`None`)
+            \**kwargs: Capturing and ignoring all other parameters.  This is
+                the default behavior of a :class:`Module`.
+
+        Returns:
+            StochasticTensor: The `z` samples.
+        """
+        q_net = self.variational(inputs, z=None, n_z=n_z, **kwargs)
+        return q_net['z']
+
+
+class Lambda(VarScopeObject):
+    """
+    Wrapping arbitrary function into a neural network :class:`Module`.
+
+    This class wraps an arbitrary function or lambda expression into
+    a neural network :class:`Module`, reusing the variables created
+    within the specified function.
+
+    For example, one may wrap :func:`tensorflow.contrib.layers.fully_connected`
+    into a reusable module with :class:`Lambda` component as follows:
+
+    .. code-block:: python
+
+        import functools
+        from tensorflow.contrib import layers
+
+        dense = Lambda(
+            functools.partial(
+                layers.fully_connected,
+                num_outputs=100,
+                activation_fn=tf.nn.relu
+            )
+        )
+    """
+
+    def __init__(self, f, name=None, scope=None):
+        """
+        Construct the :class:`Lambda`.
+
+        Args:
+            f ((inputs, \**kwargs) -> outputs): The function or lambda
+                expression which derives the outputs.
+            name (str): Optional name of this module
+                (argument of :class:`~tfsnippet.utils.VarScopeObject`).
+            scope (str): Optional scope of this module
+                (argument of :class:`~tfsnippet.utils.VarScopeObject`).
+        """
+        super(Lambda, self).__init__(name=name, scope=scope)
+        self._factory = f
+
+    def _forward(self, inputs, **kwargs):
+        return self._factory(inputs, **kwargs)
+
+    def __call__(self, inputs, **kwargs):
+        with reopen_variable_scope(self.variable_scope):
+            # Here `reopen_name_scope` is set to True, so that multiple
+            # calls to the same Module instance will always generate operations
+            # within the original name scope.
+            # However, in order for ``tf.variable_scope(default_name=...)``
+            # to work properly with variable reusing, we must generate a nested
+            # unique name scope.
+            with tf.name_scope('forward'):
+                return self._forward(inputs, **kwargs)
+
diff --git a/networks/omni_anomaly/wrapper.py b/networks/omni_anomaly/wrapper.py
new file mode 100644
index 0000000..c374d90
--- /dev/null
+++ b/networks/omni_anomaly/wrapper.py
@@ -0,0 +1,153 @@
+# -*- coding: utf-8 -*-
+import logging
+
+import tensorflow as tf
+import tensorflow_probability as tfp
+from tfsnippet.distributions import Distribution
+
+
+class TfpDistribution(Distribution):
+    """
+    A wrapper class for `tfp.distributions.Distribution`
+    """
+
+    @property
+    def is_continuous(self):
+        return self._is_continuous
+
+    def __init__(self, distribution):
+        if not isinstance(distribution, tfp.distributions.Distribution):
+            raise TypeError(
+                "`distribution` is not an instance of `tfp."
+                "distributions.Distribution`"
+            )
+        super(TfpDistribution, self).__init__()
+        self._distribution = distribution
+        self._is_continuous = True
+        self._is_reparameterized = (
+            self._distribution.reparameterization_type
+            is tfp.distributions.FULLY_REPARAMETERIZED
+        )
+
+    def __repr__(self):
+        return "Distribution({!r})".format(self._distribution)
+
+    @property
+    def dtype(self):
+        return self._distribution.dtype
+
+    @property
+    def is_reparameterized(self):
+        return self._is_reparameterized
+
+    @property
+    def value_shape(self):
+        return self._distribution.event_shape
+
+    def get_value_shape(self):
+        return self._distribution.event_shape
+
+    @property
+    def batch_shape(self):
+        return self._distribution.batch_shape
+
+    def get_batch_shape(self):
+        return self._distribution.batch_shape()
+
+    def sample(
+        self,
+        n_samples=None,
+        is_reparameterized=None,
+        group_ndims=0,
+        compute_density=False,
+        name=None,
+    ):
+        from tfsnippet.stochastic import StochasticTensor
+
+        if n_samples is None or n_samples < 2:
+            n_samples = 2
+        with tf.name_scope(name=name, default_name="sample"):
+            samples = self._distribution.sample(n_samples)
+            samples = tf.reduce_mean(samples, axis=0)
+            t = StochasticTensor(
+                distribution=self,
+                tensor=samples,
+                n_samples=n_samples,
+                group_ndims=group_ndims,
+                is_reparameterized=self.is_reparameterized,
+            )
+            if compute_density:
+                with tf.name_scope("compute_prob_and_log_prob"):
+                    log_p = t.log_prob()
+                    t._self_prob = tf.exp(log_p)
+            return t
+
+    def log_prob(self, given, group_ndims=0, name=None):
+        with tf.name_scope(name=name, default_name="log_prob"):
+            log_prob, _, _, _, _, _, _ = self._distribution.forward_filter(given)
+            return log_prob
+
+
+def softplus_std(inputs, units, epsilon, name):
+    return (
+        tf.nn.softplus(tf.layers.dense(inputs, units, name=name, reuse=tf.AUTO_REUSE))
+        + epsilon
+    )
+
+
+def rnn(
+    x,
+    window_length,
+    rnn_num_hidden,
+    rnn_cell="GRU",
+    hidden_dense=2,
+    dense_dim=200,
+    time_axis=1,
+    name="rnn",
+):
+    from tensorflow.contrib import rnn
+
+    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
+        if len(x.shape) == 4:
+            x = tf.reduce_mean(x, axis=0)
+        elif len(x.shape) != 3:
+            logging.error("rnn input shape error")
+        x = tf.unstack(x, window_length, time_axis)
+
+        if rnn_cell == "LSTM":
+            # Define lstm cells with TensorFlow
+            # Forward direction cell
+            fw_cell = rnn.BasicLSTMCell(rnn_num_hidden, forget_bias=1.0)
+        elif rnn_cell == "GRU":
+            fw_cell = tf.nn.rnn_cell.GRUCell(rnn_num_hidden)
+        elif rnn_cell == "Basic":
+            fw_cell = tf.nn.rnn_cell.BasicRNNCell(rnn_num_hidden)
+        else:
+            raise ValueError("rnn_cell must be LSTM or GRU")
+
+        # Get lstm cell output
+
+        try:
+            outputs, _ = rnn.static_rnn(fw_cell, x, dtype=tf.float32)
+        except Exception:  # Old TensorFlow version only returns outputs not states
+            outputs = rnn.static_rnn(fw_cell, x, dtype=tf.float32)
+        outputs = tf.stack(outputs, axis=time_axis)
+        for i in range(hidden_dense):
+            outputs = tf.layers.dense(outputs, dense_dim)
+        return outputs
+    # return size: (batch_size, window_length, rnn_num_hidden)
+
+
+def wrap_params_net(inputs, h_for_dist, mean_layer, std_layer):
+    with tf.variable_scope("hidden", reuse=tf.AUTO_REUSE):
+        h = h_for_dist(inputs)
+    return {
+        "mean": mean_layer(h),
+        "std": std_layer(h),
+    }
+
+
+def wrap_params_net_srnn(inputs, h_for_dist):
+    with tf.variable_scope("hidden", reuse=tf.AUTO_REUSE):
+        h = h_for_dist(inputs)
+    return {"input_q": h}
diff --git a/networks/tranad/__init__.py b/networks/tranad/__init__.py
new file mode 100644
index 0000000..7830ccb
--- /dev/null
+++ b/networks/tranad/__init__.py
@@ -0,0 +1 @@
+from .models import TranAD
\ No newline at end of file
diff --git a/networks/tranad/dlutils.py b/networks/tranad/dlutils.py
new file mode 100644
index 0000000..f2df907
--- /dev/null
+++ b/networks/tranad/dlutils.py
@@ -0,0 +1,146 @@
+import torch.nn as nn
+import torch
+from torch.autograd import Variable
+import math
+import numpy as np
+
+
+class PositionalEncoding(nn.Module):
+    def __init__(self, d_model, dropout=0.1, max_len=5000):
+        super(PositionalEncoding, self).__init__()
+        self.dropout = nn.Dropout(p=dropout)
+
+        pe = torch.zeros(max_len, d_model)
+        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
+        div_term = torch.exp(torch.arange(0, d_model).float() * (-math.log(10000.0) / d_model))
+        pe += torch.sin(position * div_term)
+        pe += torch.cos(position * div_term)
+        pe = pe.unsqueeze(0).transpose(0, 1)
+        self.register_buffer('pe', pe)
+
+    def forward(self, x, pos=0):
+        x = x + self.pe[pos:pos+x.size(0), :]
+        return self.dropout(x)
+
+class TransformerEncoderLayer(nn.Module):
+    def __init__(self, d_model, nhead, dim_feedforward=16, dropout=0):
+        super(TransformerEncoderLayer, self).__init__()
+        self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
+        self.linear1 = nn.Linear(d_model, dim_feedforward)
+        self.dropout = nn.Dropout(dropout)
+        self.linear2 = nn.Linear(dim_feedforward, d_model)
+        self.dropout1 = nn.Dropout(dropout)
+        self.dropout2 = nn.Dropout(dropout)
+
+        self.activation = nn.LeakyReLU(True)
+
+    def forward(self, src,src_mask=None, src_key_padding_mask=None):
+        src2 = self.self_attn(src, src, src)[0]
+        src = src + self.dropout1(src2)
+        src2 = self.linear2(self.dropout(self.activation(self.linear1(src))))
+        src = src + self.dropout2(src2)
+        return src
+
+class TransformerDecoderLayer(nn.Module):
+    def __init__(self, d_model, nhead, dim_feedforward=16, dropout=0):
+        super(TransformerDecoderLayer, self).__init__()
+        self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
+        self.multihead_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
+        self.linear1 = nn.Linear(d_model, dim_feedforward)
+        self.dropout = nn.Dropout(dropout)
+        self.linear2 = nn.Linear(dim_feedforward, d_model)
+        self.dropout1 = nn.Dropout(dropout)
+        self.dropout2 = nn.Dropout(dropout)
+        self.dropout3 = nn.Dropout(dropout)
+
+        self.activation = nn.LeakyReLU(True)
+
+    def forward(self, tgt, memory, tgt_mask=None, memory_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None):
+        tgt2 = self.self_attn(tgt, tgt, tgt)[0]
+        tgt = tgt + self.dropout1(tgt2)
+        tgt2 = self.multihead_attn(tgt, memory, memory)[0]
+        tgt = tgt + self.dropout2(tgt2)
+        tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt))))
+        tgt = tgt + self.dropout3(tgt2)
+        return tgt
+
+class ComputeLoss:
+    def __init__(self, model, lambda_energy, lambda_cov, device, n_gmm):
+        self.model = model
+        self.lambda_energy = lambda_energy
+        self.lambda_cov = lambda_cov
+        self.device = device
+        self.n_gmm = n_gmm
+    
+    def forward(self, x, x_hat, z, gamma):
+        """Computing the loss function for DAGMM."""
+        reconst_loss = torch.mean((x-x_hat).pow(2))
+
+        sample_energy, cov_diag = self.compute_energy(z, gamma)
+
+        loss = reconst_loss + self.lambda_energy * sample_energy + self.lambda_cov * cov_diag
+        return Variable(loss, requires_grad=True)
+    
+    def compute_energy(self, z, gamma, phi=None, mu=None, cov=None, sample_mean=True):
+        """Computing the sample energy function"""
+        if (phi is None) or (mu is None) or (cov is None):
+            phi, mu, cov = self.compute_params(z, gamma)
+
+        z_mu = (z.unsqueeze(1)- mu.unsqueeze(0))
+
+        eps = 1e-12
+        cov_inverse = []
+        det_cov = []
+        cov_diag = 0
+        for k in range(self.n_gmm):
+            cov_k = cov[k] + (torch.eye(cov[k].size(-1))*eps).to(self.device)
+            cov_inverse.append(torch.inverse(cov_k).unsqueeze(0))
+            det_cov.append((Cholesky.apply(cov_k.cpu() * (2*np.pi)).diag().prod()).unsqueeze(0))
+            cov_diag += torch.sum(1 / cov_k.diag())
+        
+        cov_inverse = torch.cat(cov_inverse, dim=0)
+        det_cov = torch.cat(det_cov).to(self.device)
+
+        E_z = -0.5 * torch.sum(torch.sum(z_mu.unsqueeze(-1) * cov_inverse.unsqueeze(0), dim=-2) * z_mu, dim=-1)
+        E_z = torch.exp(E_z)
+        E_z = -torch.log(torch.sum(phi.unsqueeze(0)*E_z / (torch.sqrt(det_cov)).unsqueeze(0), dim=1) + eps)
+        if sample_mean==True:
+            E_z = torch.mean(E_z)            
+        return E_z, cov_diag
+
+    def compute_params(self, z, gamma):
+        """Computing the parameters phi, mu and gamma for sample energy function """ 
+        # K: number of Gaussian mixture components
+        # N: Number of samples
+        # D: Latent dimension
+        # z = NxD
+        # gamma = NxK
+
+        #phi = D
+        phi = torch.sum(gamma, dim=0)/gamma.size(0) 
+
+        #mu = KxD
+        mu = torch.sum(z.unsqueeze(1) * gamma.unsqueeze(-1), dim=0)
+        mu /= torch.sum(gamma, dim=0).unsqueeze(-1)
+
+        z_mu = (z.unsqueeze(1) - mu.unsqueeze(0))
+        z_mu_z_mu_t = z_mu.unsqueeze(-1) * z_mu.unsqueeze(-2)
+        
+        #cov = K x D x D
+        cov = torch.sum(gamma.unsqueeze(-1).unsqueeze(-1) * z_mu_z_mu_t, dim=0)
+        cov /= torch.sum(gamma, dim=0).unsqueeze(-1).unsqueeze(-1)
+
+        return phi, mu, cov
+        
+class Cholesky(torch.autograd.Function):
+    def forward(ctx, a):
+        l = torch.cholesky(a, False)
+        ctx.save_for_backward(l)
+        return l
+    def backward(ctx, grad_output):
+        l, = ctx.saved_variables
+        linv = l.inverse()
+        inner = torch.tril(torch.mm(l.t(), grad_output)) * torch.tril(
+            1.0 - Variable(l.data.new(l.size(1)).fill_(0.5).diag()))
+        s = torch.mm(linv.t(), torch.mm(inner, linv))
+        return s
\ No newline at end of file
diff --git a/networks/tranad/models.py b/networks/tranad/models.py
new file mode 100644
index 0000000..813a84e
--- /dev/null
+++ b/networks/tranad/models.py
@@ -0,0 +1,124 @@
+import logging
+import math
+import os
+import torch
+import torch.nn as nn
+import numpy as np
+from torch.nn import TransformerEncoder
+from torch.nn import TransformerDecoder
+from .dlutils import (
+    PositionalEncoding,
+    TransformerEncoderLayer,
+    TransformerDecoderLayer,
+)
+from common.utils import set_device
+
+
+class TranAD(nn.Module):
+    def __init__(self, feats, window_size, lr, model_root, device):
+        super(TranAD, self).__init__()
+        self.name = "TranAD"
+        self.n_feats = feats
+        self.n_window = window_size
+        self.device = set_device(device)
+        self.n = self.n_feats * self.n_window
+        self.pos_encoder = PositionalEncoding(2 * feats, 0.1, self.n_window)
+        encoder_layers = TransformerEncoderLayer(
+            d_model=2 * feats, nhead=feats, dim_feedforward=16, dropout=0.1
+        )
+        self.transformer_encoder = TransformerEncoder(encoder_layers, 1)
+        decoder_layers1 = TransformerDecoderLayer(
+            d_model=2 * feats, nhead=feats, dim_feedforward=16, dropout=0.1
+        )
+        self.transformer_decoder1 = TransformerDecoder(decoder_layers1, 1)
+        decoder_layers2 = TransformerDecoderLayer(
+            d_model=2 * feats, nhead=feats, dim_feedforward=16, dropout=0.1
+        )
+        self.transformer_decoder2 = TransformerDecoder(decoder_layers2, 1)
+        self.fcn = nn.Sequential(nn.Linear(2 * feats, feats), nn.Sigmoid())
+
+        self.init_model(lr, model_root)
+
+    def encode(self, src, c, tgt):
+        src = torch.cat((src, c), dim=2)
+        src = src * math.sqrt(self.n_feats)
+        src = self.pos_encoder(src)
+        memory = self.transformer_encoder(src)
+        tgt = tgt.repeat(1, 1, 2)
+        return tgt, memory
+
+    def forward(self, src, tgt):
+        # Phase 1 - Without anomaly scores
+        c = torch.zeros_like(src)
+        x1 = self.fcn(self.transformer_decoder1(*self.encode(src, c, tgt)))
+        # Phase 2 - With anomaly scores
+        c = (x1 - src) ** 2
+        x2 = self.fcn(self.transformer_decoder2(*self.encode(src, c, tgt)))
+        return x1, x2
+
+    def init_model(self, lr, model_root, retrain=True, test=False):
+        optimizer = torch.optim.AdamW(self.parameters(), lr=lr, weight_decay=1e-5)
+        scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 5, 0.9)
+
+        if os.path.exists(model_root) and (not retrain or test):
+            logging.info("Loading pre-trained model")
+            checkpoint = torch.load(os.path.join(model_root, "model.pt"))
+            self.load_state_dict(checkpoint["model_state_dict"])
+            optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
+            scheduler.load_state_dict(checkpoint["scheduler_state_dict"])
+        else:
+            logging.info("Creating new model: TranAD")
+
+        self.optimizer = optimizer
+        self.scheduler = scheduler
+        logging.info("Finish model initialization.")
+
+    def fit(self, nb_epoch, dataloader, training=True):
+        self.to(self.device)
+        for epoch in range(1, nb_epoch + 1):
+            mse_func = nn.MSELoss(reduction="none")
+            n = epoch + 1
+            l1s = []
+            if training:
+                logging.info("Training epoch: {}".format(epoch))
+                for d in dataloader:
+                    d = d.to(self.device)
+                    local_bs = d.shape[0]
+                    window = d.permute(1, 0, 2)
+                    elem = window[-1, :, :].view(1, local_bs, self.n_feats)
+                    z = self(window, elem)
+                    l1 = (
+                        mse_func(z, elem)
+                        if not isinstance(z, tuple)
+                        else (1 / n) * mse_func(z[0], elem)
+                        + (1 - 1 / n) * mse_func(z[1], elem)
+                    )
+                    if isinstance(z, tuple):
+                        z = z[1]
+                    l1s.append(torch.mean(l1).item())
+                    loss = torch.mean(l1)
+                    self.optimizer.zero_grad()
+                    loss.backward(retain_graph=True)
+                    self.optimizer.step()
+                self.scheduler.step()
+                logging.info("Epoch: {} finished.".format(epoch))
+
+    def predict_prob(self, test_iterator, label_windows=None):
+        mse_func = nn.MSELoss(reduction="none")
+        loss_steps = []
+        for d in test_iterator:
+            d = d.to(self.device)
+            bs = d.shape[0]
+            window = d.permute(1, 0, 2)
+            elem = window[-1, :, :].view(1, bs, self.n_feats)
+            z = self(window, elem)
+            if isinstance(z, tuple):
+                z = z[1]
+            loss = mse_func(z, elem)[0]
+            loss_steps.append(loss.detach().cpu().numpy())
+        anomaly_score = np.concatenate(loss_steps).mean(axis=1)
+        if label_windows is None:
+            return anomaly_score
+        else:
+            anomaly_label = (np.sum(label_windows, axis=1) >= 1) + 0
+            return anomaly_score, anomaly_label
diff --git a/networks/usad/__init__.py b/networks/usad/__init__.py
new file mode 100644
index 0000000..78ac536
--- /dev/null
+++ b/networks/usad/__init__.py
@@ -0,0 +1 @@
+from .usad import *
diff --git a/networks/usad/gdrivedl.py b/networks/usad/gdrivedl.py
new file mode 100644
index 0000000..b5a0327
--- /dev/null
+++ b/networks/usad/gdrivedl.py
@@ -0,0 +1,214 @@
+#!/usr/bin/env python
+from __future__ import unicode_literals
+import json
+import os
+import re
+import sys
+import unicodedata
+
+try:
+    #Python3
+    from urllib.request import Request, urlopen
+except ImportError:
+    #Python2
+    from urllib2 import Request, urlopen
+
+ITEM_URL = 'https://drive.google.com/open?id={id}'
+FILE_URL = 'https://docs.google.com/uc?export=download&id={id}&confirm={confirm}'
+FOLDER_URL = 'https://drive.google.com/drive/folders/{id}'
+
+ID_PATTERNS = [
+    re.compile('/file/d/([0-9A-Za-z_-]{10,})(?:/|$)', re.IGNORECASE),
+    re.compile('id=([0-9A-Za-z_-]{10,})(?:&|$)', re.IGNORECASE),
+    re.compile('([0-9A-Za-z_-]{10,})', re.IGNORECASE)
+]
+FILE_PATTERN = re.compile("itemJson: (\[.*?)};</script>",
+                          re.DOTALL | re.IGNORECASE)
+FOLDER_PATTERN = re.compile("window\['_DRIVE_ivd'\] = '(.*?)';",
+                            re.DOTALL | re.IGNORECASE)
+CONFIRM_PATTERN = re.compile("download_warning[0-9A-Za-z_-]+=([0-9A-Za-z_-]+);",
+                             re.IGNORECASE)
+FOLDER_TYPE = 'application/vnd.google-apps.folder'
+
+def output(text):
+    try:
+        sys.stdout.write(text)
+    except UnicodeEncodeError:
+        sys.stdout.write(text.encode('utf8'))
+
+# Big thanks to leo_wallentin for below sanitize function (modified slightly for this script)
+# https://gitlab.com/jplusplus/sanitize-filename/-/blob/master/sanitize_filename/sanitize_filename.py
+def sanitize(filename):
+    blacklist = ["\\", "/", ":", "*", "?", "\"", "<", ">", "|", "\0"]
+    reserved = [
+        "CON", "PRN", "AUX", "NUL", "COM1", "COM2", "COM3", "COM4", "COM5",
+        "COM6", "COM7", "COM8", "COM9", "LPT1", "LPT2", "LPT3", "LPT4", "LPT5",
+        "LPT6", "LPT7", "LPT8", "LPT9",
+    ]
+
+    filename = "".join(c for c in filename if c not in blacklist)
+    filename = "".join(c for c in filename if 31 < ord(c))
+    filename = unicodedata.normalize("NFKD", filename)
+    filename = filename.rstrip(". ")
+    filename = filename.strip()
+
+    if all([x == "." for x in filename]):
+        filename = "_" + filename
+    if filename in reserved:
+        filename = "_" + filename
+    if len(filename) == 0:
+        filename = "_"
+    if len(filename) > 255:
+        parts = re.split(r"/|\\", filename)[-1].split(".")
+        if len(parts) > 1:
+            ext = "." + parts.pop()
+            filename = filename[:-len(ext)]
+        else:
+            ext = ""
+        if filename == "":
+            filename = "_"
+        if len(ext) > 254:
+            ext = ext[254:]
+        maxl = 255 - len(ext)
+        filename = filename[:maxl]
+        filename = filename + ext
+        filename = filename.rstrip(". ")
+        if len(filename) == 0:
+            filename = "_"
+
+    return filename
+
+
+def process_item(id, directory):
+    url = ITEM_URL.format(id=id)
+    resp = urlopen(url)
+    url = resp.geturl()
+    html = resp.read().decode('utf-8')
+
+    if '/file/' in url:
+        match = FILE_PATTERN.search(html)
+        data = match.group(1).replace('\/', '/')
+        data = data.replace(r'\x5b', '[').replace(r'\x22', '"').replace(r'\x5d', ']').replace(r'\n','')
+        data = json.loads(data)
+
+        file_name = sanitize(data[1])
+        file_size = int(data[25][2])
+        file_path = os.path.join(directory, file_name)
+
+        process_file(id, file_path, file_size)
+    elif '/folders/' in url:
+        process_folder(id, directory, html=html)
+    elif 'ServiceLogin' in url:
+        sys.stderr.write('Id {} does not have link sharing enabled'.format(id))
+        sys.exit(1)
+    else:
+        sys.stderr.write('That id {} returned an unknown url'.format(id))
+        sys.exit(1)
+
+
+def process_folder(id, directory, html=None):
+    if not html:
+        url = FOLDER_URL.format(id=id)
+        html = urlopen(url).read().decode('utf-8')
+
+    match = FOLDER_PATTERN.search(html)
+    data = match.group(1).replace('\/', '/')
+    data = data.replace(r'\x5b', '[').replace(r'\x22', '"').replace(r'\x5d', ']').replace(r'\n','')
+    data = json.loads(data)
+
+    if not os.path.exists(directory):
+        os.mkdir(directory)
+        output('Directory: {directory} [Created]\n'.format(directory=directory))
+    else:
+        output('Directory: {directory} [Exists]\n'.format(directory=directory))
+
+    if not data[0]:
+        return
+
+    for item in sorted(data[0], key=lambda i: i[3] == FOLDER_TYPE):
+        item_id = item[0]
+        item_name = sanitize(item[2])
+        item_type = item[3]
+        item_size = item[13]
+        item_path = os.path.join(directory, item_name)
+
+        if item_type == FOLDER_TYPE:
+            process_folder(item_id, item_path)
+        else:
+            process_file(item_id, item_path, int(item_size))
+
+
+def process_file(id, file_path, file_size, confirm='', cookies=''):
+    if os.path.exists(file_path):
+        output('{file_path} [Exists]\n'.format(file_path=file_path))
+        return
+
+    url = FILE_URL.format(id=id, confirm=confirm)
+    req = Request(url, headers={'Cookie': cookies,
+                                'User-Agent': 'Mozilla/5.0'})
+    resp = urlopen(req)
+    cookies = resp.headers.get('Set-Cookie') or ''
+
+    if not confirm and 'download_warning' in cookies:
+        confirm = CONFIRM_PATTERN.search(cookies)
+        return process_file(id, file_path, file_size, confirm.group(1), cookies)
+
+    output(file_path + '\n')
+
+    try:
+        with open(file_path, 'wb') as f:
+            dl = 0
+            while True:
+                chunk = resp.read(4096)
+                if not chunk:
+                    break
+
+                if b'Too many users have viewed or downloaded this file recently' in chunk:
+                    raise Exception('Quota exceeded for this file')
+
+                dl += len(chunk)
+                f.write(chunk)
+                done = int(50 * dl / file_size)
+                output("\r[{}{}] {:.2f}MB/{:.2f}MB".format(
+                    '=' * done,
+                    ' ' *
+                    (50 - done),
+                    dl / 1024 / 1024,
+                    file_size / 1024 / 1024
+                ))
+                sys.stdout.flush()
+    except:
+        if os.path.exists(file_path):
+            os.remove(file_path)
+        raise
+
+    output('\n')
+
+
+def get_arg(pos, default=None):
+    try:
+        return sys.argv[pos]
+    except IndexError:
+        return default
+
+
+if __name__ == '__main__':
+    url = get_arg(1, '').strip()
+    directory = get_arg(2, './').strip()
+    id = ''
+
+    if not url:
+        sys.stderr.write('A Google Drive URL is required')
+        sys.exit(1)
+
+    for pattern in ID_PATTERNS:
+        match = pattern.search(url)
+        if match:
+            id = match.group(1)
+            break
+
+    if not id:
+        sys.stderr.write('Unable to get ID from {}'.format(url))
+        sys.exit(1)
+
+    process_item(id, directory)
diff --git a/networks/usad/usad.py b/networks/usad/usad.py
new file mode 100644
index 0000000..d0d8d04
--- /dev/null
+++ b/networks/usad/usad.py
@@ -0,0 +1,223 @@
+import logging
+import numpy as np
+import torch
+import torch.nn as nn
+from common.utils import set_device
+import torch.utils.data as data_utils
+
+
+def to_device(data, device):
+    """Move tensor(s) to chosen device"""
+    if isinstance(data, (list, tuple)):
+        return [to_device(x, device) for x in data]
+    return data.to(device, non_blocking=True)
+
+
+class Encoder(nn.Module):
+    def __init__(self, in_size, latent_size):
+        super().__init__()
+        self.linear1 = nn.Linear(in_size, int(in_size / 2))
+        self.linear2 = nn.Linear(int(in_size / 2), int(in_size / 4))
+        self.linear3 = nn.Linear(int(in_size / 4), latent_size)
+        self.relu = nn.ReLU(True)
+
+    def forward(self, w):
+        out = self.linear1(w)
+        out = self.relu(out)
+        out = self.linear2(out)
+        out = self.relu(out)
+        out = self.linear3(out)
+        z = self.relu(out)
+        return z
+
+
+class Decoder(nn.Module):
+    def __init__(self, latent_size, out_size):
+        super().__init__()
+        self.linear1 = nn.Linear(latent_size, int(out_size / 4))
+        self.linear2 = nn.Linear(int(out_size / 4), int(out_size / 2))
+        self.linear3 = nn.Linear(int(out_size / 2), out_size)
+        self.relu = nn.ReLU(True)
+        self.sigmoid = nn.Sigmoid()
+
+    def forward(self, z):
+        out = self.linear1(z)
+        out = self.relu(out)
+        out = self.linear2(out)
+        out = self.relu(out)
+        out = self.linear3(out)
+        w = self.sigmoid(out)
+        return w
+
+
+class UsadModel(nn.Module):
+    def __init__(self, w_size, z_size, device):
+        super().__init__()
+        self.w_size = w_size
+        self.z_size = z_size
+        self.encoder = Encoder(w_size, z_size)
+        self.decoder1 = Decoder(z_size, w_size)
+        self.decoder2 = Decoder(z_size, w_size)
+        self.device = set_device(device)
+
+    def training_step(self, batch, n):
+        z = self.encoder(batch)
+        w1 = self.decoder1(z)
+        w2 = self.decoder2(z)
+        w3 = self.decoder2(self.encoder(w1))
+        loss1 = 1 / n * torch.mean((batch - w1) ** 2) + (1 - 1 / n) * torch.mean(
+            (batch - w3) ** 2
+        )
+        loss2 = 1 / n * torch.mean((batch - w2) ** 2) - (1 - 1 / n) * torch.mean(
+            (batch - w3) ** 2
+        )
+        return loss1, loss2
+
+    def validation_step(self, batch, n):
+        z = self.encoder(batch)
+        w1 = self.decoder1(z)
+        w2 = self.decoder2(z)
+        w3 = self.decoder2(self.encoder(w1))
+        loss1 = 1 / n * torch.mean((batch - w1) ** 2) + (1 - 1 / n) * torch.mean(
+            (batch - w3) ** 2
+        )
+        loss2 = 1 / n * torch.mean((batch - w2) ** 2) - (1 - 1 / n) * torch.mean(
+            (batch - w3) ** 2
+        )
+        return {"val_loss1": loss1, "val_loss2": loss2}
+
+    def validation_epoch_end(self, outputs):
+        batch_losses1 = [x["val_loss1"] for x in outputs]
+        epoch_loss1 = torch.stack(batch_losses1).mean()
+        batch_losses2 = [x["val_loss2"] for x in outputs]
+        epoch_loss2 = torch.stack(batch_losses2).mean()
+        return {"val_loss1": epoch_loss1.item(), "val_loss2": epoch_loss2.item()}
+
+    def epoch_end(self, epoch, result):
+        print(
+            "Epoch [{}], val_loss1: {:.4f}, val_loss2: {:.4f}".format(
+                epoch, result["val_loss1"], result["val_loss2"]
+            )
+        )
+
+    def fit(
+        self, windows_train, windows_val, epochs, batch_size, opt_func=torch.optim.Adam
+    ):
+        self.to(self.device)
+        train_loader = torch.utils.data.DataLoader(
+            data_utils.TensorDataset(
+                torch.from_numpy(windows_train)
+                .float()
+                .view(([windows_train.shape[0], self.w_size]))
+            ),
+            batch_size=batch_size,
+            shuffle=False,
+            num_workers=0,
+        )
+
+        if windows_val is not None:
+            val_loader = torch.utils.data.DataLoader(
+                data_utils.TensorDataset(
+                    torch.from_numpy(windows_val)
+                    .float()
+                    .view(([windows_val.shape[0], self.w_size]))
+                ),
+                batch_size=batch_size,
+                shuffle=False,
+                num_workers=0,
+            )
+        else:
+            val_loader = None
+
+        training(
+            epochs,
+            self,
+            train_loader,
+            val_loader,
+            opt_func=opt_func,
+            device=self.device,
+        )
+
+    def predict_prob(self, windows_test, batch_size, windows_label=None):
+        self.to(self.device)
+        test_loader = torch.utils.data.DataLoader(
+            data_utils.TensorDataset(
+                torch.from_numpy(windows_test)
+                .float()
+                .view(([windows_test.shape[0], self.w_size]))
+            ),
+            batch_size=batch_size,
+            shuffle=False,
+            num_workers=0,
+        )
+        results = testing(self, test_loader, device=self.device)
+        if len(results) >= 2:
+            y_pred = np.concatenate(
+                [
+                    torch.stack(results[:-1]).flatten().detach().cpu().numpy(),
+                    results[-1].flatten().detach().cpu().numpy(),
+                ]
+            )
+        else:
+            y_pred = (results[-1].flatten().detach().cpu().numpy(),)
+        if windows_label is not None:
+            windows_label = (np.sum(windows_label, axis=1) >= 1) + 0
+            return y_pred, windows_label
+        else:
+            return y_pred
+
+
+def evaluate(model, val_loader, n, device="cpu"):
+    outputs = [
+        model.validation_step(to_device(batch, device), n) for [batch] in val_loader
+    ]
+    return model.validation_epoch_end(outputs)
+
+
+def training(
+    epochs, model, train_loader, val_loader, opt_func=torch.optim.Adam, device="cpu"
+):
+    history = []
+    optimizer1 = opt_func(
+        list(model.encoder.parameters()) + list(model.decoder1.parameters())
+    )
+    optimizer2 = opt_func(
+        list(model.encoder.parameters()) + list(model.decoder2.parameters())
+    )
+    for epoch in range(epochs):
+        logging.info(f"Training epoch: {epoch}..")
+        for [batch] in train_loader:
+            batch = to_device(batch, device)
+            # Train AE1
+            loss1, loss2 = model.training_step(batch, epoch + 1)
+            loss1.backward()
+            optimizer1.step()
+            optimizer1.zero_grad()
+
+            # Train AE2
+            loss1, loss2 = model.training_step(batch, epoch + 1)
+            loss2.backward()
+            optimizer2.step()
+            optimizer2.zero_grad()
+
+        if val_loader is not None:
+            result = evaluate(model, val_loader, epoch + 1, device)
+            model.epoch_end(epoch, result)
+            history.append(result)
+            return history
+        logging.info(f"Training epoch: {epoch} done.")
+
+
+def testing(model, test_loader, alpha=0.5, beta=0.5, device="cpu"):
+    with torch.no_grad():
+        model.eval()
+        results = []
+        for [batch] in test_loader:
+            batch = to_device(batch, device)
+            w1 = model.decoder1(model.encoder(batch))
+            w2 = model.decoder2(model.encoder(w1))
+            results.append(
+                alpha * torch.mean((batch - w1) ** 2, axis=1)
+                + beta * torch.mean((batch - w2) ** 2, axis=1)
+            )
+    return results
diff --git a/requirements/RANSyncoders.txt b/requirements/RANSyncoders.txt
new file mode 100644
index 0000000..9c1285e
--- /dev/null
+++ b/requirements/RANSyncoders.txt
@@ -0,0 +1,10 @@
+# pip install -r requirements.txt
+joblib == 1.0.0
+jupyter == 1.0.0
+keras == 2.3.1
+numpy == 1.19.2
+pandas == 1.1.5
+scikit-learn == 0.23.2
+scipy == 1.5.2
+spectrum == 0.7.5
+tensorflow == 2.1.0
\ No newline at end of file
diff --git a/requirements/anomaly_transformer.txt b/requirements/anomaly_transformer.txt
new file mode 100644
index 0000000..b9eceaa
--- /dev/null
+++ b/requirements/anomaly_transformer.txt
@@ -0,0 +1,7 @@
+hydra-core==1.1.1
+numpy==1.21.3
+omegaconf==2.1.1
+torch==1.10.0
+tqdm==4.62.3
+transformers==4.11.3
+wandb==0.12.10
\ No newline at end of file
diff --git a/requirements/interfusion.txt b/requirements/interfusion.txt
new file mode 100644
index 0000000..14420ff
--- /dev/null
+++ b/requirements/interfusion.txt
@@ -0,0 +1,22 @@
+# python 3.6.6
+more_itertools
+numpy==1.17.0
+tensorflow-gpu==1.12.0 # for gpu
+# tensorflow==1.12.0
+typing-extensions==3.7.4.1
+typing-inspect==0.5.0
+tqdm==4.31.1
+pickleshare==0.7.5
+scikit-learn==0.20.3
+scipy==1.2.1
+pandas==0.24.2
+matplotlib==2.0.2
+seaborn==0.9.0
+dataclasses==0.7
+dataclasses-json==0.3.5
+Click==7.0
+fs==2.4.4
+six==1.11.0
+git+https://github.com/thu-ml/zhusuan.git@48c0f4e
+git+https://github.com/haowen-xu/tfsnippet.git@v0.2.0-alpha4
+git+https://github.com/haowen-xu/ml-essentials.git
\ No newline at end of file
diff --git a/requirements/omnianomaly.txt b/requirements/omnianomaly.txt
new file mode 100644
index 0000000..98041bc
--- /dev/null
+++ b/requirements/omnianomaly.txt
@@ -0,0 +1,14 @@
+six == 1.11.0
+matplotlib == 3.0.2
+numpy == 1.15.4
+pandas == 0.23.4
+scipy == 1.2.0
+scikit_learn == 0.20.2
+tensorflow-gpu == 1.12.0
+tensorflow_probability == 0.5.0
+tqdm == 4.28.1
+imageio == 2.4.1
+fs == 2.3.0
+click == 7.0
+git+https://github.com/thu-ml/zhusuan.git
+git+https://github.com/haowen-xu/tfsnippet.git@v0.2.0-alpha1
\ No newline at end of file