Skip to content

Commit

Permalink
init pus
Browse files Browse the repository at this point in the history
  • Loading branch information
JinYang88 committed Jun 8, 2022
0 parents commit 6151747
Show file tree
Hide file tree
Showing 127 changed files with 18,541 additions and 0 deletions.
146 changes: 146 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# MTSBenchmark
**/checkpoints/
**/dev
**/data/
**/.vscode

.DS_Store
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
*.pkl

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/
95 changes: 95 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
## MTAD: Tools and Benchmark for Multivariate Time Series Anomaly Detection

This repository is a **M**ultivariate **T**ime Series **A**nomaly **D**etection toolkit named ***MTAD*** with a comprehensive benchmarking protocol and contains state-of-the-art methods with a unified and easy-to-use interface.

Multivariate time series are a group of inherently correlated time series. For example, in the area of manufacturing industry and Information Technology (IT) systems, an entity (e.g., a physical machine or software service) is generally equipped with a monitoring mechanism to ensure its security or reliability.

In contrast to anomaly detection on single time series, extensive recent studies indicate that dependency hidden in MTS is of great importance for accurate anomaly detection, namely, the anomaly detector should consider the MTS as a whole. To this end, state-of-the-art methods have resort to deep learning-based methods to capture the dependency for more accurate anomaly detection.

The ultimate goal of MTSAnomaly is to unify current various evaluation protocols to reveal the best performance of models proposed recently.



### Our evaluation protocol

**Threshold Selection:**

- EVT-based method
- Seaching for the optimal one

**Metrics:**

- Precision, Recall, F1-score: *how accruate a model is?*
- with **or** without point adjustment
- Delay: *how timely can a model report an anomaly?*
- Efficiency: *how fast can a model be trained and perform anomaly detection?*



### Requirement

> cd ./requirements
>
> pip install -r \<requirement file\>
### Run our benchmark

The following is an example to run benchmark for LSTM, whose configuration files are stored in the `./benchmark_config/`folder.

```
cd benchmark
python lstm_benchmark.py
```

### Models integrated in this tool

**General Machine Learning-based Models**

| Model | Paper reference |
| :------ | :----------------------------------------------------------- |
| PCA | **[2003]** Shyu M L, Chen S C, Sarinnapakorn K, et al. A novel anomaly detection scheme based on principal component classifier |
| iForest | **[ICDM'2008]** Fei Tony Liu, Kai Ming Ting, Zhi-Hua Zhou: Isolation Forest |
| LODA | **[Machine Learning'2016]** Tomás Pevný. Loda**:** Lightweight online detector of anomalies |

**Deep Learning-based Models**

| Model | Paper reference |
| :---------- | :----------------------------------------------------------- |
| AE | **[AAAI'2019]** Andrea Borghesi, Andrea Bartolini, Michele Lombardi, Michela Milano, Luca Benini. Anomaly Detection Using Autoencoders in High Performance Computing Systems |
| LSTM | **[KDD'2018]** Kyle Hundman, Valentino Constantinou, Christopher Laporte, Ian Colwell, Tom Söderström. Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding |
| LSTM-VAE | **[Arxiv'2017]** A Multimodal Anomaly Detector for Robot-Assisted Feeding Using an LSTM-based Variational Autoencoder |
| DAGMM | **[ICLR'2018]** Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Dae-ki Cho, Haifeng Chen. Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection |
| MSCRED | **[AAAI'19]** Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, Nitesh V. Chawla. A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. |
| OmniAnomaly | **[KDD'2019]** Ya Su, Youjian Zhao, Chenhao Niu, Rong Liu, Wei Sun, Dan Pei. Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network |
| MTAD-GAT | **[ICDM'2020]** Multivariate Time-series Anomaly Detection via Graph Attention Networks |
| USAD | **[KDD'2020]** USAD: UnSupervised Anomaly Detection on Multivariate Time Series. |
| InterFusion | **[KDD'2021]** Zhihan Li, Youjian Zhao, Jiaqi Han, Ya Su, Rui Jiao, Xidao Wen, Dan Pei. Multivariate Time Series Anomaly Detection and Interpretation using Hierarchical Inter-Metric and Temporal Embedding |
| TranAD | **[VLDB'2021]** TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data |
| RANSynCoders | **[KDD'2021]** Practical Approach to Asynchronous Multivariate Time Series Anomaly Detection and Localization |
| AnomalyTransformer | **[ICLR'2022]** Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy |
| GANF | **[ICLR'2022]** Graph-Augmented Normalizing Flows for Anomaly Detection of Multiple Time Series |



### Datasets

The following datasets are kindly released by different institutions or schools. Raw datasets could be downloaded or applied from the link right behind the dataset names. The processed datasets can be found [here](https://drive.google.com/drive/folders/1NEGyB4y8CvUB8TX2Wh83Eas_QHtufGPR?usp=sharing)⬇️ (SMD, SMAP, and MSL).

- Server Machine Datase (**SMD**) [Download raw datasets⬇️](https://github.com/NetManAIOps/OmniAnomaly.git)

> Collected from a large Internet company containing a 5-week-long monitoring KPIs of 28 machines. The meaning for each KPI could be found [here](https://github.com/NetManAIOps/OmniAnomaly/issues/22).
- Soil Moisture Active Passive satellite (**SMAP**) and Mars Science Laboratory rovel (**MSL**) [Download raw datasets⬇️](link)

> They are collected from the running spacecraft and contain a set of telemetry anomalies corresponding to actual spacecraft issues involving various subsystems and channel types.
- Secure Water Treatment (**WADI**) [Apply here\*](https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/)

>WADI is collected from a real-world industrial water treatment plant, which contains 11-day-long multivariate KPIs. Particularly, the system is in a normal state in the first seven days and is under attack in the following four days.
- Water Distribution (**SWAT**) [Apply here\*](https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/)

> An extended dataset of SWAT. 14-day-long operation KPIs are collected when the system is running normally and 2-day-long KPIs are obtained when the system is in attack scenarios.
\* WADI and SWAT datasets were released by iTrust, which should be individually applied. One can request the raw datasets and preprocess them with our preprocessing scripts.

92 changes: 92 additions & 0 deletions benchmark/AutoEncoder_benchmark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
import sys
from pyod.models.auto_encoder import AutoEncoder

sys.path.append("../")

import logging
import argparse
from common import data_preprocess
from common.dataloader import load_dataset
from common.utils import seed_everything, load_config, set_logger, print_to_json
from common.evaluation import Evaluator, TimeTracker
from common.exp import store_entity

seed_everything()
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--config",
type=str,
default="./benchmark_config/",
help="The config directory.",
)
parser.add_argument("--expid", type=str, default="autoencoder_SMD")
parser.add_argument("--gpu", type=int, default=-1)
args = vars(parser.parse_args())

config_dir = args["config"]
experiment_id = args["expid"]

params = load_config(config_dir, experiment_id)
set_logger(params, args)
logging.info(print_to_json(params))

data_dict = load_dataset(
data_root=params["data_root"],
entities=params["entities"],
dim=params["dim"],
valid_ratio=params["valid_ratio"],
test_label_postfix=params["test_label_postfix"],
test_postfix=params["test_postfix"],
train_postfix=params["train_postfix"],
nrows=params["nrows"],
)

# preprocessing
pp = data_preprocess.preprocessor(model_root=params["model_root"])
data_dict = pp.normalize(data_dict, method=params["normalize"])

# train/test on each entity put here
evaluator = Evaluator(**params["eval"])
for entity in params["entities"]:
logging.info("Fitting dataset: {}".format(entity))

train = data_dict[entity]["train"]
test = data_dict[entity]["test"]

model = AutoEncoder(
hidden_neurons=params["hidden_neurons"],
batch_size=params["batch_size"],
epochs=params["nb_epoch"],
l2_regularizer=params["l2_regularizer"],
verbose=1,
)

tt = TimeTracker(nb_epoch=params["nb_epoch"])

tt.train_start()
model.fit(train)
tt.train_end()

train_anomaly_score = model.decision_function(train)

tt.test_start()
anomaly_score = model.decision_function(test)
tt.test_end()

anomaly_label = data_dict[entity]["test_label"]

store_entity(
params,
entity,
train_anomaly_score,
anomaly_score,
anomaly_label,
time_tracker=tt.get_data(),
)
evaluator.eval_exp(
exp_folder=params["model_root"],
entities=params["entities"],
merge_folder=params["benchmark_dir"],
extra_params=params,
)
Loading

0 comments on commit 6151747

Please sign in to comment.