Skip to content

Latest commit

 

History

History
30 lines (18 loc) · 1.64 KB

README.md

File metadata and controls

30 lines (18 loc) · 1.64 KB

Bus Lines Categorization

This is the second assignment of "Data Mining" course (spring 2018).

Requirements:

Dataset format:

img not found

  • Part 1: The purpose of this part is to familiarize us with the use of gmplot python library by visualizing 5 different bus lines (i.e: journeyPatternIDs).

  • Part 2: For every bus line(i.e: trajectory) in test_set_a1.csv we need to find its 5 neighbors* from the train_set.csv file. We utilize Dynamic Time Warping (DTW) as similarity measure between two trajectories.

  • Part 3: In this part we do the same as the in the previous part with the exception of utilizing Longest Common Subsequence (LCS) as similarity measure this time.

  • Part 4: The main task of this part is to predict the bus line that each trajectory in test_set_a2.csv is part of. For this purpose we create a typical KNN-Classifier.

Multiprocessing is achieved using Python's multiprocessing module.

* By "neighbors" we mean the 5 most similar trajectories to the one currently being tested.