This project compares solutions to the Traveling Salesman Problem (TSP) using both Q-Learning and a Policy Gradient approach with a neural network. The Q-Learning algorithm is based on tabular Q-values, while the Policy Gradient approach utilizes a neural network to learn a policy.
- Python (>=3.6)
- NumPy
- TensorFlow (for Q-Learning)
- PyTorch (for Policy Gradient)
- Matplotlib
Clone the repository:
```bash
git clone https://github.com/your-username/traveling-salesman.git
cd traveling-salesman
Run the file by using the folllowing command
```bash
python QL_PG.py
Change the parameters in the script and observe the graph.
num_cities, num_episodes, epsilon, alpha, gamma
num_cities, input_size, hidden_size, output_size, learning_rate, num_episodes
View the results for loss over episdoes for Q-Learning and Policy Gradient Respectively.