This repository contains code written for the Kaggle Diabetic Retinopathy challenge, and achieved 20th place out of 661 competitors. Read more about the challenge and the creation of this repository here.
While I competed in the challenge, this project evolved into my personal general purpose framework for training Deep Convolutional Neural Networks. It is built on top of theano, Lasagne, pylearn2 and many other general purpose python libraries.
Features in this framework:
- network architecture via json
- nonlinearities, initialization, CONV filters, FC size, CONV+FC dropout+(pooling size+stride+overlap), and padding
- data augmentation (
cmd
line options)- color casting
- flipping
- jittering
- data handling
- image normalize/standardization (
cmd
line option) - automated creation of image conversion batch files
- streaming and caching multiple minibatches in GPU (i.e. macrobatches) (
cmd
line option) - easy restoring of network parameters: easy to pause then continue training at a later time (
cmd
line option) - control over class balance per minibatch (
cmd
line option)- classes automatically evenly distributed throughout minibatches
- custom class (im)balance specifiable
- consistent training/validation sets by default (
cmd
line option) - minibatch shuffling (
cmd
line option)
- image normalize/standardization (
- evaluation
- automatic autosave of best results during training
- plotting CONV layer weights
- occlusion heatmap studies
- plotting network results
- prediction from 1 or more models (
cmd
line option)
- misc
cmd
line options- multiple error functions
- cross+relative entropy
- nnrank
- mse
- error function code is easy to swap
- learning rate, flipping noise decay controllable
- color/grayscale switching
- number output classes configurable
- label/image source
- CPU or GPU
- c01b or bc01
- detects if fundus image was taken with indirect or direct ophthalmoscope with 90% accuracy (using identification tab from indirect ophthalmoscopes)
- multiple error functions
sudo apt-get install -y git python-pip python-yaml python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose libfreetype6-dev libpng-dev
sudo pip install theano scikit-learn scikit-image nyanbar natsort
- If
skimage.io
has issues, try:sudo pip install -U scikit-image
. - I used theano version '0.7.0'
git clone https://github.com/Lasagne/Lasagne.git && cd Lasagne/ && git checkout 4e4f2f4fdefdab6c2634c7ba080dc3e036782378 && pip install -r requirements.txt && sudo python setup.py install && cd ..
git clone git://github.com/lisa-lab/pylearn2.git && cd pylearn2/ && git checkout 04c77eb9998c9dad1f2efa020736989005cd9c98 && python setup.py develop && sudo python setup.py develop && cd ..
Create a ~/.theanorc file
Ex Contents:
[global]
floatX = float32
device = gpu0
Override with another device via: THEANO_FLAGS='device=gpu0'
prefix. Get a list of gpus via: nvidia-smi -L
.
Also ensure that something like the following lines are in your ~/.bashrc
:
export PATH=/usr/local/cuda-7.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64:$LD_LIBRARY_PATH
sudo apt-get install -y p7zip-full graphicsmagick
Download from kaggle (maybe with w3m) and place in data/train
, to unpack run:
7z e -oorig/ train.zip.001
This will place the images into data/train/orig
After placing the test zip files into data/test
you can run a similar command 7z e -oorig/ test.zip.001
to place the images into data/test/orig
Place trainLabels.csv
into data/train
(Ex) This will create 3 batchfiles for graphicsmagick to output 128x128 pngs:
mkdir data/train/centered_crop
python my_code/create_resize_batchfiles.py data/train/orig/ data/train/centered_crop/ 2 128 3
Then follow the on screen directions, which will list what commands to run to process the images cataloged in the generated batchfiles.
Depending on how your CPU schedules, more than 1 batchfile may not result in any speedup (3 is the best size for me however).
It is possible to run the network on a cpu, though keep in mind it is 15 times slower to train a single batch of size 128!* There are two things to keep in mind:
- prefix your python command with:
THEANO_FLAGS='device=cpu'
- you must use the command line options
-cc 0 -fs bc01
when you run the network.
*On an Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz with the default network on 128px images, it takes 54.8 mins per epoch on the CPU versus 3.7 mins/epoch on the GPU.
python -m my_code.VGGNet -x 160
python -m my_code.VGGNet -d data/train/cent_crop_192/ -n vgg_mini7b_leak_sig_ecp -x 200
python -m my_code.VGGNet -d data/train/cent_crop_256/ -n vgg_mini7b_leak_sig_ecp -x 200
python -m my_code.plot_results -f results/best_results.pkl
python -m my_code.predict -M models/modelfile.pkl -D data/test/cent_crop_192/
*This command will print out where it saves a *.csv file submittable to Kaggle, as well as a .pkl file containing the network's raw outputs, ready to be ensembled with other raw outputs.
python -m my_code.avg_raw_ouputs results/my_2nd_best.pkl,results/my_1st_best.pkl
Combine "My 2nd best Network" with "My best Network" to get a Kappa ~0.76
python -m my_code.compare_csv data/train/trainLabels.csv results/result1.csv
python -m my_code.plot_occluded_activations -M models/mymodel.pkl -D data/train/centered_crop/41188_right.png
make test
To reduce noise in the training dataset, detect which images are inverted (taken with an indirect ophthalmoscope) and which are left/right, and invert the images until optic nerve is on the right side of the image.
python my_code/batch_align.py data/train/orig/ n i
This will run the ith of n partitions that creates a csv of which inversions to perform on the images in that partition. For example, you could run:
python my_code/batch_align.py data/train/orig/ 3 1
python my_code/batch_align.py data/train/orig/ 3 2
python my_code/batch_align.py data/train/orig/ 3 3
In three different screen
sessions for parallel processing. Each will report having created a csv file. You can join these multiple csvs into one with: awk 'FNR==1 && NR!=1{next;}{print}' *.csv > my.csv
A 90% accurate alignment of the training set is made available here. horizontal_flip == 1
means the image should be flipped on its horizontal axis (upside-down). vertical_flip == 1
means the image should be flipped on its vertical axis (left-right). Doing both of the flips as specified in the csv file will lead to 90% of the training images having the optic nerve on the right, slightly above the horizontal.