Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClassificationData::reformatAsRegressionData() indexing bug #157

Open
jamiebullock opened this issue Mar 8, 2019 · 0 comments
Open

ClassificationData::reformatAsRegressionData() indexing bug #157

jamiebullock opened this issue Mar 8, 2019 · 0 comments

Comments

@jamiebullock
Copy link
Contributor

jamiebullock commented Mar 8, 2019

I have found a bug in ClassificationData::reformatAsRegressionData().

Basically, it only works if the user adds training examples with contiguous class labels starting from 1. If, class labels do not start from 1 or are non-contiguous, the method crashes at line 1137

    regressionData.addSample(data[i].getSample(),targetVector);

The reason for this is that getNumClasses() is used to set the size of targetVector but then targetVector is indexed with targetVector[ classLabel-1 ] = 1.

So if we have 2 classes with labels of 3 and 4 targetVector gets set to a size of 2 and then indexed with 2 and 3.

I can see some possible solutions to this:

  1. Change the interface for MLP::init(), or both MLP::train() so that non-contiguous classification data is not accepted. For example, have MLP::train() return false if the passed in ClassificationData does not have contiguous class labels
  2. Use the classLabel index not the class label itself as an index into targetVector then store the class labels in the MLP instance so that a reverse lookup can be performed. This introduces a coupling between MLP and ClassificationData, but then MLP is the only class that calls reformatAsRegressionData(), so maybe this is OK.

I might attempt 2, but I'd appreciate any thoughts on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant