Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error of multiple of replacement in perf for plsda and possible solution #303

Closed
guannan-yang opened this issue Feb 19, 2024 · 4 comments · Fixed by #349
Closed

Error of multiple of replacement in perf for plsda and possible solution #303

guannan-yang opened this issue Feb 19, 2024 · 4 comments · Fixed by #349
Assignees
Labels
bug Something isn't working

Comments

@guannan-yang
Copy link

guannan-yang commented Feb 19, 2024

Hi mixOmics team and users,


🐞 Describe the bug:
When run function perf on a plsda/ splsda object, this error might happen:

Error in ncomp_opt[measure, ijk] <- which(t(rowMeans(mat.error.rate[[measure_i]][[ijk]])) == :
number of items to replace is not a multiple of replacement length

This is from the step to return the number of optimal component according to the error rates and happens when there are more than one component has the same minimum error rates. This error happens when using LOOV (validation = "loo, with always nrepeat = 1) or Mfold with nrepeat < 3 and more often when the sample size is very small.


💡 Possible solution:
Before another updated mixOmics comes out, here is my solution in case you also encounter it:

  1. download the source code of perf.
  2. find the function perf.mixo_plsda (line 740- end), copy.
  3. paste to a new R source document, modify and define a new function (e.g. named 'mod_perf.mixo_plsda'). Modify line 297 in the new doc (or line 1036 in the perf source code) to: ncomp_opt[measure, ijk] = which(t(rowMeans(mat.error.rate[[measure_i]][[ijk]])) == min(t(rowMeans(mat.error.rate[[measure_i]][[ijk]]))))[1] (just add a [1] at the end of this line in the orginal code to avoid multiple replacement). Then run the new function.
  4. call to environment() assures that the function will be able to call other hidden functions in mixOmics by using the code: environment(mod_perf.mixo_plsda) <- asNamespace("mixOmics").
  5. run the function for your splsda or plsda object, e.g. splsda.perf <- mod_perf.mixo_plsda(splsda, validation = "loo"). And it should work now!

Hope it helps!

Guan

@guannan-yang guannan-yang added the bug Something isn't working label Feb 19, 2024
@iandanilevicz
Copy link

Hi @esheeep, I tried validation = 'Mfold', nrepeat = 50 and it displayed the same error, so it is not restricted to small nrepeat, I tried the proposed change but without success. The bug persists, it doens't matter if I put loo or Mfold.

@iandanilevicz
Copy link

Hi all,
Some additional issues in my case, the bug persists for plsda validation="loo" when there are 3 or more principal components and dist="max.dist", because when I select dist="centroids.dist" or dist = "mahalanobis.dist" is everything ok, of course, dist = "all" also doesn't run properly.

@evaham1
Copy link
Collaborator

evaham1 commented Oct 31, 2024

Hi @guannan-yang and @iandanilevicz ! Thanks for flagging this issue, I will look into this, could you please share a reproducible example to illustrate the bug? As the error is during cross-validation you will need to make sure the example is reproducible, you can do this by setting RNGseed in the BPPARAM argument of perf. Thanks very much!

@evaham1
Copy link
Collaborator

evaham1 commented Nov 12, 2024

add warning to this fix so its clear to the user that two solutions were found

@evaham1 evaham1 assigned evaham1 and unassigned aljabadi Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants