Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring: streamline perf() and tune() functions to do performance assessment on just input model #343

Open
4 tasks done
evaham1 opened this issue Nov 12, 2024 · 1 comment · Fixed by #344 · May be fixed by #348
Open
4 tasks done

Refactoring: streamline perf() and tune() functions to do performance assessment on just input model #343

evaham1 opened this issue Nov 12, 2024 · 1 comment · Fixed by #344 · May be fixed by #348
Assignees
Labels
refactorisation For improvements to the package which don't change functionality, just improve backend structure

Comments

@evaham1
Copy link
Collaborator

evaham1 commented Nov 12, 2024

Describe the issue
The functions perf() and tune() perform overlapping roles which do not follow exactly their nomenclature.

Expected behaviour
Based on function naming, would expect tune() to build and assess multiple models with different parameters to aid parameter tuning whilst perf() would assess the performance of just one, final model.

Solution
Refactor the code for perf() and tune() so that they can still be run as before BUT also can be run in this new way:

# perform tuning for component selection across all distances
tune(X, Y, ncomp = 10, keepX = NULL, dist = NULL)
# perform tuning for variable selection using chosen ncomp and distance
tune(X, Y, ncomp = 3, keepX = c(1, 10, 100), dist = "max.dist")
# once have final model, assess performance using just the ncomp used
perf(model)

Old way of running these methods also needs to remain unchanged, i.e.

# tuning for keepX only
tune(X, Y, ncomp = 3, keepX = c(1, 10, 100), dist = "max.dist")
# using perf() to identify optimal number of components
perf(model, ncomp = 10)

Update: for this to be possible will need to make a new function called perf.assess() rather than edit perf() as would not be possible to both add new functionality and keep old function mechanism.

Functions to add:

  • perf.assess.mixo.plsda() == perf.assess.mixo.splsda()
  • perf.assess.mixo.pls() == perf.assess.mixo.spls()
  • perf.assess.sgccda()
  • perf.assess.mint.plsda() == perf.assess.mint.splsda()
@evaham1 evaham1 added the refactorisation For improvements to the package which don't change functionality, just improve backend structure label Nov 12, 2024
@evaham1 evaham1 self-assigned this Nov 12, 2024
@evaham1 evaham1 linked a pull request Nov 13, 2024 that will close this issue
@evaham1 evaham1 changed the title Refactoring: streamline perf() and tune() functions so they perform different roles Refactoring: streamline perf() functions to do performance assessment on just input model Nov 18, 2024
@evaham1 evaham1 removed their assignment Nov 18, 2024
@evaham1 evaham1 changed the title Refactoring: streamline perf() functions to do performance assessment on just input model Refactoring: streamline perf() and tune() functions to do performance assessment on just input model Nov 19, 2024
@evaham1
Copy link
Collaborator Author

evaham1 commented Nov 19, 2024

Going through tune() functions to see which need to be updated:

  • tune.pca() is fine - tunes ncomp

  • tune.rcc() is fine - tunes lambda1 and lambda2

  • tune.spca() is fine - tunes ncomp and test.keepX simultaneously, but as described in the sPCA Multidrug Case Study users should use tune.pca() to choose ncomp and then tune.spca() to choose which variables to keep

  • tune.splslevel() does not need to be edited - only for variable selection

  • tune.splsda() tunes ncomp and test.keepX but doesn't work when test.keepX = NULL

plot(tune(method = "splsda", liver.toxicity$gene, liver.toxicity$treatment$Treatment.Group, 
     test.keepX = c(15, 25), ncomp = 3, nrep = 3)) # works
plot(tune(method = "splsda", liver.toxicity$gene, liver.toxicity$treatment$Treatment.Group, 
          test.keepX = NULL, ncomp = 3, nrep = 3)) # fails
  • tune.plsda() which only tunes ncomp doesn't exist (essentially should be same as perf())
  • tune.spls() tunes ncomp and test.keepX and test.keepY, but fails if both keepX = NULL and keepY = NULL
plot(tune.spls(liver.toxicity$gene, liver.toxicity$clinic, ncomp = 3,
                      test.keepX = c(5, 10, 15), test.keepY = c(3, 6, 8),
                      folds = 5, nrepeat = 3, progressBar = TRUE)) # works
plot(tune.spls(liver.toxicity$gene, liver.toxicity$clinic, ncomp = 3,
               test.keepX = NULL, test.keepY = c(3, 6, 8),
               folds = 5, nrepeat = 3, progressBar = TRUE)) # works
plot(tune.spls(liver.toxicity$gene, liver.toxicity$clinic, ncomp = 3,
               test.keepX = c(5, 10, 15), test.keepY = NULL,
               folds = 5, nrepeat = 3, progressBar = TRUE)) # works
plot(tune.spls(liver.toxicity$gene, liver.toxicity$clinic, ncomp = 3,
               test.keepX = NULL, test.keepY = NULL,
               folds = 5, nrepeat = 3, progressBar = TRUE)) # fails
  • tune.spls1() which is called inside tune.spls() can stay the same but update examples and make sure spls1 models can be tuned in the new method using tune.spls()

  • tune.pls() which only tunes ncomp doesn't exist (essentially should be same as perf()) (also make sure this runs for pls1 models)

  • tune.block.splsda() - tunes test.keepX for each data block and component, need to edit so it can just run perf() if test.keepX = NULL

  • tune.block.plsda() - which only tunes ncomp doesn't exist (should basically call perf())

  • tune.mint.splsda() - tunes test.keepX, need to edit so can just run perf() if test.keepX = NULL

  • tune.mint.plsda() - which only tunes ncomp doesn't exist (should basically call perf())

  • put all new tuning functions into tune() wrapper and add to unit tests to make sure they run the same in and out of the wrapper. Also check how all new and adapted tune functions work with variety of params like modes, distances, validation methods, multilevel, etc.

  • add unit tests for all new tune functions to check they give the same outputs as perf() on same models

@evaham1 evaham1 reopened this Nov 22, 2024
@evaham1 evaham1 self-assigned this Nov 26, 2024
@evaham1 evaham1 linked a pull request Nov 26, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactorisation For improvements to the package which don't change functionality, just improve backend structure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant