-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Systematic framework tests for all machines #4712
base: develop
Are you sure you want to change the base?
Conversation
} | ||
} | ||
|
||
TEST(AllMachines, cv_thread_consistency) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@theartful these are the kind of tests I was talking about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are very nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think separating the data generating code to the public interface would be a good idea? examples would benefit from them for instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I want to move all that into an environment similar to those used by trained_model_serialization
so the data can be re-used for other instances. Will add that to the list, thanks
8c4fb25
to
123757c
Compare
@@ -1,206 +0,0 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are subsumed by the new ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant and inconsistent
|
||
TEST(AllMachines, train_thread_consistency) | ||
{ | ||
std::set<std::string> ignores = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-paste all around the source... plz avoid redundancy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ignores are test-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah 3x times the very same ignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is because the number of tested machines is small and not yet automated. Although there are already differences here, see random forest. For more machines tested, there will be more diversity. If not, i can of course drop this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you copy paste 3 times the value... you could create a value of it instead of copy-pasting the same string around. 1 copy-paste is one too many imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, i will look into this once the coverage is bigger
cv->put("seed", 1); | ||
auto result_single = cv->evaluate(); | ||
|
||
get_global_parallel()->set_num_threads(4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a by-chance checking of the consistency... depends on many aspects. for example on the machine that the code is being run on etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, but this already revealed bugs. open for ways to improve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's one thing that you have a test locally to have this tested... its another thing that have this formalised in a test that is part of the codebase and the CI that is going to be run on many different arch/distro etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example:
- do not use global_parallel - that is going to be dropped eventually - the sooner the better
- why 4, why not 8, or 3 or 11 or 42? you could actually use a function to get the runtime env supported number of max threads and use that number, and if it's 1 then say that foobar.
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the number is of course arbitrary, good idea to use the available number of threads, i will do that
init_machine(machine); | ||
auto data = generate_data(machine); | ||
|
||
machine->set_labels(data.second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the triplet: set_labels, train, apply sprayed around this source again... while it could be a function that returns a tuple that you nicely tie and get it back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for further info: https://en.cppreference.com/w/cpp/language/structured_binding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but in this case you would actually just return the predictions... so you dont even need that... just pass the rvalue of the tuple (labels, data)
123757c
to
79d7838
Compare
} | ||
|
||
// TODO, generate this automatically, like in trained_model_serialization | ||
std::set<string> all_machines = {"LibSVM", "Perceptron", "LibLinear", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least make a function that returns this set instead of having it here a global var....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do once this extraction is automated
|
||
get_global_parallel()->set_num_threads(1); | ||
machine->set_labels(data.second); | ||
if (machine->has("seed")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's actually a function for this: seed(machine, 1);
auto result_single = machine->apply(data.first); | ||
|
||
init_machine(machine); | ||
get_global_parallel()->set_num_threads(4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total random number....
get_global_parallel()->set_num_threads(4); | ||
machine2->set_labels(data.second); | ||
if (machine2->has("seed")) | ||
machine2->put("seed", 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess the seed should be the same as above, right? in that case again just copy-pasting the same number around... refactoring is a pain.. use a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
machine2->set_labels(data.second); | ||
if (machine2->has("seed")) | ||
machine2->put("seed", 1); | ||
machine2->train(data.first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as below: set_labels, train, apply. could you use a function for this?
the seeding part could be fixed by having that function a 4th arg that has a default value of some special number that means no seeding...
machine_subsampled->put("seed", 1); | ||
} | ||
|
||
machine_subset->set_labels(labels_subset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set_labels, train, apply... all over again
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
testing
trained_model_serialization.cc.py
tests