Documentation #16

gabays · 2021-12-27T09:52:57Z

State clearly which step is compulsory is compulsory and which one is not at the beginning
State clearly what kind of data one will need:
a. a reference and a test set?
b. 1 file/per author? Or multiple files is OK if it starts with the same name?
Give a number to the three steps to be clear about the order (and the fact that there are three steps, the second being optional)
Give an example of debug_authors.csv, feature_list.json, feats_tests.csv langcert_revised.csv… so that we know what kind of data you expect (what is a column, what is a row…)
Move Alternatively, you can choose to do not specific split, but to use a leave-one-out approach. just under the title part so that it is clear that it is not a compulsory step
Drop a couple of lines on how to choose the --sampling options
Provide an example to play with, so that people ca check if everything works fine and observe the structure of the data

With that you should solve a lot of problems (and avoid a lot of emails like mine)

The text was updated successfully, but these errors were encountered:

EtienneFerrandi · 2022-05-30T20:42:31Z

Here is my script :

python main.py -s train/* -t chars -n 3 
mv feats_tests_n3_k_5000.csv train.csv
python main.py -s test/* -t chars -n 3 -f feature_list_chars3grams5000mf.json
mv feats_tests_n3_k_5000.csv test.csv
python train_svm.py train.csv --test_path test.csv --norms --final

Notice that, for the first main.py, I get "K Limit ignored because the size of the list is lower (3302 < 5000)".

Then I get this error in from svm.py l. 190 :

myclasses = pipe.classes_
        decs = pipe.decision_function(test)
        dists = {}
        for myclass in enumerate(myclasses):
            dists[myclass[1]] = [d[myclass[0]] for d in decs]

-->

dists[myclass[1]] = [d[myclass[0]] for d in decs]
IndexError: invalid index to scalar variable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation #16

Documentation #16

gabays commented Dec 27, 2021 •

edited

Loading

EtienneFerrandi commented May 30, 2022

Documentation #16

Documentation #16

Comments

gabays commented Dec 27, 2021 • edited Loading

EtienneFerrandi commented May 30, 2022

gabays commented Dec 27, 2021 •

edited

Loading