Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation #16

Open
gabays opened this issue Dec 27, 2021 · 1 comment
Open

Documentation #16

gabays opened this issue Dec 27, 2021 · 1 comment

Comments

@gabays
Copy link

gabays commented Dec 27, 2021

  1. State clearly which step is compulsory is compulsory and which one is not at the beginning
  2. State clearly what kind of data one will need:
    a. a reference and a test set?
    b. 1 file/per author? Or multiple files is OK if it starts with the same name?
  3. Give a number to the three steps to be clear about the order (and the fact that there are three steps, the second being optional)
  4. Give an example of debug_authors.csv, feature_list.json, feats_tests.csv langcert_revised.csv… so that we know what kind of data you expect (what is a column, what is a row…)
  5. Move Alternatively, you can choose to do not specific split, but to use a leave-one-out approach. just under the title part so that it is clear that it is not a compulsory step
  6. Drop a couple of lines on how to choose the --sampling options
  7. Provide an example to play with, so that people ca check if everything works fine and observe the structure of the data

With that you should solve a lot of problems (and avoid a lot of emails like mine)

@EtienneFerrandi
Copy link

Here is my script :

python main.py -s train/* -t chars -n 3 
mv feats_tests_n3_k_5000.csv train.csv
python main.py -s test/* -t chars -n 3 -f feature_list_chars3grams5000mf.json
mv feats_tests_n3_k_5000.csv test.csv
python train_svm.py train.csv --test_path test.csv --norms --final

Notice that, for the first main.py, I get "K Limit ignored because the size of the list is lower (3302 < 5000)".

Then I get this error in from svm.py l. 190 :

myclasses = pipe.classes_
        decs = pipe.decision_function(test)
        dists = {}
        for myclass in enumerate(myclasses):
            dists[myclass[1]] = [d[myclass[0]] for d in decs]

-->

dists[myclass[1]] = [d[myclass[0]] for d in decs]
IndexError: invalid index to scalar variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants