The Python graphical authorship attribution program is a text classification tool in Python and a port of JGAAP, the original java version. It follows the typical steps seen in a text classification task: from pre-processing, feature extraction, and feature culling to text embedding and analysis.
- Please see the developer manual if you would like to extend PyGAAP.
- Launch PyGAAP,
- Click on
Files
$\rightarrow$ AAAC Problems
$\rightarrow$ Problem A
- Switch to the
Event drivers
tab, click onCharacter NGrams
, thenAdd
- Switch to the
Embeddings
tab and click onFrequency
, thenAdd
- Switch to the
Analysis Methods
tab and click onLinear SVM (sklearn)
, thenAdd
. - Finally, switch to
Review & Process
tab, pressProcess
, and wait for the results!
The module types in PyGAAP have alternative names one may have seen in academic papers or other machine learning programs. Here are some of the names one may encounter.
Canonicizers
: Pre-processing; Text normalization.Event Drivers
: Feature (set) extraction, Characteristics extraction, "Write-print" (a particular feature extraction method).Event Culling
: Feature (set) culling/filtering.Embeddings
: Feature embeddingAnalysis Methods
: Classifiers, Algorithms.Distance Functions
: Distances, Metrics.
parameters
PyGAAP is cross-compatible with JGAAP experiment csv format. Simply run the CLI as if you'd run a PyGAAP csv. Caution that since JGAAP's exp. csv format doesn't specify an embedding, the default will always be Frequency
.