Author: Antoine Amarilli <firstname.lastname@example.org>
Date: Sat, 17 Aug 2019 18:32:10 +0200
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/README b/README
@@ -58,10 +58,11 @@ the approach here to other linguistic features, read on.
The master script is make.sh which accepts French text on stdin and a
list of exceptions files as arguments. Included exception files are
-datasets/additions and datasets/wikipedia. These exceptions are just like
-training data and are not stored as-is; they are just piped later on in the
-training phase. make.sh produces on stdout the json trie. Thus, you would run
-something like the following, where corpus is your corpus:
+datasets/additions and datasets/wikipedia. These exceptions are just
+like training data and are not stored as-is; they are just piped later
+on in the training phase. make.sh produces on stdout the json trie.
+Thus, you would run something like the following, where corpus is your
cat corpus | ./make.sh ../datasets/additions ../datasets/wikipedia > ../haspirater/haspirater.json
@@ -111,8 +112,9 @@ confidence values. We also drop useless leaf nodes there.
You can use trie2dot.py to convert the output of buildtrie.py or
compresstrie.py in the dot format which can be used to render a drawing
of the trie ("trie2dot.py h 0 1"). The result of such a drawing is given
-as haspirater.pdf (before majoritytrie.py: contains frequency info, but
-more nodes) and haspirater_majority.pdf (no frequency, less nodes).
+as plots/haspirater.pdf (before majoritytrie.py: contains frequency
+info, but more nodes) and plots/haspirater_majority.pdf (no frequency,
You can use leavestrie.py to get the leaves of a trie.