haspirater

detect aspirated 'h' in French words (local mirror of https://gitlab.com/a3nm/haspirater)
git clone https://a3nm.net/git/haspirater/
Log | Files | Refs | README | LICENSE

commit f00ce2b41963037a63ea8c85c039f5044a495cf6
parent f2e5f523cca1bc4ef8d37da49dae9ba9aa263959
Author: Antoine Amarilli <a3nm@a3nm.net>
Date:   Sat, 17 Aug 2019 18:32:10 +0200

fix paths

Diffstat:
README | 14++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/README b/README @@ -58,10 +58,11 @@ the approach here to other linguistic features, read on. The master script is make.sh which accepts French text on stdin and a list of exceptions files as arguments. Included exception files are -datasets/additions and datasets/wikipedia. These exceptions are just like -training data and are not stored as-is; they are just piped later on in the -training phase. make.sh produces on stdout the json trie. Thus, you would run -something like the following, where corpus is your corpus: +datasets/additions and datasets/wikipedia. These exceptions are just +like training data and are not stored as-is; they are just piped later +on in the training phase. make.sh produces on stdout the json trie. +Thus, you would run something like the following, where corpus is your +corpus: cd training cat corpus | ./make.sh ../datasets/additions ../datasets/wikipedia > ../haspirater/haspirater.json @@ -111,8 +112,9 @@ confidence values. We also drop useless leaf nodes there. You can use trie2dot.py to convert the output of buildtrie.py or compresstrie.py in the dot format which can be used to render a drawing of the trie ("trie2dot.py h 0 1"). The result of such a drawing is given -as haspirater.pdf (before majoritytrie.py: contains frequency info, but -more nodes) and haspirater_majority.pdf (no frequency, less nodes). +as plots/haspirater.pdf (before majoritytrie.py: contains frequency +info, but more nodes) and plots/haspirater_majority.pdf (no frequency, +less nodes). You can use leavestrie.py to get the leaves of a trie.