haspirater

detect aspirated 'h' in French words (local mirror of https://gitlab.com/a3nm/haspirater)
git clone https://a3nm.net/git/haspirater/
Log | Files | Refs | README | LICENSE

commit fd8a4d40f9e1be23ce43272c3059c5d6493481e1
parent ef17fb57020d7726cf6bf9730e84bee18b4f8a5f
Author: Antoine Amarilli <a3nm@a3nm.net>
Date:   Fri, 16 Aug 2019 00:25:46 +0200

consistent reflowing

Diffstat:
README | 22++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/README b/README @@ -45,14 +45,16 @@ one possibility is returned even when both are attested. == 4. Training == -The training data used by haspirater/haspirater.py is loaded at runtime from the -haspirater/haspirater.json file which has been trained from French texts taken -from Project Gutenberg <www.gutenberg.org>, from the list in the Wikipedia -article <http://fr.wikipedia.org/wiki/H_aspir%C3%A9>, from the categories in the -French Wiktionary <http://fr.wiktionary.org/wiki/Catégorie:Mots_à_h_muet> and -<http://fr.wiktionary.org/wiki/Catégorie:Mots_à_h_aspiré> and from a custom set -of exceptions. If you want to create your own data, or adapt the approach here -to other linguistic features, read on. +The training data used by haspirater/haspirater.py is loaded at runtime +from the haspirater/haspirater.json file which has been trained from +French texts taken from Project Gutenberg <www.gutenberg.org>, from the +list in the Wikipedia article +<http://fr.wikipedia.org/wiki/H_aspir%C3%A9>, from the categories in the +French Wiktionary +<http://fr.wiktionary.org/wiki/Catégorie:Mots_à_h_muet> and +<http://fr.wiktionary.org/wiki/Catégorie:Mots_à_h_aspiré> and from a +custom set of exceptions. If you want to create your own data, or adapt +the approach here to other linguistic features, read on. The master script is make.sh which accepts French text on stdin and a list of exceptions files as arguments. Included exception files are @@ -94,8 +96,8 @@ trie carrying the value count for each occurrence having a given prefix. === 5.5. Trie compression (compresstrie.py) === The trie is then compressed by removing branches which are not needed to -infer a value, because the only possible value is already determined at that -stage. +infer a value, because the only possible value is already determined at +that stage. === 5.6. Trie majority relabeling (majoritytrie.py) ===