initial commit - frhyme - guess the last phonemes of a French word

commit 7bbc79a5ef1dee811df235f8d8f43703c6a74fa1
Author: Antoine Amarilli <a3nm@a3nm.net>
Date:   Mon, 30 May 2011 04:41:16 -0400

initial commit

Diffstat:
README  | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
additions  | 29 +++++++++++++++++++++++++++++
buildtrie.py  | 42 ++++++++++++++++++++++++++++++++++++++++++
compresstrie.py  | 22 ++++++++++++++++++++++
detect.pl  | 22 ++++++++++++++++++++++
haspirater.json  | 1 +
haspirater.py  | 40 ++++++++++++++++++++++++++++++++++++++++
majoritytrie.py  | 24 ++++++++++++++++++++++++
make.sh  | 13 +++++++++++++
prepare.sh  | 6 ++++++
trie2dot.py  | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
wikipedia  | 593 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

12 files changed, 957 insertions(+), 0 deletions(-)
diff --git a/README b/README
@@ -0,0 +1,105 @@
+haspirater -- a toolkit to detect aspirated 'h' in French words
+Copyright (C) 2011 by Antoine Amarilli
+
+== 0. Licence ==
+
+Permission is hereby granted, free of charge, to any person obtaining a
+copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice shall be included
+in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+== 1. Features ==
+
+haspirater is a tool to detect if a French word starts with an aspirated
+'h' or not. It is not based on a list of words but on a trie trained
+from a corpus, which ensures that it should do a reasonable job for
+unseen words which are similar to known ones, without carrying a big
+exceptions list. The json trie used is less than 5 Kio, and the lookup
+script is 40 lines of Python.
+
+== 2. Usage ==
+
+If you just want to use the included training data, you can either run
+haspirater.py, giving one word per line in stdin and getting the
+annotation on stout, or you can import it in a Python file and call
+haspirater.lookup(word) which returns True if the leading 'h' is
+aspirated, False if it isn't, and raises ValueError if there is no
+leading 'h'.
+
+Please report any errors in the training data, keeping in mind than only
+one possibility is returned even when both are attested.
+
+== 3. Training ==
+
+The training data used by haspirater.py is loaded at runtime from the
+haspirater.json file which has been trained from French texts taken from
+Project Gutenberg <www.gutenberg.org>, from the list in the Wikipedia
+article <http://fr.wikipedia.org/wiki/H_aspir%C3%A9>, and from a custom
+set of exceptions. If you want to create your own data, or adapt the
+approach here to other linguistic features, read on.
+
+The master script is make.sh which accepts French text on stdin and a
+list of exceptions files as arguments. Included exception files are
+additions and wikipedia. These exceptions are just like training data
+and are not stored as-is; they are just piped later on in the training
+phase. make.sh produces on stdout the json trie. Thus, you would run
+something like:
+
+  $ cat corpus | ./make.sh exceptions > haspirater.json
+
+== 4. Training details ==
+
+=== 4.1. Corpus preparation (prepare.sh) ===
+
+This script removes useless characters, and separates words (one per
+line).
+
+=== 4.2. Property inference (detect.pl) ===
+
+This script examines the output, notices occurrences of words for which
+the preceding word indicates the aspirated or non-aspirated status, and
+outputs them.
+
+=== 4.3. Removing leading 'h' ===
+
+This is a quick optimization.
+
+=== 4.4. Trie construction (buildtrie.py) ===
+
+The occurrences are read one after the other and are used to populate a
+trie carrying the value count for each occurrence having a given prefix.
+
+=== 4.5. Trie compression (compresstrie.py) ===
+
+The trie is then compressed by removing branches which are not needed to
+infer a value. This step could be followed by a removal of branches with
+very little dissent from the majority value if we wanted to reduce the
+trie size at the expense of accuracy: for aspirated h, this isn't
+needed.
+
+=== 4.5. Trie majority relabeling (majoritytrie.py) ===
+
+Instead of the list of values with their counts, nodes are relabeled to
+carry the most common value. This step could be skipped to keep
+confidence values.
+
+== 5. Additionnal stuff ==
+
+You can use trie2dot.py to convert the output of buildtrie.py or
+compresstrie.py in the dot format which can be used to render a drawing
+of the trie.
+
diff --git a/additions b/additions
@@ -0,0 +1,29 @@
+1 heaume
+1 hè1ment
+1 hertz
+1 héraut
+1 hit-parade
+1 high-five
+1 hlm
+1 hobby
+1 hongrois
+1 homard
+1 hoquet
+1 hors-piste
+1 hors-bord
+1 huée
+1 hildegarde
+1 hiroshima
+1 heimatlos
+0 Haÿ-1s-Roses
+0 heur
+0 heure
+0 h
+1 have1r
+0 hallucination
+0 hallucine
+0 halène
+0 halèner
+0 hadopisme
+1 hadopi
+0 hellénisme
diff --git a/buildtrie.py b/buildtrie.py
@@ -0,0 +1,42 @@
+#!/usr/bin/env python3
+
+"""From a list of values (arbitrary) and keys (words), create a trie
+representing this mapping"""
+
+import json
+import sys
+
+# first item is a dictionnary from values to an int indicating the
+# number of occurrences with this prefix having this value
+# second item is a dictionnary from letters to descendent nodes
+def empty_node():
+  return [{}, {}]
+
+trie = empty_node()
+
+def insert(trie, key, val):
+  """Insert val for key in trie"""
+  values, children = trie
+  # create a new value, if needed
+  if val not in values.keys():
+    values[val] = 0
+  # increment count for val
+  values[val] += 1
+  if len(key) > 0:
+    # create a new node if needed
+    if key[0] not in children.keys():
+      children[key[0]] = empty_node()
+    # recurse
+    return insert(children[key[0]], key[1:], val)
+
+for line in sys.stdin.readlines():
+  line = line.split()
+  value = line[0]
+  word = line[1] if len(line) == 2 else ''
+  # a trailing space is used to mark termination of the word
+  # this is useful in cases where a prefix of a word is a complete,
+  # different word with a different value
+  insert(trie, word+' ', value)
+
+print(json.dumps(trie))
+
diff --git a/compresstrie.py b/compresstrie.py
@@ -0,0 +1,22 @@
+#!/usr/bin/env python3
+
+"""Read json trie in stdin, trim unneeded branches and output json dump
+to stdout"""
+
+import json
+import sys
+
+trie = json.load(sys.stdin)
+
+def compress(trie):
+  """Compress the trie"""
+  if len(trie[0].keys()) <= 1:
+    # no need for children, there is no more doubt
+    trie[1] = {}
+  for child in trie[1].values():
+    compress(child)
+
+compress(trie)
+
+print(json.dumps(trie))
+
diff --git a/detect.pl b/detect.pl
@@ -0,0 +1,22 @@
+#!/usr/bin/perl
+
+# From a list of '\n'-separated words, output occurrences of words
+# starting by 'h' when it can be inferred whether the word is aspirated
+# or not. The format is "0 word" for non-aspirated and "1 word" for
+# aspirated.
+
+my $asp; # will the next word be aspirated?
+
+while (<>) {
+  $_ = lc($_);
+  print "$asp $_" if (/^h/i && $asp >= 0);
+  chop;
+  # we store in asp what the current word indicates about the next word
+  $asp = -1; # default is unknown
+  $asp = 0 if /^[lj]'$/;
+  $asp = 0 if /^cet$/;
+  $asp = 1 if /^ce$/;
+  # only meaningful are "je", "de", "le" and "la"
+  $asp = 1 if /^[jdl][ea]$/;
+}
+
diff --git a/haspirater.json b/haspirater.json
@@ -0,0 +1 @@
+["0", {"a": ["1", {" ": ["1", {}], "c": ["1", {}], "b": ["0", {"i": ["0", {}], "a": ["1", {}], "e": ["0", {}]}], "d": ["1", {"a": ["1", {}], "d": ["1", {}], "j": ["1", {}], "o": ["1", {"p": ["1", {"i": ["1", {" ": ["1", {}], "s": ["0", {}]}]}]}], "\u00ee": ["1", {}], "r": ["0", {}]}], "g": ["1", {}], "i": ["1", {}], "h": ["1", {}], "m": ["1", {}], "l": ["1", {"a": ["1", {}], "b": ["1", {}], "e": ["1", {"i": ["0", {}], "c": ["1", {}], "r": ["1", {}], "t": ["1", {}]}], "d": ["1", {}], "i": ["0", {}], "\u00e8": ["0", {"t": ["1", {}], "n": ["0", {}]}], "l": ["1", {"a": ["0", {}], " ": ["1", {}], "e": ["1", {}], "i": ["1", {}], "s": ["1", {}], "u": ["0", {}]}], "o": ["1", {}], "t": ["1", {}]}], "\u00ef": ["1", {}], "n": ["1", {}], "q": ["1", {}], "p": ["1", {}], "s": ["1", {}], "r": ["1", {"a": ["1", {}], "c": ["1", {}], "e": ["1", {}], "d": ["1", {}], "g": ["1", {}], "f": ["1", {}], "i": ["1", {}], "k": ["1", {}], "m": ["0", {}], "l": ["1", {}], "o": ["1", {}], "n": ["1", {}], "p": ["1", {}], "r": ["1", {}], "t": ["1", {}], "v": ["1", {}]}], "u": ["1", {}], "v": ["1", {}], "y": ["1", {}]}], " ": ["0", {}], "\u00e2": ["1", {}], "e": ["0", {"a": ["1", {"r": ["1", {}], "u": ["1", {"m": ["1", {}], "t": ["0", {}]}]}], "i": ["1", {}], "m": ["1", {}], "l": ["0", {"l": ["0", {"\u00e9": ["0", {}], "e": ["0", {}], "o": ["1", {}]}], "v": ["0", {}]}], "n": ["1", {}], "p": ["1", {}], "s": ["1", {}], "r": ["0", {"c": ["1", {"h": ["1", {}], "u": ["0", {}]}], "b": ["0", {}], "m": ["0", {"\u00e9": ["0", {}], "i": ["0", {"t": ["1", {"a": ["0", {}], "i": ["1", {}]}], "n": ["0", {}]}]}], "n": ["1", {}], "s": ["1", {}], "t": ["1", {}]}], "u": ["0", {"s": ["1", {}], "r": ["0", {" ": ["0", {}], "e": ["0", {}], "t": ["1", {}]}], "l": ["1", {}], "/": ["1", {}]}], "x": ["0", {}]}], "i": ["0", {"a": ["1", {}], " ": ["1", {}], "c": ["1", {}], "b": ["1", {}], "e": ["1", {}], "d": ["1", {}], "g": ["1", {}], "f": ["1", {}], "\u00e9": ["1", {}], "h": ["1", {}], "l": ["1", {"a": ["0", {"i": ["1", {}], "r": ["0", {}]}], "b": ["1", {}], "e": ["1", {}], "d": ["1", {}], "o": ["1", {}]}], "n": ["1", {"d": ["1", {"i": ["1", {}], "o": ["0", {}]}]}], "p": ["0", {"p": ["0", {"i": ["1", {}], "o": ["0", {}]}], "h": ["1", {}], " ": ["1", {}]}], "s": ["0", {"s": ["1", {}], "t": ["0", {}]}], "r": ["1", {"a": ["1", {}], "o": ["1", {"s": ["1", {}], "n": ["0", {}]}]}], "t": ["1", {}], "v": ["0", {}]}], "\u00e8": ["1", {"1": ["1", {}], "r": ["1", {}], "b": ["0", {}], "l": ["1", {}]}], "\u00ea": ["1", {}], "l": ["1", {}], "o": ["0", {" ": ["1", {}], "c": ["1", {}], "b": ["1", {}], "d": ["1", {}], "g": ["1", {}], "f": ["1", {}], "m": ["0", {"\u00e9": ["0", {}], "a": ["1", {}], "m": ["0", {}], "e": ["1", {}], "o": ["0", {}]}], "l": ["1", {"\u00e0": ["1", {}], "l": ["1", {}], "o": ["0", {}], "d": ["1", {}]}], "o": ["1", {}], "n": ["0", {" ": ["1", {}], "d": ["1", {}], "g": ["1", {}], "o": ["0", {}], "n": ["0", {"i": ["1", {}], "\u00ea": ["0", {}], "e": ["0", {}]}], "s": ["1", {}], "u": ["0", {}], "t": ["1", {}]}], "q": ["1", {}], "p": ["1", {}], "s": ["0", {"a": ["1", {}], "p": ["0", {}], "t": ["0", {}]}], "r": ["0", {"d": ["1", {}], "i": ["0", {"z": ["0", {}], "o": ["1", {}]}], "m": ["1", {}], "l": ["0", {}], "n": ["1", {}], "s": ["1", {}], "r": ["0", {}]}], "u": ["1", {}], "t": ["1", {}], "y": ["1", {}]}], "\u00e9": ["0", {"s": ["0", {}], "r": ["1", {"i": ["0", {"s": ["1", {}], "t": ["0", {}]}], "a": ["1", {"u": ["1", {}], "l": ["0", {}]}], "\u00e9": ["0", {}], "o": ["1", {"s": ["1", {}], "\u00ef": ["0", {}], "n": ["1", {}]}]}], "m": ["0", {}], "l": ["0", {"i": ["0", {}], "a": ["1", {" ": ["1", {}], "s": ["0", {}]}], "e": ["1", {}]}], "b": ["0", {}]}], "u": ["0", {"a": ["1", {}], "c": ["1", {}], "b": ["1", {}], "e": ["1", {}], "d": ["0", {}], "g": ["1", {}], "p": ["1", {}], "i": ["1", {"s": ["0", {}], "t": ["1", {}], "l": ["0", {}]}], "m": ["0", {"a": ["0", {"i": ["0", {"s": ["1", {}], "n": ["0", {}]}], "g": ["1", {}], "n": ["0", {}]}], " ": ["1", {}], "b": ["0", {"l": ["0", {}], "o": ["1", {}]}], "e": ["0", {" ": ["1", {}], "r": ["1", {}], "u": ["0", {"x": ["1", {}], "r": ["0", {}]}], "m": ["1", {}]}], "i": ["0", {}], "o": ["1", {}], "p": ["1", {}], "u": ["0", {}]}], "l": ["1", {}], "\u00ee": ["0", {}], "q": ["1", {}], "\u00e9": ["1", {}], "s": ["1", {}], "r": ["1", {}], "t": ["1", {}], "n": ["1", {}]}], "\u00f4": ["0", {"p": ["0", {}], "t": ["0", {}], "l": ["1", {}]}], "H": ["0", {}], "y": ["0", {}], "\u00d4": ["0", {}]}]
diff --git a/haspirater.py b/haspirater.py
@@ -0,0 +1,40 @@
+#!/usr/bin/python3
+
+"""Determine if a word starts by an aspirated 'h' or not, by a lookup in
+a precompiled trie"""
+
+import os
+import json
+import sys
+
+f = open(os.path.join(os.path.dirname(
+  os.path.realpath(__file__)), 'haspirater.json'))
+trie = json.load(f)
+f.close()
+
+def do_lookup(trie, key):
+  if len(key) == 0 or (key[0] not in trie[1].keys()):
+    return trie[0]
+  return do_lookup(trie[1][key[0]], key[1:])
+
+def lookup(key):
+  """Return True iff key starts with an aspirated 'h'"""
+  if key == '' or key[0] != 'h':
+    raise ValueError
+  return do_lookup(trie, key[1:] + ' ') == '1'
+
+if __name__ == '__main__':
+  while True:
+    line = sys.stdin.readline()
+    if not line:
+      break
+    line = line.lower().lstrip().rstrip()
+    try:
+      result = lookup(line)
+      if result:
+        print("%s: aspirated" % line)
+      else:
+        print("%s: not aspirated" % line)
+    except ValueError:
+      print("%s: no leading 'h'" % line)
+
diff --git a/majoritytrie.py b/majoritytrie.py
@@ -0,0 +1,24 @@
+#!/usr/bin/env python3
+
+"""Read json trie in stdin, keep majority value at each node and output
+trie to stdout"""
+
+import json
+import sys
+
+trie = json.load(sys.stdin)
+
+def get_majority(d):
+  """What is the most probable value?"""
+  return max(d, key=d.get)
+
+def majority(trie):
+  """Keep only the most probable value at each node"""
+  trie[0] = get_majority(trie[0])
+  for child in trie[1].values():
+    majority(child)
+
+majority(trie)
+
+print(json.dumps(trie))
+
diff --git a/make.sh b/make.sh
@@ -0,0 +1,13 @@
+#!/bin/bash
+
+# From a French text input and an exceptions dictionnary, prepare the
+# trie.
+
+./prepare.sh | # reformat the text
+  ./detect.pl | # identify and label occurrences
+  cat - $* | # add in exceptions
+  sed 's/ h/ /g' | # we don't keep the useless leading 'h' in the trie
+  ./buildtrie.py  | # prepare the trie
+  /compresstrie.py | # compress the trie
+  ./majoritytrie.py # keep only the most frequent information
+
diff --git a/prepare.sh b/prepare.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+
+# Prepare a text for piping into detect.pl
+
+tr ' ' '\n' | tr -dc "a-zA-ZÀ-Ÿà-ÿ \n'-"  | sed "s/'/'\n/"
+
diff --git a/trie2dot.py b/trie2dot.py
@@ -0,0 +1,60 @@
+#!/usr/bin/env python3
+
+"""Read json trie in stdin, trim unneeded branches and output json dump
+to stdout"""
+
+import json
+import sys
+from math import log
+
+trie = json.load(sys.stdin)
+
+free_id = 0
+
+def cget(d, k):
+  if k in d.keys():
+    return d[k]
+  else:
+    return 0
+
+def int2strbyte(i):
+  s = hex(i).split('x')[1]
+  if len(s) == 1:
+    return '0' + s
+  else:
+    return s
+
+def fraction2rgb(fraction):
+  n = int(255*fraction)
+  return int2strbyte(n)+'00'+int2strbyte(255 - n)
+
+def total(x):
+  key, node = x
+  return sum(node[0].values())
+
+def to_dot(trie, prefix=''):
+  global free_id
+
+  values, children = trie
+  my_id = free_id
+  free_id += 1
+  count = cget(values, "0") + cget(values, "1")
+  fraction = cget(values, "1") / count
+
+  # TODO illustrate count
+  print("%d [label=\"%s\",color=\"#%s\",penwidth=%d]" % (my_id, prefix,
+    fraction2rgb(fraction), 1+int(log(count))))
+
+  for (key, child) in sorted(children.items(), key=total, reverse=True):
+    i = to_dot(child, prefix+key)
+    print("%d -> %d [label=\"%s\",penwidth=%d]" % (my_id, i,
+      key, 1+int(log(total((None, child))))))
+
+  return my_id
+
+# TODO aspect causes graphviz crash?
+# TODO check if nodes don't get removed with aspect, it seems too good
+# to be true
+print("digraph G {\naspect=\"3\"\n")
+to_dot(trie, 'h')
+print("}")
diff --git a/wikipedia b/wikipedia
@@ -0,0 +1,593 @@
+1 habanera
+1 hâbler
+1 hâblerie
+1 hâbleur
+1 hache
+1 hacheécorce
+1 hacheécorces
+1 hachefourrage
+1 hachelégumes
+1 hachemaïs
+1 hachepaille
+1 hacher
+1 hachereau
+1 hachesarment
+1 hachesarments
+1 hachette
+1 hacheviande
+1 hachage
+1 hacheur
+1 hachis
+1 hachich
+1 hachisch
+1 hachoir
+1 hachure
+1 hack
+1 hackeur
+1 hacquebute
+1 hacquebutier
+1 hadal
+1 haddock
+1 hadîth
+1 hadj
+1 hadji
+1 hadopi
+1 haguais
+1 haguais
+1 hague
+1 hagard
+1 ha
+1 haha
+1 hahé
+1 haie
+1 haïe
+1 haïr
+1 haïk
+1 haillon
+1 haillonneux
+1 haine
+1 haineux
+1 haineusement
+1 haïr
+1 haïssable
+1 halage
+1 halbran
+1 halde
+1 hâle
+1 halecret
+1 haler
+1 hâler
+1 haleter
+1 halètement
+1 hall
+1 halle
+1 hallebarde
+1 hallebardier
+1 hallier
+1 hallstatien
+1 halo
+1 haloir
+1 halophile
+1 halot
+1 halte
+1 hamac
+1 hamada
+1 hamal
+1 hambourg
+1 hamburger
+1 hameau
+1 hammal
+1 hammam
+1 hammerfest
+1 hammerless
+1 hampe
+1 hamster
+1 han
+1 hanap
+1 hanche
+1 hanchement
+1 hancher
+1 hand
+1 handball
+1 handballeur
+1 handicap
+1 handicaper
+1 hangar
+1 hanneton
+1 hannetonner
+1 hanse
+1 hanséatique
+1 hanter
+1 hantise
+1 happe
+1 happelourde
+1 happer
+1 happening
+1 happement
+1 happyend
+1 haquebute
+1 haquebutier
+1 haquenée
+1 haquet
+1 harakiri
+1 harangue
+1 haranguer
+1 harangueur
+1 haras
+1 harassant
+1 harasser
+1 harassement
+1 harceler
+1 harcèlement
+1 harceleur
+1 hachich
+1 harald
+1 harde
+1 harder
+1 hardes
+1 hardi
+1 hardiesse
+1 hardiment
+1 hardware
+1 harem
+1 hareng
+1 harengère
+1 haret
+1 harfang
+1 hargne
+1 hargneux
+1 hargneusement
+1 haricot
+1 haricoter
+1 haridelle
+1 harissa
+1 harka
+1 harki
+1 harle
+1 harlou
+1 harnacher
+1 harnacheur
+1 harnachement
+1 harnais
+1 harnois
+1 harold
+1 haro
+1 harpailler
+1 harpe
+1 harper
+1 harpie
+1 harpiste
+1 harpon
+1 harponner
+1 harponneur
+1 harponnage
+1 harry
+1 hart
+1 harvard
+1 hasard
+1 hasarder
+1 hasardeux
+1 hasardeusement
+1 hasbeen
+1 haschich
+1 hase
+1 hast
+1 hastaire
+1 haste
+1 hastings
+1 hâte
+1 hâtelet
+1 hâtelette
+1 hâter
+1 hâtier
+1 hâtif
+1 hâtiveau
+1 hâtivement
+1 hauban
+1 haubanner
+1 haubanneur
+1 haubergeon
+1 haubert
+1 hausse
+1 haussecol
+1 haussement
+1 haussepied
+1 hausser
+1 hausseur
+1 hausseusement
+1 haussier
+1 haussier
+1 haut
+1 hautain
+1 hautain
+1 hautbois
+1 hautdechausses
+1 hautdeforme
+1 hautecontre
+1 hauteforme
+1 hautement
+1 hautesse
+1 hauteur
+1 hautescontre
+1 hautesformes
+1 hautfond
+1 hautin
+1 hautlecœur
+1 hautlecorps
+1 hautlepied
+1 hautparleur
+1 hautparleurs
+1 hautrelief
+1 hautsdechausses
+1 hautsdeforme
+1 hautsfonds
+1 hautsreliefs
+1 hauturier
+1 havage
+1 havanais
+1 havanais
+1 havane
+1 havane
+1 hâve
+1 haveneau
+1 havenet
+1 haver
+1 haveur
+1 havir
+1 havrais
+1 havrais
+1 havre
+1 havre
+1 havresac
+1 havresacs
+1 hayon
+1 heaume
+1 heaumier
+1 heimatlos
+1 hein
+1 héler
+1 héleur
+1 hèlement
+1 hello
+1 hem
+1 hemloc
+1 henné
+1 hennir
+1 hennissant
+1 hennissement
+1 hennisseur
+1 henri
+1 henry
+1 henry
+1 hep
+1 héraut
+1 herchage
+1 hercher
+1 hercheur
+1 hère
+1 hérissement
+1 hérisser
+1 hérisseur
+1 hérisson
+1 hérissonner
+1 hermitique
+1 herniaire
+1 hernie
+1 hernieux
+1 héron
+1 héronnier
+1 héros
+1 herschage
+1 herscher
+1 herscheur
+1 herse
+1 herser
+1 hertz
+1 hertzien
+1 hesse
+1 hêtraie
+1 hêtre
+1 heu/heux
+1 heulandite
+1 heurt
+1 heurtement
+1 heurtequin
+1 heurter
+1 heurteur
+1 heurtoir
+0 hélas
+1 hi
+1 hiatal
+1 hibou
+1 hic
+1 hic
+1 hickory
+1 hideur
+1 hideusement
+1 hideux
+1 hie
+1 hiement
+1 hier
+1 hiéracocéphale
+1 hiérarchie
+1 hiérarchique
+1 hiérarchiquement
+1 hiérarchiser
+1 hiérarchisation
+1 hiératique
+1 hiératiquement
+1 hiératisant
+1 hiératisé
+1 hiératisme
+1 hiérochromie
+1 hiérocrate
+1 hiérocratisme
+1 hiérodrame
+1 hiérogamie
+1 hiérogamique
+1 hiéroglyphe
+1 hiéroglyphé
+1 hiéroglyphie
+1 hiéroglyphié
+1 hiéroglyphique
+1 hiéroglyphiquement
+1 hiéroglyphisme
+1 hiéroglyphite
+1 hiérogramme
+1 hiérogrammate
+1 hiérogrammatisme
+1 hiérographe
+1 hiéromancie
+1 hiéromoine
+1 hiérophanie
+1 hiéroscopie
+1 hiéroscopique
+1 hifi
+1 highlandais
+1 highlander
+1 highlands
+1 highlife
+1 highlifer
+1 highlifeur
+1 hihan
+1 hilaire
+1 hile
+1 hiloire
+1 hilbert
+1 hildegarde
+1 hindi
+1 hip
+1 hiphop
+1 hippie
+1 hiragana
+1 hiroshima
+1 hissage
+1 hisser
+1 hissement
+1 hisseur
+1 hit
+1 hitparade
+1 hitparades
+1 hittite
+1 hittite
+1 ho
+1 hobart
+1 hobby
+1 hobereau
+1 hobereautaille
+1 hoberelle
+1 hoc
+1 hoca
+1 hocco
+1 hoche
+1 hochement
+1 hochepot
+1 hochequeue
+1 hochequeues
+1 hocher
+1 hochet
+1 hockey
+1 hockeyeur
+1 hocko
+1 hodja
+1 hoffmannesque
+1 hoffmannien
+1 hognement
+1 hogner
+1 holà
+1 holding
+1 hôler
+1 holdup
+1 hollandais
+1 hollandais
+1 hollandaisement
+1 hollande
+1 hollande
+1 hollandé
+1 hollandiser
+1 hollandobelge
+1 hollandofrançais
+1 hollandonorvégien
+1 hollandosaxon
+1 hollywood
+1 hollywoodesque
+1 hollywoodien
+1 homard
+1 homarderie
+1 homardier
+1 home
+1 homecinema
+1 homespun
+1 hon
+1 honduras
+1 hondurien
+1 hondurien
+1 hongkong
+1 hongkongais
+1 hongre
+1 hongreline
+1 hongrer
+1 hongreur
+1 hongrie
+1 hongrois
+1 hongrois
+1 hongroyage
+1 hongroyer
+1 hongroyeur
+1 honnir
+1 honnissement
+1 honshu
+1 honte
+1 honteux
+1 honteusement
+1 hooligan
+1 hop
+1 hoquet
+1 hoqueter
+1 hoquètement
+1 hoqueton
+1 horde
+1 horion
+1 hormis
+1 hornblende
+1 hors
+1 horsain
+1 horsbord
+1 horsbords
+1 horscaste
+1 horscastes
+1 horsd’œuvre
+1 horseguard
+1 horseguards
+1 horsepox
+1 horsjeu
+1 horslaloi
+1 horssérie
+1 horst
+1 horstexte
+1 hosanna
+1 hosannière
+1 hotdog
+1 hotdogs
+1 hotte
+1 hottée
+1 hotter
+1 hottentot
+1 hotteur
+1 hou
+1 houp
+1 houblon
+1 houblonner
+1 houblonneur
+1 houblonnière
+1 houdan
+1 houdan
+1 houe
+1 houhou
+1 houille
+1 houiller
+1 houillère
+1 houilleux
+1 houka
+1 houle
+1 houler
+1 houlette
+1 houleux
+1 houleusement
+1 houlier
+1 houlque
+1 hoummous
+1 houp
+1 houppe
+1 houppelande
+1 houppette
+1 houppier
+1 houque
+1 houraillis
+1 hourd
+1 hourdage
+1 hourder
+1 hourdi
+1 hourdis
+1 houret
+1 houri
+1 hourque
+1 hourra
+1 hourvari
+1 houseau
+1 houspiller
+1 houspilleur
+1 houspillement
+1 houssage
+1 houssaie
+1 housse
+1 housser
+1 houssine
+1 houssoir
+1 houston
+1 houx
+1 hoyau
+1 huard
+1 hublot
+1 huche
+1 huchée
+1 hucher
+1 huchet
+1 huchier
+1 hue
+1 huée
+1 huer
+1 huerta
+1 huehau
+1 hugo
+1 hugolâtre
+1 hugolâtrie
+1 hugolien
+1 hugotique
+1 hugotisme
+1 hugues
+1 huguenot
+1 huit
+1 huitain
+1 huitaine
+1 huitante
+1 huitième
+1 hulotte
+1 hululation
+1 hululer
+1 hum
+1 humage
+1 humement
+1 humer
+1 humeux
+1 huns
+1 humoter
+1 hune
+1 hunier
+1 hunter
+1 huppe
+1 huppé
+1 huque
+1 hure
+1 hurlade
+1 hurlée
+1 hurlement
+1 hurler
+1 hurleur
+1 huroiroquois
+1 huroiroquois
+1 huron
+1 huron
+1 huronien
+1 huronien
+1 hurricane
+1 husky
+1 hussard
+1 hussite
+1 hussitisme
+1 hutin
+1 hutinet
+1 hutte
+1 hutteau
+1 hutter
+1 huttier

	frhyme guess the last phonemes of a French word
	git clone https://a3nm.net/git/frhyme/
	Log \| Files \| Refs \| README

README	\|	105	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
additions	\|	29	+++++++++++++++++++++++++++++
buildtrie.py	\|	42	++++++++++++++++++++++++++++++++++++++++++
compresstrie.py	\|	22	++++++++++++++++++++++
detect.pl	\|	22	++++++++++++++++++++++
haspirater.json	\|	1	+
haspirater.py	\|	40	++++++++++++++++++++++++++++++++++++++++
majoritytrie.py	\|	24	++++++++++++++++++++++++
make.sh	\|	13	+++++++++++++
prepare.sh	\|	6	++++++
trie2dot.py	\|	60	++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
wikipedia	\|	593	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++