frhyme

guess the last phonemes of a French word (local mirror of https://gitlab.com/a3nm/frhyme)
git clone https://a3nm.net/git/frhyme/
Log | Files | Refs | README | LICENSE

README (3012B)


      1 frhyme -- a toolkit to guess the last phonemes of a French word
      2 Repository URL: https://gitlab.com/a3nm/frhyme
      3 Python package name: frhyme
      4 
      5 == 0. Author and license ==
      6 
      7 frhyme is copyright (C) 2011-2019 by Antoine Amarilli
      8 
      9 frhyme is free software, distributed under an MIT license: see the
     10 file LICENSE for details of the licensing terms that apply to frhyme.
     11 
     12 Many thanks to Julien Romero who maintains the PyPI package for
     13 frhyme.
     14 
     15 The file "frhyme.json" in the directory "frhyme" is a derivative work of
     16 the French lexical database Lexique <http://www.lexique.org/>, version
     17 3.83, by Boris New <http://psycho-usmb.fr/boris.new/> and Christophe
     18 Pallier <http://www.pallier.org/>. Hence, this file is under the same
     19 license as Lexique, namely, the license CC BY SA 4.0 (according to the
     20 file README-Lexique.txt in the downloadable archive of Lexique). The
     21 license in LICENSE does *not* apply to this file "frhyme/frhyme.json".
     22 
     23 == 1. Features ==
     24 
     25 frhyme is a tool to guess what the last phonemes of a French word are.
     26 It is trained on a list of words with associated pronunciation, and will
     27 infer a few likely possibilities for unseen words using known words with
     28 the longest common prefix, using a trie for internal representation.
     29 
     30 == 2. Installation ==
     31 
     32 You need a working Python3 environment to run frhyme.
     33 
     34 You can install frhyme directly with pip by doing:
     35 
     36   pip3 install frhyme
     37 
     38 You can also manually clone the project repository and use frhyme
     39 directly from there.
     40 
     41 == 3. Usage ==
     42 
     43 You can either run
     44 
     45   ./frhyme/frhyme.py [NBEST]
     46 
     47 giving one word per line in stdin and getting the NBEST top
     48 pronunciations on stdout (default is 5), or you can import frhyme in a
     49 Python program and call frhyme.lookup(word, NBEST) which returns the
     50 NBEST top pronunciations (default is 5).
     51 
     52 The pronunciations returned are annotated with a confidence score (the
     53 number of occurrences in the training data). They should be sensible up
     54 to the longest prefix of the input word that occurs in the training
     55 data, but they may be prefixed by garbage.
     56 
     57 The pronunciations are given in a variant of X-SAMPA which ensures that
     58 each phoneme is mapped to exactly one ASCII character: the substitutions
     59 are "A~" => "#", "O~" => "$", "E~" => ")", "9~" => "(".
     60 
     61 == 4. Training ==
     62 
     63 This section explains how the file "frhyme.json" can be prepared. You do
     64 not need to do this to use frhyme, but it can be useful if you want to
     65 create a pronunciation database from a different source.
     66 
     67 The provided "fryhme.json" file was trained on a custom variant of the
     68 database Lexique <http://www.lexique.org/>, with some additions. You can
     69 regenerate it as follows:
     70 
     71   git clone 'https://a3nm.net/git/lexique'
     72   cd scripts
     73   ./make.sh 4 <(cut -f 1,2 ../lexique/lexique_my_format | uniq) additions > ../frhyme/frhyme.json
     74 
     75 The value "4" indicates the number of trailing phonemes to keep, and can
     76 be changed. Beware, this process can take up several hundred megabytes
     77 of RAM. The resulting file should be accurate on the French words of
     78 Lexique.
     79