README (3012B)
1 frhyme -- a toolkit to guess the last phonemes of a French word 2 Repository URL: https://gitlab.com/a3nm/frhyme 3 Python package name: frhyme 4 5 == 0. Author and license == 6 7 frhyme is copyright (C) 2011-2019 by Antoine Amarilli 8 9 frhyme is free software, distributed under an MIT license: see the 10 file LICENSE for details of the licensing terms that apply to frhyme. 11 12 Many thanks to Julien Romero who maintains the PyPI package for 13 frhyme. 14 15 The file "frhyme.json" in the directory "frhyme" is a derivative work of 16 the French lexical database Lexique <http://www.lexique.org/>, version 17 3.83, by Boris New <http://psycho-usmb.fr/boris.new/> and Christophe 18 Pallier <http://www.pallier.org/>. Hence, this file is under the same 19 license as Lexique, namely, the license CC BY SA 4.0 (according to the 20 file README-Lexique.txt in the downloadable archive of Lexique). The 21 license in LICENSE does *not* apply to this file "frhyme/frhyme.json". 22 23 == 1. Features == 24 25 frhyme is a tool to guess what the last phonemes of a French word are. 26 It is trained on a list of words with associated pronunciation, and will 27 infer a few likely possibilities for unseen words using known words with 28 the longest common prefix, using a trie for internal representation. 29 30 == 2. Installation == 31 32 You need a working Python3 environment to run frhyme. 33 34 You can install frhyme directly with pip by doing: 35 36 pip3 install frhyme 37 38 You can also manually clone the project repository and use frhyme 39 directly from there. 40 41 == 3. Usage == 42 43 You can either run 44 45 ./frhyme/frhyme.py [NBEST] 46 47 giving one word per line in stdin and getting the NBEST top 48 pronunciations on stdout (default is 5), or you can import frhyme in a 49 Python program and call frhyme.lookup(word, NBEST) which returns the 50 NBEST top pronunciations (default is 5). 51 52 The pronunciations returned are annotated with a confidence score (the 53 number of occurrences in the training data). They should be sensible up 54 to the longest prefix of the input word that occurs in the training 55 data, but they may be prefixed by garbage. 56 57 The pronunciations are given in a variant of X-SAMPA which ensures that 58 each phoneme is mapped to exactly one ASCII character: the substitutions 59 are "A~" => "#", "O~" => "$", "E~" => ")", "9~" => "(". 60 61 == 4. Training == 62 63 This section explains how the file "frhyme.json" can be prepared. You do 64 not need to do this to use frhyme, but it can be useful if you want to 65 create a pronunciation database from a different source. 66 67 The provided "fryhme.json" file was trained on a custom variant of the 68 database Lexique <http://www.lexique.org/>, with some additions. You can 69 regenerate it as follows: 70 71 git clone 'https://a3nm.net/git/lexique' 72 cd scripts 73 ./make.sh 4 <(cut -f 1,2 ../lexique/lexique_my_format | uniq) additions > ../frhyme/frhyme.json 74 75 The value "4" indicates the number of trailing phonemes to keep, and can 76 be changed. Beware, this process can take up several hundred megabytes 77 of RAM. The resulting file should be accurate on the French words of 78 Lexique. 79