commit ebe63afe36c813a2cba408a37a3a8d3570b865c3
parent 6e9af935a279923df039026980c92b979e26947a
Author: Antoine Amarilli <a3nm@a3nm.net>
Date: Fri, 16 Aug 2019 00:30:32 +0200
go over README
Diffstat:
README | | | 93 | +++++++++++++++++++++++++++++++++++++++++++------------------------------------ |
1 file changed, 51 insertions(+), 42 deletions(-)
diff --git a/README b/README
@@ -1,27 +1,16 @@
frhyme -- a toolkit to guess the last phonemes of a French word
-Copyright (C) 2011-2019 by Antoine Amarilli
Repository URL: https://gitlab.com/a3nm/frhyme
+Python package name: frhyme
-== 0. Licence ==
+== 0. Author and license ==
-Permission is hereby granted, free of charge, to any person obtaining a
-copy of this software and associated documentation files (the
-"Software"), to deal in the Software without restriction, including
-without limitation the rights to use, copy, modify, merge, publish,
-distribute, sublicense, and/or sell copies of the Software, and to
-permit persons to whom the Software is furnished to do so, subject to
-the following conditions:
+frhyme is copyright (C) 2011-2019 by Antoine Amarilli
-The above copyright notice and this permission notice shall be included
-in all copies or substantial portions of the Software.
+frhyme is free software, distributed under an MIT license: see the
+file LICENSE for details of the licensing terms.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
-OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
-MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
-IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
-CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
-TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
-SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+Many thanks to Julien Romero who maintains the PyPI package for
+frhyme.
== 1. Features ==
@@ -30,40 +19,60 @@ It is trained on a list of words with associated pronunciation, and will
infer a few likely possibilities for unseen words using known words with
the longest common prefix, using a trie for internal representation.
-== 2. Usage ==
+== 2. Installation ==
-To avoid licensing headaches, and because the data file is quite big, no
-pronunciation data is included, you have to generate it yourself. See section 3.
+You need a working Python3 environment to run frhyme.
-Once you have pronunciation data ready in frhyme.json, you can either run
-frhyme.py [NBEST], giving one word per line in stdin and getting the NBEST top
-pronunciations on stdout (default is 5), or you can import it in a Python file
-and call frhyme.lookup(word, NBEST) which returns the NBEST top pronunciations
-(default is 5).
+You can install frhyme directly with pip by doing:
-The pronunciations returned are annotated with a confidence score (the number of
-occurrences in the training data). They should be sensible up to the longest
-prefix occurring in the training data, but may be prefixed by garbage.
+ pip3 install frhyme
-== 3. Training ==
+You can also manually clone the project repository and use frhyme
+directly from there, but you must then follow the instructions in
+Section 4 below to prepare the file frhyme.json for frhyme. (By
+contrast, if you install frhyme using pip, a file frhyme.json is
+provided, which has been trained using the Lexique database:
+http://www.lexique.org/.)
-First, make sure that you have a working python3 installation and that you have
-unzip (Debian packages: python3, unzip).
+== 3. Usage ==
-The data used by frhyme.py is loaded at runtime from the fryme.json file which
-should be trained from a pronunciation database. The recommended way to do so is
-to use a tweaked Lexique <http://lexique.org> along with a provided bugfix file,
-as follows:
+You can either run
+
+ frhyme.py [NBEST]
+
+giving one word per line in stdin and getting the NBEST top
+pronunciations on stdout (default is 5), or you can import frhyme in a
+Python program and call frhyme.lookup(word, NBEST) which returns the
+NBEST top pronunciations (default is 5).
+
+The pronunciations returned are annotated with a confidence score (the
+number of occurrences in the training data). They should be sensible up
+to the longest prefix of the input word that occurs in the training
+data, but they may be prefixed by garbage.
+
+== 4. Training ==
+
+If you have cloned this repository, you need to prepare the file
+frhyme.json.
+
+First, make sure that you have a working python3 installation and that
+you have unzip (Debian packages: python3, unzip).
+
+The data used by frhyme.py is loaded at runtime from the frhyme.json
+file which should be trained from a pronunciation database. The
+recommended way to do so is to use Lexique <http://lexique.org> with
+some tweaks and some modifications according to provided files. The way
+to do this is as follows:
cd scripts
lexique/lexique_retrieve.sh > lexique.txt
./make.sh NPHON lexique.txt additions > ../frhyme/frhyme.json
cd ..
-where NPHON is the number of trailing phonemes to keep (suggested value: 4).
-Beware, this may take up several hundred megabytes of RAM. The resulting file
-should be accurate on the French words of Lexique, and will return
-pronunciations in a variant of X-SAMPA which ensures that each phoneme is mapped
-to exactly one ASCII character: the substitutions are "A~" => "#", "O~" => "$",
-"E~" => ")", "9~" => "(".
+where NPHON is the number of trailing phonemes to keep (suggested value:
+4). Beware, this may take up several hundred megabytes of RAM. The
+resulting file should be accurate on the French words of Lexique, and
+will return pronunciations in a variant of X-SAMPA which ensures that
+each phoneme is mapped to exactly one ASCII character: the substitutions
+are "A~" => "#", "O~" => "$", "E~" => ")", "9~" => "(".