TODO (4594B)
1 == Ongoing == 2 3 - add "scientifique" "scientifiques" to additions, and understand why it breaks 4 the test suite 5 - pretty-print the json in the diaeresis.json file to make debugging clearer 6 with git diff 7 8 - fix pytest-3 plint to make it work 9 - migrate the readme to markdown 10 - turn should_be_accepted into a test 11 - expand the corpus of classical poetry: more Racine, more other authors 12 (Boileau, Corneille, Prudhomme, etc.) 13 14 - fix problems in the new works 15 - Train diaresesis.json on new works 16 - check that diaeresis:permissive is indeed more permissive 17 - check for duplicates in additions.txt 18 - check again xmllitre 19 20 - download in bulk all possible sources to train on them in an error-tolerant 21 way, and to be able to check easily the usage of a given word 22 23 == Ideas == 24 25 - Use the latest lexique (with our corrections) to generate a file of known 26 words with their length, and when we have exactly one of these words ensure 27 that we do not allow less syllables than indicated (but it can be more, 28 because of diérèse) 29 - Ensure that, on words known in Lexique, frhyme returns exactly the known 30 pronunciation(s); so we can use it confidently, e.g. to predict elision of the 31 ending for rhyme genre and number of syllables 32 - remove kludge for invalid characters, split them in specific chunks 33 - Improve performance with profiling 34 - Only indicate hemistiche status when there is a problem with hemistiches 35 - Clean up the code to the extent possible 36 - Look at dicollecte 37 <http://grammalecte.net/download/fr/lexique-dicollecte-fr-v6.4.1.zip>, which 38 also features pronunciation, and see how it differs from Lexique 39 40 == Low priority == 41 42 === Error reporting === 43 44 - When reporting hemistiche errors, highlight possible hemistiche positions 45 where an hemistiche could have been placed 46 47 === Diérèses/synérèses == 48 49 - When training, take into consideration the contexts where we haven't been able 50 to infer the number of syllables, and only learn at each step from the 51 contexts where we are the most certain (including the unknown occurrences), 52 instead of having a hardcoded default threshold 53 - Formally evaluate the performance of the approach without additions 54 55 === Other approaches === 56 57 - Learn rhyme and gender agnostically by clustering: prepare an undirected graph 58 of rhyming verses, factor out suffixes, do SCC, prepare a trie 59 60 === Misc === 61 62 - Fuzz testing: try giving random input to plint and check that it behaves 63 - Better exception logging for the Web frontend 64 65 === Problems === 66 67 - Loanwords from English ("crumble", "single", etc.) shouldn't be elidable 68 - Loadwords from Italian ("ad patres") shouldn't be elidable 69 70 == Other possible sources == 71 72 The following could be easily integrated, either from 73 https://dramacode.github.io/ or from the indicated URL: 74 75 corneille_surena https://fr.wikisource.org/wiki/Sur%C3%A9na 76 corneille_pulcherie https://fr.wikisource.org/wiki/Pulch%C3%A9rie 77 corneille_tite_et_berenice https://fr.wikisource.org/wiki/Tite_et_B%C3%A9r%C3%A9nice 78 corneille_attila https://fr.wikisource.org/wiki/Attila 79 corneille_othon https://fr.wikisource.org/wiki/Othon/Texte_entier 80 corneille_sophonisbe https://fr.wikisource.org/wiki/Sophonisbe_(Corneille) 81 corneille_sertorius https://fr.wikisource.org/wiki/Sertorius 82 corneille_toison_dor https://fr.wikisource.org/wiki/La_Toison_d%E2%80%99or_(Corneille) 83 corneille_nicomede https://fr.wikisource.org/wiki/Nicom%C3%A8de 84 corneille_don_sanche_daragon https://fr.wikisource.org/wiki/Don_Sanche_d%E2%80%99Aragon 85 corneille_heraclius https://fr.wikisource.org/wiki/H%C3%A9raclius_empereur_d%E2%80%99Orient 86 corneille_theodore https://fr.wikisource.org/wiki/Th%C3%A9odore_vierge_et_martyre 87 corneille_rodogune https://fr.wikisource.org/wiki/Th%C3%A9odore_vierge_et_martyre 88 corneille_menteur_suite https://fr.wikisource.org/wiki/La_Suite_du_Menteur 89 corneille_menteur https://fr.wikisource.org/wiki/Le_Menteur 90 corneille_pompee https://fr.wikisource.org/wiki/Pomp%C3%A9e 91 corneille_polyeucte https://fr.wikisource.org/wiki/Polyeucte/%C3%89dition_Masson,_1887 92 corneille_cinna https://fr.wikisource.org/wiki/Cinna_ou_la_Cl%C3%A9mence_d%E2%80%99Auguste 93 corneille_horace https://fr.wikisource.org/wiki/Horace_(Corneille) 94 corneille_cid https://fr.wikisource.org/wiki/Le_Cid 95 corneille_comedie_des_tuileries https://fr.wikisource.org/wiki/La_Com%C3%A9die_des_Tuileries 96 97 Other ideas (trickier): 98 99 - https://fr.wikisource.org/wiki/Imitation_de_J%C3%A9sus-Christ/Texte_entier 100 - https://fr.wikisource.org/wiki/Po%C3%A9sies_diverses_(Corneille) 101 - corneille_andromede https://fr.wikisource.org/wiki/Androm%C3%A8de (but much free verse) 102