TODO (4647B)
1 == Ongoing == 2 3 - check "Blablabla subjuguent, blablabla blablabla" 4 5 - add "scientifique" "scientifiques" to additions, and understand why it breaks 6 the test suite 7 - pretty-print the json in the diaeresis.json file to make debugging clearer 8 with git diff 9 10 - fix pytest-3 plint to make it work 11 - migrate the readme to markdown 12 - turn should_be_accepted into a test 13 - expand the corpus of classical poetry: more Racine, more other authors 14 (Boileau, Corneille, Prudhomme, etc.) 15 16 - fix problems in the new works 17 - Train diaresesis.json on new works 18 - check that diaeresis:permissive is indeed more permissive 19 - check for duplicates in additions.txt 20 - check again xmllitre 21 22 - download in bulk all possible sources to train on them in an error-tolerant 23 way, and to be able to check easily the usage of a given word 24 25 == Ideas == 26 27 - Use the latest lexique (with our corrections) to generate a file of known 28 words with their length, and when we have exactly one of these words ensure 29 that we do not allow less syllables than indicated (but it can be more, 30 because of diérèse) 31 - Ensure that, on words known in Lexique, frhyme returns exactly the known 32 pronunciation(s); so we can use it confidently, e.g. to predict elision of the 33 ending for rhyme genre and number of syllables 34 - remove kludge for invalid characters, split them in specific chunks 35 - Improve performance with profiling 36 - Only indicate hemistiche status when there is a problem with hemistiches 37 - Clean up the code to the extent possible 38 - Look at dicollecte 39 <http://grammalecte.net/download/fr/lexique-dicollecte-fr-v6.4.1.zip>, which 40 also features pronunciation, and see how it differs from Lexique 41 42 == Low priority == 43 44 === Error reporting === 45 46 - When reporting hemistiche errors, highlight possible hemistiche positions 47 where an hemistiche could have been placed 48 49 === Diérèses/synérèses == 50 51 - When training, take into consideration the contexts where we haven't been able 52 to infer the number of syllables, and only learn at each step from the 53 contexts where we are the most certain (including the unknown occurrences), 54 instead of having a hardcoded default threshold 55 - Formally evaluate the performance of the approach without additions 56 57 === Other approaches === 58 59 - Learn rhyme and gender agnostically by clustering: prepare an undirected graph 60 of rhyming verses, factor out suffixes, do SCC, prepare a trie 61 62 === Misc === 63 64 - Fuzz testing: try giving random input to plint and check that it behaves 65 - Better exception logging for the Web frontend 66 67 === Problems === 68 69 - Loanwords from English ("crumble", "single", etc.) shouldn't be elidable 70 - Loadwords from Italian ("ad patres") shouldn't be elidable 71 72 == Other possible sources == 73 74 The following could be easily integrated, either from 75 https://dramacode.github.io/ or from the indicated URL: 76 77 corneille_surena https://fr.wikisource.org/wiki/Sur%C3%A9na 78 corneille_pulcherie https://fr.wikisource.org/wiki/Pulch%C3%A9rie 79 corneille_tite_et_berenice https://fr.wikisource.org/wiki/Tite_et_B%C3%A9r%C3%A9nice 80 corneille_attila https://fr.wikisource.org/wiki/Attila 81 corneille_othon https://fr.wikisource.org/wiki/Othon/Texte_entier 82 corneille_sophonisbe https://fr.wikisource.org/wiki/Sophonisbe_(Corneille) 83 corneille_sertorius https://fr.wikisource.org/wiki/Sertorius 84 corneille_toison_dor https://fr.wikisource.org/wiki/La_Toison_d%E2%80%99or_(Corneille) 85 corneille_nicomede https://fr.wikisource.org/wiki/Nicom%C3%A8de 86 corneille_don_sanche_daragon https://fr.wikisource.org/wiki/Don_Sanche_d%E2%80%99Aragon 87 corneille_heraclius https://fr.wikisource.org/wiki/H%C3%A9raclius_empereur_d%E2%80%99Orient 88 corneille_theodore https://fr.wikisource.org/wiki/Th%C3%A9odore_vierge_et_martyre 89 corneille_rodogune https://fr.wikisource.org/wiki/Th%C3%A9odore_vierge_et_martyre 90 corneille_menteur_suite https://fr.wikisource.org/wiki/La_Suite_du_Menteur 91 corneille_menteur https://fr.wikisource.org/wiki/Le_Menteur 92 corneille_pompee https://fr.wikisource.org/wiki/Pomp%C3%A9e 93 corneille_polyeucte https://fr.wikisource.org/wiki/Polyeucte/%C3%89dition_Masson,_1887 94 corneille_cinna https://fr.wikisource.org/wiki/Cinna_ou_la_Cl%C3%A9mence_d%E2%80%99Auguste 95 corneille_horace https://fr.wikisource.org/wiki/Horace_(Corneille) 96 corneille_cid https://fr.wikisource.org/wiki/Le_Cid 97 corneille_comedie_des_tuileries https://fr.wikisource.org/wiki/La_Com%C3%A9die_des_Tuileries 98 99 Other ideas (trickier): 100 101 - https://fr.wikisource.org/wiki/Imitation_de_J%C3%A9sus-Christ/Texte_entier 102 - https://fr.wikisource.org/wiki/Po%C3%A9sies_diverses_(Corneille) 103 - corneille_andromede https://fr.wikisource.org/wiki/Androm%C3%A8de (but much free verse) 104