plint

French poetry validator (local mirror of https://gitlab.com/a3nm/plint)
git clone https://a3nm.net/git/plint/
Log | Files | Refs | README

TODO (4647B)


      1 == Ongoing ==
      2 
      3 - check "Blablabla subjuguent, blablabla blablabla"
      4 
      5 - add "scientifique" "scientifiques" to additions, and understand why it breaks
      6   the test suite
      7 - pretty-print the json in the diaeresis.json file to make debugging clearer
      8   with git diff
      9 
     10 - fix pytest-3 plint to make it work
     11 - migrate the readme to markdown
     12 - turn should_be_accepted into a test
     13 - expand the corpus of classical poetry: more Racine, more other authors
     14   (Boileau, Corneille, Prudhomme, etc.)
     15 
     16 - fix problems in the new works
     17 - Train diaresesis.json on new works
     18 - check that diaeresis:permissive is indeed more permissive
     19 - check for duplicates in additions.txt
     20 - check again xmllitre
     21 
     22 - download in bulk all possible sources to train on them in an error-tolerant
     23   way, and to be able to check easily the usage of a given word
     24 
     25 == Ideas ==
     26 
     27 - Use the latest lexique (with our corrections) to generate a file of known
     28   words with their length, and when we have exactly one of these words ensure
     29   that we do not allow less syllables than indicated (but it can be more,
     30   because of diérèse)
     31 - Ensure that, on words known in Lexique, frhyme returns exactly the known
     32   pronunciation(s); so we can use it confidently, e.g. to predict elision of the
     33   ending for rhyme genre and number of syllables
     34 - remove kludge for invalid characters, split them in specific chunks
     35 - Improve performance with profiling
     36 - Only indicate hemistiche status when there is a problem with hemistiches
     37 - Clean up the code to the extent possible
     38 - Look at dicollecte
     39   <http://grammalecte.net/download/fr/lexique-dicollecte-fr-v6.4.1.zip>, which
     40   also features pronunciation, and see how it differs from Lexique
     41 
     42 == Low priority ==
     43 
     44 === Error reporting ===
     45 
     46 - When reporting hemistiche errors, highlight possible hemistiche positions
     47   where an hemistiche could have been placed
     48 
     49 === Diérèses/synérèses ==
     50 
     51 - When training, take into consideration the contexts where we haven't been able
     52   to infer the number of syllables, and only learn at each step from the
     53   contexts where we are the most certain (including the unknown occurrences),
     54   instead of having a hardcoded default threshold
     55 - Formally evaluate the performance of the approach without additions
     56 
     57 === Other approaches ===
     58 
     59 - Learn rhyme and gender agnostically by clustering: prepare an undirected graph
     60   of rhyming verses, factor out suffixes, do SCC, prepare a trie
     61 
     62 === Misc ===
     63 
     64 - Fuzz testing: try giving random input to plint and check that it behaves
     65 - Better exception logging for the Web frontend
     66 
     67 === Problems ===
     68 
     69 - Loanwords from English ("crumble", "single", etc.) shouldn't be elidable
     70 - Loadwords from Italian ("ad patres") shouldn't be elidable
     71 
     72 == Other possible sources ==
     73 
     74 The following could be easily integrated, either from
     75 https://dramacode.github.io/ or from the indicated URL:
     76 
     77 corneille_surena https://fr.wikisource.org/wiki/Sur%C3%A9na
     78 corneille_pulcherie https://fr.wikisource.org/wiki/Pulch%C3%A9rie
     79 corneille_tite_et_berenice https://fr.wikisource.org/wiki/Tite_et_B%C3%A9r%C3%A9nice
     80 corneille_attila https://fr.wikisource.org/wiki/Attila
     81 corneille_othon https://fr.wikisource.org/wiki/Othon/Texte_entier
     82 corneille_sophonisbe https://fr.wikisource.org/wiki/Sophonisbe_(Corneille)
     83 corneille_sertorius https://fr.wikisource.org/wiki/Sertorius
     84 corneille_toison_dor https://fr.wikisource.org/wiki/La_Toison_d%E2%80%99or_(Corneille)
     85 corneille_nicomede https://fr.wikisource.org/wiki/Nicom%C3%A8de
     86 corneille_don_sanche_daragon https://fr.wikisource.org/wiki/Don_Sanche_d%E2%80%99Aragon
     87 corneille_heraclius https://fr.wikisource.org/wiki/H%C3%A9raclius_empereur_d%E2%80%99Orient
     88 corneille_theodore https://fr.wikisource.org/wiki/Th%C3%A9odore_vierge_et_martyre
     89 corneille_rodogune https://fr.wikisource.org/wiki/Th%C3%A9odore_vierge_et_martyre
     90 corneille_menteur_suite https://fr.wikisource.org/wiki/La_Suite_du_Menteur
     91 corneille_menteur https://fr.wikisource.org/wiki/Le_Menteur
     92 corneille_pompee https://fr.wikisource.org/wiki/Pomp%C3%A9e
     93 corneille_polyeucte https://fr.wikisource.org/wiki/Polyeucte/%C3%89dition_Masson,_1887
     94 corneille_cinna https://fr.wikisource.org/wiki/Cinna_ou_la_Cl%C3%A9mence_d%E2%80%99Auguste
     95 corneille_horace https://fr.wikisource.org/wiki/Horace_(Corneille)
     96 corneille_cid https://fr.wikisource.org/wiki/Le_Cid
     97 corneille_comedie_des_tuileries https://fr.wikisource.org/wiki/La_Com%C3%A9die_des_Tuileries
     98 
     99 Other ideas (trickier):
    100 
    101 - https://fr.wikisource.org/wiki/Imitation_de_J%C3%A9sus-Christ/Texte_entier
    102 - https://fr.wikisource.org/wiki/Po%C3%A9sies_diverses_(Corneille)
    103 - corneille_andromede https://fr.wikisource.org/wiki/Androm%C3%A8de (but much free verse)
    104