plint

French poetry validator (local mirror of https://gitlab.com/a3nm/plint)
git clone https://a3nm.net/git/plint/
Log | Files | Refs | README

TODO (4594B)


      1 == Ongoing ==
      2 
      3 - add "scientifique" "scientifiques" to additions, and understand why it breaks
      4   the test suite
      5 - pretty-print the json in the diaeresis.json file to make debugging clearer
      6   with git diff
      7 
      8 - fix pytest-3 plint to make it work
      9 - migrate the readme to markdown
     10 - turn should_be_accepted into a test
     11 - expand the corpus of classical poetry: more Racine, more other authors
     12   (Boileau, Corneille, Prudhomme, etc.)
     13 
     14 - fix problems in the new works
     15 - Train diaresesis.json on new works
     16 - check that diaeresis:permissive is indeed more permissive
     17 - check for duplicates in additions.txt
     18 - check again xmllitre
     19 
     20 - download in bulk all possible sources to train on them in an error-tolerant
     21   way, and to be able to check easily the usage of a given word
     22 
     23 == Ideas ==
     24 
     25 - Use the latest lexique (with our corrections) to generate a file of known
     26   words with their length, and when we have exactly one of these words ensure
     27   that we do not allow less syllables than indicated (but it can be more,
     28   because of diérèse)
     29 - Ensure that, on words known in Lexique, frhyme returns exactly the known
     30   pronunciation(s); so we can use it confidently, e.g. to predict elision of the
     31   ending for rhyme genre and number of syllables
     32 - remove kludge for invalid characters, split them in specific chunks
     33 - Improve performance with profiling
     34 - Only indicate hemistiche status when there is a problem with hemistiches
     35 - Clean up the code to the extent possible
     36 - Look at dicollecte
     37   <http://grammalecte.net/download/fr/lexique-dicollecte-fr-v6.4.1.zip>, which
     38   also features pronunciation, and see how it differs from Lexique
     39 
     40 == Low priority ==
     41 
     42 === Error reporting ===
     43 
     44 - When reporting hemistiche errors, highlight possible hemistiche positions
     45   where an hemistiche could have been placed
     46 
     47 === Diérèses/synérèses ==
     48 
     49 - When training, take into consideration the contexts where we haven't been able
     50   to infer the number of syllables, and only learn at each step from the
     51   contexts where we are the most certain (including the unknown occurrences),
     52   instead of having a hardcoded default threshold
     53 - Formally evaluate the performance of the approach without additions
     54 
     55 === Other approaches ===
     56 
     57 - Learn rhyme and gender agnostically by clustering: prepare an undirected graph
     58   of rhyming verses, factor out suffixes, do SCC, prepare a trie
     59 
     60 === Misc ===
     61 
     62 - Fuzz testing: try giving random input to plint and check that it behaves
     63 - Better exception logging for the Web frontend
     64 
     65 === Problems ===
     66 
     67 - Loanwords from English ("crumble", "single", etc.) shouldn't be elidable
     68 - Loadwords from Italian ("ad patres") shouldn't be elidable
     69 
     70 == Other possible sources ==
     71 
     72 The following could be easily integrated, either from
     73 https://dramacode.github.io/ or from the indicated URL:
     74 
     75 corneille_surena https://fr.wikisource.org/wiki/Sur%C3%A9na
     76 corneille_pulcherie https://fr.wikisource.org/wiki/Pulch%C3%A9rie
     77 corneille_tite_et_berenice https://fr.wikisource.org/wiki/Tite_et_B%C3%A9r%C3%A9nice
     78 corneille_attila https://fr.wikisource.org/wiki/Attila
     79 corneille_othon https://fr.wikisource.org/wiki/Othon/Texte_entier
     80 corneille_sophonisbe https://fr.wikisource.org/wiki/Sophonisbe_(Corneille)
     81 corneille_sertorius https://fr.wikisource.org/wiki/Sertorius
     82 corneille_toison_dor https://fr.wikisource.org/wiki/La_Toison_d%E2%80%99or_(Corneille)
     83 corneille_nicomede https://fr.wikisource.org/wiki/Nicom%C3%A8de
     84 corneille_don_sanche_daragon https://fr.wikisource.org/wiki/Don_Sanche_d%E2%80%99Aragon
     85 corneille_heraclius https://fr.wikisource.org/wiki/H%C3%A9raclius_empereur_d%E2%80%99Orient
     86 corneille_theodore https://fr.wikisource.org/wiki/Th%C3%A9odore_vierge_et_martyre
     87 corneille_rodogune https://fr.wikisource.org/wiki/Th%C3%A9odore_vierge_et_martyre
     88 corneille_menteur_suite https://fr.wikisource.org/wiki/La_Suite_du_Menteur
     89 corneille_menteur https://fr.wikisource.org/wiki/Le_Menteur
     90 corneille_pompee https://fr.wikisource.org/wiki/Pomp%C3%A9e
     91 corneille_polyeucte https://fr.wikisource.org/wiki/Polyeucte/%C3%89dition_Masson,_1887
     92 corneille_cinna https://fr.wikisource.org/wiki/Cinna_ou_la_Cl%C3%A9mence_d%E2%80%99Auguste
     93 corneille_horace https://fr.wikisource.org/wiki/Horace_(Corneille)
     94 corneille_cid https://fr.wikisource.org/wiki/Le_Cid
     95 corneille_comedie_des_tuileries https://fr.wikisource.org/wiki/La_Com%C3%A9die_des_Tuileries
     96 
     97 Other ideas (trickier):
     98 
     99 - https://fr.wikisource.org/wiki/Imitation_de_J%C3%A9sus-Christ/Texte_entier
    100 - https://fr.wikisource.org/wiki/Po%C3%A9sies_diverses_(Corneille)
    101 - corneille_andromede https://fr.wikisource.org/wiki/Androm%C3%A8de (but much free verse)
    102