nlsplit

split natural language text in chunks at reasonable language boundaries
git clone https://a3nm.net/git/nlsplit/
Log | Files | Refs | README

commit 7c7bfe58211fdb3026ad582a2dfe4b13ee360231
parent 7335282a996b413385259ca6e5d4951258d5d49c
Author: Antoine Amarilli <a3nm@a3nm.net>
Date:   Sat, 15 Oct 2011 18:01:16 +0200

new limitation in readme: nlsplit assumes LF

Diffstat:
README | 4++++
1 file changed, 4 insertions(+), 0 deletions(-)

diff --git a/README b/README @@ -76,6 +76,10 @@ memory usage is O(SIZE). == 4. Limitations == +nlsplit assumes that newlines are encoded with LF, not CR, CR+LF or +something else. You will have to perform the conversion using another +tool if this can be an issue. + nlsplit's heuristics are not bulletproof. They can be fooled and perform bad splits, or miss good ones.