nlsplit

split natural language text in chunks at reasonable language boundaries
git clone https://a3nm.net/git/nlsplit/
Log | Files | Refs | README

commit ea41916de5a2389084a6b28b77ef1203d332dedb
parent 887b457af2a5a8df7c9ea2faf323a799caf77a26
Author: Antoine Amarilli <a3nm@a3nm.net>
Date:   Thu, 28 Jun 2012 15:48:30 +0200

clarify

Diffstat:
README | 10+++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/README b/README @@ -87,11 +87,11 @@ nlsplit can produce arbitrarily small chunks and will do nothing to avoid that. It's up to you to regroup chunks if you don't find this acceptable. -nlsplit is not Unicode-aware. It will not perform splits according to -extended characters, and could theoretically split an extended -character. However, as long as you are using ASCII whitespace regularly -enough, these splits should not be favoured and that bad situation -should not happen. +nlsplit is not Unicode-aware. It will not take extended characters into +account when performing splits, and could theoretically split an +extended character. However, as long as you are using ASCII whitespace +regularly enough, these splits should not be favoured and that bad +situation should not happen. nlsplit keeps whitespace at the beginning or at the end of chunks to avoid losing any information. Depending on your application, you might