This crate provides a subword tokenizer. A subword tokenizer splits a token into several pieces, so-called word pieces. Word pieces were popularized by and used in the BERT natural language encoder.
danieldk / wordpieces Goto Github PK
View Code? Open in Web Editor NEWSplit tokens into word pieces
License: Other