You must log in or # to comment.
Why 51? If you assign a binary number to a split, so g|i|r|af|fe is 111010 and gira|f|fe is 000110, that would be 2^7-1. I didn’t get how this is counted.
Dominated by popular culture, brands, platform-speak, and non-words. Tokenization is, therefore, a political act: it defines what can be represented and how the world becomes computationally representable.
Welcome to 1984
This was a very interesting talk. Thanks for sharing it.


