Which of the following is *not* a mapping rule for tokens Canonicalization?
1) Removing characters such as hyphen, periods and accents.
2) Reducing all letters to lower case (case-folding)
3) Collapsing alternate spellings (colour → color)
4) Keeping synonyms as different classes to include more diverse tokens