TL;DR
In Java, do:
Java
xxxxxxxxxx
1
1
String normalizedString = Normalizer.normalize(originalString,Normalizer.Form.NFKD)
2
.replaceAll("[^\\p{ASCII}]", "").toLowerCase().replaceAll("\\s{2,}", " ").trim();
Nowadays, most strings are Unicode-encoded and we are able to work with many different native characters with diacritical signs/accents (like ö
, é
, À
) or ligatures (like æ
or ʥ
). Characters can be stored in UTF-8 (for instance) and associated glyphs can be displayed properly if the font supports them.