Like many problems to do with strings, this can be done in a simple way with a regex.
>>> word = 'Llanfairpwllgwyn|gyllgogerychwyrndrobwllllantysiliogogogoch'
>>> import re
>>> pattern = re.compile(r'ch|dd|ff|ng|ll|ph|rh|th|[^\W\d_]', flags=re.IGNORECASE)
>>> len(pattern.findall(word))
51
The character class [^\W\d_]
(from here) matches word-characters that are not digits or underscores, i.e. letters, including those with diacritics.