non-ascii-characters – Tarik Billa

How do I get accented letters to actually work on bash?

December 28, 2023 by Tarik

To get accented letters on bash via Cygwin using Mintty 1.1.2 just do the following: Go to the menu (if you don’t see any menu, right click on your Terminal). Click Options…. Click Text. Change the Locale to C. Change the Character set to ISO-8859-1 (Western European). Then test it:

matching unicode characters in python regular expressions

December 21, 2023 by Tarik

You need to specify the re.UNICODE flag, and input your string as a Unicode string by using the u prefix: >>> re.match(r’^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$’, u’/by_tag/påske/øyfjell.jpg’, re.UNICODE).groupdict() {‘tag’: u’p\xe5ske’, ‘filename’: u’\xf8yfjell.jpg’} This is in Python 2; in Python 3 you must leave out the u because all strings are Unicode, and you can leave off the re.UNICODE flag.

R on Windows: character encoding hell

December 17, 2023 by Tarik

How to account for accent characters for regex in Python?

December 13, 2023 by Tarik

Try the following: hashtags = re.findall(r’#(\w+)’, str1, re.UNICODE) Regex101 Demo EDIT Check the useful comment below from Martijn Pieters.

Replace accented characters in R with non-accented counterpart (UTF-8 encoding) [duplicate]

December 10, 2023 by Tarik

Why does wprintf transliterate Russian text in Unicode into Latin on Linux?

September 13, 2023 by Tarik

Because conversion of wide characters is done according to the currently set locale. By default a C program always starts with a “C” locale which only supports ASCII characters. You have to switch to any Russian or UTF-8 locale first: setlocale(LC_ALL, “ru_RU.utf8”); // Russian Unicode setlocale(LC_ALL, “en_US.utf8”); // English US Unicode Or to a current … Read more

How do I write non-ASCII characters using echo?

August 20, 2023 by Tarik

If you care about portability, you’ll drop echo and use printf(1): printf ‘\012’

Finding the Values of the Arrow Keys in Python: Why are they triples?

August 9, 2023 by Tarik

I think I figured it out. I learned from here that each arrow key is represented by a unique ANSI escape code. Then I learned that the ANSI escape codes vary by system and application: in my terminal, hitting cat and pressing the up-arrow gives ^[[A, in C it seems to be \033[A, etc. The … Read more

Removing unicode \u2026 like characters in a string in python2.7 [duplicate]

July 24, 2023 by Tarik

Python 2.x >>> s ‘This is some \\u03c0 text that has to be cleaned\\u2026! it\\u0027s annoying!’ >>> print(s.decode(‘unicode_escape’).encode(‘ascii’,’ignore’)) This is some text that has to be cleaned! it’s annoying! Python 3.x >>> s=”This is some \u03c0 text that has to be cleaned\u2026! it\u0027s annoying!” >>> s.encode(‘ascii’, ‘ignore’) b”This is some text that has to be … Read more

How to fetch a non-ascii url with urlopen?

April 22, 2023 by Tarik

Strictly speaking URIs can’t contain non-ASCII characters; what you have there is an IRI. To convert an IRI to a plain ASCII URI: non-ASCII characters in the hostname part of the address have to be encoded using the Punycode-based IDNA algorithm; non-ASCII characters in the path, and most of the other parts of the address … Read more