Character reading from file in Python

Ref: http://docs.python.org/howto/unicode Reading Unicode from a file is therefore simple: import codecs with codecs.open(‘unicode.rst’, encoding=’utf-8′) as f: for line in f: print repr(line) It’s also possible to open files in update mode, allowing both reading and writing: with codecs.open(‘test’, encoding=’utf-8′, mode=”w+”) as f: f.write(u’\u4500 blah blah blah\n’) f.seek(0) print repr(f.readline()[:1]) EDIT: I’m assuming that your … Read more

How do I get a list of all the ASCII characters using Python?

The constants in the string module may be what you want. All ASCII capital letters: >>> import string >>> string.ascii_uppercase ‘ABCDEFGHIJKLMNOPQRSTUVWXYZ’ All printable ASCII characters: >>> string.printable ‘0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c’ For every single character defined in the ASCII standard, use chr: >>> ”.join(chr(i) for i in range(128)) ‘\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !”#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f’

How can I remove non-ASCII characters but leave periods and spaces?

You can filter all characters from the string that are not printable using string.printable, like this: >>> s = “some\x00string. with\x15 funny characters” >>> import string >>> printable = set(string.printable) >>> filter(lambda x: x in printable, s) ‘somestring. with funny characters’ string.printable on my machine contains: 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ !”#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c EDIT: On Python 3, filter will … Read more

How to check if a String contains only ASCII?

From Guava 19.0 onward, you may use: boolean isAscii = CharMatcher.ascii().matchesAllOf(someString); This uses the matchesAllOf(someString) method which relies on the factory method ascii() rather than the now deprecated ASCII singleton. Here ASCII includes all ASCII characters including the non-printable characters lower than 0x20 (space) such as tabs, line-feed / return but also BEL with code … Read more

Why does Python print unicode characters when the default encoding is ASCII?

Thanks to bits and pieces from various replies, I think we can stitch up an explanation. By trying to print an unicode string, u’\xe9′, Python implicitly try to encode that string using the encoding scheme currently stored in sys.stdout.encoding. Python actually picks up this setting from the environment it’s been initiated from. If it can’t … Read more