Question: Unicode Regex in Python 3 (from Python 2 Code)


Unicode Regex in Python 3 (from Python 2 Code)

Answers 2
Added at 2016-12-16 10:12

I'm trying to convert my Python 2 script to Python 3. How do we do Regex with Unicode?

This is what I had in Python 2 which works It replaces quotes to « and »:

text = re.sub(ur'"(.*?)"', ur'«\1»', text)

I have some really complex ones which the "ur" made it so easy. But it doesn't work in Python 3:

text = re.sub(ur'ه\sایم([\]\.،\:»\)\s])', ur'ه\u200cایم\1', text)

Answers to

Unicode Regex in Python 3 (from Python 2 Code)

nr: #1 dodano: 2016-12-16 10:12

All strings in Python3 are unicode by default. Just remove the u and you should be fine.

In Python2 strings are lists of bytes by default, so we use u to mark them as unicode strings.

nr: #2 dodano: 2016-12-16 11:12

Since Python 3.0, the language features a str type that contain Unicode characters, meaning any string created using "unicode rocks!", 'unicode rocks!', or the triple-quoted string syntax is stored as Unicode.

Unicode HOWTO This doc will help you.

so, you just do want every you do in Python2, and it will works, no extra effects.

Source Show
◀ Wstecz