Replacing multiple text patterns in python

In python, re.sub() or str.replace() can be used to replace a certain sub-string to another one. Sometimes you may want to replace multiple strings. This article will explain how you can replace multiple sub-strings without traversing the target text multiple times. The following code shows one way to replace multiple sub-strings in the target text. Although this is easy, it is not efficient since it traverses the target text multiple times. Calling re.sub() multiple times has the same issue.

>>> target = '''
... In another moment down went Alice after it, never once
... considering how in the world she was to get out again. The
... rabbit-hole went straight on like a tunnel for some way, and
... then dipped suddenly down, so suddenly that Alice had not a
... moment to think about stopping herself before she found herself
... falling down a very deep well.
... '''
>>>
>>> result = target.replace('Alice', 'ALICE')
>>> result = result.replace('down', 'DOWN')
>>> result = result.replace('suddenly', 'SUDDENLY')
>>> result = result.replace('she', 'SHE')
>>> result = result.replace('herself', 'HERSELF')
>>> print result

In another moment DOWN went ALICE after it, never once
considering how in the world SHE was to get out again. The
rabbit-hole went straight on like a tunnel for some way, and
then dipped SUDDENLY DOWN, so SUDDENLY that ALICE had not a
moment to think about stopping HERSELF before SHE found HERSELF
falling DOWN a very deep well.

The following code shows how to replace multiple sub-strings in the target text but only traversing the text once. The argument of re.compile() is all keys of rdict concatenated by pipe | characters. The first argument of robj.sub() is a lambda function. The argument is the matched regular expression object and, therefore, m.group(0) is the matched sub-string that we want to replace. The function looks up the rdict dictionary and returns the corresponding destination sub-string. Please note that this example only has alphabets but when you have symbols, you must escape regular expression characters like . * ? | in the keys of rdict in advance.

>>> target = '''
... In another moment down went Alice after it, never once
... considering how in the world she was to get out again. The
... rabbit-hole went straight on like a tunnel for some way, and
... then dipped suddenly down, so suddenly that Alice had not a
... moment to think about stopping herself before she found herself
... falling down a very deep well.
... '''
>>>
>>> rdict = {
...     'Alice': 'ALICE',
...     'down': 'DOWN',
...     'suddenly': 'SUDDENLY',
...     'she': 'SHE',
...     'herself': 'HERSELF',
... }
>>>
>>> robj = re.compile('|'.join(rdict.keys()))
>>> result = robj.sub(lambda m: rdict[m.group(0)], target)
>>> print result

In another moment DOWN went ALICE after it, never once
considering how in the world SHE was to get out again. The
rabbit-hole went straight on like a tunnel for some way, and
then dipped SUDDENLY DOWN, so SUDDENLY that ALICE had not a
moment to think about stopping HERSELF before SHE found HERSELF
falling DOWN a very deep well.