Python正则表达式常用函数总结

时间：2021-05-22

本文实例总结了Python正则表达式常用函数。分享给大家供大家参考，具体如下：

re.match()

函数原型：

match(pattern, string, flags=0)
Try to apply the pattern at the start of the string,
returning a match object, or None if no match was found.

函数作用：

re.match函数尝试从字符串的开头开始匹配一个模式，如果匹配成功，返回一个匹配成功的对象，否则返回None。

参数说明：

pattern：匹配的正则表达式
string：要匹配的字符串
flags：标志位，用于控制正则表达式的匹配方式。如是否区分大小写、是否多行匹配等。

我们可以使用group()或groups()匹配对象函数来获取匹配后的结果。

group()

group(...)
group([group1, ...]) -> str or tuple.
Return subgroup(s) of the match by indices or names.
For 0 returns the entire match.

获得一个或多个分组截获的字符串；指定多个参数时将以元组形式返回。group1可以使用编号也可以使用别名；编号0代表匹配的整个子串；默认返回group(0)；没有截获字符串的组返回None；截获了多次的组返回最后一次截获的子串。

groups()

groups(...)
groups([default=None]) -> tuple.
Return a tuple containing all the subgroups of the match, from 1.
The default argument is used for groups
that did not participate in the match

以元组形式返回全部分组截获的字符串。相当于调用group(1,2,…last)。没有截获字符串的组以默认值None代替。

实例

import reline = "This is the last one"res = re.match( r'(.*) is (.*?) .*', line, re.M|re.I)if res: print "res.group() : ", res.group() print "res.group(1) : ", res.group(1) print "res.group(2) : ", res.group(2) print "res.groups() : ", res.groups()else: print "No match!!"

re.M|re.I：这两参数表示多行匹配|不区分大小写，同时生效。

细节实例：

>>> re.match(r'.*','.*g3jl\nok').group()'.*g3jl'

.（点）表示除换行符以外的任意一个字符，*（星号）表示匹配前面一个字符0次1次或多次，这两联合起来使用表示匹配除换行符意外的任意多个字符，所以出现以上的结果。

1、re.match(r'.*..', '..').group()'..'2、>>> re.match(r'.*g.','.*g3jlok').group()'.*g3'3、>>> re.match(r'.*...', '..').group()Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'

上面两例子为什么有结果呢？这是因为第一个例子.*..中的.*匹配了0次，后面的..匹配字符串中..，而第二个例子中的 .* 匹配了一次，匹配字符串中的 .*，g匹配了后面的g字符，最后一个.号匹配了。
为什么第三个例子没有匹配到结果呢？这是因为就算正则表达式中的 .* 匹配0次，后面的三个点也不能完全匹配原字符串中的两个点，所以匹配失败了。
从上面几个例子可以看出，只有当正则表达式中要匹配的字符数小于等于原字符串中的字符数，才能匹配出结果。并且 “.*” 在匹配的过程中会回溯，先匹配0次，如果整个表达式能匹配成功，再匹配一次，如果还是能匹配，那就匹配两次，这样一次下去，直到不能匹配成功时，返回最近一次匹配成功的结果，这就是”.*”的贪婪性。

匹配Python中的标识符：

>>> re.match(r'^[a-zA-Z|_][\w_]*','_1name1').group()'_1name1'>>> re.match(r'^[a-zA-Z|_][\w_]*','_name1').group()'_name1'>>> re.match(r'^[a-zA-Z|_][\w_]*','num').group()'num'>>> re.match(r'^[a-zA-Z|_][\w_]*','1num').group()Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'

re.search()

函数原型：

search(pattern, string, flags=0)
Scan through string looking for a match to the pattern,
returning a match object, or None if no match was found.

函数作用：

扫描整个字符串并返回第一次成功的匹配对象，如果匹配失败，则返回None。

参数说明：

pattern：匹配的正则表达式
string：要匹配的字符串
flags：标志位，用于控制正则表达式的匹配方式。如是否区分大小写、是否多行匹配等。

跟re.match函数一样，使用group()和groups()方法来获取匹配后的结果。

>>> re.search(r'[abc]\*\d{2}','12a*23Gb*12ad').group()'a*23'

从匹配结果看出，re.search返回了第一次匹配成功的结果'a*23'，如果尽可能多的匹配的话，还可以匹配后面的'b*12'。

re.match与re.search的区别

re.match只匹配字符串的开始，如果字符串开始不符合正则表达式，则匹配失败，函数返回None；而re.search匹配整个字符串，直到找到一个匹配，否则也返回None。

>>> re.match(r'(.*)(are)',"Cats are smarter than dogs").group(2)'are'>>> re.search(r'(are)+',"Cats are smarter than dogs").group()'are'

上面两个例子是等价的。

re.sub()

Python的re模块中提供了re.sub()函数用于替换字符串中的匹配项，如果没有匹配的项则字符串将没有匹配的返回。

函数原型：

sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a string, backslash escapes in it are processed. If it is
a callable, it's passed the match object and must return
a replacement string to be used.

参数说明：

pattern：匹配的正则表达式
repl：用于替换的字符串
string：要被替换的字符串
count：替换的次数，如果为0表示替换所有匹配到的字串，如果是1表示替换1次等,该参数必须是非负整数，默认为0。
flags：标志位，用于控制正则表达式的匹配方式。如是否区分大小写、是否多行匹配等。

实例

将手机号的后4位替换成0

>>> re.sub('\d{4}$','0000','13549876489')'13549870000'

将代码后面的注释信息去掉

>>> re.sub('#.*$','', 'num = 0 #a number')'num = 0 '

re.split()

函数原型：

split(pattern, string, maxsplit=0, flags=0)
Split the source string by the occurrences of the pattern,
returning a list containing the resulting substrings.

函数作用：

分割字符串，将字符串用给定的正则表达式匹配的字符串进行分割，分割后返回结果list。

参数说明：

pattern：匹配的正则表达式
string：被分割的字符串
maxsplit：最大的分割次数
flags：标志位，用于控制正则表达式的匹配方式。如是否区分大小写、是否多行匹配等。

re.findall()

函数原型：

findall(pattern, string, flags=0)
Return a list of all non-overlapping matches in the string.
If one or more groups are present in the pattern, return a
list of groups; this will be a list of tuples if the pattern
has more than one group.
Empty matches are included in the result.

函数的作用：

获取字符串中所有匹配的字符串，并以列表的形式返回。列表中的元素有如下几种情况：

当正则表达式中含有多个圆括号()时，列表的元素为多个字符串组成的元组，而且元组中字符串个数与括号对数相同，并且字符串排放顺序跟括号出现的顺序一致（一般看左括号'(‘就行），字符串内容与每个括号内的正则表达式想对应。
当正则表达式中只带有一个圆括号时，列表中的元素为字符串，并且该字符串的内容与括号中的正则表达式相对应。（注意：列表中的字符串只是圆括号中的内容，不是整个正则表达式所匹配的内容。）
当正则表达式中没有圆括号时，列表中的字符串表示整个正则表达式匹配的内容。

参数说明：

pattern：匹配的正则表达式
string：被分割的字符串
flags：标志位，用于控制正则表达式的匹配方式。如是否区分大小写、是否多行匹配等。

实例：

1、匹配字符串中所有含有'oo'字符的单词

#正则表达式中没有括号>>> re.findall(r'\w*oo\w*', 'woo this foo is too')['woo', 'foo', 'too']

从结果可以看出，当正则表达式中没有圆括号时，列表中的字符串表示整个正则表达式匹配的内容

2、获取字符串中所有的数字字符串

#正则表达式中只有1个括号>>> re.findall(r'.*?(\d+).*?','adsd12343.jl34d5645fd789')['12343', '34', '5645', '789']

从上面结果可以看出，当正则表达式中只带有一个圆括号时，列表中的元素为字符串，并且该字符串的内容与括号中的正则表达式相对应。

3、提取字符串中所有的有效的域名地址

#正则表达式中有多个括号时>>> add = 'https://pile('\w+') #编译正则表达式，获得其对象>>> res = p.findall(s)#用正则表达式对象去匹配内容>>> print res['this', 'is', 'a', 'python', 'test']

PS：这里再为大家提供2款非常方便的正则表达式工具供大家参考使用：

JavaScript正则表达式在线测试工具：
http://tools.jb51.net/regex/javascript

正则表达式在线生成工具：
http://tools.jb51.net/regex/create_reg

更多关于Python相关内容可查看本站专题：《Python正则表达式用法总结》、《Python数据结构与算法教程》、《Python函数使用技巧总结》、《Python字符串操作技巧汇总》、《Python入门与进阶经典教程》及《Python文件与目录操作技巧汇总》

希望本文所述对大家Python程序设计有所帮助。

Python正则表达式常用函数总结

相关文章

PostgreSQL 正则表达式 常用函数的总结

MongoDB正则表达式及应用

JS正则表达式常见函数与用法小结

Java常用正则表达式验证工具类RegexUtils.java

PHP preg match正则表达式函数的操作实例

PostgreSQL 正则表达式常用函数的总结