从网页里复制的 请求头,在复制到pycharm中时,格式不一样,每次都需要慢慢的调整,后来通过re正则表达式快速处理。
比如原来的是
Accept: text/html,Application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
Cache-Control: no-cache
Connection: keep-alive
Cookie: BIDUPSID=CE2731xUjJ4RGNlYTJaWW9razah058k25vo1ggcdpl0q
DNT: 1
Host: www.baidu.com
Pragma: no-cache
sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Microsoft Edge";v="92"
sec-ch-ua-mobile: ?0
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36 Edg/92.0.902.62
通过以下正则表达式后会变成:
import re
# 下方引号内添加替换掉请求头内容
headers_str = """
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
Cache-Control: no-cache
Connection: keep-alive
Cookie: BIDUPSID=CE2731xUjJ4RGNlYTJaWW9razah058k25vo1ggcdpl0q
DNT: 1
"""
pattern = '^(.*?):(.*)$'
for line in headers_str.splitlines():
print(re.sub(pattern,''\1':'\2',',line).replace(' ',''))
最终结果
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding':'gzip,deflate,br',
'Accept-Language':'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
'Cache-Control':'no-cache',
'Connection':'keep-alive',
'Cookie':'BIDUPSID=CE2731xUjJ4RGNlYTJaWW9razah058k25vo1ggcdpl0q',
'DNT':'1',
'Host':'www.baidu.com',
'Pragma':'no-cache',
'sec-ch-ua':'"Chromium";v="92","NotA;Brand";v="99","MicrosoftEdge";v="92"',
'sec-ch-ua-mobile':'?0',
'Sec-Fetch-Dest':'document',
'Sec-Fetch-Mode':'navigate',
'Sec-Fetch-Site':'none',
'Sec-Fetch-User':'?1',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/92.0.4515.107Safari/537.36Edg/92.0.902.6',
Process finished with exit code 0