在命令行运行:
1 2 3 4 5 |
# Python2 $ pip install requests-html # Python3 # pip3 install requests-html |
即可安装该模块。
例子如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
#coding=utf-8 from bs4 import BeautifulSoup import requests from requests_html import HTMLSession #使用requests抓取页面内容,并将响应赋值给page变量 html = requests.get('https://xcx.xzlzq.net/#/liftDetail?registerCode=31103301042010020002') session = HTMLSession() first_page = session.get('https://xcx.xzlzq.net/#/liftDetail?registerCode=31103301042010020002') first_page.html.render(sleep=5) #使用content属性获取页面的源页面 #使用BeautifulSoap解析,内容传递到BeautifulSoap类 soup = BeautifulSoup(first_page.html.html,'lxml') links = soup.find_all('div',class_='content') #link的内容就是div,我们取它的span内容就是我们需要段子的内容 for link in links: print(link.span.get_text()) |