深海游弋的鱼 – 默默的点滴

在命令行运行：

# Python2
$ pip install requests-html

# Python3
# pip3 install requests-html

# Python2

$ pip install requests-html

# Python3

# pip3 install requests-html

即可安装该模块。

例子如下：

#coding=utf-8
from bs4 import BeautifulSoup
import requests
from requests_html import HTMLSession
 
#使用requests抓取页面内容，并将响应赋值给page变量
html = requests.get('https://xcx.xzlzq.net/#/liftDetail?registerCode=31103301042010020002')
 
session = HTMLSession()
first_page = session.get('https://xcx.xzlzq.net/#/liftDetail?registerCode=31103301042010020002')
first_page.html.render(sleep=5)

#使用content属性获取页面的源页面
#使用BeautifulSoap解析，内容传递到BeautifulSoap类
soup = BeautifulSoup(first_page.html.html,'lxml')
links = soup.find_all('div',class_='content')
 
#link的内容就是div，我们取它的span内容就是我们需要段子的内容
for link in links:
    print(link.span.get_text())

#coding=utf-8

from bs4 import BeautifulSoup

import requests

from requests_html import HTMLSession

#使用requests抓取页面内容，并将响应赋值给page变量

html = requests.get('https://xcx.xzlzq.net/#/liftDetail?registerCode=31103301042010020002')

session = HTMLSession()

first_page = session.get('https://xcx.xzlzq.net/#/liftDetail?registerCode=31103301042010020002')

first_page.html.render(sleep=5)

#使用content属性获取页面的源页面

#使用BeautifulSoap解析，内容传递到BeautifulSoap类

soup = BeautifulSoup(first_page.html.html,'lxml')

links = soup.find_all('div',class_='content')

#link的内容就是div，我们取它的span内容就是我们需要段子的内容

for link in links:

print(link.span.get_text())

一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

在用BeautifulSoup解析HTML前对其中以JavaScript渲染部分的处理

参考链接

发布者

默默

发表回复取消回复

参考链接

发布者

默默

发表回复 取消回复

发表回复取消回复