python,利用BeautifulSoup写的第一采集程序
import requests from bs4 import BeautifulSoup link="http://www.xiangmingshan.com/zhishi/" headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"} r=requests.get(link,headers=headers) soup=BeautifulSoup(r.text, "lxml") titles=soup.find_all("h3", class_="f-16 mb15 f-bold") i=0 for title in titles: i+=1 print(i,title.text) urlnames=title.a url=urlnames.get('href') res=requests.get(url,headers=headers) conaa=BeautifulSoup(res.text,"lxml") neirong=conaa.find(class_="info-con") print("正文内容:",neirong.text)
import requests from bs4 import BeautifulSoup link="http://www.xiangmingshan.com/zhishi/" headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"} r=requests.get(link,headers=headers) soup=BeautifulSoup(r.text, "lxml") titles=soup.find_all("h3", class_="f-16 mb15 f-bold") i=0 for title in titles: i+=1 print(i,title.text) url=title.a['href'] res=requests.get(url,headers=headers) conaa=BeautifulSoup(res.text,"lxml") neirong=conaa.find(class_="info-con") print("正文内容:",neirong.text)
其中获取链接有两个不同方式,分别是
url=urlnames.get('href')
与
url=title.a['href']
评论