一个简单的磁力链接搜索引擎爬虫

作者: Jarett 日期: 2014 年 3 月 15 日发表评论 (1) 查看评论

现在有很多大大小小的磁力链接搜索网站，输入各种关键字，就会把相关的的磁力链接找出来，简直是良心网站，撸友福音啊。我平常使用的是：
shousibaocai.com

可是当我们需要批量获取某个系(nv)列(you)的磁力链接时，会发现非常地麻烦，于是使用Python写了一个小脚本，获取某个关键词的所有磁力链接，希望大家喜欢，哈哈，就算你不喜欢也关我毛事。哦，对了，这个脚本需要安装BeautifulSoup，一个用来解析html的库，其实不用也行，就是自己根据关键字找而已，这库安装也很简单，不过版本3和4略有区别，我会在注释中标注。下面请看大屏幕。

# -*- coding: utf-8 -*-
import urllib2
import sys
#BeautifulSoup3不需要修改，BeautifulSoup4，改成from bs4 import BeautifulSoup
from BeautifulSoup import BeautifulSoup

reload(sys) 
sys.setdefaultencoding( "utf-8" )

def getcontent(url):
    print url
    req = urllib2.Request(url)
    res = urllib2.urlopen(req)
    magnetlist=[]
    html = res.read()
    res.close()
    soup = BeautifulSoup(html)
    #BeautifulSoup3不需要修改，BeautifulSoup4，改成soup.find_all('a')
    allentry=soup.findAll('a')
    for link in allentry:
        if "magnet:"==link.get('href')[0:7]:
            magnetlist.append(link.get('href'))
    magnetlist = [line+'\n' for line in magnetlist]
    f =open("magnet.txt",  "a")
    f.writelines(magnetlist)
    f.close()

def main():
    site="http://bt.shousibaocai.com/search/"
    keyword="地心引力"
    keyword=urllib2.quote(keyword)
    #总共抓前多少页
    page=3
    for i in range(1,page):
        searchurl=site+keyword+"/"+str(i)
        getcontent(searchurl)

if __name__ == '__main__':
    main()
    #end Jarett