http://code.google.com/p/pyfan/
Pyfan(派饭?)是一个给fanfou写的Python脚本,最大的特点就是支持抓取20条消息之前的内容。
- 跨越版本的时候,一定要记得保存版本号。不然以后有人问起BUG,就不知道上那里去找。
Url:
http://fanfou.com/message/ppip/p.2
Structure:
In line 46, begin with a <ol class="wa">
<a class="avatar" title="刘丹" href="/PorkFat">
</a>
<a class="author" href="/PorkFat">刘丹</a>
<span class="content">
@
<a class="former" href="http://fanfou.com/xiaoyo311">小野田</a>
谢谢,还没有吃过啊,听起来很新奇,我想去吃了
</span>
<span class="stamp">
<a class="time" title="2008-02-22 13:39" href="/statuses/sy9A4NqWjYo">约 1 小时前</a>
<span class="method">
通过
<a target="_blank" href="http://help.fanfou.com/im.html">MSN</a>
</span>
</span>
<span class="op">
</span>
</li>
有一个抓取的错误
Get page 26
Traceback (most recent call last):
File "/home/ppip/pyfan/pyfan.py", line 798, in <module>
main(sys.argv[1:])
File "/home/ppip/pyfan/pyfan.py", line 715, in main
work.update()
File "/home/ppip/pyfan/pyfan.py", line 595, in update
self.__friendPostList__(pageurl, lists, lock)
File "/home/ppip/pyfan/pyfan.py", line 308, in __friendPostList__
node = minidom.parse(urllib2.urlopen(pageurl))
File "/usr/lib/python2.5/site-packages/_xmlplus/dom/minidom.py", line 1915, in parse
return expatbuilder.parse(file)
File "/usr/lib/python2.5/site-packages/_xmlplus/dom/expatbuilder.py", line 930, in parse
result = builder.parseFile(file)
File "/usr/lib/python2.5/site-packages/_xmlplus/dom/expatbuilder.py", line 207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 39, column 1952
Exception exceptions.TypeError: "'NoneType' object is unsubscriptable" in <bound method fanfou.__del__ of <__main__.fanfou instance at 0x8306fcc>> ignored
http://fanfou.com/statuses/GRiiURofTU0
E:\>pyfan -l lazylorna--max=2000
IOError, no data file?
Get page 1
Get page 2
Get page 3
Get page 4
Get page 5
Get page 6
Unhandled exception in thread started by
Traceback (most recent call last):
File "pyfan.py", line 310, in _friendPostList_
IndexError: list index out of range
Unhandled exception in thread started by
Traceback (most recent call last):
File "pyfan.py", line 310, in _friendPostList_
IndexError: list index out of range
Unhandled exception in thread started by
Traceback (most recent call last):
File "pyfan.py", line 310, in _friendPostList_
IndexError: list index out of range
Unhandled exception in thread started by
Traceback (most recent call last):
File "pyfan.py", line 310, in _friendPostList_
IndexError: list index out of range
Unhandled exception in thread started by
Traceback (most recent call last):
File "pyfan.py", line 310, in _friendPostList_
IndexError: list index out of range
Unhandled exception in thread started by
Traceback (most recent call last):
File "pyfan.py", line 310, in _friendPostList_
IndexError: list index out of range
Sent at 3:13 PM on Thursday
