python - Scrapy print to json file -
i run spider against craigslist , save results json file using scrapy. spider displays results in console, .json file empty. command using is:
scrapy runspider detroit.py -o detroit.json
can shed little light, thanks!
from scrapy.spider import basespider scrapy.selector import htmlxpathselector craigslist_sample.items import craigslistsampleitem class myspider(basespider): name = "craig" allowed_domains = ["craigslist.org"] start_urls = ["http://detroit.craigslist.org/search/sof"] def parse(self, response): hxs = htmlxpathselector(response) titles = hxs.select("//span[@class='pl']") titles in titles: title = titles.select("a/text()").extract()[0] link = titles.select("a/@href").extract()[0] print title, link
that's because printing results. need instantiate items , return them:
def parse(self, response): elm in response.xpath("//span[@class='pl']//a"): item = craigslistsampleitem() item["title"] = elm.xpath("text()").extract_first() item["link"] = elm.select("href").extract_first() yield item
Comments
Post a Comment