Python search loop slow -
i running search on list of ads (adscrape). each ad dict within adscrape (e.g. ad below). searches through list of ids (database_ids) between 200,000 - 1,000,000 items long. want find ads in adscrape don't have id in database_ids.
my current code below. takes loooong time, , multiple seconds each ad scan through database_ids. there more efficient/faster way of running (finding items in big list, in big list)?
database_ids = ['id1','id2','id3'...] ad = {'body': u'\xa0suv', 'loc': u'sa', 'last scan': '06/02/16', 'eng': u'\xa06cyl 2.7l ', 'make': u'hyundai', 'year': u'2006', 'id': u'oag-ad-12371713', 'first scan': '06/02/16', 'odo': u'168911', 'active': 'y', 'adtype': u'dealer: used car', 'model': u'tucson auto 4x4 ', 'trans': u'\xa0automatic', 'price': u'9990'} ad in adscrape: ad['last scan'] = date ad['active'] = 'y' adscrape_ids.append(ad['id']) if ad['id'] not in database_ids: ad['first scan'] = date print 'new ad:',ad newads.append(ad)
`you can use list comprehensions code base given below. use existing database_ids list , adscrape dict given above.
code base: new_adds_ids = [ad ad in adscrape if ad['id'] not in database_ids]`
Comments
Post a Comment