Python search loop slow -


i running search on list of ads (adscrape). each ad dict within adscrape (e.g. ad below). searches through list of ids (database_ids) between 200,000 - 1,000,000 items long. want find ads in adscrape don't have id in database_ids.

my current code below. takes loooong time, , multiple seconds each ad scan through database_ids. there more efficient/faster way of running (finding items in big list, in big list)?

database_ids = ['id1','id2','id3'...] ad = {'body': u'\xa0suv', 'loc': u'sa', 'last scan': '06/02/16', 'eng': u'\xa06cyl 2.7l ', 'make': u'hyundai', 'year': u'2006', 'id': u'oag-ad-12371713', 'first scan': '06/02/16', 'odo': u'168911', 'active': 'y', 'adtype': u'dealer: used car', 'model': u'tucson auto 4x4 ', 'trans': u'\xa0automatic', 'price': u'9990'}  ad in adscrape:     ad['last scan'] = date     ad['active'] = 'y'     adscrape_ids.append(ad['id'])     if ad['id'] not in database_ids:         ad['first scan'] = date         print 'new ad:',ad         newads.append(ad) 

`you can use list comprehensions code base given below. use existing database_ids list , adscrape dict given above.

code base: new_adds_ids = [ad ad in adscrape if ad['id'] not in database_ids]`


Comments

Popular posts from this blog

routing - AngularJS State management ->load multiple states in one page -

python - GRASS parser() error -

json - Gson().fromJson(jsonResult, Myobject.class) return values in 0's -