python - Displaying contents of web scrape -


the code below displays fields out onto screen.is there way fields "alongside" each other appear in database or in spreadsheet.in source code fields track,date,datetime,grade,distance , prizes found in resultsblockheader div class,and fin(finishing position) greyhound,trap,sp timesec , time distance found in div resultsblock.i trying them displayed track,date,datetime,grade,distance,prizes,fin,greyhound,trap,sp,timesec,timedistance in 1 line.any appreciated.

from urllib import urlopen  bs4 import beautifulsoup html = urlopen("http://www.gbgb.org.uk/resultsmeeting.aspx?id=135754")  bsobj = beautifulsoup(html, 'lxml') namelist = bsobj. findall("div", {"class": "track"}) name in namelist:  print(name. get_text())  namelist = bsobj. findall("div", {"class": "date"}) name in namelist:  print(name. get_text())   namelist = bsobj. findall("div", {"class": "datetime"}) name in namelist:  print(name. get_text()) namelist = bsobj. findall("div", {"class": "grade"}) name in namelist:  print(name. get_text()) namelist = bsobj. findall("div", {"class": "distance"}) name in namelist:  print(name. get_text()) namelist = bsobj. findall("div", {"class": "prizes"}) name in namelist:  print(name. get_text()) namelist = bsobj. findall("li", {"class": "first essential fin"}) name in namelist:  print(name. get_text()) namelist = bsobj. findall("li", {"class": "essential greyhound"}) name in namelist:  print(name. get_text()) namelist = bsobj. findall("li", {"class": "trap"}) name in namelist:  print(name. get_text()) namelist = bsobj. findall("li", {"class": "sp"}) name in namelist:  print(name. get_text()) namelist = bsobj. findall("li", {"class": "timesec"}) name in namelist:  print(name. get_text()) namelist = bsobj. findall("li", {"class": "timedistance"}) name in namelist:  print(name. get_text()) namelist = bsobj. findall("li", {"class": "essential trainer"}) name in namelist:  print(name. get_text()) namelist = bsobj. findall("li", {"class": "first essential comment"}) name in namelist:  print(name. get_text()) namelist = bsobj. findall("div", {"class": "resultsblockfooter"}) name in namelist:  print(name. get_text())  namelist = bsobj. findall("li", {"class": "first essential"}) name in namelist:  print(name. get_text()) 

first of all, make sure not violating website's terms of use - stay on legal side.

the markup not easy scrape, iterate on race headers , every header, desired information race. then, sibling results block , extract rows. sample code started - extracts track , greyhound:

from pprint import pprint urllib2 import urlopen  bs4 import beautifulsoup   html = urlopen("http://www.gbgb.org.uk/resultsmeeting.aspx?id=135754") soup = beautifulsoup(html, 'lxml')  rows = [] header in soup.find_all("div", class_="resultsblockheader"):     track = header.find("div", class_="track").get_text(strip=true)      results = header.find_next_sibling("div", class_="resultsblock").find_all("ul", class_="line1")     result in results:         greyhound = result.find("li", class_="greyhound").get_text(strip=true)          rows.append({             "track": track,             "greyhound": greyhound         })  pprint(rows) 

note every row see in tables represented 3 lines in markup:

<ul class="contents line1">    ... </ul> <ul class="contents line2">    ... </ul> <ul class="contents line3">    ... </ul> 

the greyhound value inside first ul (with line1 class), may need line2 , line3 using result.find_next_sibling("ul", class="line2") , result.find_next_sibling("ul", class="line3").


Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -