python - Reformatting scraped selenium table -


i'm scraping table displays info sporting league. far selenium beginner:

from selenium import webdriver import re import pandas pd  driver = webdriver.phantomjs(executable_path=r'c:/.../bin/phantomjs.exe')  driver.get("http://www.oddsportal.com/hockey/usa/nhl-2014-2015/results/#/page/2.html")  infotable = driver.find_elements_by_class_name("table-main") matches = driver.find_elements_by_class_name("table-participant") ilist, match = [], []  in infotable:     ilist.append(i.text)     infolist = ilist[0]  in matches:     match.append(i.text)  driver.close()  home = pd.series([item.split(' - ')[0] item in match]) away = pd.series([item.strip().split(' - ')[1] item in match])  df = pd.dataframe({'home' : home, 'away' : away})  date = re.findall("\d\d\s\w\w\w\s\d\d\d\d", infolist) 

in last line, date scrapes dates in table can't link them corresponding game.

my thinking is: for child/element "under date", date = last_found_date.

ultimate goal have 2 more columns in df, 1 date of match , next if text found beside date, example 'play offs' (i can figure out myself if can date issue sorted).

should incorporating program/method retain order of tags/elements of table?

you need change way extract match information. instead of separately extracting home , away teams, in 1 loop extracting dates , events:

from selenium import webdriver  import pandas pd  driver = webdriver.phantomjs() driver.get("http://www.oddsportal.com/hockey/usa/nhl-2014-2015/results/#/page/2.html")  data = [] match in driver.find_elements_by_css_selector("div#tournamenttable tr.deactivate"):     home, away = match.find_element_by_class_name("table-participant").text.split(" - ")     date = match.find_element_by_xpath(".//preceding::th[contains(@class, 'first2')][1]").text      if " - " in date:         date, event = date.split(" - ")     else:         event = "not specified"      data.append({         "home": home.strip(),         "away": away.strip(),         "date": date.strip(),         "event": event.strip()     })  driver.close()  df = pd.dataframe(data) print(df) 

prints:

                     away         date          event                 home 0     washington capitals  25 apr 2015      play offs   new york islanders 1          minnesota wild  25 apr 2015      play offs       st.louis blues 2         ottawa senators  25 apr 2015      play offs   montreal canadiens 3     pittsburgh penguins  25 apr 2015      play offs     new york rangers 4          calgary flames  24 apr 2015      play offs    vancouver canucks 5      chicago blackhawks  24 apr 2015      play offs  nashville predators 6     tampa bay lightning  24 apr 2015      play offs    detroit red wings 7      new york islanders  24 apr 2015      play offs  washington capitals 8          st.louis blues  23 apr 2015      play offs       minnesota wild 9           anaheim ducks  23 apr 2015      play offs        winnipeg jets 10     montreal canadiens  23 apr 2015      play offs      ottawa senators 11       new york rangers  23 apr 2015      play offs  pittsburgh penguins 12      vancouver canucks  22 apr 2015      play offs       calgary flames 13    nashville predators  22 apr 2015      play offs   chicago blackhawks 14    washington capitals  22 apr 2015      play offs   new york islanders 15    tampa bay lightning  22 apr 2015      play offs    detroit red wings 16          anaheim ducks  21 apr 2015      play offs        winnipeg jets 17         st.louis blues  21 apr 2015      play offs       minnesota wild 18       new york rangers  21 apr 2015      play offs  pittsburgh penguins 19      vancouver canucks  20 apr 2015      play offs       calgary flames 20     montreal canadiens  20 apr 2015      play offs      ottawa senators 21    nashville predators  19 apr 2015      play offs   chicago blackhawks 22    washington capitals  19 apr 2015      play offs   new york islanders 23          winnipeg jets  19 apr 2015      play offs        anaheim ducks 24    pittsburgh penguins  19 apr 2015      play offs     new york rangers 25         minnesota wild  18 apr 2015      play offs       st.louis blues 26      detroit red wings  18 apr 2015      play offs  tampa bay lightning 27         calgary flames  18 apr 2015      play offs    vancouver canucks 28     chicago blackhawks  18 apr 2015      play offs  nashville predators 29        ottawa senators  18 apr 2015      play offs   montreal canadiens 30     new york islanders  18 apr 2015      play offs  washington capitals 31          winnipeg jets  17 apr 2015      play offs        anaheim ducks 32         minnesota wild  17 apr 2015      play offs       st.louis blues 33      detroit red wings  17 apr 2015      play offs  tampa bay lightning 34    pittsburgh penguins  17 apr 2015      play offs     new york rangers 35         calgary flames  16 apr 2015      play offs    vancouver canucks 36     chicago blackhawks  16 apr 2015      play offs  nashville predators 37        ottawa senators  16 apr 2015      play offs   montreal canadiens 38     new york islanders  16 apr 2015      play offs  washington capitals 39        edmonton oilers  12 apr 2015  not specified    vancouver canucks 40          anaheim ducks  12 apr 2015  not specified      arizona coyotes 41     chicago blackhawks  12 apr 2015  not specified   colorado avalanche 42    nashville predators  12 apr 2015  not specified         dallas stars 43          boston bruins  12 apr 2015  not specified  tampa bay lightning 44    pittsburgh penguins  12 apr 2015  not specified       buffalo sabres 45      detroit red wings  12 apr 2015  not specified  carolina hurricanes 46      new jersey devils  12 apr 2015  not specified     florida panthers 47  columbus blue jackets  12 apr 2015  not specified   new york islanders 48     montreal canadiens  12 apr 2015  not specified  toronto maple leafs 49         calgary flames  11 apr 2015  not specified        winnipeg jets 

Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -