Getting table from HTML file with Python -
game_link = "http://espn.go.com/nba/playbyplay?gameid=400579510&period=0" game_source = urlopen(game_link) game_html = game_source.read() game_source.close(); row = beautifulsoup(game_html, "html.parser") pieces = list(row.children) i need game log rows above link above code gives me whol html text how can extract tables , turn them single rowns (pieces).
you try beautifulsoup.findall , supply tag , other attributes may know tags looking for. after looking @ page looks you're looking <tr> tags class even. use soup.findall("tr", attrs = {"class": "even"}). example.
import urllib.request bs4 import beautifulsoup game_link = "http://espn.go.com/nba/playbyplay?gameid=400579510&period=0" game_source = urllib.request.urlopen(game_link) game_html = game_source.read() game_source.close(); soup = beautifulsoup(game_html, "html.parser") # find instances of row class "even" rows = soup.findall("tr", attrs = {"class": "even"}) row in rows: // work print(row) you still need parse html each row. following "crude" example.
def parse_row(row): cols = row.findall("td") # each column in row # ignore timeouts, example if len(cols) < 4: return none else: return { "time": cols[0].get_text(), "team1": cols[1].get_text(), "score": cols[2].get_text(), "team2": cols[3].get_text() } parsed_rows = [] row in rows: parsed = parse_row(row) if parsed: parsed_rows.append(parsed)
Comments
Post a Comment