web scraping - Webscrapping an alternate version/hidden item from a webpage using python beautifulsoup -
i trying webscrap information (points scored, tackles made, time played, position, etc...) top14 rugby players website. each player info page : http://www.lnr.fr/rugby-top-14/joueurs/nicholas-abendanon each player, can info 2015-2016 season easily, need info 2014-2015 season.
problem is, when open corresponding link (http://www.lnr.fr/rugby-top-14/joueurs/nicholas-abendanon#season=14535) source code same , info program scrap 2015-2016 data. can't seem find way info previous seasons though appears on webpage. knows how solve ?
here code player gave example.
import bs4 lxml import html import requests import string import _pickle pickle bs4 import beautifulsoup dic={} url_player='http://www.lnr.fr/rugby-top-14/joueurs/nicholas-abendanon' page = requests.get(url_player) html=page.content parsed_html = beautifulsoup(html) body=parsed_html.body saison14_15=body.find('a',attrs={'data-title':'saison 2014-2015'}) link=saison14_15['href'] url_season='http://www.lnr.fr/rugby-top-14/joueurs/nicholas-abendanon'+link page_season = requests.get(url_season) html_season=page_season.content parsed_html_season = beautifulsoup(html_season) body_season=parsed_html_season.body dic['nom']=body_season.find('h1',attrs={'id':'page-title'}).text dic[body_season.find('span',attrs= {'class':'title'}).text]=body_season.find('span',attrs={'class':'text'}).text info1=body_season.find('ul',attrs={'class':'infos-list small-bold'}) try: item in info1.findall('li'): dic[item.find('span',attrs={'class':'title'}).text]=item.find('span',attrs={'class':'text'}).text info2=body_season.find('ul',attrs={'class':'fluid-block-grid-3 team-stats'}) if info2 not none : item in info2.findall('li'): dic[item.find('span',attrs={'class':'title'}).text]=item.find('span',attrs={'class':'text'}).text info3=body_season.find('ul',attrs={'class':'number-list small-block-grid-2'}) if info3 not none : item in info3.findall('li'): dic[item.find('span',attrs={'class':'title'}).text]=item.find('span',attrs={'class':'text'}).text except: dic=dic`
when choose 2014-2015 season, page makes ajax request to
http://www.lnr.fr/ajax_player_stats_detail?player=33249&compet_type=1&=undefined&season=14535&_filter_current_tab_id=panel-filter-season&ajax-target-selector=%23player_stats_detail_block
if switch 2015-2016, makes ajax request to
http://www.lnr.fr/ajax_player_stats_detail?player=33249&compet_type=1&=undefined&season=18505&_filter_current_tab_id=panel-filter-season&ajax-target-selector=%23player_stats_detail_block
each request returns chunk of html gets inserted page.
if can figure out parameters needed player
, season
, suggest request data directly (without loading parent page @ all).
Comments
Post a Comment