web scraping - Webscrapping an alternate version/hidden item from a webpage using python beautifulsoup -

i trying webscrap information (points scored, tackles made, time played, position, etc...) top14 rugby players website. each player info page : http://www.lnr.fr/rugby-top-14/joueurs/nicholas-abendanon each player, can info 2015-2016 season easily, need info 2014-2015 season.

problem is, when open corresponding link (http://www.lnr.fr/rugby-top-14/joueurs/nicholas-abendanon#season=14535) source code same , info program scrap 2015-2016 data. can't seem find way info previous seasons though appears on webpage. knows how solve ?

here code player gave example.

import bs4 lxml import html import requests import string import _pickle pickle bs4 import beautifulsoup  dic={} url_player='http://www.lnr.fr/rugby-top-14/joueurs/nicholas-abendanon' page = requests.get(url_player) html=page.content parsed_html = beautifulsoup(html) body=parsed_html.body saison14_15=body.find('a',attrs={'data-title':'saison 2014-2015'}) link=saison14_15['href'] url_season='http://www.lnr.fr/rugby-top-14/joueurs/nicholas-abendanon'+link page_season = requests.get(url_season) html_season=page_season.content parsed_html_season = beautifulsoup(html_season) body_season=parsed_html_season.body  dic['nom']=body_season.find('h1',attrs={'id':'page-title'}).text dic[body_season.find('span',attrs=     {'class':'title'}).text]=body_season.find('span',attrs={'class':'text'}).text info1=body_season.find('ul',attrs={'class':'infos-list small-bold'}) try:     item in info1.findall('li'):         dic[item.find('span',attrs={'class':'title'}).text]=item.find('span',attrs={'class':'text'}).text      info2=body_season.find('ul',attrs={'class':'fluid-block-grid-3 team-stats'})     if info2 not none :         item in info2.findall('li'):                dic[item.find('span',attrs={'class':'title'}).text]=item.find('span',attrs={'class':'text'}).text     info3=body_season.find('ul',attrs={'class':'number-list small-block-grid-2'})     if info3 not none :         item in info3.findall('li'):                dic[item.find('span',attrs={'class':'title'}).text]=item.find('span',attrs={'class':'text'}).text except:     dic=dic`

when choose 2014-2015 season, page makes ajax request to

http://www.lnr.fr/ajax_player_stats_detail?player=33249&compet_type=1&=undefined&season=14535&_filter_current_tab_id=panel-filter-season&ajax-target-selector=%23player_stats_detail_block

if switch 2015-2016, makes ajax request to

http://www.lnr.fr/ajax_player_stats_detail?player=33249&compet_type=1&=undefined&season=18505&_filter_current_tab_id=panel-filter-season&ajax-target-selector=%23player_stats_detail_block

each request returns chunk of html gets inserted page.

if can figure out parameters needed player , season, suggest request data directly (without loading parent page @ all).

Search This Blog

Ben

web scraping - Webscrapping an alternate version/hidden item from a webpage using python beautifulsoup -

Comments

Post a Comment

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

post - imageshack API cURL -

dataset - MPAndroidchart returning no chart Data available -