Parsing HTML in Python - Some pages work and some don't...? -

using following script:

from lxml import html import requests  gameurl = 'http://store.401games.ca/catalog/2415520/caylus' page = requests.get(gameurl) tree = html.fromstring(page.content)  stock = tree.xpath('//*[@id="stock"]/span[1]/div/*/text()')[0]  print stock

it correctly display stock level listed on page. (1 @ time)

gameurl = 'http://store.401games.ca/catalog/2415324/ticket-to-ride'

it displays stock 68, incorrect. (i have no idea 68 coming from).

i tried lot of pages site , 90% of them work correctly using script. other 10% fail , give random numbers...some quite different 68 instead of 30. or 1100 instead of 30. closer, 12 instead of 9. have no idea happening.

does have idea of may problem?

if open page in browser, see quantity: 68 flashing before changes quantity: 30.

at first, thought there xhr request dynamically gets product availability endpoint after page loaded , started provide usual answer browser automation, problem here different.

if open network tab in browser developer tools, may see store.js javascript file being loaded. @ beginning of script, can see:

if(stock>30) { $('div.availability span').text( "30" ); } var instock = $('div.availability').text(); instock = instock.replace("in-stock", "quantity");

what means that, if quantity more 30, "manually" set 30.

Search This Blog

Ben

Parsing HTML in Python - Some pages work and some don't...? -

Comments

Post a Comment

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

post - imageshack API cURL -

dataset - MPAndroidchart returning no chart Data available -