python - Scrapy table with tr as header how to import it -


i want import table in scrapy organized this:

<tr class="header1"> <tr class="row1"> <tr class="row2"> <tr class="row3"> <tr class="header2"> <tr class="row4"> 

and on different rows between headers, how can import header have item first attribute header name or text? like

header1, row1 header1, row2 header1, row3 header2, row4 

you can iterate on "row" nodes and, every node, preceding "header" sibling.

imagine have following input html:

<table>     <tr class="header1">header 1</tr>     <tr class="row1">row 1</tr>     <tr class="row2">row 2</tr>     <tr class="row3">row 3</tr>     <tr class="header2">header 2</tr>     <tr class="row4">row 4</tr> </table> 

now, here how can parse it:

>>> row in response.css("tr[class^=row]"): ...     header_text = row.xpath("preceding-sibling::tr[starts-with(@class, 'header')][1]/text()").extract_first() ...     row_text = row.xpath("text()").extract_first() ...     print(header_text, row_text) ...  (u'header 1', u'row 1') (u'header 1', u'row 2') (u'header 1', u'row 3') (u'header 2', u'row 4') 

Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -