python - Scrapy table with tr as header how to import it -
i want import table in scrapy organized this:
<tr class="header1"> <tr class="row1"> <tr class="row2"> <tr class="row3"> <tr class="header2"> <tr class="row4">
and on different rows between headers, how can import header have item first attribute header name or text? like
header1, row1 header1, row2 header1, row3 header2, row4
you can iterate on "row" nodes and, every node, preceding "header" sibling.
imagine have following input html:
<table> <tr class="header1">header 1</tr> <tr class="row1">row 1</tr> <tr class="row2">row 2</tr> <tr class="row3">row 3</tr> <tr class="header2">header 2</tr> <tr class="row4">row 4</tr> </table>
now, here how can parse it:
>>> row in response.css("tr[class^=row]"): ... header_text = row.xpath("preceding-sibling::tr[starts-with(@class, 'header')][1]/text()").extract_first() ... row_text = row.xpath("text()").extract_first() ... print(header_text, row_text) ... (u'header 1', u'row 1') (u'header 1', u'row 2') (u'header 1', u'row 3') (u'header 2', u'row 4')
Comments
Post a Comment