python re.findall did not find all -


content='<tr><td style="text-align:center;" height="30">12090043</td>'+\         '<td style="text-align:left;">coursea</td>'+\         '<td style="text-align:center;">3</td>'+\         '<td style="text-align:left;">86</td><td>2013-summer</td></tr>'+\         '<tr><td style="text-align:center;" height="30">10420844</td>'+\         '<td style="text-align:left;">courseb</td>'+\         '<td style="text-align:center;">4</td>'+\         '<td style="text-align:left;">98</td><td>2013-autumn</td></tr>' pattern=re.compile('<tr>.*"30">(.*)</td>.*"text-align:left;">(.*)</td>.*"text-align:center;">(.*)</td>.*"text-align:left;">(.*)</td><td>(.*)</td></tr>') items=re.findall(pattern,content) print items 

the output is:

[('10420844', 'courseb', '4', '98', '2013-autumn')] 

but expected result is:

[('12090043', 'coursea', '3', '86', '2013-summer'),('10420844', 'courseb', '4', '98', '2013-autumn')] 

actually code returns last match, if there more 2 matches. can tell me why happening? sorry long code , in advance!

you can beautifulsoup below:

>>> bs4 import beautifulsoup >>> content = """ ... <tr> ...     <td style="text-align:center;" height="30">12090043</td> ...     <td style="text-align:left;">coursea</td> ...     <td style="text-align:center;">3</td> ...     <td style="text-align:left;">86</td><td>2013-summer</td> ... </tr> ...  ... <tr> ...     <td style="text-align:center;" height="30">10420844</td> ...     <td style="text-align:left;">courseb</td> ...     <td style="text-align:center;">4</td> ...     <td style="text-align:left;">98</td><td>2013-autumn</td> ... </tr> ... """ >>>  >>> soup = beautifulsoup(content, "html.parser") >>> [i.get_text(' ').split() in soup.find_all('tr')] [['12090043', 'coursea', '3', '86', '2013-summer'], ['10420844', 'courseb', '4', '98', '2013-autumn']] 

regex isn't correct tool parse html. don't try debug code, instead, totally drop , use html parser above example (beautifulsoup).


Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -