Read Excel file without signature in PHP -
question: how 1 read or modify excel file without signature allow php parse properly?
for project, want automatically download , read excel file national volleyball association (nevobo) using php. downloading goes fine. reading not. issue seems related fact there's no signature in first 8 bytes tell phpexcel ole file, such phpexcel identifies csv file, not. excel can open file force me save in different format.
i have downloaded files same source (different content though), lack signature. on these files have managed filter control characters (\x00
thru \xff
) in php , automatically create new row when sees date (since in column a), unfortunately didn't work file.
function cleanpart ( $part ) { $part = trim(preg_replace('/[\x00\x01\x03-\x0a\x0d-\x1f\x80-\xff]/', '', trim($part, ' ')), ' '); $part = preg_replace('/\x0b/', "\x0c", $part); $part = preg_replace('/\"/', "\x0c", $part); $part = preg_replace('/\x0c+/', "\x0c", $part); $part = preg_replace('/\x0c\x02/', "\x0c", $part); if ( $part == "\x02\x0c" || $part == "\x02\x0b" ) return false; $part = trim(preg_replace('/[\x00-\x1f\x80-\xff]/', "\x02", $part), ' '); $part = trim(preg_replace('/\x02+/', "\x02", $part), ' '); $part = trim(preg_replace('/[\x00\x01\x03-\x1f\x80-\xff]/', '', $part), ' '); if ( strlen($part) == 0 ) return false; $part = trim(preg_replace('/\x02/', "", $part), ' '); return $part; } foreach ( explode("\x04", preg_replace('!\x04+!', "\x04", $data)) $part ) { if ( ! ( $part = cleanpart($part) ) ) { continue; } // create array }
libreoffice read file excel file, must known format libreoffice, if file magic
identifies apple basic (!) , other utilities targa (which means little more "binary data length multiple of three").
however, this delimited text format. possibly word processor format , strange characters control characters tabulation , typefacing?
to convert more reliably in csv type, can replace control sequences tabulations, skipping first 12 characters. control sequences appear 12 bytes long, prefixed \x04 \x02, so:
$clean = preg_replace('#\\x04\\x02..........#ms', "\t", substr($dirty, 24));
(i have skipped first control sequence too, giving 12+12 = 24 byte skip).
you can split field chunks, php csv parse functions should able work, 20 fields per row.
i cannot use csv parse using sequences delimiter because sequences different throughout file. include carriage returns, forces use whitespace/line modifier in regex.
this parser appears work:
<?php $clean = preg_split( '#\\x04\\x02..........#ms', substr(file_get_contents('excelgen.xls'), 24) ); $rows = array(); while (!empty($clean)) { $rows[] = array_splice($clean, 0, 20); } // $header = array_shift($rows); print_r($rows);
yields:
array ( [0] => array ( [0] => datum [1] => tijd [2] => team thuis [3] => team uit [4] => locatie [5] => veld [6] => regio [7] => poule [8] => code [9] => zaal code [10] => zaal [11] => plaats [12] => eerste scheidsrechter [13] => tweede scheidsrechter [14] => rapporteur / begeleider / jurylid [15] => lijnrechter 1 [16] => lijnrechter 2 [17] => lijnrechter 3 [18] => lijnrechter 4 [19] => reserve ... ... [54] => array ( [0] => 2016-04-23 [1] => 19:30 [2] => ecare apollo 8 hs 1 [3] => lycurgus hs 2 [4] => de veste, borne [5] => 1 [6] => nationaal [7] => 1ah [8] => al [9] => bneve [10] => de veste [11] => borne ... )
Comments
Post a Comment