haskell - Using pipes-csv to parse Latin-1 encoded content? -
i'd use pipes-csv parse large csv files, turns out these csv files latin-1 encoded , turns out pipes-csv, , cassava library depends on, assume utf-8. ends producing parsing errors need handle.
the way i've approached duplicate records hold csv data text fields bytestring fields in dup. decode dup, manually translate latin-1 strings utf-8 , create final record. inelegant least.
is there better way?
per daniel's suggestion, here have far:
import qualified pipes.text.encoding pte import qualified pipes.bytestring pb withfile "file.csv" readmode $ \h -> let transcode = pte.decodeiso8859_1 . pb.fromhandle ~> pte.encodeutf8 runeffect $ decodebyname (void . transcode $ h) >-> process it trades off unnecessary records unnecessary re-encoding of text, it's improvement. don't suppose there way without doing either of these unnecessary things?
Comments
Post a Comment