How to generate a schema from a CSV for a PostgreSQL Copy -
given csv several dozen or more columns, how can 'schema' created can used in create table sql expression in postgresql use copy tool?
i see plenty of examples copy tool, , basic create table expressions, nothing goes detail cases when have potentially prohibitive number of columns manual creation of schema.
if csv not excessively large , available on local machine csvkit simplest solution. contains number of other utilities working csvs, usefull tool know in general.
at simplest typing shell
$ csvsql myfile.csv will print out required create table sql command, can saved file using output redirection.
if provide connection string csvsql create table , upload file in 1 go:
$ csvsql --db "$my_db_uri" --insert myfile.csv there options specify flavor of sql , csv working with. documented in builtin help:
$ csvsql -h usage: csvsql [-h] [-d delimiter] [-t] [-q quotechar] [-u {0,1,2,3}] [-b] [-p escapechar] [-z maxfieldsize] [-e encoding] [-s] [-h] [-v] [--zero] [-y snifflimit] [-i {access,sybase,sqlite,informix,firebird,mysql,oracle,maxdb,postgresql,mssql}] [--db connection_string] [--query query] [--insert] [--tables table_names] [--no-constraints] [--no-create] [--blanks] [--no-inference] [--db-schema db_schema] [file [file ...]] generate sql statements 1 or more csv files, create execute statements directly on database, , execute 1 or more sql queries. positional arguments: file csv file(s) operate on. if omitted, accept input on stdin. optional arguments: -h, --help show message , exit -d delimiter, --delimiter delimiter delimiting character of input csv file. -t, --tabs specifies input csv file delimited tabs. overrides "-d". -q quotechar, --quotechar quotechar character used quote strings in input csv file. -u {0,1,2,3}, --quoting {0,1,2,3} quoting style used in input csv file. 0 = quote minimal, 1 = quote all, 2 = quote non-numeric, 3 = quote none. -b, --doublequote whether or not double quotes doubled in input csv file. -p escapechar, --escapechar escapechar character used escape delimiter if --quoting 3 ("quote none") specified , escape quotechar if --doublequote not specified. -z maxfieldsize, --maxfieldsize maxfieldsize maximum length of single field in input csv file. -e encoding, --encoding encoding specify encoding input csv file. -s, --skipinitialspace ignore whitespace following delimiter. -h, --no-header-row specifies input csv file has no header row. create default headers. -v, --verbose print detailed tracebacks when errors occur. --zero when interpreting or displaying column numbers, use zero-based numbering instead of default 1-based numbering. -y snifflimit, --snifflimit snifflimit limit csv dialect sniffing specified number of bytes. specify "0" disable sniffing entirely. -i {access,sybase,sqlite,informix,firebird,mysql,oracle,maxdb,postgresql,mssql}, --dialect {access,sybase,sqlite,informix,firebird,mysql,oracle,maxdb,postgresql,mssql} dialect of sql generate. valid when --db not specified. --db connection_string if present, sqlalchemy connection string use directly execute generated sql on database. --query query execute 1 or more sql queries delimited ";" , output result of last query csv. --insert in addition creating table, insert data table. valid when --db specified. --tables table_names specify 1 or more names tables created. if omitted, filename (minus extension) or "stdin" used. --no-constraints generate schema without length limits or null checks. useful when sampling big tables. --no-create skip creating table. valid when --insert specified. --blanks not coerce empty strings null values. --no-inference disable type inference when parsing input. --db-schema db_schema optional name of database schema create table(s) in. several other tools schema inference including:
- apache spark
- pandas (python)
- blaze (python)
- read.csv + favorite db package in r
each of these have functinality read csv (and other formats) tabular data structure called dataframe or similar, inferring column types in process. have other commends either write out equivalent sql schema or upload dataframe directly specified database. choice of tool depend on volume of data, how stored, idiosyncracies of csv, target database , language prefer work in.
Comments
Post a Comment