How to generate a schema from a CSV for a PostgreSQL Copy -


given csv several dozen or more columns, how can 'schema' created can used in create table sql expression in postgresql use copy tool?

i see plenty of examples copy tool, , basic create table expressions, nothing goes detail cases when have potentially prohibitive number of columns manual creation of schema.

if csv not excessively large , available on local machine csvkit simplest solution. contains number of other utilities working csvs, usefull tool know in general.

at simplest typing shell

$ csvsql myfile.csv 

will print out required create table sql command, can saved file using output redirection.

if provide connection string csvsql create table , upload file in 1 go:

$ csvsql --db "$my_db_uri" --insert myfile.csv 

there options specify flavor of sql , csv working with. documented in builtin help:

$ csvsql -h usage: csvsql [-h] [-d delimiter] [-t] [-q quotechar] [-u {0,1,2,3}] [-b]               [-p escapechar] [-z maxfieldsize] [-e encoding] [-s] [-h] [-v]               [--zero] [-y snifflimit]               [-i {access,sybase,sqlite,informix,firebird,mysql,oracle,maxdb,postgresql,mssql}]               [--db connection_string] [--query query] [--insert]               [--tables table_names] [--no-constraints] [--no-create]               [--blanks] [--no-inference] [--db-schema db_schema]               [file [file ...]]  generate sql statements 1 or more csv files, create execute statements directly on database, , execute 1 or more sql queries. positional arguments:   file                  csv file(s) operate on. if omitted, accept                         input on stdin.  optional arguments:   -h, --help            show message , exit   -d delimiter, --delimiter delimiter                         delimiting character of input csv file.   -t, --tabs            specifies input csv file delimited                         tabs. overrides "-d".   -q quotechar, --quotechar quotechar                         character used quote strings in input csv file.   -u {0,1,2,3}, --quoting {0,1,2,3}                         quoting style used in input csv file. 0 = quote                         minimal, 1 = quote all, 2 = quote non-numeric, 3 =                         quote none.   -b, --doublequote     whether or not double quotes doubled in input                         csv file.   -p escapechar, --escapechar escapechar                         character used escape delimiter if --quoting 3                         ("quote none") specified , escape                         quotechar if --doublequote not specified.   -z maxfieldsize, --maxfieldsize maxfieldsize                         maximum length of single field in input csv                         file.   -e encoding, --encoding encoding                         specify encoding input csv file.   -s, --skipinitialspace                         ignore whitespace following delimiter.   -h, --no-header-row   specifies input csv file has no header row.                         create default headers.   -v, --verbose         print detailed tracebacks when errors occur.   --zero                when interpreting or displaying column numbers, use                         zero-based numbering instead of default 1-based                         numbering.   -y snifflimit, --snifflimit snifflimit                         limit csv dialect sniffing specified number of                         bytes. specify "0" disable sniffing entirely.   -i {access,sybase,sqlite,informix,firebird,mysql,oracle,maxdb,postgresql,mssql}, --dialect {access,sybase,sqlite,informix,firebird,mysql,oracle,maxdb,postgresql,mssql}                         dialect of sql generate. valid when --db                         not specified.   --db connection_string                         if present, sqlalchemy connection string use                         directly execute generated sql on database.   --query query         execute 1 or more sql queries delimited ";" ,                         output result of last query csv.   --insert              in addition creating table, insert                         data table. valid when --db                         specified.   --tables table_names  specify 1 or more names tables                         created. if omitted, filename (minus extension) or                         "stdin" used.   --no-constraints      generate schema without length limits or null                         checks. useful when sampling big tables.   --no-create           skip creating table. valid when --insert                         specified.   --blanks              not coerce empty strings null values.   --no-inference        disable type inference when parsing input.   --db-schema db_schema                         optional name of database schema create table(s)                         in. 

several other tools schema inference including:

  • apache spark
  • pandas (python)
  • blaze (python)
  • read.csv + favorite db package in r

each of these have functinality read csv (and other formats) tabular data structure called dataframe or similar, inferring column types in process. have other commends either write out equivalent sql schema or upload dataframe directly specified database. choice of tool depend on volume of data, how stored, idiosyncracies of csv, target database , language prefer work in.


Comments

Popular posts from this blog

routing - AngularJS State management ->load multiple states in one page -

python - GRASS parser() error -

json - Gson().fromJson(jsonResult, Myobject.class) return values in 0's -