powershell - Fastest way to combine multiple csv files based on 1st column value -


so lets have 5 csv files (created in order 1 5) 8-10 columns each. each file has 300,000 (give or take) rows each.

each file should match value (unique) first column in every file, , combine records + column title(s). if files 2 through 5 not have it's value column 1 found in file1 (from column 1), entire row should excluded merging.

example below of 2 (out of 5) csv files...

file1

columntitle1,columntitle2,columntitle3,columntitle4,columntitle5,columntitle6,columntitle7,columntitle8,columntitle9,columntitle10 column1value3145,column2value1,column3value1,column4value1,column5value1,column6value1,column7valu1,column8value1,column9value1,column10value1 column1value573,column2value2,column3value2,column4value2,column5value2,column6value2,column7valu2,column8value2,column9value2,column10value2 column1value62317,column2value3,column3value3,column4value3,column5value3,column6value3,column7valu3,column8value3,column9value3,column10value3 column1value93,column2value4,column3value4,column4value4,column5value4,column6value4,column7valu4,column8value4,column9value4,column10value4 

file2:

columntitle1,columntitle11,columntitle12,columntitle13,columntitle14,columntitle15,columntitle16,columntitle17,columntitle18 column1value752789,column11value1,column12value1,column13value1,column14value1,column15value1,column16value1,column17value1,column18value1 column1value3145,column11value2,column12value2,column13value2,column14value2,column15value2,column16value2,column17value2,column18value2 column1value573,column11value3,column12value3,column13value3,column14value3,column15value3,column16value3,column17value3,column18value3 column1value832657,column11value4,column12value4,column13value4,column14value4,column15value4,column16value4,column17value4,column18value4 column1value62317,column11value5,column12value5,column13value5,column14value5,column15value5,column16value5,column17value5,column18value5 column1value93,column11value6,column12value6,column13value6,column14value6,column15value6,column16value6,column17value6,column18value6 column1value423568,column11value7,column12value7,column13value7,column14value7,column15value7,column16value7,column17value7,column18value7 

if merge these 2 files (2 out of 5) this:

columntitle1,columntitle2,columntitle3,columntitle4,columntitle5,columntitle6,columntitle7,columntitle8,columntitle9,columntitle10,columntitle11,columntitle12,columntitle13,columntitle14,columntitle15,columntitle16,columntitle17,columntitle18 column1value3145,column2value1,column3value1,column4value1,column5value1,column6value1,column7valu1,column8value1,column9value1,column10value1,column11value2,column12value2,column13value2,column14value2,column15value2,column16value2,column17value2,column18value2 column1value573,column2value2,column3value2,column4value2,column5value2,column6value2,column7valu2,column8value2,column9value2,column10value2,column11value3,column12value3,column13value3,column14value3,column15value3,column16value3,column17value3,column18value3 column1value62317,column2value3,column3value3,column4value3,column5value3,column6value3,column7valu3,column8value3,column9value3,column10value3,column11value5,column12value5,column13value5,column14value5,column15value5,column16value5,column17value5,column18value5 column1value93,column2value4,column3value4,column4value4,column5value4,column6value4,column7valu4,column8value4,column9value4,column10value4,column11value6,column12value6,column13value6,column14value6,column15value6,column16value6,column17value6,column18value6 

adding files 3 - 5 increase columns around 50 (give or take).

i'm not sure if quickest method, here logic thinking (which i'm not sure how using powershell):

  • go 1 file @ time match , merge file one
  • store file1 in variable
  • store file2 in variable
  • loop through lines in file1
  • \\\\ value1 in column1 file1 found in column1 file2
  • \\\\ append row file2 row in file1
  • \\\\ remove row file2 (lessen search during next loop iteration)
  • clear variable holding file2
  • store next file in variable
  • repeat loop find , append iterations

all roads lead rome. 1 of them is:

#hashtable store master-objects in $data = @{}  #import-csv -filter "mymasterlist.csv" | foreach-object { $data[$_.columntitle1] = $_ } #sampledata below @" columntitle1,columntitle2,columntitle3,columntitle4,columntitle5,columntitle6,columntitle7,columntitle8,columntitle9,columntitle10 column1value3145,column2value1,column3value1,column4value1,column5value1,column6value1,column7valu1,column8value1,column9value1,column10value1 column1value573,column2value2,column3value2,column4value2,column5value2,column6value2,column7valu2,column8value2,column9value2,column10value2 column1value62317,column2value3,column3value3,column4value3,column5value3,column6value3,column7valu3,column8value3,column9value3,column10value3 column1value93,column2value4,column3value4,column4value4,column5value4,column6value4,column7valu4,column8value4,column9value4,column10value4 "@ | convertfrom-csv | % { $data[$_.columntitle1] = $_ }  get-childitem -path "c:\myothercsvs" -filter "*.csv" | foreach-object { import-csv -path $_.fullname } | foreach-object {     $id = $_.columntitle1      #if row in masterlist     if($data.containskey($id)) {         #get matching object         $obj = $data[$id]         #foreach line in csv         $_.psobject.properties | where-object { $_.name -ne 'columntitle1' } | foreach-object {             #foreach property, add master-object             add-member -inputobject $obj -membertype noteproperty -name $_.name -value $_.value         }         #put modified object hashtable         $data[$id] = $obj     } }  $data.values | export-csv -path "mergedcsv.csv" -notypeinformation 

be sure pack memory large csv-files.


Comments