Friday 13 June 2014

Merge Two CSV Into One CSV

Hello my so lovely friends,

So how are your days going on? Hope you guys doing some experimenting. NOOO.. not an issue.

During my office hours I met with a problem to merge two CSV's into one on the basis of the common header, no need to include those records with mis match headers with another CSV. After experimenting finally able to write a script in python. So here is the code for the same. Hope if any one is facing or will face same issue can use this script.
Link for the script: https://gist.github.com/anupamshakya7/60607c86a7dbe9853843

So here is the script. Enjoy my friends and have a good day :)

import csv
import os

if __name__ == '__main__':

    """This script will merge two CSV with common data as per the ID as a field
    into new CSV.

    ::NOTE:: This program tested on 2.7.6 version of python.

    For an example Test1.csv have this data:
    ----------------------------------------
    ID,name,add,school
    1,anupam,FHG,sch1
    2,aditya,DFR,sch2
    3,urmila,HJTY,sch3
    4,roop,GHG
    5,EASH,HJJ

    Test2.csv have this data:
    ------------------------
    ID,Key,Value
    1,x-jdkj,100
    2,k-djsh,200
    3,j-jdjd,300

    Resultant CSV have this data:
    -----------------------------
    add,school,ID,name,Value,Key
    FHG,sch1,1,anupam,100,x-jdkj
    HJTY,sch3,3,urmila,300,j-jdjd
    DFR,sch2,2,aditya,200,k-djsh


    How to run
    ----------
    On command promt go to the location where you placed your script(python script)
    And issue this command 'python CSV.py'

    After that it will ask for the absolute path where you have placed your two CSV with ID
    as a common field(in header part).

    ::NOTE:: Please provide absolute path for folder like - C:\Users\hv3775\Desktop\TEST

    That's. It will generate new CSV into the same folder that you have provided as a input path.
    """

    input_folder_location = raw_input('Enter location where your required two csv files placed. NOTE:: Please enter absolute path: ')
    infiles = []
   
    for csv_file in os.listdir(input_folder_location):
        if csv_file.endswith('.csv'):
            infiles.append(csv_file)
   
    data = {}
    fields = []

    for fname in infiles:
        with open(fname, 'rb') as df:
            reader = csv.DictReader(df)
            for line in reader:
                # assuming the field is called ID
                if line['ID'] not in data:
                    data[line['ID']] = line
                else:
                    for k,v in line.iteritems():
                        if k not in data[line['ID']]:
                            data[line['ID']][k] = v
                for k in line.iterkeys():
                    if k not in fields:
                        fields.append(k)
            del reader

    data_list = []
   
    for d in data.items():
        if len(d[1].values()) != len(fields):
            continue
        data_list.append(d[1])  
   
    csv_output_location = os.path.join(input_folder_location, 'Result.csv')
    with open(csv_output_location, 'wb') as f:
        w = csv.DictWriter(f, fields)
        w.writeheader()
        w.writerows(data_list)


1 comment:

  1. Clear python script. Really helpful...


    Thank u so much.

    ReplyDelete