How to find items in csv file A and not in file B and vice versa using python

924 views

Let's say I want to compare two csv files: file A and file B. They are both similarly built - the first column has product IDs (one product per row) and the columns provide some stats about the products such as sales in # and $.

I want to compare these files - see which product IDs appear in the first column of file A and not in B, and which in B and not A.
Finally, it would be very great if the result could be written into two new CSV files - one product ID per row in the first column. (no other data in the other columns needed)

This is the script I tried:

import csv

#open CSV's and read first column with product IDs into variables pointing to lists
A = [line.split(',')[0] for line in open('Afile.csv')]
B = [line.split(',')[0] for line in open('Bfile.csv')]

#create variables pointing to lists with unique product IDs in A and B respectively 
inAnotB = list(set(A)-set(B))
inBnotA = list(set(B)-set(A))

print inAnotB
print inBnotA

c = csv.writer(open("inAnotB.csv", "wb"))
c.writerow([inAnotB])

d = csv.writer(open("inBnotA.csv", "wb"))
d.writerow([inBnotA])

print "done!"

But it doesn't produce the required results.
It prints IDs in this format:

247158132n

and nothing to the csv files.

posted Jun 18, 2013 by anonymous

Share this question

1 Answer

Python reads lines from a file with the trailing newline included, and line.split(",") with only one column (i. e. no comma) keeps the whole line. As you already know about the csv module you should use it to read your data, e. g. instead of

> A = [line.split(',')[0] for line in open('Afile.csv')]

try

with open("Afile.csv", "rb") as f:
 a = {row[0] for row in csv.reader(f)}
...

I used {...} instead of [...], so a is already a set and you can proceed:

in_a_not_b = a - b

Finally as a shortcut for

for item in in_a_not_b:
 writer.writerow([item])

use the writerows() method to write your data:

with open("inAnotB.csv", "wb") as f:
 writer = csv.writer(f)
 writer.writerows([item] for item in_a_not_b)

Note that I'm wrapping every item in the set rather than the complete set as a whole. If you wanted to be clever you could spell that even more succinct as

 writer.writerows(zip(in_a_not_b))

answer Jun 18, 2013 by anonymous

Similar Questions

+2 votes

Python: What is dictreader() and reader() function call while reading from a csv file?

+2 votes

How to read CSV file in python?

Is there a better way to do that:

def Read_CSV_File(filename):
 file = open(filename, "r")
 reader = csv.DictReader(file)
 line = 1
 for row in reader:
 if line < 6:
 reader.next()
 line++
# process the CSV

+2 votes

If a=[1,2,3,4,5] and b=[1,2,3,4,5] then,what a and b are same or not;(In Python)

+1 vote

You have an IP address in your network how will you find hostname and vice versa?

+3 votes

Socket send file from android to server and vice versa

I want to send a file (video) from android to server and vice versa. And I want the setup to work for N number of devices (more than 1). I assume we can use socket, yet I am not sure if using Socket for this requirement is the best way. Thanks in advance.

...

How to find items in csv file A and not in file B and vice versa using python

Your comment on this post:

1 Answer

Your comment on this answer:

Your answer

Preview