different = (df1 != df2).any(axis=1)
Let's have a look in iPython. We start by creating two play DataFrames, each with 10 rows and 5 columns (lines 1-3). We change the first two rows on the second DataFrame (line 4). Then we do the above Boolean test for difference (at line 5). This yields a Boolean series (line 6), which we can use to look at the changed rows (line 7).
In [1]: df1 = pd.DataFrame(np.random.randn(10, 5) * 10 + 50).astype(int) # make play data
In [2]: df1 # let's look at the play data
Out[2]:
0 1 2 3 4
0 80 54 36 41 43
1 41 55 66 52 59
2 49 56 61 35 42
3 72 67 53 58 43
4 51 45 76 64 43
5 61 22 33 44 59
6 49 51 52 47 50
7 33 56 56 39 39
8 52 27 57 19 64
9 45 64 53 43 59
In [3]: df2 = df1.copy() # copy the play data
In [4]: df2[0:2] = df2[0:2] + 1 # add 1 to each element in the first two rows
In [5]: different = (df1 != df2).any(axis=1) # test for differences in the rows
In [6]: different
Out[6]:
0 True
1 True
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
dtype: bool
In [7]: df2[different] # these are the rows that were changed
Out[7]:
0 1 2 3 4
0 81 55 37 42 44
1 42 56 67 53 60
I should note, this test assumes that the two DataFrames being compared have identically labelled row index and columns. This will be the case if the DataFrames are being sliced from a pandas Panel.
No comments:
Post a Comment