different = (df1 != df2).any(axis=1)
Let's have a look in iPython. We start by creating two play DataFrames, each with 10 rows and 5 columns (lines 1-3). We change the first two rows on the second DataFrame (line 4). Then we do the above Boolean test for difference (at line 5). This yields a Boolean series (line 6), which we can use to look at the changed rows (line 7).
In [1]: df1 = pd.DataFrame(np.random.randn(10, 5) * 10 + 50).astype(int) # make play data In [2]: df1 # let's look at the play data Out[2]: 0 1 2 3 4 0 80 54 36 41 43 1 41 55 66 52 59 2 49 56 61 35 42 3 72 67 53 58 43 4 51 45 76 64 43 5 61 22 33 44 59 6 49 51 52 47 50 7 33 56 56 39 39 8 52 27 57 19 64 9 45 64 53 43 59 In [3]: df2 = df1.copy() # copy the play data In [4]: df2[0:2] = df2[0:2] + 1 # add 1 to each element in the first two rows In [5]: different = (df1 != df2).any(axis=1) # test for differences in the rows In [6]: different Out[6]: 0 True 1 True 2 False 3 False 4 False 5 False 6 False 7 False 8 False 9 False dtype: bool In [7]: df2[different] # these are the rows that were changed Out[7]: 0 1 2 3 4 0 81 55 37 42 44 1 42 56 67 53 60
I should note, this test assumes that the two DataFrames being compared have identically labelled row index and columns. This will be the case if the DataFrames are being sliced from a pandas Panel.
No comments:
Post a Comment