This is how I did it.
The first line in the iPython code snippet creates some play data. It is a DataFrame with 5 columns and 10 rows. We have a quick look at our play data on line 2.
I created a function to return the difference between highest value in a Pandas Series and the next or equal highest value. This function is at 3.
Next, I applied that function to each row in the DataFrame, ranked the result, and returned the rank as an integer. Then, I added that Series of ranks as a new column in the DataFrame, This is on 4.
Almost done, at 5 we sort the DataFrame by its rank.
Finally at 6 we have a quick look at the results.You can see the original line position from the index column at the left. You can see that two rows have the rank of 7th place.
In [1]: df = pd.DataFrame(np.random.randn(10, 5) * 10 + 50).astype(int) # play data In [2]: df Out[2]: 0 1 2 3 4 0 48 52 56 38 66 1 60 65 52 38 48 2 50 61 50 38 53 3 67 57 43 50 50 4 54 56 54 60 51 5 55 44 49 62 53 6 41 48 53 51 44 7 48 43 60 42 48 8 38 58 75 62 58 9 30 58 44 57 53 In [3]: def differand(x): ...: y = x.sort_values(ascending=False) ...: return y.iat[0] - y.iat[1] # first position minus second ... ...: In [4]: df['rank'] = df.apply(differand, axis=1).rank(method='min').astype(int) In [5]: df.sort_values('rank', inplace=True) In [6]: df Out[6]: 0 1 2 3 4 rank 9 30 58 44 57 53 1 6 41 48 53 51 44 2 4 54 56 54 60 51 3 1 60 65 52 38 48 4 5 55 44 49 62 53 5 2 50 61 50 38 53 6 0 48 52 56 38 66 7 3 67 57 43 50 50 7 7 48 43 60 42 48 9 8 38 58 75 62 58 10
More concisely, without the iPython, we have:
df = pd.DataFrame(np.random.randn(10, 5) * 10 + 50).astype(int) # play data def differand(x): """Receive a Pandas Series and return the difference between the highest value in the Series and the next (or equal) highest value in that Series.""" y = x.sort_values(ascending=False) return y.iat[0] - y.iat[1] df['rank'] = df.apply(differand, axis=1).rank(method='min').astype(int) df.sort_values('rank', inplace=True)
No comments:
Post a Comment