Question: R: Rank-function with two variables and ties.method random

Question

R: Rank-function with two variables and ties.method random

Answers 2
Added at 2016-12-19 01:12
Tags
Question

Is there a way in R to use the rank function (or something similar) with multiple criteria and a ties.method?

Normally rank is used to rank values in a vector and if there are ties you can use one of the ties methods ("average", "random", "first", ...). But when ranking a column in a matrix, I would like to use multiple columns and one of the ties methods.

A minimal example:

x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(1, 4, 5, 5, 2, 8 ,8, 1,3, 3)
z <- c(0.2, 0.8, 0.5, 0.4, 0.2, 0.1, 0.1, 0.7, 0.3, 0.3)
m <- cbind(x=x,y=y, z=z)

Imagine I want to rank the y-values in the above matrix. But if there are ties, I want the function to look at the z-values. If there still are ties after that, then I want to use the ties.method = "random"-parameter.

In other words, a possible outcome could be:

       x y   z
 [1,]  1 1 0.2
 [2,]  8 1 0.7
 [3,]  5 2 0.2
 [4,]  9 3 0.3
 [5,] 10 3 0.3
 [6,]  2 4 0.8
 [7,]  4 5 0.4
 [8,]  3 5 0.5
 [9,]  6 8 0.1
[10,]  7 8 0.1

But it could also be this:

       x y   z
 [1,]  1 1 0.2
 [2,]  8 1 0.7
 [3,]  5 2 0.2
 [4,] 10 3 0.3
 [5,]  9 3 0.3
 [6,]  2 4 0.8
 [7,]  4 5 0.4
 [8,]  3 5 0.5
 [9,]  7 8 0.1
[10,]  6 8 0.1

Notice how the fourth and the fifth row are different (just as the ninth and the tenth). The above outcome I've been able to get with the order-function (i.e. m[order(m[,2], m[,3], sample(length(x))),], but I'd like to receive the rank-values, not the indices of a sorted matrix.

If you need elaboration on why I need the rank-values, feel free to ask and I'll edit the question with extra details. For now I think the minimal example will do.

EDIT: Changed dataframe to matrix as @alistaire pointed out.

Answers to

R: Rank-function with two variables and ties.method random

nr: #1 dodano: 2016-12-19 01:12

Sorry, I misunderstood your question originally. I think that this is what you want. I made one minor change. Specifically, I made your variable df a data frame, not just a matrix.

x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(1, 4, 5, 5, 2, 8 ,8, 1,3, 3)
z <- c(0.2, 0.8, 0.5, 0.4, 0.2, 0.1, 0.1, 0.7, 0.3, 0.3)
df <- data.frame(x=x,y=y, z=z)

TM = "last"     ## Your desired ties method here.
df[rank(df$z, ties.method=TM),] = df
df = df[order(df$y),]
df
    x y   z
4   1 1 0.2
9   8 1 0.7
3   5 2 0.2
5  10 3 0.3
6   9 3 0.3
10  2 4 0.8
7   4 5 0.4
8   3 5 0.5
1   7 8 0.1
2   6 8 0.1

You could use any of the ties methods available in rank, but I chose to use "last" here so that it emphasized that it made the order switch.

nr: #2 dodano: 2016-12-19 02:12

Since order(order(x)) gives the same result as rank(x) (see Why does order(order(x)) equal rank(x) in R?), you could just do

order(order(y, z, runif(length(y))))

to get the rank values.


Here's a more involved approach that allows you to use methods from ties.method. It requires dplyr:

library(dplyr)
rank2 <- function(df, key1, key2, ties.method) {
  average <- function(x) mean(x)
  random <- function(x) sample(x, length(x))
  df$r <- order(order(df[[key1]], df[[key2]]))
  group_by_(df, key1, key2) %>% mutate(rr = get(ties.method)(r))  
}

rank2(df, "y", "z", "average")
# Source: local data frame [10 x 5]
# Groups: y, z [8]
#        x     y     z     r    rr
#    <dbl> <dbl> <dbl> <int> <dbl>
# 1      1     1   0.2     1   1.0
# 2      2     4   0.8     6   6.0
# 3      3     5   0.5     8   8.0
# 4      4     5   0.4     7   7.0
# 5      5     2   0.2     3   3.0
# 6      6     8   0.1     9   9.5
# 7      7     8   0.1    10   9.5
# 8      8     1   0.7     2   2.0
# 9      9     3   0.3     4   4.5
# 10    10     3   0.3     5   4.5
Source Show
◀ Wstecz