This post is really a note-to-self on the differences between the RANK and ROW_NUMBER analytic functions in Oracle SQL and why you don’t need to fiddle about with the order by clause of ROW_NUMBER to persuade it to generate unique values.
Now, I had planned to start by waxing lyrical about the simplicity inherent in that classic of Italian design, the Coffee Moka then segue into a demonstration of how these functions differ using the Fibonacci Sequence.
Despite it’s name, it turns out that the Fibonacci Sequence isn’t Italian at all, having actually originated in India.
As I’d already come up with the Fibonacci example, I’ll persist with it but, in a “deft and seamless pivot”, I’ll then use the IPL as the basis of further examples.
Full disclosure – the IPL team I follow is Sunrisers Hyderabad, a consequence of having worked with a native of that fair city ( hello Bhargav :)…
ROW_NUMBER vs RANK
As is traditional in posts on this topic, I’ll start by comparing the two functions.
Incidentally, I’ve used a recursive subquery to generate the sequence values ( based on this example ), as I’ll undoubtedly need to lay hands on one the next time I need to use this technique…
Run this and you get :
In the second row, we can see how the output from each function differs. RANK will return the same value for rows that cannot be separated by the criteria in it’s order by clause wheras ROW_NUMBER will always return a unique value for each record.
Some Sensible Examples
We have the following table :
If we want to rank IPL Winning Teams by the number of titles won, then RANK() is perfect for the job :
For a listing of IPL winners ordered by the first year in which they won a title, ROW_NUMBER() is a better fit :
So far, that all seems quite straightforward, but what about…
Deduplicating with ROWID
I have seen ( and, I must confess, perpetrated) examples of RANK() being used for this purpose with ROWID being used as a “tie-breaker” to ensure uniqueness :
Whilst, on this occasion, we get the intended result, this query does have some issues.
For one thing, ROWID is not guaranteed to be a unique value, as explained in the documentation :
Usually, a rowid value uniquely identifies a row in the database. However, rows in different tables that are stored together in the same cluster can have the same rowid.
By contrast, ROW_NUMBER requires no such “tie-breaker” criteria and so is rather more reliable in this context :
Relative Performance
A simple performance test did not reveal any major difference in the performance of each function.
I tested against a comparatively modest dataset ( 52K rows) on an OCI Free Tier 19c Database.
To mitigate any caching effect, each query was run twice in succession with the second runtime being recorded here.
The test code was :
For both test queries the fastest runtime was 0.048 seconds and 43 rows were returned.