In ECDSA, inherent signature malleability is a byproduct of the structure we use to verify a digital signature. It allows anyone to modify the signature in a specific way without access to the private key yet the signature remains perfectly valid. More concretely, for every single signature there are is a perfectly valid second signature. One that is easy to calculate and use. In the Bitcoin world, this can create problems if the signature is part of the data used to generate the transaction identifier.
Bitcoin addresses this in BIP 146. I've always accepted the solutions without spending too much time thinking about it. In reviewing some notes, I started a deeper dive to fully understand why malleability exists.
So without further ado, here is a brain dump for posterity. The one caveat: I am not a cryptographer and it has been 20 years since I've taken a math class, so excuse the layman explanation and feel free to comment, correct, or enlighten in the comments.
Let's first start with a simple discussion of ECDSA to establish a common use of variables throughout the discussion. This will be quick as I'm assuming the reader has some familiarity elliptic curve cryptography. If not, I recommend brushing up in one of the many great articles out there.
First thing we need to consider is that there is a finite set of numbers in use called a finite field. The field is the size of a prime number. For the sake of this discussion, we called the prime number p and define the field is Fp={0,1,...,p-1}.
Next we consider an elliptic curve. Elliptic curves have the equation y2=x^3+ax+b. Elliptic curves reflect over the x-axis. Meaning the same x-coordinate has two possible y-coordinates.
For elliptic curve cryptography, we create an elliptic curve over a finite field and we use point addition. Point addition can be thought of as taking two points of the curve, finding the slope of the line created by those points, finding the third point on the curve, then reflecting it across the x-axis. When the same point is used, we take the tangent of the point, take that line and find the other intersection point on the curve and reflect it. This forms the basic operation of point addition, of which we can repeatedly add the same point to achieve scalar multiplication.
So if we take a point on the curve G, we'll call it our generator point, and perform scalar multiplication on it we end up with the equation:
e is some scalar value
G is our generator point
P is the resulting point on the curve (with x and y coordinate), this is the public key!
The interesting thing about this is that we reach some scalar value n where the resulting value ends up invalid.
As a result we require our scalar to be between 0 and n: 0 <= e < n.
Putting it all together elliptic curve definition for secp256k1, used by Bitcoin, looks like this:
p = 2256-232-977
= 0xfffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2f
a = 0
b = 7
Gx = 0x79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798
Gy = 0x483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8
n = 0xfffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364141
The private key is simply some large number e. For Bitcoin, we generate a 256-bit value between 1 and n-1.
The corresponding public key is a point on the curve that is generated using our equation eG=P.
ECDSA
So with the basics of ECC discussed, lets briefly refresh ECDSA.
An ECDSA signature consists of two parts, r and s, both of which are 256-bit numbers in the case of secp256k1. This is often denoted as the pair (r, s).
The general signature equation is
where
u = z/s
v = r/s
G is our trusty generator point
P is our public key point, remember P=eG where e is our private key.
k is some arbitrary unguessable and never reused value
z is the 32-byte hash of the message that we're signing
r is pretty easy to generate. It is just the x-coordinate of right half of the equation, kG. So we get point R from the equation R=kG, extract the x-coordinate so that r = Rx. Notice that we're not concerning ourselves with the y-coordinate of this point. Remember that for now.
Next we need s. If we take our signature equation and some basic algebra we end up with
With knowledge of z, k, and e we can generate both pieces of the signature r and s. We can then provide (r, s), z, and P to another party and they can verify it.
Verifying
To verify, we need
z the hash of the message that was signed
P their public key corresponding to the private key used to sign the message
(r,s) the signature
To verify we simply reconstruct u and v and plug values into our equation. Remember that:
u = z/s
v = r/s
We then use the equation uG + vP and end up with some point R. If the x-coordinate of R matches the provided value r in the signature then we have a signature match!
Astute readers will note that we're only comparing the x-coordinate to verify the signature. As we recall from our refresh of elliptic curves there are two y-coordinates for every x-coordinate. This is the source of our malleability.
Signature Malleability Maths
As we just saw, to verify a signature we only verify the x-coordinate of the resulting equation. This means that there are two valid points: (x, y) and (x, p-y).
In the Bitcoin world, BIP146 addresses the malleability by requiring the signature use the low-s value. This means that the s value must be less than n / 2 and we can trivially find the low s value with n-s.
If you're a little lost with these last two paragraphs, you're reached the point that started me on this entire journey! So why is it n-s and not p-s? Why does this translate to the points y and p-y? After working through the maths we'll see the answer. I'll describe it and then we'll show some code output to make it concrete.
If you recall, p is the size of the finite field. So p is ultimately driving the values of x and y. That is we can't have an x or y value that is larger than p.
Meanwhile, n is the size of the finite cyclic group. Remember from our equation eG=P that e must be less than n. So n is controlling how we scalar multiple against the generator point.
So we can see that n is controlling the scalars used in the left side of our equation eG=P and p is controlling the right side of the equation. Skeptical? Let's take a look at some concrete examples.
First thing, let's consider some point x. We know that we have two possible y positions y and p-y.
So using our simple private key/public key equation eG=P let's validate that:
We start by taking a scalar value e as one private key then subtracting it from n to create a second private key.
The public keys for each should share the same x value and the y values should be be y and p-y.
We first validate that both public keys share the same x value. Sure enough, both public keys have an x value of 79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798.
Next we need to verify that y values. Again, verifying y and p-y also checks out 483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8 == fffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2f-b7c52588d95c3b9aa25b0403f1eef75702e84bb7597aabe663b82f6f04ef2777!
So this is pretty cool. We can see that we can invert either the scalar or the point side of the equation using n or p respectively. Furthermore, if we invert one side, we know the equation needed to invert the other side.
At this point, it should be starting to make sense why we use n-s, but we'll work through the equations again and show a concrete example.
If we look at the equation for a digital signature, we should be able to verify that using n-s creates the a point p-y as with the following two equations:
Recall that with digital signatures, we're only verifying the x coordinate. More importantly with malleability, we are able to change the signature without knowledge of the private key. As such z, P, and obviously G remain unchanged. So our only option is to modify s which is also conveniently part of the signature data (remember the signature consists of (r, s).
From our previous exercise this should all make sense. Since s is a scalar value being multiplied by the generator point G, we invert it using n. If we do that, we should expect the right side of the equation, the point, to have it's y value flipped equivalently with p-y.
For a concrete example, let's start by generating some value z that we're going to sign along with the private key and public key.
Next we create our first signature
Next we create convert that original signature's s values by subtracting it from n
As you can see, both signatures have the same r value but we have different (inverted) s values. Finally to test our theory of malleability, we should see if we can verify both signatures with the same pubkey P and message digest z.
Final Thoughts
We've shown how you can invert the scalar and point sides of the equation and how signature malleability works. I will leave one final thought as an exercise for the reader.
In our malleability example we saw how the r value of the signature can come from either either point (x, y) or (x, p-y). Naturally we see that using private key e in the signature creation results in point (x, y). What is the natural way to obtain point (x, p-y)? Or more succinctly, what private key do we use to obtain the signature of z that results in (x, p-y)?
If you enjoyed this, feel free to check out the "from scratch" ECC library. The malleability code can be viewed in these tests.