Signed divisions (Al Viro; Linus Torvalds)

Index Home About Blog

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Signed divides vs shifts (Re: [Security] /dev/urandom uses uninit
Date: Mon, 17 Dec 2007 17:31:14 UTC
Message-ID: <fa.LuZQ+MYyY3weu45IA2ZkxhjrNp0@ifi.uio.no>

On Sat, 15 Dec 2007, Herbert Xu wrote:
>
> There ought to be a warning about this sort of thing.

We could add it to sparse. The appended (untested) patch seems to say
there's a lot of those signed divides-by-power-of-twos.

However, the problem with such warnings is that it encourages people to do
the simple fix that may be *wrong*. For example, you fixed it with patches
like

> -		int rsvd = r->limit ? 0 : random_read_wakeup_thresh/4;
> +		int rsvd = r->limit ? 0 : random_read_wakeup_thresh / 4u;

which is really quite dangerous for several reasons:

 - it depends intimately on the type of the thing being divided (try it:
   it will do nothing at all if the thing you divide is larger than
   "unsigned int", since then the "4u" will be turned into a _signed_
   larger type by the C type expansion).

   So in general, the above doesn't even do what it's supposed to do on a
   64-bit architecture if the thing to be divided is 64-bit!

 - it changes behaviour. If that thing really is signed and can be
   negative, that "trivial" patch just changed the divide to be
   fundamentally something totally different.

so I think this patch is horribly wrong.

The *correct* way to fix signed divisions is by doing one of two things:

 - really make the data we divide be unsigned. With all the thinking that
   involves!

   This is the good change, but it does involve making sure that there are
   no tests against zero and that the value really cannot go negative.
   Usually the unsigned types are (a) faster and (b) more robust, but if
   somebody is depending on signs, unsigned types are obviously not
   appropriate.

 - change a divide-by-power-of-2 into a signed shift instead.

   Yes, this also changes the end result for negative values, but it
   changes it in a sane and generally good way (ie it will still be a
   "valid" divide, it will just be a divide that rounds differently, and
   is more likely than turning it into an unsigned divide to generally
   result in working code).

Hmm?

			Linus
---
 simplify.c |   30 ++++++++++++++++++++++++++++++
 1 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/simplify.c b/simplify.c
index 94e14d2..91f1120 100644
--- a/simplify.c
+++ b/simplify.c
@@ -286,6 +286,36 @@ static int simplify_constant_rightside(struct instruction *insn)
 		if (!value)
 			return replace_with_pseudo(insn, insn->src2);
 		return 0;
+
+	case OP_DIVU: case OP_DIVS:
+		if (!value)
+			break;
+		if (value == 1)
+			return replace_with_pseudo(insn, insn->src1);
+		/* Power-of-two? */
+		if (!(value & (value-1))) {
+			int log2 = -1;
+			do {
+				log2++;
+				value >>= 1;
+			} while (value);
+
+			/* Turn unsigned divides into shifts */
+			if (insn->opcode == OP_DIVU) {
+				insn->src2->value = log2;
+				insn->opcode = OP_LSR;
+				return 1;
+			}
+
+			/*
+			 * This is incorrect, but we care more about
+			 * the warning than the code generation
+			 */
+			warning(insn->pos, "expensive signed divide");
+			insn->src2->value = log2;
+			insn->opcode = OP_ASR;
+			return 1;
+		}
 	}
 	return 0;
 }

From: Al Viro <viro@ftp.linux.org.uk>
Newsgroups: fa.linux.kernel
Subject: Re: Signed divides vs shifts (Re: [Security] /dev/urandom uses uninit 
	bytes, leaks user data)
Date: Mon, 17 Dec 2007 17:49:00 UTC
Message-ID: <fa.YRiHFilqtg8eTPK3ejZ+NNEAPq8@ifi.uio.no>

On Mon, Dec 17, 2007 at 09:28:57AM -0800, Linus Torvalds wrote:
>
>
> On Sat, 15 Dec 2007, Herbert Xu wrote:
> >
> > There ought to be a warning about this sort of thing.
>
> We could add it to sparse. The appended (untested) patch seems to say
> there's a lot of those signed divides-by-power-of-twos.

I'm not sure that you are warning about the right things.  If you want
a real nightmare scenario in that area, consider this:

	int x[20];

	int *p = x + n;
	int *q = x + m;

	p - q
	((char *)p - (char *)q)/4
	((char *)p - (char *)q)/sizeof(int)

The first two are equivalent on all targets we care about.  However, an
attempt to make the second one "more portable" silently creates the
code that'll do something entirely different as soon as we get m > n...

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [Security] Signed divides vs shifts (Re: /dev/urandom uses uninit
Date: Mon, 17 Dec 2007 18:32:23 UTC
Message-ID: <fa./4d1IlR9d31FuFHQzroIQ42536I@ifi.uio.no>

On Mon, 17 Dec 2007, Eric Dumazet wrote:
>
> while
>
> long *mid(long *a, long *b)
> {
> 	return ((a - b) / 2u + a);
> }

This is exactly what I'm talking about. That "2u" is TOTALLY POINTLESS.
It's an "unsigned int", but since (a-b) will be of type ptrdiff_t, and is
*wider* on a 64-bit architecture (it's the same as "long" on x86-64), then
the 2u will just be converted to "long", and be signed again!

So you thought that you did an unsigned divide, but you did no such thing.

If you change the "2u" to a "2ul", it works again, and you get

	mid:
	        movq    %rdi, %rax
	        subq    %rsi, %rax
	        sarq    %rax
	        andq    $-8, %rax
	        addq    %rdi, %rax
	        ret

which is the code you wanted. But quite frankly, you could just have
written it with a shift to start with, and avoided the subtle type issue,
although gcc then generates

        movq    %rdi, %rax
        subq    %rsi, %rax
        sarq    $4, %rax
        leaq    (%rdi,%rax,8), %rax
        ret

instead. Of course, this all *does* still have subtle sign issues, because
the "a-b" part implies a signed divide in itself, which is why you see
that "sarq" in he first place (rather than a "shrq").

Signed divides are hard. The "a-b" pointer subtraction is actually cheaper
than a general signed divide by sizeof, since the compiler can then assume
that the two pointers are mutually aligned, which is why gcc can generate
just a single "sarq" instead of having to do an extra "add negative bit"
thing to get the rounding right.

[ So Al, when you said that

	(a-b)

  is equivalent to

	((char *)a-(char *)b)/4

  for a "int *" a and b, you're right in the sense that the *result* is
  the same, but the code generation likely isn't. The "a-b" thing can (and
  does) allow the compiler to avoid the whole "align up for signed
  numbers" thing, and the difference in code generation is clear:

	subq    %rsi, %rdi
	sarq    $2, %rdi

  vs

	subq    %rsi, %rdi
	leaq    3(%rdi), %rax
	testq   %rdi, %rdi
	cmovs   %rax, %rdi
	sarq    $2, %rdi

  exactly because the first case *knows* that the low two bits have to be
  zero, and thus there is no rounding issue. ]

			Linus

From: Al Viro <viro@ftp.linux.org.uk>
Newsgroups: fa.linux.kernel
Subject: Re: [Security] Signed divides vs shifts (Re: /dev/urandom uses uninit 
	bytes, leaks user data)
Date: Mon, 17 Dec 2007 19:09:34 UTC
Message-ID: <fa.jUqDMy45EhE5/h8Lt6KKLjHJTkI@ifi.uio.no>

On Mon, Dec 17, 2007 at 10:28:38AM -0800, Linus Torvalds wrote:
> [ So Al, when you said that
>
> 	(a-b)
>
>   is equivalent to
>
> 	((char *)a-(char *)b)/4
>
>   for a "int *" a and b, you're right in the sense that the *result* is
>   the same, but the code generation likely isn't. The "a-b" thing can (and

Sure.  And yes, I very much do prefer code that uses C as it ought to be
used and doesn't play games with casts from hell, etc.  For a lot of reasons,
both correctness- and efficiency-related.

We _do_ have such turds.  In spades.  And such places are potential timebombs,
since well-intentioned idiotic patch ("I've read in lecture notes that sizeof
is better than explicit constant, so replacement surely can only improve the
things and the best part is, I don't need to understand what I'm doing")
turns an ugly FPOS into equally ugly FPOS that silently doesn't work ;-/

[sorry about the rant, I'm about 3/4 through the drivers/net colonoscopy,
with >300Kb of patches and a pile of assorted bugs so far - and then there's
drivers/scsi to deal with.  Endianness stuff, mostly...]

Index Home About Blog