[PATCH 1/2] track_info_cmp: fix comparison of doubles

Discussion:

Johannes Weißl

2011-04-07 15:52:34 UTC

---
track_info.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/track_info.c b/track_info.c
index 3eb4a74..d7ef92d 100644
--- a/track_info.c
+++ b/track_info.c
@@ -160,11 +160,13 @@ int track_info_matches(const struct track_info *ti, const char *text, unsigned i

static int doublecmp0(double a, double b)
{
+ double x;
if (isnan(a))
return isnan(b) ? 0 : -1;
if (isnan(b))
return 1;
- return (int) (a - b);
+ x = a - b;
+ return (x > 0) - (x < 0);
}

/* this function gets called *alot*, it must be very fast */

--
1.7.4.1

Johannes Weißl

2011-04-07 15:52:35 UTC

Permalink

This way it is way more useful, since music directories often contain
spaces.
---
cmdline.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/cmdline.c b/cmdline.c
index d852923..e79e147 100644
--- a/cmdline.c
+++ b/cmdline.c
@@ -9,7 +9,7 @@
struct cmdline cmdline;

const char cmdline_word_delimiters[] = " ";
-const char cmdline_filename_delimiters[] = " /";
+const char cmdline_filename_delimiters[] = "/";

void cmdline_init(void)
{

--
1.7.4.1

Gregory Petrosyan

2011-04-08 20:25:25 UTC

Permalink

Post by Johannes WeiÃl
-const char cmdline_filename_delimiters[] = " /";
+const char cmdline_filename_delimiters[] = "/";

Merged to master, thanks again!

Gregory

Jason Woofenden

2011-04-07 16:36:36 UTC

Permalink

I'd like to suggest adding a fudge factor to doublecmp0().

My understanding is that floating point registers have more bits
than the memory slots for floating point values. The result of this
is that floating point values get truncated at arbitrary times.

When two doubles have [almost] the same value, you can't reliable
tell if they are the same. x86 in particular seems to be bad about
this. I was very surprised and frustrated to discover (while
working on vor) that when rounding to int, not only did it not
always round to the same int, but it would sometimes later round
the same variable to a _larger_ positive integer (480 instead of
478).

So I fear if we try to use the simple algorithm, of just checking
if a-b==0 it will result in arbitrary and sometimes changing sort
order for values that are actually the same. I suggest something
like: (after the nan checks)

// FIXME figure out how many zeros should be in this
#define VERY_SMALL_DOUBLE = 0.0000001

x = a - b;

if(x < VERY_SMALL_DOUBLE && x > -VERY_SMALL_DOUBLE) {
return 0;
else {
return (x > 0) - (x < 0);
}

I'm not sure what VERY_SMALL_DOUBLE should be, or if something like
this is already defined in the system headers.

Take care, - Jason

Johannes Weißl

2011-04-07 18:28:07 UTC

Permalink

Hello Jason,

Post by Jason Woofenden
I'd like to suggest adding a fudge factor to doublecmp0().
My understanding is that floating point registers have more bits
than the memory slots for floating point values. The result of this
is that floating point values get truncated at arbitrary times.
When two doubles have [almost] the same value, you can't reliable
tell if they are the same. x86 in particular seems to be bad about
this. I was very surprised and frustrated to discover (while
working on vor) that when rounding to int, not only did it not
always round to the same int, but it would sometimes later round
the same variable to a _larger_ positive integer (480 instead of
478).
So I fear if we try to use the simple algorithm, of just checking
if a-b==0 it will result in arbitrary and sometimes changing sort
order for values that are actually the same. I suggest something
like: (after the nan checks)
// FIXME figure out how many zeros should be in this
#define VERY_SMALL_DOUBLE = 0.0000001
x = a - b;
if(x < VERY_SMALL_DOUBLE && x > -VERY_SMALL_DOUBLE) {
return 0;
else {
return (x > 0) - (x < 0);
}
I'm not sure what VERY_SMALL_DOUBLE should be, or if something like
this is already defined in the system headers.

While your point is certainly true and matters for lots of applications,
I think in cmus the simple algorithm is fine: The only source for
doubles are replay gain peak and gain values, which are never so small.

An implementation for doublecmp0() for potentially small numbers would
be:

x = a - b;
if (fabs(x) < DBL_EPSILON)
return 0;
return (x > 0) - (x < 0);

Johannes

Jason Woofenden

2011-04-07 19:22:40 UTC

Permalink

OK, now I get that my sort of solution only works with numbers on a
certain order of magnitude or three.

But I'm still concerned that the current code is going to be
arbitrary with RG values that are the same, depending on how/when
the CPU truncates them.

I've just read lots of people pointing out this problem, but only
one solution that looked like it might work at any magnitude:

http://c-faq.com/fp/fpequal.html

I see that doublecmp0() is only being used to sort by RG values,
and thus it's not very important. I'm not concerned with the order
of the sorting so much as I am concerned that they stay in the same
order every time they are sorted by the same criteria.

But even then, I doubt people will leave their library sorted by RG
values, so... I'll let this rest.

That's enough out of me...

Take care, - Jason

Johannes Weißl

2011-04-07 19:44:33 UTC

Permalink

Hello Jason,

Post by Jason Woofenden
But I'm still concerned that the current code is going to be
arbitrary with RG values that are the same, depending on how/when
the CPU truncates them.

That would indeed be interesting! The numbers we are dealing with are
very "good", i.e. they are always rational numbers (so we could also
store them as two ints). I wonder if it is guaranteed that these numbers
are not truncated (or truncated the same way!). On the other hand, human
readable decimal fractions might be complicate binary fractions...
If you can find evidence (or counter evidence), please post!

Post by Jason Woofenden
I see that doublecmp0() is only being used to sort by RG values,
and thus it's not very important. I'm not concerned with the order
of the sorting so much as I am concerned that they stay in the same
order every time they are sorted by the same criteria.

Yes, I see the problem. But this is more an issue of strtod(). If it is
guaranteed that doublecmp0(strtod("num1"), strtod("num2")) is always the
same, we have no problem...

In extreme cases (all sort keys are equal) the order depends on the
order of the inserts.

Johannes

Johannes Weißl

2011-04-07 18:35:02 UTC

Permalink

more than 3 times faster
---
track_info.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/track_info.c b/track_info.c
index d7ef92d..0859c24 100644
--- a/track_info.c
+++ b/track_info.c
@@ -161,10 +161,10 @@ int track_info_matches(const struct track_info *ti, const char *text, unsigned i
static int doublecmp0(double a, double b)
{
double x;
- if (isnan(a))
- return isnan(b) ? 0 : -1;
- if (isnan(b))
- return 1;
+ /* fast check for NaN */
+ int r = (b != b) - (a != a);
+ if (r)
+ return r;
x = a - b;
return (x > 0) - (x < 0);
}

--
1.7.4.1

Gregory Petrosyan

2011-04-08 20:24:46 UTC

Permalink

Post by Johannes WeiÃl
more than 3 times faster

Merged this version to master, thanks!

Gregory