Finding the two closest factors of an integer in Lisp

I blogged a few months ago about finding the two closest factors of a number. Well, having been bitten by the Lisp bug, and being impressed at how well Racket handles big numbers, I decided to try the same problem in Racket.

The C# I came up with there (wrapped in a method this time) looked like this for regular integers…

int ClosestFactor(int n) =>
  Enumerable.Range((int)Math.Sqrt(n), n - (int)Math.Sqrt(n) + 1)
    .First(a => n % a == 0);

I did a version using BigInteger as well, but we’ll stick with this one here as it’s clearer.

Doing this in Racket seemed straightforward enough until I realised that I didn’t know how to do the equivalent of Linq’s First() method. I have a background task, trying to write Racket versions of Linq methods, mainly to sharpen my Racket skills, as the language already includes a lot of equivalent methods. As part of that, I had written a version of First() that looked like this…

(define (first-match f lst)
  (cond [(empty? lst) #f]
        [(f (first lst)) (first lst)]
        [else (first-match f (rest lst))]))

This function works, but isn’t very elegant. After some playing around and some contemplating (OK, so I was nodding off in front of the PC, but don’t tell anyone!), I came up with what seems to be a much more elegant version…

(define (first-match f lst)
  (first (filter f lst)))

So, armed with that, it was time to write the main function. Working on the principle that the C# version worked, and was reasonably functional, I mirrored it in Racket, and produced something like this (exact code lost as I was changing it as I developed)…

(define (closest-factors n)
  (define (first-match f lst)
    (first (filter f lst)))
  (first-match (λ (a) (= (remainder n a) 0))
               (range (exact-round (sqrt n)) (+ n 1))))

This seemed to work fine, until I passed in a prime number, at which point I got a contract violation.

After a bit of investigation, I realised that the Linq Enumerable.Range method takes a starting value and the number of elements you want in your collection, whereas the Racket range function takes a maximum value as its second function. This meant that the number you gave the closest-factors function wasn’t included in the list produced by range. For a composite number (ie not a prime), this didn’t matter, as there would be a number in the list that was a factor, so first-match would find something. However, if the number were prime, the only factors would be 1 (which is only ever included in the list if that is what you pass in), and the number itself. In such a case, first-match wouldn’t find anything, causing the error.

It took me a bit of fiddling to get this to work, but in the end, the result was actually much simpler than I originally thought…

(define (closest-factors n)
  (define (first-match f lst)
    (first (filter f lst)))
  (first-match (λ (a) (= (remainder n a) 0)) (range (exact-round (sqrt n)) (+ n 1))))

This worked fine, and as Racket is able to handle big numbers without choking, I didn’t need to modify the code for that.

As usual, it was time to compare speed. Instead of running the code against a range of numbers, I just picked one decent-sized number (100000001) and timed it against that.

The C# version took a mere 32ms to compute the factor (5882353), which was pretty quick.

Sadly, the Racket version took about 10s, about 320 times slower. Whilst I wasn’t surprised that the C# version was faster, I didn’t expect the difference to be so obvious.

Looking at the definition of first-match, it occurred to me that maybe this was the issue. Whilst my second version was elegant, it could well be that the filter function would filter the entire list before passing it to first. This would cause an unnecessary delay. By contrast, the Linq First() method only evaluates the list until it finds a match, which can make it much faster.

I tried modifying my code to use the first version of first-match, as this looked to me as if it would only evaluate the list until it found a match. However, this only reduced the execution time down to about 9s, so still about 280 times slower than the C# version.

I’m still not sure why the Racket version is so slow, but maybe that will become clearer as I get better at it. For the meantime, I’m happy with the function I produced. Given that I had to write my own version of Linq’s First() method, it’s not much more code than the C# version, and whilst not quite so clear (to me at least), is still quite elegant.

Update: It seems that I was right in my hunch about my second version of first-match traversing the whole list (see Óscar López’s answer to my question here), whereas my second version would only traverse as far as it needed.

However, predictably, Racket has a built-in function to do just what I want, namely findf. With this, my function was much shorter…

(define (closest-factors n)
  (findf (λ (a) (= (remainder n a) 0))
               (range (exact-round (sqrt n)) (+ n 1))))

Sadly, it wasn’t that much faster, only shaving about 50ms off the time taken using my own function. So, I’m still not sure why my Racket function is so much slower than the C# one, but I guess that’s an investigation for another day. Too late at night to think about that now!

Update in the clear light of day: I had an epiphany early this morning. If you read my previous post on this puzzle, you’ll see that the method I use is to start from the square root of the target number, and work away from it, looking for the first factor. Given that the pairs of factors will always have the property that one factor will be less than the square root, and one will be greater than it (ignoring the degenerate case where they will both be the same), I could search from the square root up to n (which is what I did), or from the square root down to 1. I didn’t really think about this at the time, but for big numbers, going up will mean traversing a lot more potential factors than going down.

To see this, consider an arbitrarily bigg-ish number such as 123456789. The square root of this (to the nearest integer) is 11111. If we start with this and work up to 123456789, we have to check 123445678 potential factors. However, if we work our way down from the square root to 1, we only need to check 11111 potential factors. As the target number increases, the difference between working up and working down increases significantly.

Now my gut feeling was that this shouldn’t make much difference. After all Lisp was designed for list processing (hence the name), so whizzing through a long list looking for factors shouldn’t take that long should it? Spoiler: it did!

The range function takes an optional third argument that specifies the step. By passing -1 in, we can work our way down. I had to set the lower bound to zero, to ensure that 1 was included in the list, as this is needed for prime numbers.

The resulting code was slightly simpler than the previous…

(define (closest-factors n)
  (findf (λ (a) (= (remainder n a) 0))
               (range (exact-round (sqrt n)) 0 -1)))

Using this version to find the closest factors of 5882353 (as before) was so fast that the time function didn’t get chance to register anything! Passing 1000000000000000 in only took 234ms, which is extremely fast.

As is clear when you think about the method used, the time taken will depend very much on the number passed in. For numbers that have a pair of factors close to the square root, a solution should be found quite quickly. For a number whose closest factors are further apart, it will take longer to find a solution. With this in mind, the best way to compare the C# and Racket versions would be with the latter type of number.

With that in mind, I tried both versions out with 100000000000000015 (fairly randomly picked by playing with numbers until I found something that took longer). As the closest pair of factors is (5 20000000000000003), this is a pretty good test. The Racket code took 68s. I don’t know how long the C# code would have taken, as I stopped it after an hour (no, honestly). I then realised that the C# code was still working its way up from the square root. After I modified the code to work downwards, like the revised Racket code, it took around 28s.

So, C# worked out quite a bit faster. However, the Racket code was much cleaner, as it handled large numbers natively, without me needing to write my own versions of standard methods.

Be First to Comment

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.