I was browsing the latest C# questions on StackOverflow, and came across one that caught my eye (now removed). Sadly, it was quickly closed by people who, in my opinion, didn’t read it properly. The question was…
> I am trying to write a program to find the factor of a number with minimum difference. For Ex: for 20 it should be 4×5 or vice versa. How can this be achieved for a really BIG number. lets say 989287498274928743928174192847219347123984723498. Please guide
Ignoring the usual SO rules about showing some effort or research, the question is not simply “How do I find the factors of n?”, which is what all the linked answers were about. This question was how to find the two closest factors. Sure you can do that by finding them all, then sorting them by their difference, but for a big number, that’s hugely inefficient. There are ways of solving this specific question much more efficiently, which is what I decided to do.
Just for once, I did learn from my previous mistakes (see Project Euler – Problem 8 Largest product in a series, or “What I learned in school today”, Project Euler – Problem 14 Longest Collatz sequence, or “What I didn’t learn from problem 8” and more recently, Using Linq to debunk the Feng Shui about the number of days in December), and thought about the problem before diving into code. Before anyone thinks I’ve become smart, I should in all honesty confess that the reason I didn’t jump into coding was because I was using the wife’s lap top while I had my breakfast, and that doesn’t have any coding tools installed.
OK, confession over, let’s take a quick look at the problem first.
Suppose you have a number n, which has two factors a and b. Note that this is always possible, as even if n is a prime, the two factors will just be 1 and n.
A moment’s thought should convince you that there are two possibilities here…
- n is a square number, and a and b are the square root (ie a==b).
- n is not a square number, in which case one of the two factors must be less than the square root of n, and the other must be greater than it.
With that in mind, a simple algorithm springs to mind:
- Start with the square root of n (which we’ll denote by a), and check if this is a factor of n. This is easily done in C# by checking if n % a is zero. If so, we are done.
- If we want to exclude the above case, or if n % a is not zero, meaning that a is not a factor of n, then try a + 1, then a + 2 and so on, until we either find two factors, or reach n, meaning that it’s a prime number.
This will find all pairs of factors of n, albeit in a fairly inefficient way.
Now, as we started from the square root of n and are working our way away from it, any pairs of factors that we find will be increasingly distant form each other. To put it another way, the very first pair that we find must be the closest. Therefore, we only need to run this algorithm until we find a pair. This is why this is so much more efficient than finding all pairs first, and then looking for the closest.
OK, enough thinking, it’s time to open LinqPad and try this. As my regular reader will know, I enjoy trying to find elegant Linq expressions to solve these sorts of problems.
I decided to ignore the fact that the original question was about big integers, and work with plain integers first. It was pretty simple to find all pairs of factors, starting with the closest…
Enumerable.Range((int)Math.Sqrt(n), n - (int)Math.Sqrt(n) + 1) .Where(a => n % a == 0) .Select(a => (a, n / a, Math.Abs(n / a - a)))
The first line just gives us a sequence of integers that starts with the biggest integer less than or equal to the square root of n, and containing numbers up to and including n. Enumerable.Range uses yield under the covers, it only generates the numbers you need, so doesn’t use up any more CPU cycles than necessary. I’m not sure how much difference this would make, but worth noting.
The next line checks for factors. As mentioned above, if n % a is zero, then a is a factor, then n % (n / a), must also be.
Whilst testing, I used Select to dump them all out, and the third line just shows both factors.
In order to find the closest pair, we can change Select to First, and drop the Select (which only works on sequences, not individual items). This gives us the following expression which will find one of the two factors…
Enumerable.Range((int)Math.Sqrt(n), n - (int)Math.Sqrt(n) + 1) .First(a => n % a == 0)
Finding the other factor is just a matter of diving n by this. I didn’t bother fiddling with the Linq to do this, as it wasn’t particularly interesting.
The only thing left is to do this for big integers. We have two problems here. First, Enumerable.Range only works with regular integers, and second, Math.Sqrt doesn’t handle big integers.
The first problem is easy to solve. I linked to the source for Enumerable.Range above, so we can just rewrite it to use big integers…
static IEnumerable<BigInteger> RangeIterator(BigInteger start, BigInteger count) { for (BigInteger i = 0; i < count; i++) { yield return start + i; } }
I called it RangeIterator, as that is the name of the private method that does the actual work, Enumerable.Range is just a wrapper that includes bounds checking.
The second problem was solved with a quick search, which turned up a simple method for getting the square root of a big integer…
static double BigSqrt(BigInteger bi) => Math.Pow(Math.E, BigInteger.Log(bi) / 2);
There are some important caveats to the code (read the comments there), but none of them are relevant to us.
With these two methods defined, the original problem can now be solved as follows…
RangeIterator(new BigInteger(BigSqrt(n)), n - new BigInteger(BigSqrt(n)) + 1) .First(a => n % a == 0)
Wrapping this together as a method gives the following…
static IEnumerable<BigInteger> RangeIterator(BigInteger start, BigInteger count) { for (BigInteger i = 0; i < count; i++) { yield return start + i; } } static double BigSqrt(BigInteger bi) => Math.Pow(Math.E, BigInteger.Log(bi) / 2); BigInteger ClosestFactors(BigInteger n) => RangeIterator(new BigInteger(BigSqrt(n)), n - new BigInteger(BigSqrt(n)) + 1) .First(a => n % a == 0);
As noted before, this is very efficient when the two factors are close to the square root, but can be quite slow for big integers where they are far apart. As is well known, Linq is not always the most efficient way of writing code, even if it is (usually) the most elegant. However, the original question didn’t put any bounds on efficiency, and I’m doing this for the fun, so I didn’t bother trying to optimise it.
Update 25th May ’21
Whilst I was repeating this exercise in Lisp, I realised that my code was very inefficient. I was starting from the square root, and working my way upwards to the target number. For small numbers, this is OK, but for large numbers, it means I was checking a lot more numbers than I needed to.
To understand why, consider a random “big” number like 989287498274928743928174192847219347123984723498 (the one in the original question). The square root of this is 994629327073623331045376. This means that if we work upwards from the square root (as the code above does), we need to check up to 989287498274928769521168860668490353114521534464 potential factors. However, if we work our way down instead, we only need to check 994629327073623331045376 candidates.
It’s hard to get your head round how much smaller this is, but to give you some idea, if you were to check one million candidates every second, going upwards would take 31,348,629,118,656,959,682,346,862,941,807,830 years. By contrast, going upwards would only take 31,517,901,458 years!
So, I modified the code to allow it to work downwards. The only part that needed changing was the RangeIterator
method, which I renamed Range
to be consistent with the Linq method…
static IEnumerable<BigInteger> Range(BigInteger min, BigInteger count, int step = 1) { for (BigInteger i = 0; i <= count; i++) { yield return min + (i * step); } } static double BigSqrt(BigInteger bi) => Math.Pow(Math.E, BigInteger.Log(bi) / 2); BigInteger ClosestFactors(BigInteger n) => Range(new BigInteger(BigSqrt(n)), new BigInteger(BigSqrt(n)), -1) .First(a => n % a == 0);
Using this revised code, I was able to find the solution for 100000000000000015 in 28s, which is something I probably couldn’t have done at all with the old code.
Be First to Comment