Wednesday, August 22, 2012

Problem 6 - Sum of Squares vs. Square of Sum


Problem 6



The sum of the squares of the first ten natural numbers is,
12 + 22 + ... + 102 = 385
The square of the sum of the first ten natural numbers is,
(1 + 2 + ... + 10)2 = 552 = 3025
Hence the difference between the sum of the squares of the first ten natural numbers and the square of the sum is 3025 − 385 = 2640.
Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.
Both the required subroutines are straight forward to implement.  The square of the sum grows much quicker than the sum of the squares.

Solution

#!/usr/bin/python

def sumsquare(val):  #Returns the sum of the squares.
    ss = 1
    for each in range(2,val+1): ss = ss + each**2
    return ss
    
def squaresum(val):  #Returns the squares of the sums.
    ss = 1
    for each in range(2,val+1): ss = ss + each
    return ss**2

if __name__ == '__main__' :
    import sys
    val = int(sys.argv[1]) # Bad Code, assumes input is an integer

    print (sumsquare(val))
    print (squaresum(val))
    print (squaresum(val)-sumsquare(val))


The answer is 25502500-338350=25164150.  

Approach

I first define two functions which compute the desired sums.  The algorithms are clear and concise in python.  I did notice that the for loop, which presumably gets allocated all at once, uses an enormous amount of memory.  

Benchmarks

Python3 really did poorly here and Jython crashed.  Pypy was the most efficient as before.  Java crashed with an out of memory error so I decided to look at memory usage.  Python 3 was actually the most memory efficient followed by Pypy and Python 2.  Jython memory usage was reported despite the crash.  I have read that pypy is implemented on stackless python, and expected to see stack usage with the other versions.  However, none of the python versions used any stack memory.

CPU Usage



Problem #6 Benchmarks
time python factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000

real 0m37.971s
user 0m36.458s
sys 0m1.360s
time python -O factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000

real 0m38.709s
user 0m37.198s
sys 0m1.408s
time python3 factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000

real 1m16.192s
user 1m16.133s
sys 0m0.008s
time python3 -O factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000

real 1m16.864s
user 1m16.805s
sys 0m0.008s
time pypy factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000

real 0m10.756s
user 0m10.729s
sys 0m0.016s
time pypy -O factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000

real 0m10.672s
user 0m10.649s
sys 0m0.012s
time jython factdiff.py 100000000

java.lang.OutOfMemoryError: java.lang.OutOfMemoryError: Java heap space

real 0m38.766s
user 1m48.291s
sys 0m1.680s


Memory Usage


/usr/bin/time -f "%M" python factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000
"12750032"
/usr/bin/time -f "%M" python -O factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000
"12754544"
/usr/bin/time -f "%M" python3 factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000
"30688"
/usr/bin/time -f "%M" python3 -O factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000
"39264"
/usr/bin/time -f "%M" pypy factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000
"76832"
/usr/bin/time -f "%M" pypy -O factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000
"76864"
/usr/bin/time -f "%M" jython factdiff.py 100000000
Command exited with non-zero status 255
17467504








Conclusion

The way I implemented this problem led to large memory usage.  I have 16 GB of memory so I didn't run out, but I noticed a jump of several GB when I ran the benchmarks.  Python 2 used almost 13 GB as reported by the time function.  Python3 was the most memory efficient of the versions, but also the slowest to run.  

The answer was 25164150.

Saturday, August 18, 2012

Problem 5 - Smallest Number Divisible By 1 to 20

Problem 5


2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.

What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?
The most obvious number which would be divisible by 1-20 would be n-factorial (n!).  Obviously this would not be the smallest possible number or the question would be too simple.  What can be eliminated?  If you have 2 x 5 then the number will be divisible by 10 without having to multiply by 10.  This leads to the idea that only the prime factors are necessary.  But what about 12?  The prime factors are 2 and 3, but a number which includes 2 x 3 only will not necessarily be divisible by 12.  Thus you need all the repeated prime factors as well.  So we need to multiply all the prime factors of each number, including repeated factors for the composite to be divisible by each number.

Solution

#!/usr/bin/python

import sys
from primes import factor

def union(a,b):
    for each in a:
        if a.count(each)>0 and b.count(each)>0:
            b.remove(each)
    return a + b

if __name__ == '__main__':
    #Determines the smallest number divisible by each of a range of numbers
    #smaller than just n!
    val = int(sys.argv[1])#Bad practice, assumes input is an int
    factors = []
    for each in range (2,val+1):
        factors = union(factors,factor(each))
    composite = 1
    for each in factors:
        composite = composite * each
    print (composite)


Prime factors with repeats are [2, 3, 2, 5, 7, 2, 3, 11, 13, 2, 17, 19].
The answer is 232792560.  (n! is 2432902008176640000) 

Approach

I start by defining a "union" function.  In set theory, the union adds two sets together but doesn't repeat the common members of the set.  My approach was to compare the two sets and remove the repeats from one of the sets as I went along.  Then I can simply concatenate the remaining sets without overlap.

Benchmarks

The surprising result here was the poor performance of pypy.  In this case, pypy was slower than either Python 2 or 3.  I even repeated the tests to see if the computer happened to be busy when pypy was running.  With this program, pypy performed almost as slowly as Jython.



Problem #5 Benchmarks
time python primes20.py 2000

151117794877444315...

real 0m0.902s
user 0m0.896s
sys 0m0.004s
time python -O primes20.py 2000

151117794877444315...

real 0m0.945s
user 0m0.928s
sys 0m0.016s
time python3 primes20.py 2000

151117794877444315...

real 0m1.253s
user 0m1.252s
sys 0m0.000s
time python3 -O primes20.py 2000

151117794877444315...

real 0m1.276s
user 0m1.264s
sys 0m0.008s
time pypy primes20.py 2000

151117794877444315...

real 0m2.520s
user 0m2.508s
sys 0m0.004s
time pypy -O primes20.py 2000

151117794877444315...

real 0m2.511s
user 0m2.504s
sys 0m0.004s
time jython primes20.py 2000

151117794877444315...

real 0m3.102s
user 0m5.984s
sys 0m0.104s




Conclusion

This was an interesting twist on the prime factoring problem from earlier.  I had to modify the factoring code I had written not to throw away repeated prime factors.  Pypy had a surprisingly poor performance in this program and the "count" function used in my union subroutine may be implemented poorly in pypy.

Friday, August 17, 2012

Problem 4 - Palindromes

Problem 4

A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91  99.
Find the largest palindrome made from the product of two 3-digit numbers.

Solution

#!/usr/bin/env python

def ispal(num):  #Check if the number is a palindrome
    val = str(num)  #Python just does the right thing here.  Nice.
    for each in range(0,int((len(val)+1)/2)): #Any mismatch return false
        if val[len(val)-1-each]!=val[0+each]: return False
    return True  #otherwise return true

def calc():  #brute force looking for palindromes
    temp = 0
    largest = 0
    for each in range(999,100,-1):
        for other in range(999,100,-1):
            temp = each * other
            if ispal(temp):
                largest = max(largest,temp)
    return largest

if __name__ == '__main__' :
    print (calc())


The answer is 906609.

Approach

This was one of the problems which took almost no time to code.  The seamless casting in Python made this one trivial.  You just work from both ends and look for mismatches.

Converting to a string worked exactly like it should have and python made it trivial to code.  Using some sort of built-in to reverse the string and see if it matches the original sting would probably be faster than a char by char comparison.

Benchmarks

The benchmarks fell out pretty much like the others.  Boring example, Sorry!

Problem #4 Benchmarks
time python palindromes.py
906609

real 0m0.746s
user 0m0.744s
sys 0m0.000s
time python -O palindromes.py
906609

real 0m0.764s
user 0m0.756s
sys 0m0.004s
time python3 palindromes.py
906609

real 0m0.839s
user 0m0.828s
sys 0m0.008s
time python3 -O palindromes.py
906609

real 0m0.875s
user 0m0.868s
sys 0m0.004s
time pypy palindromes.py
906609

real 0m0.090s
user 0m0.080s
sys 0m0.008s
time pypy -O palindromes.py
906609

real 0m0.087s
user 0m0.072s
sys 0m0.012s
time jython palindromes.py
906609

real 0m2.835s
user 0m5.852s
sys 0m0.176s

Conclusion

Python just worked here.  This problem took no time at all to code and worked on the first try.  The benchmarks agree with everything else we've seen.  The answer is 906609.

Thursday, August 16, 2012

Problem 3 - Factoring Primes

Problem 3

The prime factors of 13195 are 5, 7, 13 and 29.

What is the largest prime factor of the number 600851475143 ?

This is the first problem I found with an interesting real world application.  Factoring primes is hard but multiplying out two primes is trivial.  That concept, called a one-way function, underpins modern cryptography.  The generation of large primes is a hard problem, but it is much harder to figure out which two primes were multiplied to make a given number than it is to generate those primes in the first place.  That's because modern computers are very, very good at multiplication and addition, but relatively slow at division.  Likewise, the brute force guessing of prime factors is very time consuming because there are a large number of primes.

All cryptography can be broken given enough time.  The trick is to make the math problem hard enough that the attacker gives up and does something else before he can decrypt your secrets.

Solution

Here's the code for Problem 3:
#!/usr/bin/python

def isprime(val): #Checks to see if a number is prime, brute force
    for x in range(2, int(val**0.5)+1):
        if val % x == 0: return False
    return True

def factor(val):  #Get the prime factors of a number
    if isprime(val): return [val]
    i = 2
    thesefactors = []
    tempval = val
    while (i <= tempval):
        if tempval%i == 0:
            thesefactors += [i]
            tempval = tempval/i
            i = 2
        else:
            i = i + 1
    if len(thesefactors) <= 1:
        return [val]
    else:
        return thesefactors

if __name__ == '__main__':
    import sys
    factors = factor(int(sys.argv[1]))
    print (factors)
This is run using the following command (with the correct number argument for the question asked):

python primes.py 600851475143

The correct answer is 6857.

Approach

Again, this is a brute force approach.  I have included a refinement called Newton's Improvement where you don't need to check if numbers larger than the square root of that number are factors.  This helps improve both subroutines.  These subroutines were both used in the solutions of later problems.

for x in range(2, int(val**0.5)+1):

Benchmarks

The question asked was too easy to make a good benchmark, so I created my own composite using the solution of a later problem involving counting primes.  The two primes I multiplied were 1299379 and 1299709.  The multiplication takes the blink of an eye, but the factoring takes many, many blinks.

As before, pypy is about 7-10x faster than Python2, with Jython being much slower and Python3 somewhere in the middle.

Problem #3 Benchmarks
time python primes.py 1688814580711
[1299379, 1299709]

real 0m0.329s
user 0m0.316s
sys 0m0.012s
time python -O primes.py 1688814580711
[1299379, 1299709]

real 0m0.380s
user 0m0.372s
sys 0m0.004s
time python3 primes.py 1688814580711
[1299379, 1299709]

real 0m0.715s
user 0m0.704s
sys 0m0.008s
time python3 -O primes.py 1688814580711
[1299379, 1299709]

real 0m0.772s
user 0m0.764s
sys 0m0.004s
time pypy primes.py 1688814580711
[1299379, 1299709]

real 0m0.083s
user 0m0.080s
sys 0m0.000s
time pypy -O primes.py 1688814580711
[1299379, 1299709]

real 0m0.083s
user 0m0.076s
sys 0m0.004s





Conclusion

The Problem 3 solution is a program with both real world uses and utility as a library for later problems.  The problem of locating and factoring primes is the holy grail of cryptography and this simple implementation gives some insight into prime numbers and the problem of factoring primes.
  

Wednesday, August 15, 2012

Problem 2 - Sum of Even numbers in the Fibonachi Sequence under 4000000

Problem 2

Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:
1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.

Solution

Here's the code for Problem 2:
#!/usr/bin/env python

def nextfib(a,b): #compute the next Fibonachi term
    c = a + b
    return (b,c) #keep the previous term

if __name__ == '__main__' :

    import sys
    a = 1
    b = 2
    mysum = 0
    val = int(sys.argv[1]) #Bad code, assumes input is an int
    while(b < val):
        if b%2==0: mysum += b #add it up if it's even
        (a,b) = nextfib(a,b)
    print(mysum) #print the answer at the end


You can run this code with the command:

python fibsum.py 4000000

The correct answer is:
4613732

Approach

The Fibonachi sequence is straight forward to compute.  You need to store the last two values in the sequence and add them up the get the next term.  If the value is even, you add it to the sum.

The only real difference between this and the previous problem was the use of a subroutine.  You can see the syntax for a subroutine here:

def nextfib(a,b): #compute the next Fibonachi term
    c = a + b
    return (b,c) #keep the previous term

Benchmarks

There was very little difference between the interpreters for this problem.  The Fibonachi sequence grows very quickly and it was difficult to get the compute time up enough for a benchmark.  As before, Jython was much slower and I believe we're seeing the virtual machine start-up time more than the actual compute time.  Java + Python takes longer to load than Python, which comes as no surprise.

Problem #2 Benchmarks

time python fibsum.py 400000000000000000000000000000000000000000000000000000000000000000000000000000
324236912809324524060519206539762341175704857242759930854194179672563128605028

real 0m0.025s
user 0m0.020s
sys 0m0.004s
time python -O fibsum.py 400000000000000000000000000000000000000000000000000000000000000000000000000000
324236912809324524060519206539762341175704857242759930854194179672563128605028

real 0m0.047s
user 0m0.040s
sys 0m0.004s
time python3 fibsum.py 400000000000000000000000000000000000000000000000000000000000000000000000000000
324236912809324524060519206539762341175704857242759930854194179672563128605028

real 0m0.032s
user 0m0.028s
sys 0m0.000s
time python3 -O fibsum.py 400000000000000000000000000000000000000000000000000000000000000000000000000000
324236912809324524060519206539762341175704857242759930854194179672563128605028

real 0m0.082s
user 0m0.076s
sys 0m0.004s
time pypy fibsum.py 400000000000000000000000000000000000000000000000000000000000000000000000000000
324236912809324524060519206539762341175704857242759930854194179672563128605028

real 0m0.035s
user 0m0.020s
sys 0m0.012s
time pypy -O fibsum.py 400000000000000000000000000000000000000000000000000000000000000000000000000000
324236912809324524060519206539762341175704857242759930854194179672563128605028

real 0m0.035s
user 0m0.016s
sys 0m0.016s
time jython fibsum.py 400000000000000000000000000000000000000000000000000000000000000000000000000000
324236912809324524060519206539762341175704857242759930854194179672563128605028

real 0m1.514s
user 0m3.204s
sys 0m0.164s




Conclusion

The relatively short compute time isolated the start-up time difference between Jython and the various Pythons.  At least Jython ran this time.    This was another easy problem and the code is simple.  The syntax for a subroutine was introduced for any beginners.

Tuesday, August 14, 2012

Problem 1 - Integer Multiples of 3 and 5


Problem 1

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. 
Find the sum of all the multiples of 3 or 5 below 1000.

This is a real softball just to get people hooked.  The code could easily be a single line, but I have separated things out a bit.  I'll take the time to discuss the various boilerplate in my code in this blog post because the problem itself doesn't require much discussion.

I have also run my first benchmarks on several different interpreters.  I experiment with the -O switch which runs some sort of optimizations before running the script.  Interpreters tried include, Python 2 & 3, Pypy, and Jython.  The only modifications needed to run in Python 3 were the use of the new Python 3 "print ()" syntax which also worked in all Python 2 flavors.

There is an Ubuntu Launchpad project where I will host all my solution code.  You can get your very own copy on a linux box with bzr installed using this command:

bzr branch lp:eulerplay

Or you can browse online at:
http://bazaar.launchpad.net/~brywilharris/eulerplay/trunk/files.

Solution

Here's the code for Problem 1:
#!/usr/bin/env python

#This program adds all the integer numbers divisible by 3 and 5
#below a selected maximum number.  It is a straight forward, 
#brute-force approach.

if __name__ == '__main__' :   #boilerplate

    import sys  #for command line input
    sum = 0;
    for i in range(0,int(sys.argv[1])): #int arg required
        if (i%3==0) or (i%5==0):  #divisible by 3 or 5?
            sum = sum + i #add to the sum
    print sum #spit it out at the end
This is run using the following command (with the correct number argument for the question asked):

python add35.py 1000

The correct answer is 233168.

Approach

This is a simple, brute-force solution.  I loop through the entire range and add up the numbers divisible by 3 and 5.  The modulo (%) operator lets me see which numbers are evenly divisible.

This program uses the simplest form of command line argument (using sys.argv), the boilerplate (__name__ == __main__ ) for use as a library on Windows and a simple for loop (for i in range(x,y)).  These are some of the only things a python programmer needs to learn to start writing code.  I'm  often pleasantly surprised when I don't know how to do something in Python and it just does what I mean on the first try.  Try that in Java some time.

The first line lets you invoke the script from the command line without specifying the interpreter. Instead of python add35.py 1000, you can just type ./add35.py 1000.  This line is not required when the interpreter is run manually, eg pypy add35.py 1000.
#!/usr/bin/env python
The sys library lets you access command line arguments in python. There are much fancier and more bullet proof ways of doing this, but none are quite this simple.
import sys
astring = sys.argv[1] #argv[0] is the script name
This is how python handles for loops:
for i in range(x,y):

Benchmarks

The following are some simple benchmarks using the unix "time" command.  Python can be run on a variety of interpreters without modification.   My 4-core Intel i7 processor positively spoils me for other machines  (Hopefully it's not bragging to say, "Your benchmarks may be slower").   Python 2.7.3 comes with my copy of Ubuntu 12.04.  I also tried Pypy version 2.7.2, Python 3.2.3 and Jython 2.5.1 running on Java 1.6.0_24.

Code runs in a single thread for all benchmarks in this example.  The "real"number is the wall clock time the program took to set up and run.  This includes loading the interpreter into memory, any optimizations, and the reporting of results.  Later programs may be multi-threaded, but this one didn't need that.

This simple example can be timed with the following command:

time python add35.py 1000
or
time python -O add35.py 1000

In this case, the -O (optimze) switch takes longer than running unoptimized:
bryan@myComputer:~/play$ time python add35.py 1000
233168

real 0m0.025s
user 0m0.024s
sys 0m0.000s
bryan@myComputer:~/play$ time python -O add35.py 1000
233168

real 0m0.071s
user 0m0.060s
sys 0m0.004s

Adding a few zeros makes the -O switch (just barely) worthwhile:
bryan@myComputer:~/play$ time python add35.py 100000000
2333333316666668

real 0m19.323s
user 0m18.457s
sys 0m0.772s
bryan@myComputer:~/play$ time python -O add35.py 100000000
2333333316666668

real 0m19.286s
user 0m18.417s
sys 0m0.792s

Python 3 took longer and required a modification to the print statement syntax.  print sum had to become print(sum).  It looks to me like Python 3 has a way to go before it starts getting optimized.
bryan@bryan-Aspire-V3-771:~/play$ time python3 -O add35.py 100000000

2333333316666668

real 0m22.498s
user 0m22.465s
sys 0m0.004s

On a lark I tried this in pypy.   Pypy is a python interpreter written in a version of python instead of c.  It seems like an interpreter written in an interpreted language would be the definition of clunky, but this isn't the case for pypy.  Pypy runs a lot faster in this simple case:
bryan@myComputer:~/play$ pypy -O add35.py 100000000
2333333316666668

real 0m2.833s
user 0m2.804s
sys 0m0.028s


Jython crashed with an out of memory error after 37 seconds for the number 100000000.  Jython fits my preconceived idea of an interpreter inside an interpreter much better than Pypy. I guess my jython benchmarks will be a bit limited.  Jython was able to solve the original problem, also shown below.
java.lang.OutOfMemoryError: java.lang.OutOfMemoryError: Java heap space

real 0m37.852s
user 1m48.999s
sys 0m1.732s

bryan@myComputer:~/play$ time jython add35.py 1000

233168



real 0m1.580s

user 0m3.444s
sys 0m0.136s



Conclusion

This is a straight-forward problem with a straight forward solution.  While a faster algorithm might be found (multiplying by threes and fives until you get to the target number might be faster) but it was not necessary.  The solution to the initial question above is 233168.  

Pypy is the fastest interpreter for this problem by a factor of almost 7x. Python 3 was  slower than Python 2 and Jython was very slow even for this simple problem.  Jython ran out of memory and crashed when working with the bigger number.

Monday, August 13, 2012

Euler Challenge Introduction


Illustration of the Euler Graph Concept from Wikipedia
This is the first in my series of posts on the programming site projecteuler.net. This site currently has a list of 391 interesting programming challenges and mathematical puzzles. The idea is to write a computer program to solve each problem in under an hour. Some of the puzzles can undoubtedly be solved in under an hour including both programming and run time. In my experience thus far, some of the problems will require creative solutions to solve in less than a day.

I plan to first solve all these problems in Python where possible because this is my favorite non-graphical programming language. I then plan to port some of my code to other languages for benchmarking purposes and just to brush up. One of the problems I have already solved needed to be ported to C just to solve in under a week... I don't expect this to be a series of general interest, but if you like python, math puzzles, or just geeky stuff in general, maybe this will be for you.

I plan to post on one problem a day, and I have a few week head start on solving the problems. I will post all my code along with benchmarks for those who are interested. All the code I write is licensed under the GPL. Here is a link if you don't know what that is. It means you can have, copy, keep, share, modify, sell, whatever, as long as you redistribute the code along with the programs and pass these rights on. Go forth and have fun with it.

I hope you enjoy my blog post series on the Euler challenges.