Wednesday, August 22, 2012

Problem 6 - Sum of Squares vs. Square of Sum


Problem 6



The sum of the squares of the first ten natural numbers is,
12 + 22 + ... + 102 = 385
The square of the sum of the first ten natural numbers is,
(1 + 2 + ... + 10)2 = 552 = 3025
Hence the difference between the sum of the squares of the first ten natural numbers and the square of the sum is 3025 − 385 = 2640.
Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.
Both the required subroutines are straight forward to implement.  The square of the sum grows much quicker than the sum of the squares.

Solution

#!/usr/bin/python

def sumsquare(val):  #Returns the sum of the squares.
    ss = 1
    for each in range(2,val+1): ss = ss + each**2
    return ss
    
def squaresum(val):  #Returns the squares of the sums.
    ss = 1
    for each in range(2,val+1): ss = ss + each
    return ss**2

if __name__ == '__main__' :
    import sys
    val = int(sys.argv[1]) # Bad Code, assumes input is an integer

    print (sumsquare(val))
    print (squaresum(val))
    print (squaresum(val)-sumsquare(val))


The answer is 25502500-338350=25164150.  

Approach

I first define two functions which compute the desired sums.  The algorithms are clear and concise in python.  I did notice that the for loop, which presumably gets allocated all at once, uses an enormous amount of memory.  

Benchmarks

Python3 really did poorly here and Jython crashed.  Pypy was the most efficient as before.  Java crashed with an out of memory error so I decided to look at memory usage.  Python 3 was actually the most memory efficient followed by Pypy and Python 2.  Jython memory usage was reported despite the crash.  I have read that pypy is implemented on stackless python, and expected to see stack usage with the other versions.  However, none of the python versions used any stack memory.

CPU Usage



Problem #6 Benchmarks
time python factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000

real 0m37.971s
user 0m36.458s
sys 0m1.360s
time python -O factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000

real 0m38.709s
user 0m37.198s
sys 0m1.408s
time python3 factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000

real 1m16.192s
user 1m16.133s
sys 0m0.008s
time python3 -O factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000

real 1m16.864s
user 1m16.805s
sys 0m0.008s
time pypy factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000

real 0m10.756s
user 0m10.729s
sys 0m0.016s
time pypy -O factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000

real 0m10.672s
user 0m10.649s
sys 0m0.012s
time jython factdiff.py 100000000

java.lang.OutOfMemoryError: java.lang.OutOfMemoryError: Java heap space

real 0m38.766s
user 1m48.291s
sys 0m1.680s


Memory Usage


/usr/bin/time -f "%M" python factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000
"12750032"
/usr/bin/time -f "%M" python -O factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000
"12754544"
/usr/bin/time -f "%M" python3 factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000
"30688"
/usr/bin/time -f "%M" python3 -O factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000
"39264"
/usr/bin/time -f "%M" pypy factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000
"76832"
/usr/bin/time -f "%M" pypy -O factdiff.py 100000000
333333338333333350000000
25000000500000002500000000000000
25000000166666664166666650000000
"76864"
/usr/bin/time -f "%M" jython factdiff.py 100000000
Command exited with non-zero status 255
17467504








Conclusion

The way I implemented this problem led to large memory usage.  I have 16 GB of memory so I didn't run out, but I noticed a jump of several GB when I ran the benchmarks.  Python 2 used almost 13 GB as reported by the time function.  Python3 was the most memory efficient of the versions, but also the slowest to run.  

The answer was 25164150.