Distributed Computing

Join the netWORK Computing revolution

What is distributed computing?

As a concept Distributed Computing is hardly a new thing. In fact, kind of distributed computing has been practiced for at least a hundred years, albeit computations were made by hand and later complex mechanical devices because computers hadn't been invented yet. Results and assignments had to be exchanged using carrier pigeons, horses, whatever was available - which was both very unreliable and slow. The most significant such work is the factorization effort of different exponents plus or minus one (k^n+1 and k^n-1). The results of over 30 years of work on these known as the Cunningham Tables were first published in 1925, and the project is still continuing. Recently, however, two major developments have made Distributed Computing an extremely viable research tool: Modern operating systems such as MacOS, Linux and Windows are very good at managing system resources at fine resolutions. When you press a key on the keyboard or click mouse button - such as typing in a word-processor or loading a new web-page - your computer suddenly explodes into action, performing thousands of little tasks in the split-second it takes to respond to your command. If your computer had slower processing speed, it would take longer to respond to that single key-press. After that, it most likely returns back to dormancy, just waiting for the next command to execute. What would happen if that time spent waiting was used productively, instead of wasted into the great bit-bucket in the sky?

This is basically the idea with modern Distributed Computing. Small clients on users computers get their instructions over the Internet, carrying out their assigned tasks during the long periods the computer would otherwise sit doing nothing waiting for the users next input, and sending results back to central servers via Internet. Such computers based on off the shelf commodity equipment and software are titled Beowulf class supercomputers. Altough the technology is relatively new, already three major efforts - measured by over 1 teraflop sustained processing power - to take advantage of it exist. We will take a closer look to them later in this document, but their combined sustained processing power, as of writing this document, was about 26 teraflops per second according to their own reporting.

A teraflop is thousand billion big calculation operations - a number with 12 digits to it. For comparision, the worlds fastest traditional - ASCI based - supercomputer, the Intel ASCI Red at Sandia National Laboratories, USA, has a sustained computing speed of 2,4 teraflops. The two other traditional supercomputers to cross 1 teraflop barrier are IBM ASCI Pacific Blue at Lawrence Livermore National Laboratory with 2.1 teraflops and SGI ASCI Blue Mountain at Los Alamos National Laboratory with 1.6 teraflops. Thus even the combined output of the world's fastest three traditional supercomputers, 6.1 teraflops, is less than fourth of the current distributed computing power extracted by the three largest projects over Internet.

What can it do?

Having now established what distributed computing is, and not only its potential but also current superiority to other methods, the big question is what Distributed Computing can be used for, or what it can do for you. So far, the applications of Internet Distributed Computing haven't been that ambitious, settling for long standing and well known problems such as factoring large numbers or finding large primes and breaking encrypted messages. The latter application has been particularily tempting due to its very straightforward and easily manageable nature as well as the prizes companies have offered in public relations bid for breaking their or their competitors encryption systems.

A good problem to attack by Distributed Computing methods is one that is easily parallerizable - that is, they are well defined "large" problems which can be easily and often naturally divided into definite sub-problems that aren't dependant on each other. Relatively small data-set sizes or at least communication needs are also preferable. The above mentioned applications are all examples of problems that are easy tot ackle by means of Distributed Computing. One significant example of job that by current knowledge would poorly suit this model of work is that of weather prediction, where almost everything depends on everything else, data sets are huge and very dynamic with no clear division of work units.

Some projects have recently spurned up to challenge commonly held beliefs about distributed computing. Such applications are for example SETI@home's signal processing project which tries to look for extra-terrestial signals in radio-telescope data. By common wisdom such data-intensive applications with interdependent work-units and no clear natural division between them was considered a tough nut to crack. Interdependency of work units is solved in a rather crude manner by giving out overlapping assignments where the discontinuity of the signal doesn't disturb total results but lots of work gets done doubly. However, by quickly rising as the top distributed computing power as measured by sustained processing power SETI@home has proved such doubts baseless.

Distributed.net, the second largest of the current Distributed Computing projects on Internet gave a try to another new - at least to themselves - concept by adopting an earlier small project for verifying Omptimal Goulomb Rulers. These number series which satisfy certain condition find their use for example in radio-astronomy and good solutions can be produced for example by genetic algorithms. However, to prove one optimal (or find even better), all possible combinations must be gone through. What is new in this project is that the work unit sizes are widely varying. Unfortunately for Distributed.net their first generation software for this project had an error exactly in handling that variable size, and the project was suspended and restarted.

GIMPS, one of the oldest projects around, has also recently taken steps to an unforeseen direction. Great Internet Mersenne Prime Search looks for prime numbers - ones that aren't evenly divisable by anything else but itself and one - that are the largest known to the human kind, called Mersenne primes. Once considered just a scientific curiosity, large prime numbers today find their application in cryptography and help to ensure your privacy. Study of prime numbers and factorization or finding the prime divisors of numbers, which arises from the Fundamental Theorem of Arithmetics, is also important basic research into the nature of numbers. Due to a large prize money having recently been offered for the finding of really large prime numbers, some GIMPS members are now participating in computations with work-units that can take years to finish.

Another interesting new approach to Distributed Computing comes from Dcypher.net which was formed for CS Cipher challenge. They have adopted a smaller project that used to run as a Java-script on a WWW site. This is a simulation of gamma-ray flux surrounding radioactive material in a container. By varying parameters of the container Dcypher.net expects to empirically test the viability of different nuclear waste storage proposals to eventually benefit whole humankind. This project certainly has appeal to become a major player, but the dramatism on the homesite is thick enough to cut with a knife.

Other possible uses of Distributed Computing that may eventually find their way into practice include for example genetic research, or analyzing DNA. With more and more gentic databanks being made available online the need for an average researcher to be able to process them without needing to reserve supercomputer time is increasing, and some more computer-extensive jobs could be even beyond the reach of todays supercomputers. While large datasets hamper most imaginable genetic research applications, SETI@home's success and the increase in bandwidth and memory capacity make this direction of work very tempting road if just the commercial issues can be ironed out.

Not all distributed computing has to be incredibly complex and serious work, either. One interestingpossible application for distributed computing would be in gaming. For example a chess-computer much more powerful than anything offered at the moment could be constructed this way, assigning specific set of moves to analyze for each participating computer. Ofcourse, with the advent of an unwinnable computer opponent that isn't confined to reaserch-labs whole interest int he game would diminish. But perhaps in future games which from the computers point of view are even more complex could be produced, for example nearing true AI. Distributed image-rendering for producing photorealistic computer-movies is also an interesting field, which has already seen some interest for example among Star Wars fans.

Perhaps a more imortant question than what Distributed Computing can be used for is what it should be used for. Ones that benefit the whole mankind? The ethics of distributed computing form a significant number of interesting questions. Who owns the results of thousands or even millions of people co-operating together? So far this problem has been solved by donating much of the profits to a voted non-profit organization, but the day may not be far away when the finder of the right solution claims those who helped to eliminate wrong solutions or manage the project have no right to the solution, or the project managers decide to keep the finding of solution from participants. Indeed, what is the right of participants to know what they're contributing towards? Certainly most projects are already severely limiting their ability to do so in the name of protecting their algorithms and preventing falsifying results.

Commercial Potential

What is the commercial potential of Distributed Computing? So far any money changing hands has been mostly for public relations reasons, but they can be expected to rise significantly as real problems and real solutions such as genetic research start to enter the picture. Processing time is certianly already being sold to the highest bidder on large computers, so new market it is not. However, processing time on them is rare and belongs to their owner, creating easy basis for pricing. How do you price the work of potentially almost unlimited computing power at the hands of private citizens? It's clear Distributed Computing presents its own problems to commercialization, just as it is clear this will eventually happen.

Copyright © 9th March, 2000  Jukka Santala.