Neat Story (Piper)

From: [email protected] (Ted Cashel)

Date: Wed Mar 31, 1999 11:10:17 PM US/Eastern

To: [email protected]

Cc: [email protected]

Subject: Fwd: Neat story

Reply-To: [email protected] (Ted Cashin)

Attachments: There is 1 attachment

(Embedded image moved to file: pic09930.jpg)

[email protected] didn’t work so I am trying you here

I got this from a recent Tidbits and it reminded me of your prime number program. This guy has a more complex program (albeit nearly as pointless) but has the background to have tried it on a proud line of mainframes and modern PC’s.

Power Macintosh G3: The Cannonball Express

——————————————

by Rick Holzgrafe

The Cannonball Express was the fabled train that was so fast it

took three men to say “Here she comes,” “Here she is,” and “There

she goes.” Computers are fast too, although unlike trains, most

aren’t self-propelled. What makes a computer fast, and how much

effect does software design have? How much faster are today’s

computers than yesterday’s? Recently I revisited some of these

questions, beginning with a trip down memory lane.


**Back in the Stone Age** — Twenty years ago, I was teaching

myself programming and had access to a DEC PDP-11/60 minicomputer

on evenings and weekends. This beast was bigger than a washing

machine, and during workdays I shared it with two dozen other

technicians and engineers. I found a word puzzle in a magazine and

thought it would be fun to program the PDP to solve it. The puzzle

was as follows.

Given a phrase and a sheet of graph paper, write the phrase on the

graph paper according to these rules:

1. Write one letter per square on the graph paper, like filling in

a crossword puzzle. Ignore case and anything that’s not one of the

26 letters of the alphabet. The phrase “N. 1st Street” is thus

identical to “NSTSTREET”.

2. Put the first letter of the phrase in any square you like.

3. After writing any letter, put the next letter of the phrase in

any adjacent square. Here, “adjacent” means any of the eight

neighboring squares, up, down, left, right, or diagonally. You may

reuse a square if it is adjacent and already holds the letter you

need; otherwise you must use a blank square. You can’t use the

same square twice in a row – no “hopping in place” for double

letters.

The goal is to write the phrase inside a rectangle of the smallest

possible area. (A subtle point: you are not trying to write in a

minimal number of squares.) To score your solution, draw the

smallest enclosing rectangle you can and take its area. The

rectangle may enclose some blank squares; count them, too.

Got it? Tongue-twisters are the most fun because they have lots of

opportunities to reuse whole snaky strings of squares. The 37

letters in “Peter Piper picked a peck of pickled peppers” can be

packed into a 3 by 5 rectangle, like this (view this in a

monospaced font):

OFIPT

KCPER

LEDAS

In those days I knew computers were “fast” but had no idea how

fast. The answer turned out to be “not very.” I wrote a program to

solve these puzzles and called it Piper after the tongue-twister.

I set Piper running on a medium-length phrase on a Friday evening,

and came back on Monday to find it still running. It had found

several less-than-best solutions but hadn’t finished. Way too slow

– I found a better solution myself on paper in about half an hour.

Why did it take so long? Piper was a “brute force” program. It

tried every possible solution to the problem, one after another.

The trouble is that there are too many possible solutions. Exactly

how many depends on the phrase, but for any non-trivial phrase the

number is astronomical. I realized for the first time that “fast”

sometimes isn’t “fast enough.” This point may be obvious today,

when we all use computers and are weary of waiting for them. But

in 1979, that PDP-11 was only the second computer I had ever seen!

**What Part of Fast Don’t You Understand?** I saw that I would

have to make Piper faster. There are two basic ways to speed up a

program. Plan A is to find a better way of solving the problem,

but after twenty years I still haven’t thought of a better

solution. That leaves plan B, the classic efficiency expert’s

solution: eliminate unnecessary steps. For example, Piper created

every possible solution, then calculated the area of each. It

built each solution one letter at a time, so instead of taking the

area only for completed solutions, I changed Piper to check the

area after placing each letter. If placing a letter made the

solution-in-progress take up more space than the smallest complete

solution found so far, Piper could skip the rest of that solution

(and all other solutions that started the same way) and move right

on to the next one. This eliminated a huge amount of work and

greatly improved Piper’s speed. Finding clever ways to track the

area of a growing solution helped too, because it was faster than

calculating the area from scratch after each letter. I also found

a way to calculate a minimum size for the final solution quickly:

I couldn’t guarantee that the best solution would be that small,

but I could guarantee that it wouldn’t be smaller. If Piper got

lucky and found a solution as small as that calculated minimum, it

could stop immediately. Otherwise it would continue on after

finding the best solution, vainly seeking a still better one.

Eventually Piper became clever enough to finish that original

phrase in a reasonably short period of time. But the holy grail

continued to elude me: I wanted a solution for “How much wood

would a woodchuck chuck if a woodchuck could chuck wood?” That PDP

(and, perhaps, my cleverness) were not up to the task. I had run

out of ideas for speeding up Piper, and runs still took longer

than a weekend. But if I couldn’t improve Piper, I could at least

hope to run it on a faster computer.

**Big Iron** — People tend to think of processor speed as the

speed of a computer, but many factors affect overall performance.

Virtual memory lets you work on bigger data sets or on more

problems at a time, but it’s slow, so adding more physical RAM

helps by reducing your reliance on virtual memory. Faster disks

and I/O buses load and save data more quickly. RAM disks and disk

caching replace slow disk operations with lightning-quick RAM

access. Instruction and data caches in special super-fast RAM

offer big improvements for some programs. Well-written operating

systems and toolboxes can outrun poorly written ones.

But in the end, little of this matters to Piper. Piper has always

used only a small amount of data, doesn’t read or write the disk

after it gets going, and does little I/O of any kind. With its

small code and data set Piper can take good advantage of data and

instruction caching, but what it mostly needs is “faster hamsters”

– a faster processor to make the wheels turn more quickly.

As the years rolled on, I ran versions of Piper on my first Macs,

but in the middle 1980’s I worked for Apple Computer, and had

access to a programmer’s dream: Apple’s $15 million Cray Y-MP

supercomputer, one of only two dozen in the world and arguably the

fastest computer in existence at the time. I figured the Cray

would make short work of Piper. But the Cray was not well suited

to the problem. It could barrel through parallel-processing

floating-point matrix calculations like the Cannonball Express,

but Piper was a highly linear, non-mathematical problem. Piper

used only one of the Cray’s four processors and didn’t do the kind

of operations at which the Cray excelled. Piper wasn’t a fair test

of the Cray’s power, but the Cray was still the fastest machine

I’d ever used. The Cray succeeded where all previous machines

(that PDP, my Mac Plus, my Mac II) had failed. It solved

“woodchuck” in less than a day, taking only about 20 hours to

finish its run. I was awestruck – 20 hours?! I’d no idea that

“woodchuck” was _that_ big a problem!

**Young Whippersnappers** — I set Piper aside for many years, but

recently I began to wonder how a modern desktop box compares to

those old minicomputers and mainframes. I rewrote Piper from

memory and ran it on my new 400 MHz ice-blue Power Macintosh G3

with “woodchuck.” The output is below. Piper first reprints the

phrase, then prints solutions and elapsed times as it finds them.

Each solution is the best found so far, culminating in the best of

all. The times are in seconds from the beginning of the run; the

final time is the total run time. (Unfortunately, the best

solution for “woodchuck” is larger than Piper’s calculated

minimum, so Piper continued to run for a bit after finding the

best solution.)

Here are the results. Some intermediate solutions have been left

out for brevity, but you can see Piper finding ever smaller

solutions. In the end, the 57 letters in “woodchuck” are packed

into a 4 by 4 rectangle. Have a look at Piper’s total run time,

and the time needed to find its best solution:

How much wood would a woodchuck chuck if a woodchuck could chuck wood?

0 seconds:

ULD ADLU

HDOAIUCOHWUHOD

UCOWFKHDOMCWOW

K WO

1 seconds:

DLIFADLU

UCKOHWUHOD

OHDWOMCWOW

2 seconds:

ULCHC

HWUHODKUK

OMCWOWAFI

7 seconds:

HWUHOW

OMCWOD

IKAUCW

FLDHK

9 seconds:

HWUH

OMCW

LUOK

HDOI

CWAF

65 seconds:

HWM

OUC

IKH

FWC

ADO

LUO

67 seconds:

HWMU

OOCH

UDWK

LAFI

Total run time: 107 seconds

There you have it: a shade over a minute to find the best

solution, under two minutes to finish its run. Two minutes! So

much for the big iron of the 1980’s. My new G3 Mac finished

“woodchuck” over 600 times faster (and 5,000 times cheaper) than

that 15 megabuck Cray. (For a more realistic comparison, see this

description of UCLA’s Project Appleseed.)

www.apple.com/education/hed/aua0101/appleseed/

If you want to check Piper’s speed on your Mac, I’ve placed the

code in the public domain; it’s a 40K package.

ftp://ftp.tidbits.com/pub/tidbits/misc/piper.hqx

**The Future** — What’s yet to come? 400 MHz already looks a

little pokey. It’s the best Apple offers today, but I’ve seen

claims of 550 MHz or so from third party accelerators and over-

clocking tricks. People are predicting 1 GHz (1,000 MHz) chips for

the near future. Buses are getting faster, and caches hold more

data in less space and are moving onto the processor chip for

still more speed. (Small is fast. Did you know that the speed of

light is a serious limiting factor in modern computer design? The

closer together the components are, the faster they can signal

each other.)

And like the old Cray, multi-processor desktop systems are

starting to appear. They gang up on a problem by having separate

processors work on different parts of the problem simultaneously.

Although I didn’t try to use the Cray’s extra processors, I’ve

done a little thinking lately. Piper doesn’t have to be completely

linear. On an eight-processor system, I bet I could come close to

making Piper run in one-eighth of the time of a single processor.

Are you ready?

Here she comes –

Here she is –

There she GOES!

Leave a Reply

Your email address will not be published. Required fields are marked *