VDict mobile



processor An implementation or the Advanced RISC Machine
microprocessor architecture using the micropipeline design
style. In April 1994 the Amulet group in the Computer Science
department of Manchester University took delivery of the
AMULET1 microprocessor. This was their first large scale
asynchronous circuit and the world's first implementation of a
commercial microprocessor architecture (ARM) in asynchronouslogic.
Work was begun at the end of 1990 and the design despatched
for fabrication in February 1993. The primary intent was to
demonstrate that an asynchronous microprocessor can consume
less power than a synchronous design.
The design incorporates a number of concurrent units which
cooperate to give instruction level compatibility with the
existing synchronous part. These include an Address unit,
which autonomously generates instruction fetch requests and
interleaves (nondeterministically) data requests from the
Execution unit; a Register file which supplies operands,
queues write destinations and handles data dependencies; an
Execution unit which includes a multiplier, a shifter and an
ALU with data-dependent delay; a Data interface which
performs byte extraction and alignment and includes an
instruction prefetch buffer, and a control path which
performs instruction decode. These units only synchronise
to exchange data.
The design demonstrates that all the usual problems of
processor design can be solved in this asynchronous framework:
backward instruction set compatibility, interrupts and
exact exceptions for memory faults are all covered. It
also demonstrates some unusual behaviour, for instance
nondeterministic prefetch depth beyond a branch instruction
(though the instructions which actually get executed are, of
course, deterministic). There are some unusual problems for
compiler optimisation, as the metric which must be used to
compare alternative code sequences is continuous rather than
discrete, and the nondeterminism in external behaviour must
also be taken into account.
The chip was designed using a mixture of custom datapath and
compiled control logic elements, as was the synchronous ARM.
The fabrication technology is the same as that used for one
version of the synchronous part, reducing the number of
variables when comparing the two parts.
Two silicon implementations have been received and preliminary
measurements have been taken from these. The first is a 0.7um
process and has achieved about 28 kDhrystones running the
standard benchmark program. The other is a 1 um
implementation and achieves about 20 kDhrystones. For the
faster of the parts this is equivalent to a synchronous ARM6
clocked at around 20MHz; in the case of AMULET1 it is likely
that this speed is limited by the memory system cycle time
(just over 50ns) rather than the processor chip itself.
A fair comparison of devices at the same geometries gives the
AMULET1 performance as about 70% of that of an ARM6 running
at 20MHz. Its power consumption is very similar to that of
the ARM6; the AMULET1 therefore delivers about 80 MIPS/W
(compared with around 120 from a 20MHz ARM6). Multiplication
is several times faster on the AMULET1 owing to the inclusion
of a specialised asynchronous multiplier. This performance is
reasonable considering that the AMULET1 is a first generation
part, whereas the synchronous ARM has undergone several design
iterations. AMULET2 (currently under development) is expected
to be three times faster than AMULET1 - 120 kdhrystones -
and use less power.
The macrocell size (without pad ring) is 5.5 mm by 4.5 mm
on a 1 micron CMOS process, which is about twice the area of
the synchronous part. Some of the increase can be attributed
to the more sophisticated organisation of the new part: it has
a deeper pipeline than the clocked version and it supports
multiple outstanding memory requests; there is also
specialised circuitry to increase the multiplication speed.
Although there is undoubtedly some overhead attributable to
the asynchronous control logic, this is estimated to be closer
to 20% than to the 100% suggested by the direct comparison.
AMULET1 is code compatible with ARM6 and is so is capable of
running existing binaries without modification. The
implementation also includes features such as interrupts and
memory aborts.
The work was part of a broad ESPRIT funded investigation
into low-power technologies within the European OpenMicroprocessor systems Initiative (OMI) programme, where
there is interest in low-power techniques both for portable
equipment and (in the longer term) to alleviate the problems
of the increasingly high dissipation of high-performance
chips. This initial investigation into the role asynchronouslogic might play has now demonstrated that asynchronous
techniques can be applied to problems of the scale of a
complete microprocessor.
(1994-12-08)