which performs a task in several steps, like an assembly line
in a factory. Each functional unit takes inputs and produces
outputs which are stored in its output
buffer. One stage's
output buffer is the next stage's input buffer. This
arrangement allows all the stages to work in parallel thus
giving greater throughput than if each input had to pass
through the whole pipeline before the next input could enter.
The costs are greater latency and complexity due to the need
to synchronise the stages in some way so that different inputs
do not interfere. The pipeline will only work at full
efficiency if it can be filled and emptied at the same rate
that it can process.
Pipelines may be synchronous or asynchronous. A synchronous
pipeline has a master clock and each stage must complete its
work within one cycle. The minimum clock period is thus
determined by the slowest stage. An asynchronous pipeline
requires
handshaking between stages so that a new output is
not written to the interstage buffer before the previous one
has been used.
Many
CPUs are arranged as one or more pipelines, with
different stages performing tasks such as fetch instruction,
decode instruction, fetch arguments, arithmetic operations,
store results. For maximum performance, these rely on a
continuous stream of instructions fetched from sequential
locations in memory. Pipelining is often combined with
busy.
When a
branch is taken, the contents of early stages will
contain instructions from locations after the branch which
should not be executed. The pipeline then has to be flushed
(1996-10-13)