Index
Home
About
Blog
From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.arch
Subject: Re: SPARC Vs MIPS (really, embedded control usage)
Date: 22 Apr 1995 03:52:53 GMT
(catching up - most of the dust seems to have settled ... but there were some
potentially-misleading comments among the sequence that need to be fixed):
1) INTRODUCTION
a) Most of the structure of the R2000 was designed in the Nov'84-
June '86 time period, and the first chips came back at the end of 1985.
b) There were certainly some features that might have been different,
given the short time. NMI is indeed one of them... and I might have
liked a little more flexibility in not having fixed addresses
(some of which has been fixed later).
c) However, some features that people have sometimes complained about
as accidents or unintended omissions ... were done on purpose, or in
a few cases were conscious implementation tradeoffs given the limited
die area available.
2) MARKET
a) In 1994, there were 1.67M MIPS chips sold, of which 10% went
into computer systems, and the oher 90% went into: avionics, laser
printers, copiers, communcations boards, games, telephone switches
and lots of other things, and increasingly-coming consumer products.
b) This is nice, because it adds volume, and helps amortize the
development costs - many of these chips were derived from R3000s,
which after all, first appeared in an $80K rack-mount system in 1988.
c) Recall that MIPS had a somewhat different model than Sun:
do a design and then provide the design & verification information
to the various chip partners, and encourage them to modify the chip,
or use the information as desired to create new versions for
various markets. Such proliferation is especially necessary at the
lower cost points that are very sensitive to cost and features.
d) As a result, I think there are more different flavors of MIPS-
architecture chips being sold than any other RISC ... or if not,
there are at least enough of them that I lose track myself.
e) Anyway, this wasn't particularly accidental, even if it wasn't
always consistently handled ... and some thought was being given to
this from very early in the architecture's life, as detailed below.
3) SOME ASSUMPTIONS AND FEATURES
a) The architecture was required to be good for running UNIX ...
but not be UNIX-specific. In fact, it was a specific hope that
the same chips actually have some use in very non-UNIX OS's like
telephone switches (which require low-overhead context-switching),
and high-end embedded applications. Low/mid-range applications would
require different chip variants, as actually happened.
b) Uncached usage was planned from day one, but only in certain ways.
1) Running code uncached/unmapped simplified the hardware
needed when coming out of reset: let the code do it,
then go cached/mapped when it felt like it.
2) The idea of running serious amounts of time-critical code
uncached was simply though to be irrelevant, because the
speed was dragged down to memory speed. The whole instruction
set design assumed there would always be an I-cache.
c) The exception mechanism was actually spec'd/negotiated between
operating system people and the chip designers ... and (we) OS people
got most of what we asked for. We specifically didn't care much for the
typical exception vectors that we'd seen on many past CPUs, as we
*counted cycles* not instructions, as counting the latter can often
be a fallacy; people have often been misled by the fact that some
microcoded engine does *everything* ... so it looks simple, but
can take 100s of cycles to do it.
a) Each exception vectors to a different location.
b) But the locations are not very far apart, so they
don't have much code ... in fact, a very typical
sequence would be:
c) Hardware saves all the register in some place ... and this
speed is determined by ability to store to memory ...
which in some microcoded CPUs, was possible to make go faster
than regular instructions ... whereas RISCs usually have
expected that a series of store instructions found in the
I-cache would store data as fast as could be done.
d) Hardware vectors the PC somewhere depending on the CAUSE;
a common thing would be do this as:
VECTOR-BASE | CAUSE<<n, where n was fixed enough
to allow a small nubmer of instructions at each vector
location.
e) The code at the vector targets would look like:
exception1: reg1 = 1
jump common
exception2: reg1 = 2
jump common
....
common:
manipulate state
set up overall kernel / C environment
reg1 = function-table[reg1]
jump (reg1)
f) Now, on a a RISC machine, there's no great benefit to
adding a pile of special hardware that saves state:
1) All of the state is visible.
2) Normal stores can store as fast as anything else.
3) If there are a lot of exceptions, you'd expect the
exception-handling code to live in the I-cache.
(Yes, I understand the hard-real-time issues ... which
is why some chips have lockable cache segments.)
g) In addition, on a machine that *depends* on I-cache for
reasonable performance, and knowing that branches are always
bad for pipeline bubbles anyway, the *last* thing I want to do
is vector one place for a few instructions, and then go off to
common code anyway ... especially if the cache line size
(which may well be different for different implementations)
is larger than the set of instructions at the vector
locations.
h) Hence, for a UNIX or UNIX-like system, we preferred:
1) Hardware saves a CAUSE code, goes to the common
exception address, and arrives with interrupts
masked off ... all of which is simple and quick.
2) The OS saves the registers, then uses the CAUSE
register to vector off as it wishes.
3) Yes, there are a bunch of store instructions ...
but this was deemed trivial.
i) On the other hand, we considered the possibility of other
OS flavors, and observed that one could:
1) Save a few registers, just enough to work with.
2) Use the CAUSE code to vector off right away ...
3) To individual routines that could then save
a lot of state .. or do a little work, and return.
and that cycle-wise, especially with potential cache misses,
we didn't see why this wouldn't be competitive with multiple
vectors ... and it was certainly easier to implement.
d) Summary of this part: having lots of exception vector addresses
was left out on purpose, and with software involvement, and with
people counting *cycles*; the answer we got may or may not be right,
and there may well be circumstances where there are better ways,
but it certainly wasn't accidental.
-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: mash@sgi.com
DDD: 415-390-3090 FAX: 415-967-8496
USPS: Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311
Index
Home
About
Blog