Home
Book Fire Online by http://bookfire.net Prev Page Prev Page
Table of Contents
Back Cover
The Essentials of Computer Organization and Architecture
Preface
To the Instructor
Chapter 1: Introduction
1.2 The Main Components of a Computer
1.3 An Example System - Wading through the Jargon
1.4 Standards Organizations
1.5 Historical Development
1.6 The Computer Level Hierarchy
1.7 The Von Neumann Model
1.8 Non-Von Neumann Models
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Chapter 2: Data Representation in Computer Systems
2.2 Positional Numbering Systems
2.3 Decimal To Binary Conversions
2.4 Signed Integer Representation
2.5 Floating-Point Representation
2.6 Character Codes
2.7 Codes For Data Recording And Transmission
2.8 Error Detection And Correction
Chapter Summary
Further Reading
References
Review Of Essential Terms And Concepts
Exercises
Chapter 3: Boolean Algebra and Digital Logic
3.2 Boolean Algebra
3.3 Logic Gates
3.4 Digital Components
3.5 Combinational Circuits
3.6 Sequential Circuits
3.7 Designing Circuits
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Focus on Karnaugh Maps
Chapter 4: MARIE : An Introduction to a Simple Computer
4.2 Marie
4.3 Instruction Processing
4.4 A Simple Program
4.5 A Discussion on Assemblers
4.6 Extending Our Instruction Set
4.7 A Discussion on Decoding — Hardwired vs. Microprogrammed Control
4.8 Real World Examples of Computer Architectures
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Chapter 5: A Closer Look at Instruction Set Architectures
5.2 Instruction Formats
5.3 Instruction Types
5.4 Addressing
5.5 Instruction-Level Pipelining
5.6 Real-World Examples of ISAs
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Chapter 6: Memory
6.2 Types of Memory
6.3 The Memory Hierarchy
6.4 Cache Memory
6.5 Virtual Memory
6.6 A Real-World Example of Memory Management
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Chapter 7: Input/Output and Storage Systems
7.2 Amdahl's Law
7.3 I/O Architectures
7.4 Magnetic Disk Technology
7.5 Optical Disks
7.6 Magnetic Tape
7.7 RAID
7.8 Data Compression
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Focus on Selected Disk Storage Implementations
Chapter 8: System Software
8.2 Operating Systems
8.3 Protected Environments
8.4 Programming Tools
8.5 Java — All of the Above
8.6 Database Software
8.7 Transaction Managers
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Chapter 9: Alternative Architectures
9.2 RISC Machines
9.3 Flynn's Taxonomy
9.4 Parallel and Multiprocessor Architectures
9.5 Alternative Parallel Processing Approaches
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Chapter 10: Performance Measurement and Analysis
10.2 The Basic Computer Performance Equation
10.3 Mathematical Preliminaries
10.4 Benchmarking
10.6 Disk Performance
Chapter Summary
Further Reading
References
Review Of Essential Terms And Concepts
Exercises
Chapter 11: Network Organization and Architecture
11.2 Early Business Computer Networks
11.3 Early Academic and Scientific Networks — The Roots and Architecture of the Internet
11.5 Network Protocols II — TCP/IP Network Architecture
11.6 Network Organization
11.7 High-Capacity Digital Links
11.8 A Look at the Internet
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Appendix A: Data Structures and the Computer
A.2 Fundamental Structures
A.3 Trees
A.4 Network Graphs
Summary
Further Reading
References
Exercises
Glossary
Glossary Numbers
Glossary A
Glossary B
Glossary C
Glossary D
Glossary E
Glossary F
Glossary G
Glossary H
Glossary I
Glossary J
Glossary K
Glossary L
Glossary M
Glossary N
Glossary O
Glossary P
Glossary Q
Glossary R
Glossary S
Glossary T
Glossary U
Glossary V
Glossary W
Glossary Z
Answers and Hints for Selected Exercises
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Appendix A
Index
Index A
Index B
Index C
Index D
Index E
Index F
Index G
Index H
Index I
Index J
Index K
Index L
Index M
Index N
Index O
Index P
Index Q
Index R
Index S
Index T
Index U
Index V
Index W
Index X
Index Z
List of Figures
List of Tables
List of Code Examples
List of Sidebars
Team LiB
Previous Section Next Section

4.8 Real World Examples of Computer Architectures

The MARIE architecture is designed to be as simple as possible so that the essential concepts of computer architecture would be easy to understand without being completely overwhelming. Although MARIE's architecture and assembly language are powerful enough to solve any problems that could be carried out on a modern architecture using a high-level language such as C++, Ada, or Java, you probably wouldn't be very happy with the inefficiency of the architecture or with how difficult the program would be to write and to debug! MARIE's performance could be significantly improved if more storage were incorporated into the CPU by adding more registers. Making things easier for the programmer is a different matter. For example, suppose a MARIE programmer wants to use procedures with parameters. Although MARIE allows for subroutines (programs can branch to various sections of code, execute the code, and then return), MARIE has no mechanism to support the passing of parameters. Programs can be written without parameters, but we know that using them not only makes the program more efficient (particularly in the area of reuse), but also makes the program easier to write and debug.

To allow for parameters, MARIE would need a stack, a data structure that maintains a list of items that can be accessed from only one end. A pile of plates in your kitchen cabinet is analogous to a stack: You put plates on the top and you take plates off the top (normally). For this reason, stacks are often called last-in-first-out structures. (Please see Appendix A at the end of this book for a brief overview of the various data structures.)

We can emulate a stack using certain portions of main memory if we restrict the way data is accessed. For example, if we assume memory locations 0000 through 00FF are used as a stack, and we treat 0000 as the top, then pushing (adding) onto the stack must be done from the top, and popping (removing) from the stack must be done from the top. If we push the value 2 onto the stack, it would be placed at location 0000. If we then push the value 6, it would be placed at location 0001. If we then performed a pop operation, the 6 would be removed. A stack pointer keeps track of the location to which items should be pushed or popped.

MARIE shares many features with modern architectures but is not an accurate depiction of them. In the next two sections, we introduce two contemporary computer architectures to better illustrate the features of modern architectures that, in an attempt to follow Leonardo da Vinci's advice, were excluded from MARIE. We begin with the Intel architecture (the x86 and the Pentium families) and then follow with the MIPS architecture. We chose these architectures because, although they are similar in some respects, they are built on fundamentally different philosophies. Each member of the x86 family of Intel architectures is known as a CISC (Complex Instruction Set Computer) machine, whereas the Pentium family and the MIPS architectures are examples of RISC (Reduced Instruction Set Computer) machines.

CISC machines have a large number of instructions, of variable length, with complex layouts. Many of these instructions are quite complicated, performing multiple operations when a single instruction is executed (e.g., it is possible to do loops using a single assembly language instruction). The basic problem with CISC machines is that a small subset of complex CISC instructions slows the systems down considerably. Designers decided to return to a less complicated architecture and to hardwire a small (but complete) instruction set that would execute extremely quickly. This meant it would be the compiler's responsibility to produce efficient code for the ISA. Machines utilizing this philosophy are called RISC machines.

RISC is something of a misnomer. It is true that the number of instructions is reduced. However, the main objective of RISC machines is to simplify instructions so they can execute more quickly. Each instruction performs only one operation, they are all the same size, they have only a few different layouts, and all arithmetic operations must be performed between registers (data in memory cannot be used as operands). Virtually all new instruction sets (for any architectures) since 1982 have been RISC, or some sort of combination of CISC and RISC. We cover CISC and RISC in detail in Chapter 9.

4.8.1 Intel Architectures

The Intel Corporation has produced many different architectures, some of which may be familiar to you. Intel's first popular chip, the 8086, was introduced in 1979 and used in the IBM PC computer. It handled 16-bit data and worked with 20-bit addresses, thus it could address a million bytes of memory. (A close cousin of the 8086, the 8-bit 8088, was used in many PCs to lower the cost.) The 8086 CPU was split into two parts: the execution unit, which included the general registers and the ALU, and the bus interface unit, which included the instruction queue, the segment registers, and the instruction pointer.

The 8086 had four 16-bit general purpose registers named AX (the primary accumulator), BX (the base register used to extend addressing), CX (the count register), and DX (the data register). Each of these registers was divided into two pieces: the most significant half was designated the "high" half (denoted by AH, BH, CH, and DH), and the least significant was designated the "low" half (denoted by AL, BL, CL, and DL). Various 8086 instructions required the use of a specific register, but the registers could be used for other purposes as well. The 8086 also had three pointer registers: the stack pointer (SP), which was used as an offset into the stack; the base pointer (BP), which was used to reference parameters pushed onto the stack; and the instruction pointer (IP), which held the address of the next instruction (similar to MARIE's PC). There were also two index registers: the SI (source index) register, used as a source pointer for string operations, and the DI (destination index) register, used as a destination pointer for string operations. The 8086 also had a status flags register. Individual bits in this register indicated various conditions, such as overflow, parity, carry interrupt, and so on.

An 8086 assembly language program was divided into different segments, special blocks or areas to hold specific types of information. There was a code segment (for holding the program), a data segment (for holding the program's data), and a stack segment (for holding the program's stack). To access information in any of these segments, it was necessary to specify that item's offset from the beginning of the corresponding segment. Therefore, segment pointers were necessary to store the addresses of the segments. These registers included the code segment (CS) register, the data segment (DS) register, and the stack segment (SS) register. There was also a fourth segment register, called the extra segment (ES) register, which was used by some string operations to handle memory addressing. Addresses were specified using segment/offset addressing in the form: xxx:yyy, where xxx was the value in the segment register and yyy was the offset.

In 1980, Intel introduced the 8087, which added floating-point instructions to the 8086 machine set as well as an 80-bit wide stack. Many new chips were introduced that used essentially the same ISA as the 8086, including the 80286 in 1982 (which could address 16 million bytes) and the 80386 in 1985 (which could address up to 4 billion bytes of memory). The 80386 was a 32-bit chip, the first in a family of chips often called IA-32 (for Intel Architecture, 32-bit). When Intel moved from the 16-bit 80286 to the 32-bit 80386, designers wanted these architectures to be backward compatible, which means that programs written for a less powerful and older processor should run on the newer, faster processors. For example, programs that ran on the 80286 should also run on the 80386. Therefore, Intel kept the same basic architecture and register sets. (New features were added to each successive model, so forward compatibility was not guaranteed.)

The naming convention used in the 80386 for the registers, which had gone from 16 to 32 bits, was to include an "E" prefix (which stood for "extended"). So instead of AX, BX, CX, and DX, the registers became EAX, EBX, ECX, and EDX. This same convention was used for all other registers. However, the programmer could still access the original registers, AX, AL, and AH, for example, using the original names. Figure 4.16 illustrates how this worked, using the AX register as an example.


Figure 4.16: The EAX Register, Broken into Parts

The 80386 and 80486 were both 32-bit machines, with 32-bit data buses. The 80486 added a high-speed cache memory (see Chapter 6 for more details on cache and memory), which improved performance significantly.

The Pentium series (Intel changed the name from numbers such as 80486 to "Pentium" because it was unable to trademark the numbers) started with the Pentium processor, which had 32-bit registers and a 64-bit data bus and employed a superscalar design. This means the CPU had multiple ALUs and could issue more than one instruction per clock cycle (i.e., run instructions in parallel). The Pentium Pro added branch prediction, while the Pentium II added MMX technology (which most will agree was not a huge success) to deal with multimedia. The Pentium III added increased support for 3D graphics (using floating point instructions). Historically, Intel used a classic CISC approach throughout its processor series. The more recent Pentium II and III used a combined approach, employing CISC architectures with RISC cores that could translate from CISC to RISC instructions. Intel was conforming to the current trend by moving away from CISC and toward RISC.

The seventh generation family of Intel CPUs introduced the Intel Pentium 4 (P4) processor. This processor differs from its predecessors in several ways, many of which are beyond the scope of this text. Suffice it to say that the Pentium 4 processor has clock rates of 1.4GHz (and higher), uses no less than 42 million transistors for the CPU, and implements something called a "Netburst" microarchitecture. (The processors in the Pentium family, up to this point, had all been based on the same microarchitecture, a term used to describe the architecture below the instruction set.) This new microarchitecture is composed of several innovative technologies, including a hyper-pipeline (we cover pipelines in Chapter 5), a 400MHz (and faster) system bus, and many refinements to cache memory and floating-point operations. This has made the P4 an extremely useful processor for multimedia applications.

The introduction of the Itanium processor in 2001 marked Intel's first 64-bit chip (IA-64). Itanium includes a register-based programming language and a very rich instruction set. It also employs a hardware emulator to maintain backward compatibility with IA-32/x86 instruction sets. This processor has 4 integer units, 2 floating point units, a significant amount of cache memory at 4 different levels (we study cache levels in Chapter 6), 128 floating point registers, 128 integer registers, and multiple miscellaneous registers for dealing with efficient loading of instructions in branching situations. Itanium can address up to 16GB of main memory.

The assembly language of an architecture reveals significant information about that architecture. To compare MARIE's architecture to Intel's architecture, let's return to Example 4.1, the MARIE program that used a loop to add five numbers. Let's rewrite the program in x86 assembly language, as seen in Example 4.4. Note the addition of a Data segment directive and a Code segment directive.

Listing 4.4:
Start example

A program using a loop to add five numbers written to run on a Pentium.

    .DATA
Num1    EQU    10    ; Num1 is initialized to 10
    EQU    15    ; Each word following Num1 is initialized
    EQU    20
    EQU    25
    EQU    30
Num    DB    5    ; Initialize the loop counter
Sum    DB    0    ; Initialize the Sum
 
    .CODE
    LEA    EBX, Num1    ; Load the address of Num1 into EBX
    MOV    ECX, Num    ; Set the loop counter
    MOV    EAX, 0    ; Initialize the sum
    MOV    EDI, 0    ; Initialize the offset (of which number to add)
Start:    ADD    EAX, [EBX+EDI *4]    ; Add the EBXth number to EAX
    INC    EDI    ; Increment the offset by 1
    DEC    ECX    ; Decrement the loop counter by 1
    JG    Start    ; If counter is greater than 0, return to Start
    MOV    Sum, EAX    ; Store the result in Sum

We can make the above program easier to read (which also makes it look less like MARIE's assembly language) by using the loop statement. Syntactically, the loop instruction resembles a jump instruction, in that it requires a label. The above loop can be rewritten as follows:

    MOV    ECX, Num    ; Set the counter
Start: ADD    EAX, [EBX + EDI + 4]
    INC    EDI
    LOOP Start
    MOV    Sum, EAX
End example

The loop statement in x86 assembly is similar to the do...while construct in C, C++, or Java. The difference is that there is no explicit loop variable-the ECX register is assumed to hold the loop counter. Upon execution of the loop instruction, the processor decreases ECX by one, and then tests ECX to see if it is equal to zero. If it is not zero, control jumps to Start; if it is zero, the loop terminates. The loop statement is an example of the types of instructions that can be added to make the programmer's job easier, but which aren't necessary for getting the job done.

4.8.2 MIPS Architectures

The MIPS family of CPUs has been one of the most successful and flexible designs of its class. The MIPS R3000, R4000, R5000, R8000, and R10000 are some of the many registered trademarks belonging to MIPS Technologies, Inc. MIPS chips are used in embedded systems, in addition to computers (such as Silicon Graphics machines) and various computerized toys (Nintendo and Sony use the MIPS CPU in many of their products). Cisco, a very successful manufacturer of Internet routers, uses MIPS CPUs as well.

The first MIPS ISA was MIPS I, followed by MIPS II through MIPS V. The current ISAs are referred to as MIPS32 (for the 32-bit architecture) and MIPS64 (for the 64-bit architecture). Our discussion in this section is focused on MIPS32. It is important to note that MIPS Technologies made a decision similar to that of Intel-as the ISA evolved, backward compatibility was maintained. And like Intel, each new version of the ISA included operations and instructions to improve efficiency and handle floating point values. The new MIPS32 and MIPS64 architectures have significant improvements in VLSI technology and CPU organization. The end result is notable cost and performance benefits over traditional architectures.

Like IA-32 and IA-64, the MIPS ISA embodies a rich set of instructions, including arithmetic, logical, comparison, data transfer, branching, jumping, shifting, and multimedia instructions. MIPS is a load/store architecture, which means that all instructions (other than the load and store instructions) must use registers as operands (no memory operands are allowed). MIPS32 has 168 32-bit instructions, but many are similar. For example, there are six different add instructions, all of which add numbers, but they vary in the operands and registers used. This idea of having multiple instructions for the same operation is common in assembly language instruction sets. Another common instruction is the MIPS NOP (no-op) instruction, which does nothing except eat up time (NOPs are used in pipelining as we see in Chapter 5).

The CPU in a MIPS32 architecture has 32 32-bit general purpose registers numbered r0 through r31. (Two of these have special functions: r0 is hard-wired to a value of 0 and r31 is the default register for use with certain instructions, which means it does not have to be specified in the instruction itself.) In MIPS assembly, these 32 general purpose registers are designated $0, $1, . . . , $31. Register 1 is reserved, and registers 26 and 27 are used by the operating system kernel. Registers 28, 29, and 30 are pointer registers. The remaining registers can be referred to by number, using the naming convention shown in Table 4.8. For example, you can refer to register 8 as $8 or as $t0.

Table 4.8: MIPS32 Register Naming Convention

Naming Convention

Register Number

Value Put in Register

$v0-$v1

2-3

Results, expressions

$a0-$a3

4-7

Arguments

$t0-$t7

8-15

Temporary values

$s0-$s7

16-23

Saved values

$t8-$t9

24-25

More temporary values

There are two special purpose registers, HI and LO, which hold the results of certain integer operations. Of course, there is a PC (program counter) register as well, giving a total of three special purpose registers.

MIPS32 has 32 32-bit floating point registers that can be used in single-precision floating-point operations (with double-precision values being stored in even-odd pairs of these registers). There are 4 special-purpose floating-point control registers for use by the floating-point unit.

Let's continue our comparison by writing the programs from Examples 4.1 and 4.4 in MIPS32 assembly language.

Listing 4.5:
Start example
    . . .
        .data
    # $t0 = sum
    # $t1 = loop counter Ctr
Value:    .word 10, 15,20,25,30
    Sum = 0
    Ctr = 5
    .text
    .global main    # declaration of main as a global variable
main:    lw $t0, Sum    # Initialize register containing sum to zero
    lw $t1, Ctr    # Copy Ctr value to register
    la $t2, value    # $t2 is a pointer to current value
while:    blez #t1, end_while    # Done with loop if counter <= 0
    lw #t3, 0($t2)    # Load value offset of 0 from pointer
    add $t0, $t0, $t3    # Add value to sum
    addi $t2, $t2, 4    # Go to next data value
    sub #t1, $t1, 1    # Decrement Ctr
    b while        # Return to top of loop
    la $t4, sum    # Load the address of sum into register
    sw $t0, 0($t4)    # Write the sum into memory location sum
    . . .
End example

This is similar to the Intel code in that the loop counter is copied into a register, decremented during each interation of the loop, and then checked to see if it is less than or equal to zero. The register names may look formidable, but they are actually easy to work with once you understand the naming conventions.

If you are interested in writing MIPS programs, but don't have a MIPS machine, there are several simulators that you can use. The most popular is SPIM, a self-contained simulator for running MIPS R2000/R3000 assembly language programs. SPIM provides a simple debugger and implements almost the entire set of MIPS assembly instructions. The SPIM package includes source code and a full set of documentation. It is available for many flavors of Unix (including Linux), Windows (PC), and Windows (DOS), as well as Macintosh. For further information, see the references at the end of this chapter.

If you examine Examples 4.1, 4.4, and 4.5, you will see that the instructions are quite similar. Registers are referenced in different ways and have different names, but the underlying operations are basically the same. Some assembly languages have larger instructions sets, allowing the programmer more choices for coding various algorithms. But, as we have seen with MARIE, a large instruction set is not absolutely necessary to get the job done.


Team LiB
Previous Section Next Section
Linking to Www Google.Com. Host by Book Fire