Computer System Architecture (PART 2)


Combinational Logic Applications

Adders
Most mathematical operations can be handled with addition.
For example,
subtraction can be performed by taking the
two's complement of a binary value, and then adding it to the
binary value from which it was to be subtracted. Two numbers
can be multiplied using multiple additions. Counting either up
or down (incrementing or decrementing) can be performed with
additions of 1 or -1.

Chapter 2 showed that binary addition is performed just like
decimal addition, the only difference being that decimal has 10
numerals while binary has 2. When adding two digits in binary,
a result greater than one generates an "overflow", i.e., a one is added
to the next position.

This produces a sum of 0 with a carry of 1 to the next position.

img

A well-defined process such as this is easily realized with digital
logic. Image below shows the block diagram of a system that takes
two binary inputs, A and B, and adds them together producing a bit
for the sum and a bit indicating whether or not a carry occurred.
This well known circuit is commonly referred to as a half-adder.

img

With two inputs, there are four possible patterns of ones and zeros.

img

A truth table can be derived from the image above from which the
Boolean expressions can be developed to realize this system.

img

The simplicity of a two-input truth table makes the use of a
Karnaugh map unnecessary. Examining the Sum column shows
that we should have an output of one when A=0 and B=1 and
when A=1 and B=0.

This gives us the following SOP expression:

img

Note that the output Sum is also equivalent to the 2-input XOR
gate. For Carryout, the output equals 1 only when both A and B
are equal to one. This matches the operation of the AND gate.

img

Image below presents the logic circuit for half-adders

img

The half-adder works fine if we're trying to add two bits together,
a situation that typically occurs only in the rightmost column of a
multibit addition. The remaining columns have the potential of
adding a third bit, the carry from a previous column.

img

For example, assume we want to add two four bit numbers,
A = 01102 and B = 10112. The addition would go something like
that shown to the left.

Adding the least significant bits of a multi-bit value uses the
halfadder described above. Each input to the half-adder takes one
of the least significant bits from each number. The outputs are the
least significant digit of the sum and a possible carry to the next
column.

What is needed for the remaining columns is an adder similar to
the Half-adder that can add two bits along with a carry from the
previous column to produce a Sum and the Carryout to the next
column.

Image below represents this operation where An is the bit in the
nth position of A, Bn is the bit in the nth position of B, and Sn is the
bit in the nth position in the resulting sum, S.

Notice that a Carryout from the addition of a pair of bits goes into
the carry input of the adder for the next bit. We will call the input
Carryin. This implies that we need to create a circuit that can add
three bits, An, Bn, and Carryin from the n-1 position. This adder has
two outputs, the sum and the Carryout to the n+1 position. The
resulting circuit is called a full adder. A block diagram of the full
adder is shown in image below

img

img

With three inputs there are 23 = 8 possible patterns of ones and
zeros that could be input to our full adder. Table 8-1 lists these
combinations along with the results of their addition which range
from 0 to 310.

img

The two-digit binary result in the last column of this table can
be broken into its components, the sum and a carry to the next
bit position. This gives us two truth tables, one for the Sum and one
for the Carryout.

Sum and Carryout Truth Tables for a Full Adder

img

With three inputs, a Karnaugh map can be use to create the
logic expressions. One Karnaugh map will be needed for each
output of the circuit. Image below presents the Karnaugh maps for
the Sum and the Carryout outputs of our full adder where Cin
represents the Carryin input.

img

The Carryout Karnaugh map has three rectangles, each containing
two cells and all three overlapping on the cell defined by A=1, B=1,
and Cin=1. The Karnaugh map for the Sum is less promising. In
fact, there is no way to make a more complex 3-input Karnaugh
map than the one that exists for the Sum output of the full adder.

The addition or removal of a '1' in any cell of the map will result
in a simpler expression. The four single-cell rectangles result in
the four products of the SOP expression for the Sum output
shown following the Carryout expression.

img

img

Image below presents the circuit for the full adder:

img

Now we have the building blocks to create an adder of any size.
For example,
a 16-bit adder is made by using a half adder for the
least significant bit followed by fifteen full adders daisy-chained
through their carries for the remaining fifteen bits.

This method of creating adders has a slight drawback, however.
Just as with the addition of binary numbers on paper, the sum of
the higherorder bits cannot be determined until the carry from
the lower-order bits has been calculated and propagated through
the higher stages.

Modern adders use additional logic to predict whether the
higher-order bits should expect a carry or not well before the sum
of the lower-order bits is calculated. These adders are called carry
look ahead adders.


Multiplexers
A multiplexer, sometimes referred to as a MUX, is a device that uses a set of control inputs to select which of several data inputs is to be connected to a single data output. With n binary "select lines," one of 2n data inputs can be connected to the output.

Image below presents a block diagram of a multiplexer with three select lines, S2, S1, and S0, and eight data lines, D0 through D7.

Multiplexer divides one high-speed communication circuit into several lower speed circuits (for the primary reason of saving communication line cost) which allows many devices to use it simultaneously.

img

A multiplexer acts like a television channel selector. All of the stations are broadcast constantly to the television's input, but only the channel that has been selected is displayed. As for the eight-channel multiplexer in image above, its operation can be described with the truth table shown below,

img

For example, if the selector inputs are set to S2 = 0, S1 = 1, and S0 = 1, then the data present at D3 will be output to Y. If D3 = 0, then Y will output a 0. The number of data inputs depends on the number of selector inputs.

For example, if there is only one selector line, S0, then there can only be two data inputs D0 and D1. When S0 equals zero, D0 is routed to the output. When S0 equals one, D1 is routed to the output. Two selector lines, S1 and S0, allow for four data inputs, D0, D1, D2, and D3.


De-Multiplexers
The previous section described how multiplexers select one channel from a group of input channels to be sent to a single output. Demultiplexers take a single input and select one channel out of a group of output channels to which it will route the input. It's like having multiple printers connected to a computer.

A document can only be printed to one of the printers, so the computer selects one out of the group of printers to which it will send its output. The design of a demultiplexer is much like the design of a decoder.

The decoder selected one of many outputs to which it would send a zero. The difference is that the demultiplexer sends data to that output rather than a zero. The circuit of a demultiplexer is based on the non-active-low decoder where each output is connected to an AND gate.

An input is added to each of the AND gates that will contain the demultiplexer's data input. If the data input equals one, then the output of the AND gate that is selected by the selector inputs will be a one.

If the data input equals zero, then the output of the selected AND gate will be zero. Meanwhile, all of the other AND gates output a zero, i.e., no data is passed to them. Image below presents a demultiplexer circuit with two selector inputs

img

In effect, the select lines, S0, S1, … Sn, "turn on" a specific AND gate that passes the data through to the selected output. In image above, if S1=0 and S0=1, then the D1 output will match the input from the Data line and outputs D0, D2, and D3 will be forced to have an output of zero.

If S1=0, S0=1, and Data=0, then D1=0. If S1=0, S0=1, and Data=1, then D1=1. Image below presents the truth table for the 1-line-to-4-line demultiplexer shown in image above

img


Decoders & Encoders
Encoders

An encoder is a device used to change a signal (such as a bitstream) or data into a code. The code may serve any of a number of purposes such as compressing information for transmission or storage, encrypting or adding redundancies to the input code, or translating from one code to another. This is usually done by means of a programmed algorithm, especially if any part is digital, while most analog encoding is done with analog circuitry.

Decoders

A decoder is a device which does the reverse of an encoder, undoing the encoding so that the original information can be retrieved. The same method used to encode is usually just reversed in order to decode.

In digital electronics this would mean that a decoder is a multiple-input, multiple-output logic circuit that converts coded inputs into coded outputs, where the input and output codes are different. e.g. n-to-2n, BCD decoders.

Enable inputs must be on for the decoder to function, otherwise its outputs assume a single "disabled" output code word. Decoding is necessary in applications such as data multiplexing, 7 segment display and memory address decoding.

The simplest decoder circuit would be an AND gate because the output of an AND gate is "High" (1) only when all its inputs are "High". A slightly more complex decoder would be the n-to-2n type binary decoders. These type of decoders are combinational circuits that convert binary information from 'n' coded inputs to a maximum of 2n unique outputs.

We say a maximum of 2n outputs because in case the 'n' bit coded information has unused bit combinations, the decoder may have less than 2n outputs. We can have 2-to-4 decoder, 3-to-8 decoder or 4-to-16 decoder. We can form a 3-to-8 decoder from two 2-to-4 decoders (with enable signals).

One application where digital signals are used to enable a device is to identify the unique conditions to enable an operation. For example, the magnetron in a microwave is enabled only when the timer is running and the start button is pushed and the oven door is closed.

This method of enabling a device based on the condition of a number of inputs is common in digital circuits. One common application is in the processor’s interface to memory. It is used to determine which memory device will contain a piece of data.

In the microwave example, the sentence used to describe the enabling of the magnetron joined each of the inputs with the word "and". Therefore, the enabling circuit for the magnetron should be realized with an AND gate as shown in image below

img

Decoder circuits are a group of enable circuits that have an individual output that satisfies each row of the truth table. In other words, a decoder has a unique output for each combination of ones and zeros possible at its inputs.

For example, a 2-input decoder circuit with inputs A and B can have an output that is 1 only when A=0 and B=0, an output that is 1 only when A=0 and B=1, an output that is 1 only when A=1 and B=0, and an output that is 1 only when A=1 and B=1.

The Boolean expressions that satisfy this decoder circuit are:

img

This two-input circuit is called a 1-of-4 decoder due to the fact that exactly one of its four outputs will be enabled at any one time. A change at any of the inputs will change which output is enabled, but never change the fact that only one is enabled. As for the logic circuit, it has four AND gates, one satisfying each of the above Boolean expressions.

Image below presents this digital circuit.

img


IC (Integrated Circuit)

It may appear that much of our discussion up to this point
has been theoretical, but in reality, each of the circuits we've
presented can easily be implemented given the right tools.

Prototypes used to test or verify circuit designs can be made
by wiring together small plastic chips that offer access to the
internal components through thin metal pins. These chips,
called integrated circuits (ICs), come in a wide variety of shapes,
sizes, and pin configurations. Image below presents a sample
of some ICs.

img

Connecting the metal pins of these chips with other metal pins
from the same chip or additional chips is what allows us to
create digital circuits. As for what we are connecting to them,
the metal pins of the ICs allow us access to the internal circuitry
such as the inputs and outputs of logic gates. Detailed
information is available for all ICs from the manufacturer
allowing designers to understand the internal circuitry.

The documentation defining the purpose of each pin of the IC
is usually referred to as the IC's "pin-out description." It provides
information not only on the digital circuitry, but also any power
requirements needed to operate the IC.

Image below presents an example of the pin-out of a quad
dual-input NAND gate chip, commonly referred to as a


img

Note that the pins are numbered. In order to properly use one
of these ICs, you must be able to identify the pin numbers. To
help you do this, the manufacturers identify the first pin,
referred to as "pin 1", on every IC. The image below presents
some of the ways this pin is identified.

img

The pins are then numbered counter-clockwise around the
chip. You can see this in the numbering of the pins in image
5.5.3. Many circuits are then built and tested using prototype
boards or protoboards. A protoboard is a long, thin plastic
board with small holes in it that allow ICs and short wire leads
to be plugged in. A generic protoboard is shown in image 5.5.4.

img

The protoboard allows the user to insert an IC so that it
straddles the gap running along the center of the board. Wires
can then be used to connect the pins to other sockets on the
protoboard. The rows on the top and bottom edges of the board
in Image 5.5.5 are used to connect power (Vcc) and ground (GND)
to the IC. Figure 8-34 shows a sample circuit with two chips
wired together.

img

The next step is to add input and output that will allow us to
communicate with our circuit. The simplest output from a
digital circuit is an LED. Image 5.5.6 presents the schematic
symbol of an LED.

img

An LED will turn on only when a small current passes through
it from node A to node B. No light will appear if there is no
current or if the current tries to flow in the opposite direction.
By the way, if your LED doesn't work like you think it should, try
to turn it around.

There are two things to note here. First, the current must be
very small. In order to keep the current small enough to protect
the LED, we need an electronic device called a resistor. This
resistor is placed in series with the LED to limit the current. If
you forget the resistor, you will hear a small pop and smell an
awful burning odor when you power up your circuit. Image
5.5.7 shows a typical LED circuit.

It is important to note that the LED will turn on only when the
output from the IC equals zero. This is the best way to drive an
LED. It keeps the ICs from having to supply too much current.

The simplest input to a digital circuit is a switch. It seems that
the logical way to connect a switch to a digital circuit would be
to connect it so that it toggles between a direct connection to a
logic 1 and a direct connection to a logic 0.

Switching back and forth between these connections should
produce binary 1's and 0's, right? Due to the electronics behind
IC inputs, this is not the case. Instead, connections to positive
voltages are made through resistors called pullup resistors.

This protects the IC by limiting the current flowing into it
while still providing a positive voltage that can be read as a
logic one. Image 5.5.8 presents a generic switch design for a
single input to a digital circuit. It uses a pull-up resistor
connected to 5 volts which represents the circuit's power
source.

img

img



Binary Operation Applications

Bitwise Operations
Most software performs data manipulation using mathematical operations such as multiplication or addition. Some applications, however, may require the examination or manipulation of data at the bit level. For example, what might be the fastest way to determine whether an integer is odd or even?

The method most of us are usually taught to distinguish odd and even values is to divide the integer by two discarding any remainder then multiply the result by two and compare it with the original value.

If the two values are equal, the original value was even because a division by two would not have created a remainder. Inequality, however, would indicate that the original value was odd. Below is an if-statement in the programming language C that would have performed this check.

if(((iVal/2)*2) == iVal)
// This code is executed for even values
else
// This code is executed for odd values

Clearing individual bits, also known as bit masking, uses the bitwise AND to clear specific bits while leaving the other bits untouched. The mask that is used will have ones in the bit positions that are to be left alone while zeros are in the bit positions that need to be cleared.

Clearing/Masking Bits

This operation is most commonly used when we want to isolate a bit or a group of bits. It is the perfect operation for distinguishing odd and even numbers where we want to see how the LSB is set and ignore the remaining bits. The bitwise AND can be used to clear all of the bits except the LSB. The mask we want to use will have a one in the LSB and zeros in all of the other positions.

In image 6.1.1, the results of three bitwise ANDs are given, two for odd numbers and one for an even number. By ANDing a binary mask of 000000012, the odd numbers have a non-zero result while the even number has a zero result. This shows that by using a bitwise AND with a mask of 000000012, we can distinguish an odd integer from an even integer.

Since bitwise operations are one of the fastest operations that can be performed on a processor, it is the preferred method. In fact, if we use this bitwise AND to distinguish odd and even numbers on a typical processor, it can be twice as fast as doing the same process with a right shift followed by a left shift and over ten times faster than using a divide followed by a multiply.

img

Below is an if-statement in the programming language C that uses a bitwise AND to distinguish odd and even numbers.

if(!(iVal&0b00000001))
// This code is executed for even values
else
// This code is executed for odd values

The bitwise AND can also be used to clear specific bits. For example, assume we want to separate the nibbles of a byte into two different variables. The following process can be used to do this:

• Copy the original value to the variable meant to store the lower nibble, then clear all but the lower four bits

• Copy the original value to the variable meant to store the upper nibble, then shift the value four bits to the right.

Lastly, clear all but the lower four bits. This process is demonstrated below using the byte 011011012.

img

The following C code will perform these operations.

lower_nibble = iVal & 0x0f;
upper_nibble = (iVal>>4) & 0x0f;


Parity
One of the most primitive forms of error detection is to add a single bit called a parity bit to each piece of data to indicate whether the data has an odd or even number of ones. It is considered a poor method of error detection as it sometimes doesn't detect multiple errors. When combined with other methods of error detection, however, it can improve their overall performance.

There are two primary types of parity: odd and even. Even parity means that the sum of the ones in the data element and the parity bit is an even number. With odd parity, the sum of ones in the data element and the parity bit is an odd number. When designing a digital system that uses parity, the designers decide in advance which type of parity they will be using.

Assume that a system uses even parity. If an error has occurred and one of the bits in either the data element or the parity bit has been inverted, then counting the number of ones results in an odd number.

From the information available, the digital system cannot determine which bit was inverted or even if only one bit was inverted. It can only tell that an error has occurred.

One of the primary problems with parity is that if two bits are inverted, the parity bit appears to be correct, i.e., it indicates that the data is error free. Parity can only detect an odd number of bit errors.

Some systems use a parity bit with each piece of data in memory. If a parity error occurs, the computer will generate a non-maskable interrupt, a condition where the operating system immediately discontinues the execution of the questionable application.


Checksum
For digital systems that store or transfer multiple pieces of data in blocks, an additional data element is typically added to each block to provide error detection for the block.

This method of error detection is common, especially for the transmission of data across networks. One of the simplest implementations of this error detection scheme is the checksum.

As a device transmits data, it takes the sum of all of the data elements it is transmitting to create an aggregate sum. This sum is called the datasum. The overflow carries generated by the additions are either discarded or added back into the datasum.

The Transmitting device then sends a form of this datasum appended to the end of the block. This new form of the datasum is called the checksum.

As the data elements are received, they are added a second time in order to recreate the datasum. Once all of the data elements have been received, the receiving device compares its calculated datasum with the checksum sent by the transmitting device.

The data is considered error free if the receiving device's datasum compares favorably with the transmitted checksum. Image below presents a sample data block and the datasums generated both by discarding the two carries and by adding the carries to the datasum.

img
Image 1

Upon receiving this transmission, the datasum for this data block must be calculated. Begin by taking the sum of all the data elements.
img
Image 2

The final datasum is calculated by discarding any carries that went beyond the byte width defined by the data block (5916) or by adding the carries to the final sum (5916 + 2 = 5B16).

This keeps the datasum the same width as the data. The method of calculating the datasum where the carries are added to the sum is called the one's complement sum.

The checksum shown for the data block in image 1 is only one of a number of different possible checksums for this data. In this case, the checksum was set equal to the expected datasum.

If any of the data elements or if the checksum was in error, the datasum would not qual the checksum. If this happens, the digital system would know that an error had occurred. In the case of a network data transmission, it would request the data to be resent.

The only difference between different implementations of the checksum method is how the datasum and checksum are compared in order to detect an error.

As with parity, it is the decision of the designer as to which method is used. The type of checksum used must be agreed upon by both the transmitting and receiving devices ahead of time.

The following is a short list of some of the different types of checksum implementations:

? A block of data is considered error free if the datasum is equal to the checksum. In this case, the checksum element is calculated by taking the sum of all of the data elements and discarding any carries, i.e., setting the checksum equal to the datasum.

? A block of data is considered error free if the sum of the datasum and checksum results in a binary value with all ones. In this case, the checksum element is calculated by taking the 1's complement of the datasum. This method is called a 1's complement checksum.

? A block of data is considered error free if the sum of the datasum and checksum results in a binary value with all zeros. In this case, the checksum element is calculated by taking the 2's complement of the datasum. This method is called a 2's complement checksum.


CRC (Cyclic Redundancy Check)
A cyclic redundancy check (CRC) is a type of function that takes as input a data stream of any length and produces as output a value of a certain fixed size. The term CRC is often used to denote either the function or the function's output.

A CRC can be used as a checksum to detect accidental alteration of data during transmission or storage. CRCs are popular because they are simple to implement in binary hardware, are easy to analyze mathematically, and are particularly good at detecting common errors caused by noise in transmission channels. The CRC was invented by W. Wesley Peterson, and published in his 1961 paper.

The problem with using a checksum for error correction lies in its simplicity. If multiple errors occur in a data stream, it is possible that they may cancel each other out, e.g., a single bit error may subtract 4 from the checksum while a second error adds 4.

If the width of the checksum character is 8 bits, then there are 28 = 256 possible checksums for a data stream. This means that there is a 1 in 256 chance that multiple errors may not be detected.

These odds could be reduced by increasing the size of the checksum to 16 or 32 bits thereby increasing the number of possible checksums to 216 = 65,536 or 232 = 4,294,967,296 respectively.

Assume image below represents a segment of an integer number line where the result of the checksum is identified. A minor error in one of the values may result in a small change in the checksum value.

Since the erroneous checksum is not that far from the correct checksum, it is easy for a second error to put the erroneous checksum back to the correct value indicating that there hasn't been an error when there actually has been one.

img

What we need is an error detection method that generates vastly different values for small errors in the data. The checksum algorithm doesn't do this which makes it possible for two bit changes to cancel each other in the sum.

A cyclic redundancy check (CRC) uses a basic binary algorithm where each bit of a data element modifies the checksum across its entire length regardless of the number of bits in the checksum. This means that an error at the bit level modifies the checksum so significantly that an equal and opposite bit change in another data element cannot cancel the effect of the first.

First, calculation of the CRC checksum is based on the remainder resulting from a division rather than the result of an addition. For example, the two numbers below vary only by one bit.

0111 1010 1101 11002 = 31,45210
0111 1011 1101 11002 = 31,70810

The checksums at the nibble level are:

0111 + 1010 + 1101 + 1100 = 10102 = 1010
0111 + 1011 + 1101 + 1100 = 10112 = 1110

These two values are very similar, and a bit change from another nibble could easily cancel it out. If, on the other hand, we use the remainder from a division for our checksum, we get a wildly different result for the two values. For the sake of an example, let's divide both values by 910.

31,452 ÷ 9 = 3,494 with a remainder of 6 = 01102
31,708 ÷ 9 = 3,523 with a remainder of 1 = 00012

This is not a robust example due to the fact that 4 bits only have 16 possible bit patterns, but the result is clear. A single bit change in one of the data elements resulted in a single bit change in the addition result. The same change, however, resulted in three bits changing in the division remainder.

The problem is that division in binary is not a quick operation. For example, Image below shows the long division in binary of 31,45210 = 01111010110111002 by 910 = 10012.

The result is a quotient of 1101101001102 = 3,49410 with a remainder of 1102 = 610. Remember that the goal is to create a checksum that can be used to check for errors, not to come up with a mathematically correct result.

Keeping this in mind, the time it takes to perform a long division can be reduced by removing the need for "borrows". This would be the same as doing an addition while ignoring the carries.

img

img

The A + B and A – B columns of the truth table in Table above should look familiar; they are equivalent to the XOR operation. This means that a borrow-less subtraction is nothing more than a bitwise XOR.

Below is an example of an addition and a subtraction where there is no borrowing. Note that an addition without carries produces the identical result as a subtraction without borrows.

img

There is a problem when trying to apply this form of subtraction to long division: an XOR subtraction doesn't care whether one number is larger than another. For example, 11112 could be subtracted from 0002 with no ill effect. In long division, you need to know how many digits to pull down from the dividend before subtracting the divisor.

To solve this, the assumption is made that one value can be considered "larger" than another if the bit position of its highest logic 1 is the same or greater than the bit position of the highest logic 1 in the second number.

For example, the subtractions 10110 – 10011 and 0111 – 0011 are valid while 0110 – 1001 and 01011 – 10000 are not. Figure below repeats the long division using borrowless subtractions. It is a coincidence that the resulting remainder is the same for the long division of image prior to above image . This is not usually true.

img

Since addition and subtraction without carries or borrows are equivalent to a bitwise XOR, we should be able to reconstruct the original value from the quotient and the remainder using nothing but XORs. Table below shows the step-by-step process of this reconstruction.

The leftmost column of the table is the bit-by-bit values of the binary quotient of the division of Figure above. Starting with a value of zero, 10012 is XORed with the result in the second column when the current bit of the quotient is a 1.

The result is XORed with 00002 if the current bit of the quotient is a 0. The rightmost column is the result of this XOR. Before going to the next it of the quotient, the result is shifted left one bit position. Once the end of the quotient is reached, the remainder is added. This process brings back the dividend using a multiplication of the quotient and divisor.

img



Memory Organization & Hierarchy

Organization of Memory Device
Modern memory has the same basic configuration as magnetic core memory although the rings have been replaced with electronic memory cells such as the D-Latch.

The cells are arranged so that each row represents a memory location where a binary value would be stored and the columns represent different bits of those memory locations. This is where the terminology "1K x 8" used in Section 12.1 comes from.

Memory is like a matrix where the number of rows identifies the number of memory locations in the memory and the number of columns identifies the number of bits in each memory location.

To store to or retrieve data from a memory device, the processor must place a binary number called an address on special inputs to the memory device. This address identifies which row of the memory matrix or array the processor is interested in communicating with, and enables it.

Once a valid address is placed on the address lines, the memory cells from that row are connected to bi-directional connections on the memory device that allow data either to be stored to or read from the latches. These connections are called the data lines. Three additional lines, chip select, read enable, and write enable, are used to control the transaction.

img

A decoder with n inputs has 2n outputs, exactly one of which will be active for each unique pattern of ones and zeros at its input. For example, an active-low 2-input decoder will have four outputs. A different output will equal zero for each unique input pattern while all of the other inputs will be ones.

An address decoder selects exactly one row of the memory array to be active leaving the others inactive. When the microprocessor places a binary number onto the address lines, the address decoder selects a single row in the memory array to be written to or read from.

For example, if 0112 = 310 is placed on the address lines, the fourth row of the memory will be connected to the data lines. The first row is row 0. The processor uses the inputs read enable and write enable to specify whether it is reading data from or writing data to the selected row of the memory array.

These signals are active low. When read enable is zero, we are reading data from memory, and when write enable is zero, we are writing data to memory. These two signals should never be zero at the same time.

Sometimes, the read enable and write enable signals are combined into a single line called R /W (pronounced "read write-bar"). In this case, a one on R /W initiates a data read while a zero initiates a write.

If latches are used for the memory cells, then the data lines are connected to the D inputs of the memory location latches when data is written, and they are connected to the Q outputs when data is read. The last input to the memory device shown in the image above is the chip select.

The chip select is an active low signal that enables and disables the memory device. If the chip select equals zero, the memory activates all of its input and output lines and uses them to transfer data.

If the chip select equals one, the memory becomes idle, effectively disconnecting itself from all of its input and output lines. The reason or this is that the typical memory device shares the address and data lines of a processor with other devices.

Rarely does a processor communicate with only one memory device on its data lines. Problems occur when more than one device tries to communicate with the processor over shared lines at the same time. It would be like ten people in a room trying to talk at once; no one would be able to understand what was being said.

The processor uses digital logic to control these devices so that only one is talking or listening at a time. Through individual control of each of the chip select lines to the memory devices, the processor can enable only the memory device it wishes to communicate with.

The processor places a zero on the chip select of the memory device it wants to communicate with and places ones on all of the other chip select inputs.


Memory-Processor Interfacing
The previous topic presented the input and output lines for a memory device. These lines are shared across all of the devices that communicate with the processor. If you look at the electrical traces across the surface of a motherboard, you should see collections of traces running together in parallel from the processor to then from one memory device to the next.

These groups of wires are referred to as the bus, which is an extension of the internal structure of the processor. This section discusses how the memory devices share the bus.

Buses

In order to communicate with memory, a processor needs three types of connections: data, address, and control. The data lines are the electrical connections used to send data to or receive data from memory. There is an individual connection or wire for each bit of data.

For example, if the memory of a particular system has 8 latches per memory location, i.e., 8 columns in the memory array, then it can store 8-bit data and has 8 individual wires with which to transfer data.

The address lines are controlled by the processor and are used to specify which memory location the processor wishes to communicate with. The address is an unsigned binary integer that identifies a unique location where data elements are to be stored or retrieved. Since this unique location could be in any one of the memory devices, the address lines are also used to specify which memory device is enabled.

The control lines consist of the signals that manage the transfer of data. At a minimum, they specify the timing and direction of the data transfer. The processor also controls this group of lines. Image below presents the simplest connection of a single memory device to a processor with n data lines and m address lines.

Unfortunately, the configuration of image below only works with systems that have a single memory device. This is not very common. For example, a processor may interface with a BIOS stored in a nonvolatile memory while its programs and data are stored in the volatile memory of a RAM stick.

In addition, it may use the bus to communicate with devices such as the hard drive or video card. All of these devices share the data, address, and control lines of the bus. (BIOS stand for Basic Input/Output System and it is the low-level code used to start the processor when it is first powered up.)

img

A method had to be developed to allow a processor to communicate to multiple memory devices across the same set of wires. If this wasn't done, the processor would need a separate set of data, address, and control lines for each device placing an enormous burden on circuit board designers for routing wires.

By using a bus, the processor can communicate with exactly one device at a time even though it is physically connected to many devices. If only one device on the bus is enabled at a time, the processor can perform a successful data transfer. If two devices tried to drive the data lines simultaneously, the result would be lost data in a condition called bus contention.

Image below presents a situation where data is being read from memory device 1 while memory device 2 remains "disconnected" from the bus. Disconnected is in quotes because the physical connection is still present; it just doesn't have an electrical connection across which data can pass.

Notice that image below shows that the only lines disconnected from the bus are the data lines. This is because bus contention only occurs when multiple devices are trying to output to the same lines at the same time. Since only the microprocessor outputs to the address and control lines, they can remain connected.

In order for this scheme to work, an additional control signal must be sent to each of the memory devices telling them when to be connected to the bus and when to be disconnected. This control signal is called a chip select.

img

A chip select is an active low signal connected to the enable input of the memory device. If the chip select is high, the memory device remains idle and its data lines are disconnected from the bus. When the processor wants to communicate with the memory device, it pulls that device's chip select low thereby enabling it and connecting it to the bus.

Each memory device has its own chip select, and at no time do two chip selects go low at the same time. For example, table below shows the only possible values of the chip selects for a system with four memory devices.

img

The disconnection of the data lines is performed using tristate outputs for the data lines of the memory chips. A tristate output is digital output with a third state added to it. This output can be a logic 1, a logic 0, or a third state that acts as a high impedance or open circuit. It is like someone opened a switch and nothing is connected.

This third state is controlled by the chip select. When the active low chip select equals 1, data lines are set to high impedance, sometimes called the Z state. A chip select equal to 0 causes the data lines to be active and allow input or output.

In Figure 1, three different outputs are trying to drive the same wire. This results in bus contention, and the resulting data is unreadable. Figure 2 shows two of the outputs breaking their connection with the wire allowing the first output to have control of the line. This is the goal when multiple devices are driving a single line.

Figure 3 is the same as 2 except that the switches have been replaced with tristate outputs. With all but one of the outputs in a Z state, the top gate is free to drive the output without bus contention. The following sections describe how memory systems are designed using chip selects to take advantage of tristate outputs.

img

Memory Maps

Think of memory as several filing cabinets where each folder can contain a single piece of data. The size of the stored data, i.e., the number of bits that can be stored in a single memory location, is fixed and is equal to the number of columns in the memory array.

Each piece of data can be either code (part of a program) or data (variables or constants used in the program). Code and data are typically stored in the same memory, each piece of which is stored in a unique address or row of memory.

Some sections of memory are assigned to a predefined purpose which may place constraints on how they are arranged. For example, the BIOS from which the computer performs its initial startup sequence is located at a specific address range in non-volatile memory. Video memory may also be located at a specific address range.

System designers must have a method to describe the arrangement of memory in a system. Since multiple memory devices and different types of memory may be present in a single system, hardware designers need to be able to show what addresses correspond to which memory devices.

Software designers also need to have a way to show how the memory is being used. For example, which parts of memory will be used for the operating system, which parts will be used to store a program, or which parts will be used to store the data for a program.

System designers describe the use of memory with a memory map. A memory map represents a system's memory with a long, vertical column. It is meant to model the memory array where the rows correspond to the memory locations. Within the full range of addresses are smaller partitions where the individual resources are present. Figure below presents two examples of memory maps.

img

The numbers along the left side of the memory map
represent the addresses corresponding to each memory
resource. The memory map should represent the full
address range of the processor.

This full address range is referred to as the processor's
memory space, and its size is represented by the number
of memory locations in the full range, i.e., 2m where m
equals the number of address lines coming out of the
processor. It is up to the designer whether the addresses
go in ascending or descending order on the memory map.

As an example, let's calculate the memory space of the
processor represented by the memory map in Figure b
above. The top address for this memory map is
FFFFF16 = 1111 1111 1111 1111 11112. Since the processor
accesses its highest address by setting all of its address
lines to 1, we know that this particular processor has 20
address lines.

Therefore, its memory space is 220 = 1,048,57610 =
1 Meg. This means that all of the memory resources for
this processor must be able to fit into 1 Meg without
overlapping. In the next section, we will see how to
compute the size of each partition of memory using
the address lines.

For now, however, we can determine the size of a partition
in memory by subtracting the low address from the high
address, then adding one to account for the fact that the
low address itself is a memory location. For example, the
range of the BIOS in Figure a above starts at FF0016 =
65,28010 and goes up to FFFF16 = 65,53510. This means
that the BIOS fits into 65,535 – 65,280 +1 = 256 memory
locations.

It is vital to note that there is an exact method to selecting
the upper and lower addresses for each of the ranges in the
memory map. Take for example the memory range for
Program A in Figure b above. The lower address is
2000016 while the upper address is 27FFF16. If we
convert these addresses to binary, we should see a
relationship.

2000016 = 0010 0000 0000 0000 00002
27FFF16 = 0010 0111 1111 1111 11112

It is not a coincidence that the upper five bits of these
two addresses are identical while the remaining bits go
from all zeros in the low address to all ones in the high
address.

Converting the high and the low address of any one of
the address ranges in Figure above should reveal the same
characteristic. The next section shows how these most
significant address bits are used to define which memory
device is being selected.

Address Decoding

Address decoding is a method for using an address to
enable a unique memory device while leaving all other
devices idle. The method described here works for
many more applications than memory though.

It is the same method that is used to identify which subnet
a host computer is connected to based on its IP address.
All address decoding schemes have one thing in common:
the bits of the full address are divided into two groups,
one group that is used to identify the memory device and
one group that identifies the memory location within the
selected memory device.

In order to determine how to divide the full address
into these two groups of bits, we need to know how large
the memory device is and how large the memory space is.
Once we know the size of the memory device, then we
know the number of bits that will be required from the
full address to point to a memory location within the
memory device.

Just as we calculated the size of the memory space of
a processor, the size of the memory space of a device is
calculated by raising 2 to a power equal to the number
of address lines going to that device. For example, a
memory device with 28 address lines going into it has 228 =
256 Meg locations. This means that 28 address bits from
the full address must be used to identify a memory location
within that device.

All of the remaining bits of the full address will be used
to enable or disable the device. It is through these
remaining address bits that we determine where the
memory will be located within the memory map.

Table below presents a short list of memory sizes and
the number of address lines required to access all of the
locations within them. Remember that the memory size
is simply equal to 2m where m is the number of address
lines going into the device.

img

The division of the full address into two groups is done by dividing the full address into a group of most significant bits and least significant bits. The block diagram of an m-bit full address in Figure below shows how this is done. Each bit of the full address is represented with an where n is the bit position.

img

The bits used to enable the memory device are always the most significant bits while the bits used to access a memory location within the device are always the least significant bits.

Chip Select Hardware

What we need is a circuit that will enable a memory device whenever the full address is within the address range of the device and disable the memory device when the full address falls outside the address range of the device. This is where those most significant bits of the full address come into play.


Memory Terminology
There are many different purposes for memory in the operation of a computer. Some memory is meant to store data and programs only while the computer is turned on while other memory is meant to be permanent.

Some memory contains application code while other memory is meant to store the low-level driver code to control devices such as an IDE interface or a video card. Some memory may have a larger capacity while other memory may be faster.

In order to understand what memory technologies to apply to which processor operation, we need to understand a little bit more about the technologies themselves. This section discusses some of the terminology used to describe memory.

Random Access Memory

The term Random Access Memory (RAM) is typically applied to memory that is easily read from and written to by the microprocessor. In actuality, this is a misuse of this term. For a memory to be random access means that any address can be accessed at any time. This is to differentiate it from storage devices such as tapes or hard drives where the data is accessed sequentially.

In general, RAM is the main memory of a computer. Its purpose is to store data and applications that are currently in use. The operating system controls the use of this memory dictating when items are to be loaded into RAM, where they are to be located in RAM, and when they need to be removed from RAM. RAM is meant to be very fast both for reading and writing data. RAM also tends to be volatile in that as soon as power is removed, all of the data is lost.

Read Only Memory

In every computer system, there must be a portion of memory that is stable and impervious to power loss. This kind of memory is called Read Only Memory or ROM. Once again, this term is a misnomer. If it was not possible to write to this type of memory, we could not store the code or data that is to be contained in it.

It simply means that without special mechanisms in place, a processor cannot write to this type of memory. If through an error of some sort, the processor tries to write to this memory, an error will be generated.

The most common application of ROM is to store the computer's BIOS. Since the BIOS is the code that tells the processor how to access its resources upon powering up, it must be present even when the computer is powered down. Another application is the code for embedded systems.

For example, it is important for the code in your car's computer to remain even if the battery is disconnected. There are some types of ROM that the microprocessor can write to, but usually the time needed to write to them or the programming requirements needed to do so make it unwise to write to them regularly.

Therefore, these memories are still considered read only. In some cases, the processor cannot write to a ROM under any circumstances. For example, the code in your car's computer should never need to be modified. This ROM is programmed before it is installed. To put a new program in the car's computer, the old ROM is removed and discarded and a new ROM is installed in its place.

Static RAM versus Dynamic RAM

For as long as memory has existed, scientists and engineers have experimented with new technologies to make RAM faster and to cram more of it into a smaller space, two goals that are typically at odds.

Nowhere is this more obvious than in the two main classifications of RAM: Static RAM (SRAM) and Dynamic RAM (DRAM). SRAM is made from an array of latches such as the D-latch we studied in Chapter 10. Each latch can maintain a single bit of data within a single memory address or location.

For example, if a memory stores eight bits per memory address, then there are eight latches for a single address. If this same memory has an address space of 256 K, then there are 218 • 8 = 221 = 2,097,152 latches in the device. Latches are not small devices as logic circuits go, but they are very fast. Therefore, in the pursuit of the performance goals of speed and size, SRAMs are better adapted to speed.

In general, SRAMs:

• Store data in transistor circuits similar to D-latches;

• are used for very fast applications such as RAM caches

• tend to be used in smaller memories which allows for very fast access due to the simpler decoding logic; and

• are volatile meaning that the data remains stored only as long as power is available.

There are circuits that connect SRAMs to a back up battery that allows the data to be stable even with a loss of power. These batteries, about the size of a watch battery, can maintain the data for long periods of time much as a battery in a watch can run for years. On the negative side, the extra battery and circuitry adds to the overall system cost and takes up physical space on the motherboard A bit is stored in a DRAM using a device called a capacitor.

A capacitor is made from a pair of conductive plates that are held parallel to each other and very close together, but not touching. If an electron is placed on one of the plates, its negative charge will force an electron on the other plate to leave. This works much like the north pole of a magnet pushing away the north pole of a second magnet.


Characteristic of Hard Drive
At the bottom of the hierarchy is long-term, high-capacity storage. This type of storage is slow making a poor choice for the processor to use for execution of programs and data access. It is, however, necessary to provide computer systems with high capacity, non-volatile storage.

Hard drives are the most cost-effective method of storing data. In the mid-1980's, a 30 Megabyte hard drive could be purchased for around Rs.12000 or about Rs. 400 per MB. In 2007, retailers advertised a 320 Gigabyte SATA Hard drive for around Rs.3200 or about Rs.0.01 per MB.

In other words, the cost to store a byte of data is almost 1/40,000th cheaper today than it was a little over two decades ago. Hard drives store data in well-organized patterns of ones and zeros across a thin sheet of magnetic material.

This magnetic material is spread either on one or both sides of a lightweight, rigid disk called a substrate. The substrate needs to be lightweight because it is meant to spin at very high speeds. The combination of magnetic material and substrate is called a platter.

The more rigid the substrate is, the better the reliability of the disk. This was especially true when the mechanisms that were used to read and write data from and to the disks were fixed making them prone to scraping across the substrate's surface if the substrate was not perfectly flat. The condition where the read-write mechanism comes in contact with the disk is called a "crash" which results in magnetic material being scraped away from the disk.

Substrates used to be made from aluminum. Unfortunately, extreme heat sometimes warped the aluminum disk. Now glass is used as a substrate.

It improves on aluminum by adding:

• Better surface uniformity which increases reliability;

• Fewer surface defects which reduces read/write errors;

• Better resistance to warping;

• Better resistance to shock; and

• The ability to have the read/write mechanism ride closer to the surface allowing for better data density.

Hard Drive Read/Write Head

Data is recorded to the platter using a conductive coil called a head. Older drives and floppy drives use the same head for reading the data too. The head is shaped like a "C" with the gap between the ends positioned to face the magnetic material.

A coil of wire is wrapped around the portion of the head that is furthest from the magnetic material. Figure below shows the configuration of this type of head. In order to write data, an electrical current is passed through the wire creating a magnetic field within the gap of the head close to the disk.

This field magnetizes the material on the platter in a specific direction. Reversing the current would polarize the magnetic material in the opposite direction.

By spinning the platter under the head, patterns of magnetic polarization can be stored in circular paths on the disk. By moving the head along the radius, nested circular paths can be created. The magnetized patterns on the platter represent the data.

img

It is possible to use the same head to read data back from the disk. If a magnetized material moves past a coil of wire, it produces a small current. This is the same principle that allows the alternator in your car to produce electricity.

The direction of the current generated by the disk's motion changes if the direction of the magnetization changes. In this way, the same coil that is used to write the data can be used to read it. Just like the alternator in your car though, if the disk is not spinning, no current is generated that can be used to read the data.

Newer hard drives use two heads, one for reading and one for writing. The newer read heads are made of a material that changes its resistance depending on the magnetic field that is passing under it.

These changes in resistance affect a current that the hard drive controller is passing through the read head during the read operation. In this way, the hard drive controller can detect changes in the magnetic polarization of the material directly under the read head.

There is another characteristic of the read/write head that is Important to the physical operation of the hard drive. As was stated earlier, the area that is polarized by the head is equal to the gap in the write head.

To polarize a smaller area thereby increasing the data density, the gap must be made smaller. To do this, the distance between the head and the platter must be reduced. Current technology allows heads to "fly" at less then three micro inches above the platter surface.

When the magnetic material is deposited on a flexible substrate such as a floppy diskette or a cassette tape, the flex in the material makes it possible for the head to come in contact with the substrate without experiencing reliability problems.

This is not true for hard disks. Since the platters are rigid and because the platters spin at thousands of rotations per minute, any contact that the head makes with the platter will result in magnetic material being scraped off. In addition, the heat from the friction will eventually cause the head to fail.

These two issues indicate that the read/write head should come as close to the platters as possible without touching. Originally, this was done by making the platter as flat as possible while mounting the head to a rigid arm. The gap would hopefully stay constant. Any defects or warpage in the platter, however, would cause the head to crash onto the platter resulting in damaged data.

A third type of head, the Winchester head or "flying head" is designed to float on a cushion of air that keeps it a fixed distance from the spinning platter.

This is done by shaping the head into an airfoil that takes advantage of the air current generated by the spinning platter. This means that the head can operate much closer to the surface of the platter and avoid crashing even if there are imperfections.

Data Encoding

It might seem natural to use the two directions of magnetic polarization to represent ones and zeros. This is not the case, however. One reason for this is that the controllers detect the changes in magnetic direction, not the direction of the field itself.

Second, large blocks of data that are all ones or all zeros would be difficult to read because eventually the controller might lose track or synchronization of where one bit ended and the next began.

The typical method for storing data to a platter involves setting up a clock to define the bit positions, and watching how the magnetic field changes with respect to that clock. Each period of the clock defines a single bit time, e.g., if a single bit takes 10 nanoseconds to pass under the read-write head when the platter is spinning, then a clock with a period of 10 nanoseconds, i.e., a frequency of (10×10-9)-1 = 100 MHz is used to tell the controller when the next bit position is coming.

Modified Frequency Modulation (MFM) does this by changing the way in which the magnetic polarization represents a one or a zero. MFM defines a change in polarization in the middle of a bit time as a one and no change in the middle as a zero.

If two or more zeros are placed next to each other, a change in polarization is made between each of the bit times. This is done to prevent a stream zeros from creating a long block of unidirectional polarization.

For MFM encoding, the longest period between polarity changes occurs for the bit sequence 1-0-1. In this case, the polarity changes are separated by two bit periods. The shortest period between polarity changes occurs when a one follows a one or a zero follows a zero.

In these cases, the polarity changes are separated by a single bit period. This allows us to double the data density over FM encoding using the same magnetic surface and head configuration. The hard drive controller, however, must be able to handle the increased data rate.

Hard Drive Access Time

There are a number of issues affecting the latency between a device requesting data and the hard drive responding with the data. Some of these issues depend on the current state of the system while others depend on the physical design of the drive and the amount of data being requested.

There are four basic aspects to hard drive access time:

1. Queuing time,
2. seek time,
3. rotational latency,
4. And transfer time.

After an initial request is made to a hard drive, the system must wait for the hard drive to become available. This is called queuing time.

The hard drive may be busy serving another request or the bus or I/O channel that the hard drive uses may be busy serving another device that shares the link. In addition, the system's energy saving features may have powered down the drive meaning that an additional delay is incurred waiting for the drive to spin up.

The second aspect, seek time, is the amount of time it takes to get the read/write head from its current track to the desired track. Seek time is dependent on many things. First, it depends on the distance between the current and desired tracks.

In addition, mechanical movement of any sort requires a ramping up before attaining maximum speed and a ramping down to avoid overshooting the desired target position. It is for these reasons that manufacturers publish a typical seek time.

Seek times have improved through the use of lighter components and better head positioning so that shorter seek distances are needed. As of this writing, the typical seek time for a hard drive is around 8 ms while higher performance drives might be as low as 4 ms.

The heads used in CDROMs are heavier, and therefore, the seek time of a CDROM is longer than that of a hard drive. Older fixed head designs used multiple heads (one per track), each of which was stationary over its assigned track. In this case, the seek time was minimal, limited to the amount of time it took to electrically switch to the desired head.

Once the head has been positioned over the desired track, the drive must wait for the platters to rotate to the sector containing the requested data. This is called rotational latency. The worst case occurs when the start of the desired sector has just passed under the head when the drive begins looking for the data.

This requires almost a full rotation of the platters before the drive can begin transferring the data. We can use the following calculation to determine the time required for a platter in a 7200 RPM drive to make a full rotation.

img

If we make the assumption that on average the desired sector will be one half of a rotation away from the current position, then the average rotational latency should be half the time it takes for a full rotation. This means that for a 7200 RPM drive, the estimated rotational latency should be about 4.2 milliseconds.

Queuing time, seek time, and rotational latency are somewhat random in nature. Transfer time, however, is more predictable. Transfer time is the time it takes to send the requested data from the hard drive to the requesting device. Theoretically, the maximum transfer time equals the amount of time it takes for the data to pass beneath the head.

If there are N sectors per track, then the amount of time it takes to retrieve a single sector can be calculated as shown below.

img

Self-Monitoring, Analysis & Reporting Technology System

A hard drive crash rarely comes without a warning. The user may be unaware of any changes in their hard drive's operation preceding a mechanical failure, but there are changes. For example, if a hard drive's platters are taking longer to get up to full speed, it may be that the bearings are going bad. A hard drive that has been experiencing higher than normal operating temperatures may also be about to fail.

Newer drives now support a feature referred to as Self-Monitoring Analysis and Reporting Technology (SMART). SMART enabled drives can provide an alert to the computer's BIOS warning of a parameter that is functioning outside of its normal range. This usually results in a message to the user to replace the drive before it fails.

SMART attribute values are stored in the hard drive as integers in the range from 1 to 253. Lower values indicate worse conditions. Depending on the parameter and the manufacturer, different failure thresholds are set for each of the parameters. The parameters measured vary from drive to drive with each drive typically monitoring about twenty.

The following is a sample of some types of measurements:

Power On Hours: This indicates the age of the drive.

Power Cycle Count: This also might be an indication of age.

Spin Up Time: A longer spin up time may indicate a problem with the assembly that spins the platters.

Temperature: Higher temperatures also might indicate a problem with the assembly that spins the platters.

Head Flying Height: A reduction in the flying height of a Winchester head may indicate it is about to crash into the platters.


Data Organization on Hard Drive
The width of a hard drive's read/write head is much smaller than that of the platter. This means that there are a number of non-overlapping positions for the read/write head along the platter's radius. By allowing the movable read/write head to be positioned at intervals along the radius of the disk, information can be recorded to any of a number of concentric circles on the magnetic material.

Each one of these circles is called a track. A typical hard drive disk contains thousands of tracks per inch (TPI) on a single side of a platter, each track being the width of the read/write head. Image below shows how these tracks correspond to the movement and size of the read/write head.

img

A small gap called an intertrack gap is placed between the tracks to avoid interference from neighboring data. Reducing this gap allows for more data to be stored on a disk, but it also increases the risk of having data corrupted when data from an adjacent track bleeds over.

Each track is divided into sections of around 512 bytes apiece. These sections are called sectors. A platter may have sectors that are fixed in size for the whole platter or they may have variable amounts of data depending on their location on the platter relative to the center of rotation. There are typically hundreds of sectors per track.

In addition to the gaps left between the tracks, gaps are also left between the sectors. These gaps allow for a physical separation between the blocks of data and are typically used to help the hard drive controller when reading from or writing to the disk. These gaps are called intersector gaps. Image below shows the relationship of these gaps to the tracks and sectors.

img

One way to increase the capacity of a hard drive is to increase the number of surfaces upon which the magnetic material is placed. The first way to do this is to place magnetic material on both sides of the platter.

When this is done, a second read-write head must be placed on the opposite side of the platter to read the second magnetic surface. By using the same organization of sectors and tracks, this doubles the capacity of the hard drive.

A second method for increasing capacity is to mount multiple platters on a single spindle, the axis around which all of the platters rotate. Each additional magnetic surface adds to the capacity of the drive, and as with putting magnetic material on both sides of a single platter, all magnetic surfaces have the same organization of sectors and tracks, each sector lining up with the ones above it and below it. Every additional magnetic surface requires an additional read-write head.

All of the heads of a hard drive are locked together so that they are reading from the exact same location on each of their respective surfaces. Therefore, each track on each surface that is the same distance from the spindle can be treated as a unit because the hard drive controller is accessing them simultaneously.

The set of all tracks, one from each surface, that are equidistant from the spindle are referred to as a cylinder. This virtual entity is depicted in image below

img

Using this information, we can develop a method for calculating the capacity of a hard drive. In general, the capacity of a hard drive equals the number of bytes per sector multiplied by the number of sectors per track multiplied by the number of cylinders multiplied by 2 if the platters have magnetic material on both sides and finally multiplied by the number of platters.

Because the smallest allowable size for a bit is dictated by the size of the read-write head, the number of bits per track is limited by the number of bits that can fit on the smallest track, the one closest to the spindle.

Because of this limitation, the outside tracks waste space when the bits become wider than is required by the head. Regardless of where the head is positioned, bits will pass under the head at a constant rate. This arrangement is called constant angular velocity (CAV).


Cache RAM
Even with increases in hard drive performance, it will never be practical to execute programs or access data directly from these mechanical devices. They are far too slow.

Therefore, when the processor needs to access information, it is first loaded from the hard drive into main memory where the higher performance RAM allows fast access to the data. When the processor is finished with the data, the information can either be discarded or used to update the hard drive.

Because of its expense, the capacity of a computer's main memory falls short of that of its hard drive. This should not matter though. Not all of the data on a hard drive needs to be accessed all of the time by the processor.

Only the currently active data or applications need to be in RAM. Additional performance improvements can be realized by taking this concept to another level.

Remember from our discussion in Chapter 7.3 that there are two main classifications of RAM: static RAM (SRAM) and dynamic RAM (DRAM). SRAM is faster, but that speed comes at a price: it has a lower density and it is more expensive.

Since main memory needs to be quite large and inexpensive, it is implemented with DRAM. Could, however, the same relation that exists between main memory and a hard drive be realized between a small block of SRAM and a large main memory implemented in DRAM?

Main memory improves the performance of the system by loading only the information that is currently in use from the hard drive. If a method could be developed where the code that is in immediate use could be stored in a small, fast SRAM while code that is not quite as active is left in the main memory, the system's performance could be improved again.

Due to the nature of programming, instructions that are executed within a short period of time tend to be clustered together. This is due primarily to the basic constructs of programming such as loops and subroutines that make it so that when one instruction is executed, the chances of it or its surrounding instructions being executed again in the near future are very good.

Over a short period of time, a cluster of instructions may execute over and over again. This is referred to as the principle of locality. Data also behaves according to this principle due to the fact that related data is often defined in consecutive locations.

To take advantage of this principle, a small, fast SRAM is placed between the processor and main memory to hold the most recently used code and data under the assumption that they will most likely be used again soon. This small, fast SRAM is called a RAM cache.

img

The reason the SRAM of the cache needs to be small is that larger address decoder circuits are slower than small address decoder circuits.

The larger the memory is, the more complex the address decoder circuit. The more complex the address decoder circuit is, the longer it takes to select a memory location based on the address it received. Therefore, making a memory smaller makes it faster.

It is possible to take this concept a step further by placing an even smaller SRAM between the cache and the processor thereby creating two levels of cache. This new cache is typically contained inside of the processor.

By placing the new cache inside the processor, the wires that connect the two become very short, and the interface circuitry becomes more closely integrated with that of the processor.

Both of these conditions along with the smaller decoder circuit result in even faster data access. When two caches are present, the one inside the processor is referred to as a level 1 or L1 cache while the one between the L1 cache and memory is referred to as a level 2 or L2 cache.

img

The split cache is another cache system that requires two caches. In this case, a processor will use one cache to store code and a second cache to store data. Typically, this is to support an advanced type of processor architecture such as pipelining where the mechanisms that the processor uses to handle code are so distinct from those used for data that it does not make sense to put both types of information into the same cache.

img


Registers
At the top of the memory hierarchy is a set of memory cells called registers. A register is a group of latches that have been combined in order to perform a special purpose.

This group of latches may be used to store an integer, store an address pointing to memory, configure an I/O device, or indicate the status of a process. Whatever the purpose of the register is, all of the bits are treated as a unit.

Registers are contained inside the processor and are integrated with the circuitry used to perform the processor's internal operations. This integration places registers within millionths of a meter of the action resulting in very quick access times.

In addition, the typical processor contains fewer than a hundred registers making decoding very simple and very fast. These two features combine to make registers by far the fastest memory unit in the memory hierarchy.



Introduction to Processor Architecture

BUS, Registers and Flags
Bus

A bus is a bundle of wires grouped together to serve a single purpose. The main application of the bus is to transfer data from one device to another. The processor's interface to the bus includes connections used to pass data, connections to represent the address with which the processor interested, and control lines to manage and synchronize the transaction. These lines are "daisychained" from one device to the next.

The concept of a bus is repeated here because the memory bus is not the only bus used by the processor. There are internal buses that the processor uses to move data, instructions, configuration, and status between its subsystems.

They typically use the same number of data lines found in the memory bus, but the addressing is usually simpler. This is because there are only a handful of devices between which the data is passed.

Registers

As stated when they were introduced in last chapter, a register stores a binary value using a group of latches. For example, if the processor wishes to add two integers, it may place one of the integers in a register labeled A and the second in a register labeled B.

The contents of the latches can then be added by connecting their Q outputs to the addition circuitry described in Chapter 8. The output of the addition circuitry is then directed to another register in order to store the result. Typically, this third register is one of the original two registers, e.g., A = A + B.

Although variables and pointers used in a program are all stored in memory, they are moved to registers during periods in which they are the focus of operation. This is so that they can be manipulated quickly. Once the processor shifts its focus, it stores the values it doesn't need any longer back in memory.

The individual bit positions of the register are identified by the power of two that the position represents as an integer. In other words, the least significant bit is bit 0, the next position to the left is bit 1, the next is bit 2, and so on.

For the purpose of our discussion, registers may be used for one of four types of operations.

Data registers – These registers hold the values on which to perform arithmetic or logical functions.

Address registers – Sometimes, the processor may need to store an address rather than a value. A common use of an address register is to hold a pointer to an array or string. Another application is to hold the address of the next instruction to execute.

Instruction registers – Remember that instructions are actually numeric values stored in memory. Each number represents a different command to be executed by the processor. Some registers are meant specifically to hold instructions so that they can be interpreted to see what operation is to be performed.

Flag registers – The processor can also use individual bits grouped together to represent the status of an operation or of the processor itself. The next section describes the use of flags in greater detail.

Flags

Picture the instrumentation on the dash board of a car. Beside the speedometer, tachometer, fuel gauge, and such are a number of lights unofficially referred to as "idiot lights". Each of these lights has a unique purpose. One comes on when the fuel is low; another indicates when the high beams are on; a third warns the driver of low coolant.

There are many more lights, and depending on the type of car you drive, some lights may even replace a gauge such as oil pressure. How is this analogous to the processor's operation? There are a number of indicators that reveal the processor's status much like the car's idiot lights.

Most of these indicators represent the results of the last operation. For example, the addition of two numbers might produce a negative sign, an erroneous overflow, a carry, or a value of zero.

Well, that would be four idiot lights: sign, overflow, carry, and zero. These indicators, otherwise known as flags, are each represented with a single bit. Going back to our example, if the result of an addition is negative, the sign flag would equal 1.

If the result was not a negative number, (zero or greater than zero) the sign flag would equal 0. For the sake of organization, these flags are grouped together into a single register called the flags register or the processor status register.

Since the values contained in its bits are typically based on the outcome of an arithmetic or logical operation, the flags register is connected to the mathematical unit of the processor. One of the primary uses of the flags is to remember the results of the previous operation. It is the processor's short term memory.

This function is necessary for conditional branching, a function that allows the processor to decide whether or not to execute a section of code based on the results of a condition statement such as "if".

The piece of code shown below calls different functions based on the relative values of var1 and var2, i.e., the flow of the program changes depending on whether var1 equals var2, var1 is greater than var2, or var1 is less than var2. So how does the processor determine whether one variable is less than or greater than another?

if(var1 == var2)
equalFunction();
else if(var1 > var2)
greaterThanFunction();
else
lessThanFunction();

Sample Code Using Conditional Statements

The processor does this using a "virtual subtract." This is a subtraction that occurs in the mathematical unit of the processor where it affects the flags, but the result is discarded. Referring back to our example, the results of a subtraction of var2 from var1 is used to select one of three paths through the code.


Buffers, Stacks and I/O Ports
Buffers

Rarely does a processor operate in isolation. Typically there are multiple processors supporting the operation of the main processor. These include video processors, the keyboard and mouse interface processor, and the processors providing data from hard drives and CROMs.

There are also processors to control communication interfaces such as USB, Firewire, and Ethernet networks. These processors all operate independently, and therefore one may finish an operation before a second processor is ready to receive the results.

If one processor is faster than another or if one processor is tied up with a process prohibiting if it from receiving data from a second process, and then there needs to be a mechanism in place so that data is not lost.

This mechanism takes the form of a block of memory that can hold data until it is ready to be picked up. This block of memory is called a buffer. Figure below presents the basic block diagram of a system that incorporates a buffer.

img

The concept of buffers is presented here because the internal structure of a processor often relies on buffers to store data while waiting for an external device to become available.

The Stack

During the course of normal operation, there will be a number of times when the processor needs to use a temporary memory, a place where it can store a number for a while until it is ready to use it again.

For example, every processor has a finite number of registers. If an application needs more registers than are available, the register values that are not needed immediately can be stored in this temporary memory.

When a processor needs to jump to a subroutine or function, it needs to remember the instruction it jumped from so that it can pick back up where it left off when the subroutine is completed. The return address is stored in this temporary memory.

The stack is a block of memory locations reserved to function as temporary memory. It operates much like the stack of plates at the start of a restaurant buffet line. When a plate is put on top of an existing stack of plates, the plate that was on top is now hidden, one position lower in the stack.

It is not accessible until the top plate is removed. The processor's stack works in the same way. When a processor puts a piece of data, a plate, on the top of the stack, the data below it is hidden and cannot be removed until the data above it is removed. This type of buffer is referred to as a "last-in-first-out" or LIFO buffer.

There are two main operations that the processor can perform on the stack: it can either store the value of a register to the top of the stack or remove the top piece of data from the stack and place it in a register.

Storing data to the stack is referred to as "pushing" while removing the top piece of data is called "pulling" or "popping". The LIFO nature of the stack makes it so that applications must remove data items in the opposite order from which they were placed on the stack.

For example, assume that a processor needs to store values from registers A, B, and C onto the stack. If it pushes register A first, B second, and C last, then to restore the registers it must pull in order C, then B, then A.

I/O Ports

Input/output ports or I/O ports refer to any connections that exist between the processor and its external devices. A USB printer or scanner, for example, is connected to the computer system through an I/O port. The computer can issue commands and send data to be printed through this port or receive the device's status or scanned images.

As described in the earlier section, some I/O devices are connected directly to the memory bus and act just like memory devices. Sending data to the port is done by storing data to a memory address and retrieving data from the port is done by reading from a memory address.

In some cases, however, the processor has special hardware just for I/O ports. This is done in one of two ways: either the device interface hardware is built into the processor or the processor has a second bus designed to communicate with the I/O devices.


Processor Level and CPU Level
Processor Level

Image below presents the generic block diagram of a processor system. It represents the interface between the processor, memory, and I/O devices through the bus that we discussed in the section on memory interfacing in.

img

The internals of a processor are a microcosm of the processor system shown in Figure above. Figure below shows a central processing unit (CPU) acting as the brains of the processor connected to memory and I/O devices through an internal bus within a single chip. The internal bus is much simpler than the bus the processor uses to connect its external devices. There are a number of reasons for this.

First, there are fewer devices to interface with, so the addressing scheme does not need to be that complex. Second, the external bus needs to be able to adapt to many different configurations using components from many different manufacturers. The internal bus will never change for that particular model of processor. Third, the CPU accesses the internal components in a well-defined, synchronized manner allowing for more precise timing logic.

The following is a description of the components of the processor shown in Figure below.

Central processing unit (CPU) – This is the brain of the processor. The execution of all instructions occurs inside the CPU along with the computation required to determine addressing.

Internal memory – A small, but extremely quick memory. It is used for any internal computations that need to be done fast without the added overhead of writing to external memory. It is also used for storage by processes that are transparent to the applications, but necessary for the operation of the processor.

img

Data buffer – This buffer is a bidirectional device that holds outgoing data until the memory bus is ready for it or incoming data until the CPU is ready for it. This circuitry also provides signal conditioning ensuring the output signals are strong enough and the fragile internal components of the CPU are protected.

• Address latch – This group of latches maintains the address that the processor wishes to exchange data with on the memory bus. It also provides signal conditioning and circuit protection for the CPU.

I/O ports – These ports represent the device interfaces that have been incorporated into the processor's hardware.

Configuration registers – A number of features of the processor are configurable. These registers contain the flags that represent the current configuration of the processor. These registers might also contain addressing information such as which portions of memory are protected and which are not.

CPU Level

If we look at the organization inside the CPU, we see that it in turn is a microcosm of the processor block diagram of image shown above. Figure below presents the organization inside a typical CPU.

img

• Control unit – Ask anyone who has worked in a large business what middle management does and they might say something like, "Not a darn thing." Ask them what expertise middle management has and you are likely to get a similar answer.

This of course is not true. Middle management has a very important task: they know what needs to be done, who best can do it, and when it needs to be done. This is the purpose of the control unit. It knows the big picture of what needs to be done, it knows which of the CPU's components can do it, and it controls the timing to do it.

Arithmetic logic unit (ALU) – The ALU is a collection of logic circuits designed to perform arithmetic (addition, subtraction, multiplication, and division) and logical operations (not, and, or, and exclusive-or). It's basically the calculator of the CPU. When an arithmetic or logical operation is required, the values and command are sent to the ALU for processing.

• Instruction decoder – All instructions are stored as binary values. The instruction decoder receives the instruction from memory, interprets the value to see what instruction is to be performed, and tells the ALU and the registers which circuits to energize in order to perform the function.

Registers – The registers are used to store the data, addresses, and flags that are in use by the CPU.


80x86 Execution Unit
The 80x86 processor is divided into two main components: the execution unit and the bus interface unit. It is controlled by the EU control system which serves a dual purpose: it acts as the control unit and also as a portion of the instruction decoder. The EU also contains the ALU, the processor flags, and the general purpose and address registers. Figure below presents a block diagram of the EU.

img


80x86 Bus Interface
The bus interface unit (BIU) controls the transfer of information between the processor and the external devices such as memory, I/O ports, and storage devices. Basically, it acts as the bridge between the EU and the external bus. A portion of the instruction decoder is located in the BIU.

The instruction queue acts as a buffer allowing instructions to be queued up as they wait for their turn in the EU. Figure below presents the block diagram of the BIU.

img

The main purpose of the BIU is to take the 16-bit pointers of the EU and modify them so that they can point to data in the 20-bit address space. This is done using the four registers CS, DS, SS, and ES. These are the segment registers.

Segment Addressing

In the center of the BIU block diagram is a set of segment registers labeled CS, DS, SS, and ES. These four 16-bit registers are used in conjunction with the pointer and index registers to store and retrieve items from the memory space.

So how does the processor combine a 16-bit address register with a 16-bit segment register to create a 20-bit address? Well, it is all done in the address summing block located directly above the segment registers in the block diagram of the BIU in Figure above.

Every time the processor goes out to its memory space to read or write data, this 20-bit address must be calculated based on different combinations of address and segment registers.

Next time your Intel-based operating system throws up an execution error, look to see if it gives you the address where the error occurred. If it does, you should see some hexadecimal numbers in a format similar to the one shown below:

img

This number is a special representation of the segment register (the number to the left of the colon) and the pointer or index register (the number to the right of the colon). Remember that a 4-digit hexadecimal number represents a 16-bit binary number. It is the combination of these two 16-bit registers that creates the 20-bit address.

The process works like this. First take the value in the segment register and shift if left four places. This has the effect of adding a zero to the right side of the hexadecimal number or four zeros to the right side of the binary number. In our example above, the segment is 324116 = 0011 0010 0100 00012. Adding a zero nibble to the right side of the segment gives us 3241016 = 0011 0010 0100 0001 00002.

The pointer or index register is then added to this 20-bit segment address.

Continuing our example gives us:

img

For the rest of this book, we will use the following terminology to represent these three values.

• The 20-bit value created by shifting the value in a segment register four places to the left will be referred to as the segment address. It points to the lowest address to which a segment: pointer combination can point. This address may also be referred to as the base address of the segment.

• The 16-bit value stored in a pointer or index register will be referred to as the offset address. It represents an offset from the segment address to the address in memory that the processor needs to communicate with.

• The resulting 20-bit value that comes out of the address summing block points to a specific address in the processor's memory space.

This address will be referred to as the physical address, and it is the address that is placed on the address lines of the memory bus. If we look at the function of the segment and pointer registers from the perspective of the memory space, the segment register adjusted with four binary zeros filled in from the right points to an address somewhere in the full memory space.

Because the least significant four bits are always zero, this value can only point to memory in 16-byte increments. The 16-bit offset address from the pointer register is then added to the segment address pointing to an address within the 216 = 65,535 (64K) locations above where the segment register is pointing.

This is the physical address. Figure below shows how the segment and pointer addresses relate to each other when pointing to a specific address within the memory space.

img

There is a second purpose for this segment: pointer addressing method beyond allowing the 80x86 processor to control 20 address lines using 16-bit registers. This second reason is actually of greater importance as it allows for greater functionality of the operating system.

By assigning the responsibility of maintaining the segment registers to the operating system while allowing the application to control the address and pointer registers, applications can be placed anywhere in memory without affecting their operation.

When the operating system loads an application to be executed, it selects a 64 K block of memory called a segment and uses the lowest address of that block as the base address for that particular application. During execution, the application modifies only the pointer registers keeping its scope within the 64K block of its segment.

As long as the application modifies only the address registers, then the program remains in the 64 K segment it was assigned to. By using this process, the operating system is free to place an application wherever it wants to in memory. It also allows the operating system to maintain several concurrent applications in memory by keeping track of which application is assigned to which segment.

Although the programmer may force a segment register to be used for a different purpose, each segment register has an assigned purpose. The following describes the uses of the four segment registers, CS, DS, SS, and ES.

Code Segment (CS) – This register contains the base address of the segment assigned to contain the code of an application. It is paired with the Instruction Pointer (IP) to point to the next instruction to load into the instruction decoder for execution.

Data Segment (DS) – This register contains the base address of the segment assigned to contain the data used by an application. It is typically associated with the SI register.

Stack Segment (SS) – This register contains the base address of the stack segment. Remember that there are two pointer registers that use the stack. The first is the stack pointer, and the combination of SS and SP points to the last value stored in this temporary memory. The other register is the base pointer which is used to point to the block of data elements passed to a function.

• Extra Segment (ES) – Like DS, this register points to the data segment assigned to an application. Where DS is associated with the SI register, ES is associated with the DI register.


80x86 Assembly Language
This language was presented to create a few simple programs and present how the CPU executed code. In this chapter, the assembly language of the Intel 80x86 processor family is introduced along with the typical syntax for writing 80x86 assembly language programs. This information is then used to write a sample program for the 80x86 processor.

Assemblers versus Compilers

For a high-level programming language such as C, there is a two step process to produce an application from source code. To begin with, a program called a compiler takes the source code and converts it into machine language instructions. This is a complex task that requires a detailed understanding of the architecture of the processor.

The compiler outputs the resulting sequence of machine code instructions to a file called an object file. The second step takes one or more object files and combines them by merging addressing information and generating necessary support code to make the final unit operate as an application. The program that does this is called a linker.

In order for the linker to operate properly, the object files must follow certain rules for format and addressing to clearly show how one object file interrelates with the others.

A similar two-step process is used to convert assembly language source code into an application. It begins with a program called an assembler. The assembler takes an assembly language program, and using a one-to-one conversion process, converts each line of assembly language to a single machine code instruction.

Because of this one-to one relation between assembly language instructions and machine code instructions, the assembly language programmer must have a clear understanding of how the processor will execute the machine code.

In other words, the programmer must take the place of the compiler by converting abstract processes to the step-by-step processor instructions.

As with the compiler, the output of the assembler is an object file. The format and addressing information of the assembler's object file should mimic that of the compiler making it possible for the same linker to be used to generate the final application.

This means that as long as the assembly language programmer follows certain rules when identifying shared addressing, the object file from an assembler should be capable of being linked to the object files of a high-level language compiler.

The format of an assembly language program depends on the assembler being used. There are, however, some general formatting patterns that are typically followed. This section presents some of those standards.

Components of a Line of Assembly Language

As shown in Figure below, a line of assembly language code has four fields: a label, an opcode, a set of operands, and comments. Each of these fields must be separated by horizontal white space, i.e., spaces or tabs. No carriage returns are allowed as they identify the beginning of a new line of code. Depending on the function of a particular line, one or more of the fields may be omitted.

The first field of a line is an optional label field. A label is used to identify a specific line of code or the memory location of a piece of data so that it may be referenced by other lines of assembly language. The assembler will translate the label into an address for use in the object file.

As far as the programmer is concerned, however, the label may be used any time an address reference is needed to that particular line. It is not necessary to label all lines of assembly language code, only the ones that are referred to by other lines of code.

img

A label is a text string much like a variable name in a high-level language. There are some rules to be obeyed when defining a label.

• Labels must begin in the first column with an alphabetic character. Subsequent characters may be numeric.

• It must not be a reserved string, i.e., it cannot be an assembly language instruction nor can it be a command to the assembler.

• Although a label may be referenced by other lines of assembly language, it cannot be reused to identify a second line of code within the same file.

• In some cases, a special format for a label may be required if the label's function goes beyond identification of a line within a file. A special format may be needed, for example, if a high-level programming language will be referencing one of the assembly language program's functions.

The next field is the instruction or opcode field. The instruction field contains the assembly language command that the processor is supposed to execute for this line of code. An instruction must be either an assembly language instruction (an opcode) or an instruction to the assembler (an assembler directive).

The third field is the operand field. The operand field contains the data or operands that the assembly language instruction needs for its execution. This includes items such as memory addresses, constants, or register names. Depending on the instruction, there may be zero, one, two, or three operands, the syntax and organization of which also depends on the instruction.

The last field in a line of assembly language is the comment field. As was mentioned earlier, assembly language has no structure in the syntax to represent blocks of code. Although the specific operation of a line of assembly language should be clear to a programmer, its purpose within the program usually is not. It is therefore imperative to comment assembly language programs.

In addition to the standard use of comments, comments in assembly language can be used to:

• show where functions or blocks of code begin and end;

• explain the order or selection of commands (e.g., where a shift left has replaced a multiplication by a power of two);

or

• identify obscure values (e.g., that address 037816 represents the data registers of the parallel port).

A comment is identified with a preceding semi-colon, ';'. All text from the semi-colon to the end of the line is ignored. This is much like the double-slash, "//", used in C++ or the quote used in Visual Basic to comment out the remaining text of a line. A comment may be alone in a line or it may follow the last necessary field of a line of code.

Assembly Language Directives

There are exceptions in an assembly language program to the opcode/operand lines described in the previous section. One of the primary exceptions is the assembler directive. Assembler directives are instructions to the assembler or the linker indicating how the program should be created.

SEGMENT Directive

One of the most important directives with respect to the final addressing and organization of the application is SEGMENT. This directive is used to define the characteristics and or contents of a segment.

There are three main segments:

1. The code segment,

2. The data segment,

3. and the stack segment.

To define these segments, the assembly language file is divided into areas using the SEGMENT directive. The beginning of the segment is defined with the keyword SEGMENT while its end is defined using the keyword ENDS. Figure below presents the format and parameters used to define a segment.

img

The label uniquely identifies the segment. The SEGMENT directive label must match the corresponding ENDS directive label. The alignment attribute indicates the "multiple" of the starting address for the segment.

For a number of reasons, either the processor or the operating system may require that a segment begin on an address that is divisible by a certain power of two. The align attribute is used to tell the assembler what multiple of a power of two is required. The following is a list of the available settings for alignment.

BYTE – There is no restriction on the starting address.

WORD – The starting address must be even, i.e., the binary address must end in a zero.

DWORD – The starting address must be divisible by four, i.e., the binary address must end in two zeros.

PARA – The starting address must be divisible by 16, i.e., the binary address must end in four zeros.

PAGE – The starting address must be divisible by 256, i.e., the binary address must end in eight zeros.

The combine attribute is used to tell the linker if segments can be combined with other segments..

.MODEL, .STACK, .DATA, and .CODE Directives

Instead of going to the trouble of defining the segments with the SEGMENT directive, a programmer may select a memory model. By defining the memory model for the program, a basic set of segment definitions is assumed. The directive .MODEL can do this. Figure below presents the format of the .MODEL directive.

img

Table below presents the different types of memory models that can be used with the directive. The memory models LARGE and HUGE are the same except that HUGE may contain single variables that use more than 64K of memory.

There are three more directives that can be used to simplify the definition of the segments. They are .STACK, .DATA, and .CODE. When the assembler encounters one of these directives, it assumes that it is the beginning of a new segment, the type being defined by the specific directive used (stack, data, or code). It includes everything that follows the directive in the same segment until a different segment directive is encountered.

The .STACK directive takes an integer as its operand allowing the programmer to define the size of the segment reserved for the stack. The .CODE segment takes a label as its operand indicating the segment's name.

img

PROC Directive

The next directive, PROC, is used to define the beginning of a block of code within a code segment. It is paired with the directive ENDP which defines the end of the block. The code defined between PROC and ENDP should be treated like a procedure or a function of a high-level language. This means that jumping from one block of code to another is done by calling it like a procedure.

img

END Directive

Another directive, END, is used to tell the assembler when it has reached the end of all of the code. Unlike the directive pairs SEGMENT and ENDS and PROC and ENDP, there is no corresponding directive to indicate the beginning of the code.

Data Definition Directives

The previous directives are used to tell the assembler how to organize the code and data. The next class of directives is used to define entities that the assembler will convert directly to components to be used by the code.

They do not represent code; rather they are used to define data or constants on which the application will operate. Many of these directives use integers as their operands. As an aid to programmers, the assembler allows these integers to be defined in binary, decimal, or hexadecimal.

Without some indication as to their base, however, some values could be interpreted as hex, decimal, or binary (e.g., 100). Hexadecimal values have an 'H' appended to the end of the number, binary values have a 'B' appended to the end, and decimal values are left without any suffix.

Note also that the first digit of any number must be a numeric digit. Any value beginning with a letter will be interpreted by the assembler as a label instead of a number. This means that when using hexadecimal values, a leading zero must be placed in front of any number that begins with A, B, C, D, E, or F.

The first of the defining directives is actually a set of directives used for reserving and initializing memory. These directives are used to reserve memory space to hold elements of data that will be used by the application. These memory spaces may either be initialized or left undefined, but their size will always be specified.

EQU Directive

The next directive, EQU, is in the same class as the define directives. It is like the #define directive used in C, and like #define, it is used to define strings or constants to be used during assembly. The format of the EQU directive is shown in Figure below

img

img

Both the label and the expression are required fields with the EQU directive. The label, which also is to follow the formatting guidelines of the label field, is made equivalent to the expression. This means that whenever the assembler comes across the label later in the file, the expression is substituted for it.

img

80x86 Opcodes

Assembly language instructions can be categorized into four groups: data transfer, data manipulation, program control, and special operations. The next four sections introduce some of the Intel 80x86 instructions by describing their function.

Data Transfer

There is one Intel 80x86 opcode that is used to move data: MOV. As shown in Figure below, the MOV opcode takes two operands, dest and src. MOV copies the value specified by the src operand to the memory or register specified by dest.

img

Both dest and src may refer to registers or memory locations. The operand src may also specify a constant. These operands may be of either byte or word length, but regardless of what they are specifying, the sizes of src and dest must match for a single MOV opcode. The assembler will generate an error if they do not.

Data Manipulation

Intel designed the 80x86 family of processors with plenty of instructions to manipulate data. Most of these instructions have two operands, dest and src, and just like the MOV instruction, they read from src and store in dest. The difference is that the src and dest values are combined somehow before being stored in dest. Another difference is that the data manipulation opcodes typically affect the flags.

Take for example the ADD opcode shown in Figure 17-11. It reads the data identified by src, adds it to the data identified by dest, then replaces the original contents of dest with the result.

img

The ADD opcode modifies the processor's flags including the carry flag (CF), the overflow flag (OF), the sign flag (SF), and the zero flag (ZF). This means that any of the Intel 80x86 conditional jumps can be used after an ADD opcode for program flow control.

Many of the other data manipulation opcodes operate the same way. These include logic operations such as AND, OR, and XOR and mathematical operations such as SUB (subtraction) and ADC (add with carry). MUL (multiplication) and DIV (division) are different in that they each use a single operand, but since two pieces of data are needed to perform these operations, the AX or AL registers are implied.

Program Control

As with the generic processor described in Chapter 15, the 80x86 uses both unconditional and conditional jumps to alter the sequence of instruction execution. When the processor encounters an unconditional jump or "jump always" instruction (JMP), it loads the instruction pointer with the address that serves as the JMP's operand. This makes it so that the next instruction to be executed is at the newly loaded address. Figure below presents an example of the JMP instruction.

img

The 80x86 has a full set of conditional jumps to provide program control based on the results of execution. Each conditional jump examines the flags before determining whether to load the jump opcode's operand into the instruction pointer or simply move to the next sequential instruction. Table below presents a summary of most of the 80x86 conditional jumps along with the flag settings that force a jump. (Note that "!=" means "is not equal to")

img

Typically, these conditional jumps come immediately after a compare. In the Intel 80x86 instruction set, the compare function is CMP. It uses two operands, setting the flags by subtracting the second operand from the first. Note that the result is not stored.

Special Operations

The special operations category is for opcodes that do not fit into any of the first three categories, but are necessary to fully utilize the processor's resources. They provide functionality ranging from controlling the processor flags to supporting the 80x86 interrupt system.

To begin with, there are seven instructions that allow the user to manually alter the flags. These are presented in Table below.

img

The next two special instructions are PUSH and PULL. The Intel 80x86 processor's stack is referred to as a post-increment/ pre-decrement stack. This means that the address in the stack pointer is decremented before data is stored to the stack and incremented after data is retrieved from the stack.

===============> COURSES COMPLETE <===============

No comments:

Post a Comment