Home
Book Fire Online by http://bookfire.net Prev Page Prev Page
Table of Contents
Back Cover
The Essentials of Computer Organization and Architecture
Preface
To the Instructor
Chapter 1: Introduction
1.2 The Main Components of a Computer
1.3 An Example System - Wading through the Jargon
1.4 Standards Organizations
1.5 Historical Development
1.6 The Computer Level Hierarchy
1.7 The Von Neumann Model
1.8 Non-Von Neumann Models
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Chapter 2: Data Representation in Computer Systems
2.2 Positional Numbering Systems
2.3 Decimal To Binary Conversions
2.4 Signed Integer Representation
2.5 Floating-Point Representation
2.6 Character Codes
2.7 Codes For Data Recording And Transmission
2.8 Error Detection And Correction
Chapter Summary
Further Reading
References
Review Of Essential Terms And Concepts
Exercises
Chapter 3: Boolean Algebra and Digital Logic
3.2 Boolean Algebra
3.3 Logic Gates
3.4 Digital Components
3.5 Combinational Circuits
3.6 Sequential Circuits
3.7 Designing Circuits
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Focus on Karnaugh Maps
Chapter 4: MARIE : An Introduction to a Simple Computer
4.2 Marie
4.3 Instruction Processing
4.4 A Simple Program
4.5 A Discussion on Assemblers
4.6 Extending Our Instruction Set
4.7 A Discussion on Decoding — Hardwired vs. Microprogrammed Control
4.8 Real World Examples of Computer Architectures
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Chapter 5: A Closer Look at Instruction Set Architectures
5.2 Instruction Formats
5.3 Instruction Types
5.4 Addressing
5.5 Instruction-Level Pipelining
5.6 Real-World Examples of ISAs
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Chapter 6: Memory
6.2 Types of Memory
6.3 The Memory Hierarchy
6.4 Cache Memory
6.5 Virtual Memory
6.6 A Real-World Example of Memory Management
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Chapter 7: Input/Output and Storage Systems
7.2 Amdahl's Law
7.3 I/O Architectures
7.4 Magnetic Disk Technology
7.5 Optical Disks
7.6 Magnetic Tape
7.7 RAID
7.8 Data Compression
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Focus on Selected Disk Storage Implementations
Chapter 8: System Software
8.2 Operating Systems
8.3 Protected Environments
8.4 Programming Tools
8.5 Java — All of the Above
8.6 Database Software
8.7 Transaction Managers
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Chapter 9: Alternative Architectures
9.2 RISC Machines
9.3 Flynn's Taxonomy
9.4 Parallel and Multiprocessor Architectures
9.5 Alternative Parallel Processing Approaches
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Chapter 10: Performance Measurement and Analysis
10.2 The Basic Computer Performance Equation
10.3 Mathematical Preliminaries
10.4 Benchmarking
10.6 Disk Performance
Chapter Summary
Further Reading
References
Review Of Essential Terms And Concepts
Exercises
Chapter 11: Network Organization and Architecture
11.2 Early Business Computer Networks
11.3 Early Academic and Scientific Networks — The Roots and Architecture of the Internet
11.5 Network Protocols II — TCP/IP Network Architecture
11.6 Network Organization
11.7 High-Capacity Digital Links
11.8 A Look at the Internet
Chapter Summary
Further Reading
References
Review of Essential Terms and Concepts
Exercises
Appendix A: Data Structures and the Computer
A.2 Fundamental Structures
A.3 Trees
A.4 Network Graphs
Summary
Further Reading
References
Exercises
Glossary
Glossary Numbers
Glossary A
Glossary B
Glossary C
Glossary D
Glossary E
Glossary F
Glossary G
Glossary H
Glossary I
Glossary J
Glossary K
Glossary L
Glossary M
Glossary N
Glossary O
Glossary P
Glossary Q
Glossary R
Glossary S
Glossary T
Glossary U
Glossary V
Glossary W
Glossary Z
Answers and Hints for Selected Exercises
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Appendix A
Index
Index A
Index B
Index C
Index D
Index E
Index F
Index G
Index H
Index I
Index J
Index K
Index L
Index M
Index N
Index O
Index P
Index Q
Index R
Index S
Index T
Index U
Index V
Index W
Index X
Index Z
List of Figures
List of Tables
List of Code Examples
List of Sidebars
Team LiB
Previous Section Next Section

7.8 Data Compression

No matter how cheap storage gets, no matter how much of it we buy, we never seem to be able to get enough of it. New huge disks fill rapidly with all the things we wish we could have put on the old disks. Before long, we are in the market for another set of new disks. Few people or corporations have access to unlimited resources, so we must make optimal use of what we have. One way to do this is to make our data more compact, compressing it before writing it to disk. (In fact, we could even use some kind of compression to make room for a parity or mirror set, adding RAID to our system for "free"!)

Data compression can do more than just save space. It can also save time and help to optimize resources. For example, if compression and decompression are done in the I/O processor, less time is required to move the data to and from the storage subsystem, freeing the I/O bus for other work.

The advantages offered by data compression in sending information over communication lines are obvious: less time to transmit and less storage on the host. Although a detailed study is beyond the scope of this book (see the references section for some resources), you should understand a few basic data compression concepts to complete your understanding of I/O and data storage.

When we evaluate various compression algorithms and compression hardware, we are often most concerned with how fast a compression algorithm executes and how much smaller a file becomes after the compression algorithm is applied. The compression factor (sometimes called compression ratio) is a statistic that can be calculated quickly and is understandable by virtually anyone who would care. There are a number of different methods used to compute a compression factor. We will use the following:

For example, suppose we start with a 100KB file and apply some kind of compression to it. After the algorithm terminates, the file is 40KB in size. We can say that the algorithm achieved a compression factor of: for this particular file. An exhaustive statistical study should be undertaken before inferring that the algorithm would always produce 60% file compression. We can, however, determine an expected compression ratio for particular messages or message types once we have a little theoretical background.

The study of data compression techniques is a branch of a larger field of study called information theory. Information theory concerns itself with the way in which information is stored and coded. It was born in the late 1940s through the work of Claude Shannon, a scientist at Bell Laboratories. Shannon established a number of information metrics, the most fundamental of which is entropy. Entropy is the measure of information content in a message. Messages with higher entropy carry more information than messages with lower entropy. This definition implies that a message with lower information content would compress to a smaller size than a message with a higher information content.

Determining the entropy of a message requires that we first determine the frequency of each symbol within the message. It is easiest to think of the symbol frequencies in terms of probability. For example, in the famous program output statement:

HELLO WORLD!

the probability of the letter L appearing is or . In symbols, we have P(L) = 0.25. To map this probability to bits in a code word, we use the base-2 log of this probability. Specifically, the minimum number of bits required to encode the letter L is: log2 P(L) or 2. The entropy of the message is the weighted average of the number of bits required to encode each of the symbols in the message. If the probability of a symbol x appearing in a message is P(x), then the entropy, H, of the symbol x is:

H = P(x) log2 P(x)

The average entropy over the entire message is the sum of the weighted probabilities of all n symbols of the message:

Entropy establishes a lower bound on the number of bits required to encode a message. Specifically, if we multiply the number of characters in the entire message by the weighted entropy, we get the theoretical minimum of the number of bits that are required to encode the message without loss of information. Bits in addition to this lower bound do not add information. They are therefore redundant. The objective of data compression is to remove redundancy while preserving information content. We can quantify the average redundancy for each character contained in a coded message of length n containing code words of length l by the formula:

This formula is most useful when we are comparing the effectiveness of one coding scheme over another for a given message. The code producing the message with the least amount of redundancy is the better code in terms of data compression. (Of course, we must also consider such things as speed and computational complexity as well as the specifics of the application before we can say that one method is better than another.)

Finding the entropy and redundancy for a text message is a straightforward application of the formula above. With a fixed-length code, such as ASCII or EBCDIC, the left-hand sum above is exactly the length of the code, usually 8 bits. In our HELLO WORLD! example (using the right-hand sum) we find the average symbol entropy is about 3.022. This means that if we reached the theoretical maximum entropy, we would need only 3.022 bits per character x 12 characters = 36.26 or 37 bits to encode the entire message. The 8-bit ASCII version of the message, therefore, carries 96 – 37 or 59 redundant bits.

7.8.1 Statistical Coding

The entropy metric just described can be used as a basis for devising codes that minimize redundancy in the compressed message. Generally, any application of statistical compression is a relatively slow and I/O-intensive process, requiring two passes to read the file before it is compressed and written.

Two passes over the file are needed because the first pass is used for tallying the number of occurrences of each symbol. These tallies are used to calculate probabilities for each different symbol in the source message. Values are assigned to each symbol in the source message according to the calculated probabilities. The newly assigned values are subsequently written to a file along with the information required to decode the file. If the encoded file—along with the table of values needed to decode the file—are smaller than the original file, we say that data compression has occurred.

Huffman and arithmetic coding are two fundamental statistical data compression methods. Variants of these methods can be found in a large number of popular data compression programs. We examine each of these in the next sections, starting with Huffman coding.

Huffman Coding

Suppose that after we determine probabilities for each of the symbols in the source message, we create a variable-length code that assigns the most frequently used symbols to the shortest code words. If the code words are shorter than the information words, it stands to reason that the resulting compressed message will be shorter as well. David A. Huffman formalized this idea in a paper published in 1952. Interestingly, one form of Huffman coding, Morse code, has been around since the early 1800s.

Morse code was designed using typical letter frequencies found in English writing. As you can see in Figure 7.27, the shorter codes represent the more frequently used letters in the English language. These frequencies clearly cannot apply to every single message. A notable exception would be a telegram from Uncle Zachary vacationing in Zanzibar, requesting a few quid so he can quaff a quart of quinine! Thus, the most accurate statistical model would be individualized for each message. To accurately assign code words, the Huffman algorithm builds a binary tree using symbol probabilities found in the source message. A traversal of the tree gives the bit pattern assignments for each symbol in the message. We illustrate this process using a simple nursery rhyme. For clarity, we render the rhyme in all uppercase with no punctuation as follows:


Figure 7.27: The International Morse Code

HIGGLETY PIGGLETY POP

THE DOG HAS EATEN THE MOP

THE PIGS IN A HURRY THE CATS IN A FLURRY

HIGGLETY PIGGLETY POP

We start by tabulating all occurrences of each letter in the rhyme. We will use the abbreviation <ws> (white space) for the space characters between each word as well as the newline characters (see Table 7.1).

Table 7.1: Letter Frequencies

Letter

Count

Letter

Count

A

5

N

3

C

1

O

4

D

1

P

8

E

10

R

4

F

1

S

3

G

10

T

10

H

8

U

2

I

7

Y

6

L

5

<ws>

21

M

1

  

These letter frequencies are associated with each letter using two nodes of a tree. The collection of these trees (a forest) is placed in a line ordered by the letter frequencies like this:

We begin building the binary tree by joining the nodes with the two smallest frequencies. Because we have a four-way tie for the smallest, we arbitrarily select the leftmost two nodes. The sum of the combined frequencies of these two nodes is two. We create a parent node labeled with this sum and place it back into the forest in the location determined by the label on the parent node, as shown:

We repeat the process for the nodes now having the lowest frequencies:

The two smallest nodes are the parents of F, M, C, and D. Taken together, they sum to a frequency of 4, which belongs in the fourth position from the left:

The leftmost nodes add up to 5. They are moved to their new position in the tree as shown:

The leftmost pair combine to create a parent node with a frequency of 8. It is placed back in the forest as shown:

After several more iterations, the completed tree looks like this:

This tree establishes the framework for assigning a Huffman value to each symbol in the message. We start by labeling every right branch with a binary 1, then each left branch with a binary 0. The result of this step is shown below. (The frequency nodes have been removed for clarity.)

All that we need to do now is traverse the tree from its root to each leaf node, keeping track of the binary digits encountered along the way. The completed coding scheme is shown in Table 7.2.

Table 7.2: The Coding Scheme Letter

Letter

Code

Letter

Code

<ws>

01

O

10100

T

000

R

10101

L

0010

A

11011

Y

0011

U

110100

I

1001

N

110101

H

1011

F

1000100

P

1100

M

1000101

E

1110

C

1000110

G

1111

D

1000111

S

10000

  

As you can see, the symbols with the highest frequencies end up having the fewest bits in their code. The entropy of this message is approximately 3.82 bits per symbol. The theoretical lower bound compression for this message is therefore 110 symbols x 3.82 bits = 421 bits. This Huffman code renders the message in 426 bits, or about 1% more than is theoretically necessary.

Arithmetic Coding

Huffman coding cannot always achieve theoretically optimal compression because it is restricted to using an integral number of bits in the resulting code. In the nursery rhyme in the previous section, the entropy of the symbol S is approximately 1.58. An optimal code would use 1.58 bits to encode each occurrence of S. With Huffman coding, we are restricted to using at least 2 bits for this purpose. This lack of precision cost us a total of 5 redundant bits in the end. Not too bad, but it seems we could do better.

Huffman coding falls short of optimality because it is trying to map probabilities—which are elements of the set of real numbers—to elements of a small subset of the set of integers. We are bound to have problems! So why not devise some sort of real-to-real mapping to achieve data compression? In 1963, Norman Abramson conceived of such a mapping, which was subsequently published by Peter Elias. This real-to-real data compression method is called arithmetic coding.

Conceptually, arithmetic coding partitions the real number line in the interval between 0 and 1 using the probabilities in the symbol set of the message. More frequently used symbols get a larger chunk of the interval.

Returning to our favorite program output, HELLO WORLD!, we see that there are 12 characters in this imperative statement. The lowest probability among the symbols is . All other probabilities are a multiple of . Thus, we divide the 0 – 1 interval into 12 parts. Each of the symbols except L and O are assigned of the interval. L and O get and , respectively. Our probability-to-interval mapping is shown in Table 7.3.

Table 7.3: Probability Interval Mapping for HELLO WORLD!

We encode a message by successively dividing a range of values (starting with 0.0 through 1.0) proportionate to the interval assigned to the symbol. For example, if the "current interval" is and the letter L gets of the current interval, as shown above, then to encode the L, we multiply by , giving as the new current interval. If the next character is another L, is multiplied by yielding for the current interval. We proceed in this vein until the entire message is encoded. This process becomes clear after studying the pseudocode below. A trace of the pseudocode for HELLO WORLD! is given in Figure 7.28.

Click To expand
Figure 7.28: Encoding HELLO WORLD! with Arithmetic Coding
ALGORITHM Arith_Code (Message)
  HiVal ¬ 1.0         /* Upper limit of interval. */
  LoVal ¬ 0.0        /* Lower limit of interval. */
  WHILE (more characters to process)
       Char ¬ Next message character
       Interval ¬ HiVal – LoVal
       CharHiVal ¬ Upper interval limit for Char
       CharLoVal ¬ Lower interval limit for Char
       HiVal ¬ LoVal +  Interval * CharHiVal
       LoVal ¬ LoVal +  Interval * CharLoVal
  ENDWHILE
  OUTPUT (LoVal)
END Arith_Code

The message is decoded using the same process in reverse, as shown by the pseudocode that follows. A trace of the pseudocode is given in Figure 7.29.

Click To expand
Figure 7.29: A Trace of Decoding HELLO WORLD!
ALGORITHM Arith_Decode (CodedMsg)
    Finished ¬ FALSE
    WHILE NOT Finished
       FoundChar ¬ FALSE    /*  We could do this search much more    */
       WHILE NOT FoundChar      /* efficiently in a real implementation. */
          PossibleChar ¬ next symbol from the code table
          CharHiVal ¬ Upper interval limit for PossibleChar
          CharLoVal ¬ Lower interval limit for PossibleChar
          IF CodedMsg < CharHiVal AND CodedMsg > CharLoVal THEN
             FoundChar ¬ TRUE
          ENDIF    / * We now have a character whose interval  */
       ENDWHILE    /* surrounds the current message value.     */
       OUTPUT(Matching Char)
       Interval ¬ CharHiVal – CharLoVal
    CodedMsgInterval ¬ CodedMsg - CharLoVal
    CodedMsg ¬ CodedMsgInterval / Interval
    IF CodedMsg = 0.0 THEN
       Finished ¬ TRUE
    ENDIF
END WHILE
END Arith_Decode

You may have noticed that neither of the arithmetic coding/decoding algorithms contains any error checking. We have done this for the sake of clarity. Real implementations must guard against floating point underflow in addition to making sure that the number of bits in the result are sufficient for the entropy of the information.

Differences in floating point representations can also cause the algorithm to miss the zero condition when the message is being decoded. In fact, an end-of-message character is usually inserted at the end of the message during the coding process to prevent such problems during decoding.

7.8.2 Ziv-Lempel (LZ) Dictionary Systems

Although arithmetic coding can produce nearly optimal compression, it is even slower than Huffman coding because of the floating-point operations that must be performed during both the encoding and decoding processes. If speed is our first concern, we might wish to consider other compression methods, even if it means that we can't achieve a perfect code. Surely, we would gain considerable speed if we could avoid scanning the input message twice. This is what dictionary methods are all about.

Jacob Ziv and Abraham Lempel pioneered the idea of building a dictionary during the process of reading information and writing encoded bytes. The output of dictionary-based algorithms contains either literals or pointers to information that has previously been placed in the dictionary. Where there is substantial "local" redundancy in the data, such as long strings of spaces or zeros, dictionary-based techniques work exceptionally well. Although referred to as LZ dictionary systems, the name "Ziv-Lempel" is preferred to "Lempel-Ziv" when using the authors' full names.

Ziv and Lempel published their first algorithm in 1977. This algorithm, known as the LZ77 compression algorithm, uses a text window in conjunction with a look-ahead buffer. The look ahead buffer contains the information to be encoded. The text window serves as the dictionary. If any characters inside the look-ahead buffer can be found in the dictionary, the location and length of the text in the window is written to the output. If the text cannot be found, the unencoded symbol is written with a flag indicating that the symbol should be used as a literal.

There are many variants of LZ77, all of which build on one basic idea. We will explain this basic version through an example, using another nursery rhyme. We have replaced all spaces by underscores for clarity:

STAR_LIGHT_STAR_BRIGHT_

FIRST_STAR_I_SEE_TONIGHT_

I_WISH_I_MAY_I_WISH_I_MIGHT_

GET_THE_WISH_I_WISH_TONIGHT

For illustrative purposes, we will use a 32-byte text window and a 16-byte look-ahead buffer. (In practice, these two areas usually span several kilobytes.) The text is first read into the look-ahead buffer. Having nothing in the text window yet, the S is placed in the text window and a triplet composed of:

  1. The offset to the text in the text window

  2. The length of the string that has been matched

  3. The first symbol in the look-ahead buffer that follows the phrase

In the example above, there is no match in the text, so the offset and string length are both zeros. The next character in the look-ahead buffer also has no match, so it is also written as a literal with index and length both zero.

We continue writing literals until a T appears as the first character of the lookahead buffer. This matches the T that is in position 1 of the text window. The character following the T in the look-ahead buffer is an underscore, which is the third item in the triplet that is written to the output.

The look-ahead buffer now shifts by two characters. STAR_ is now at the beginning of the look-ahead buffer. It has a match at the first character position (position 0) of the text window. We write 0, 5, B because B is the character following STAR_ in the buffer.

We shift the look-ahead buffer by six characters, and look for a match on the R. We find one at position 3 of the text, writing 3, 1, I.

GHT is now at the beginning of the buffer. It matches four characters in the text starting at position 7. We write, 7, 4, F.

After a few more iterations, the text window is nearly full:

After we match STAR_ with the characters at position 0 of the text, the six characters, STAR_I, shift out of the buffer and into the text window. In order to accommodate all six characters, the text window must shift to the right by three characters after we process STAR_.

After writing the code for STAR_I and shifting the windows, _S is at the beginning of the buffer. These characters match with the text at position 7.

Continuing in this manner, we ultimately come to the end of the text. The last characters to be processed are IGHT. These match the text at position 4. Because there are no characters after IGHT in the buffer, the last triple written is tagged with an end of file character, <EOF>.

A total of 36 triples are written to the output in this example. Using a text window of 32 bytes, the index needs only 5 bits to point to any text character. Because the look-ahead buffer is 16 bytes wide, the longest string that we can match is 16 bytes, so we require a maximum of 4 bits to store the length. Using 5 bits for the index, 4 bits for the string length, and 7 bits for each ASCII character, each triple requires 16 bits, or 2 bytes. The rhyme contains 103 characters, which would have been stored in 103 uncompressed bytes on disk. The compressed message requires only 72 bytes, giving us a compression factor of .

It stands to reason that if we make the text window larger, we increase the likelihood of finding matches with the characters in the look-ahead buffer. For example, the string _TONIGHT occurs at the forty-first position of the rhyme and again at position 96. Because there are 48 characters between the two occurrences of _TONIGHT, the first occurrence cannot possibly be used as a dictionary entry for the second occurrence if we use a 32-character text window. Enlarging the text window to 64 bytes allows the first _TONIGHT to be used to encode the second one, and it would add only one bit to each coded triple. In this example, however, an expanded 64-byte text window decreases the output by only two triples: from 36 to 34. Because the text window requires 7 bits for the index, each triple would consist of 17 bits. The compressed message then occupies a total of 17 x 34 = 578 bits or about 73 bytes. Therefore, the larger text window actually costs us a few bits in this example.

A degenerate condition occurs when there are no matches whatsoever between the text and the buffer during the compression process. For instance, if we would have used a 36-character string consisting of all the letters of the alphabet and the digits 0 through 9, ABC . . . XYZ012 . . . 9, we would have had no matches in our example. The output of the algorithm would have been 36 triples of the form, 0,0,?. We would have ended up with an output triple the size of the original string, or an expansion of 200%.

Fortunately, exceptional cases like the one just cited happen rarely in practice. Variants of LZ77 can be found in a number of popular compression utilities, including the ubiquitous PKZIP. IBM's RAMAC RVA 2 Turbo disk array implements LZ77 directly in the disk control circuitry. This compression takes place at hardware speeds, making it completely transparent to users.

Dictionary-based compression has been an active area of research since Ziv and Lempel published their algorithm in 1977. One year later, Ziv and Lempel improved on their own work when they published their second dictionary-based algorithm, now known as LZ78. LZ78 differs from LZ77 in that it removes the limitation of the fixed-size text window. Instead, it creates a special tree data structure called a trie, which is populated by tokens as they are read from the input. (Each interior node of the tree can have as many children as it needs.) Instead of writing characters to the disk as in LZ77, LZ78 writes pointers to the tokens in the trie. The entire trie is written to disk following the encoded message, and read first before decoding the message. (See Appendix A for more information regarding tries.)

7.8.3 GIF Compression

Efficient management of the trie of tokens is the greatest challenge for LZ78 implementations. If the dictionary gets too large, the pointers can become larger than the original data. A number of solutions to this problem have been found, one of which has been the source of acrimonious debate and legal action.

In 1984, Terry Welsh, an employee of the Sperry Computer Corporation (now Unisys), published a paper describing an effective algorithm for managing an LZ78-style dictionary. His solution, which involves controlling the sizes of the tokens used in the trie, is called LZW data compression, for Lempel-Ziv-Welsh. LZW compression is the fundamental algorithm behind the graphics interchange format, GIF (pronounced "jiff"), developed by CompuServe engineers and popularized by the World Wide Web. Because Welsh devised his algorithm as part of his official duties at Sperry, Unisys exercised its right to place a patent on it. It has subsequently requested small royalties each time a GIF is used by service providers or high-volume users. LZW is not specific to GIF. It is also used in the TIFF image format, other compression programs (including Unix Compress), various software applications (such as Postscript and PDF), and hardware devices (most notably modems). Not surprisingly, the royalty request of Unisys has not been well received within the Web community, some sites blazing with vows to boycott GIFs in perpetuity. Cooler heads have simply circumvented the issue by producing better (or at least different) algorithms, one of which is PNG, Portable Network Graphics.

Royalty disputes alone did not "cause" PNG (pronounced "ping") to come into being, but they undoubtedly hastened its development. In a matter of months in 1995, PNG went from a draft to an accepted international standard. Amazingly, the PNG specification has had only two minor revisions as of 2002.

PNG offers many improvements over GIF including:

  • User-selectable compression modes: "Faster" or "better" on a scale of 0 to 3, respectively

  • Improved compression ratios over GIF, typically 5% to 25% better

  • Error detection provided by a 32-bit CRC (ISO 3309/ITU-142)

  • Faster initial presentation in progressive display mode

  • An open international standard, freely available and sanctioned by the World Wide Web Consortium (W3C) as well as many other organizations and businesses

PNG uses two levels of compression: First, information is reduced using Huffman coding. The Huffman code is then followed by LZ77 compression using a 32KB text window.

GIF can do one thing that PNG cannot: Support multiple images in the same file, giving the illusion of animation (albeit stiffly). To correct this limitation, the Internet community produced the Multiple-image Network Graphics algorithm (or MNG, pronounced "ming"). MNG is an extension of PNG that allows multiple images to be compressed into one file. These files can be of any type, such as gray scale, true color, or even JPEGs (see the next section). Version 1.0 of MNG was released in January 2001, with refinements and enhancements sure to follow. PNG and MNG both being freely available (with source code!) over the Internet, one is inclined to think it is only a matter of time before the GIF issue becomes passé.

7.8.4 JPEG Compression

When we see a graphic image such as a photograph on a printed page or computer screen, what we are really looking at is a collection of small dots called pixels or picture elements. Pixels are particularly noticeable in low-image-quality media like newspapers and comic books. When pixels are small and packed closely together, our eyes perceive a "good quality" image. "Good quality," being a subjective measure, starts at about 300 pixels per inch (120 pixels/cm). On the high end, most people would agree that an image of 1600 pixels per inch (640 pixels/cm) is "good," if not excellent.

Pixels contain the binary coding for the image in a form that can be interpreted by display and printer hardware. Pixels can be coded using any number of bits. If, for example, we are producing a black and white line drawing, we can do so using one bit per pixel. The bit is either black (pixel = 0) or white (pixel = 1). If we decide that we'd rather have a grayscale image, we need to think about how many shades of gray will suffice. If we want to have eight shades of gray, we need three bits per pixel. Black would be 000, white 111. Anything in between would be some shade of gray.

Color pixels are produced using a combination of red, green, and blue components. If we want to render an image using eight different shades each of red, green, and blue, we must use three bits for each color component. Hence, we need nine bits per pixel, giving 29 – 1 different colors. Black would still be all zeros: R = 000, G = 000, B = 000; white would still be all ones: R = 111, G = 111, B = 111. "Pure" green would be R = 000, G = 111, B = 000. R = 011, G = 000, B = 101 would give us some shade of purple. Yellow would be produced by R = 111, G = 111, B = 000. The more bits that we use to represent each color, the closer we get to the "true color" that we see around us. Many computer systems approximate true color using eight bits per color—rendering 256 different shades of each. These 24-bit pixels can display about 16 million different colors.

Let's say that we wish to store a 4 inch x 6 inch (10cm x 15cm) photographic image in such a way as to give us "reasonably" good quality when it is viewed or printed. Using 24 bits (3 bytes) per pixel at 300 pixels per inch, we need 300 x 300 x 6 x 4 x 3 = 6.48MB to store the image. If this 4 inch x 6 inch photo is part of a sales brochure posted on the Web, we would risk losing customers with dial-up modems once they realize that 20 minutes have passed and they still have not completed downloading the brochure. At 1600 pixels per inch, storage balloons to just under 1.5GB, which is practically impossible to download and store.

JPEG is a compression algorithm designed specifically to address this problem. Fortunately, photographic images contain a considerable amount of redundant information. Furthermore, some of the information having high theoretical entropy is often of no consequence to the integrity of the image. With these ideas in mind, the ISO and ITU together commissioned a group to formulate an international image compression standard. This group is called the Joint Photographic Experts Group, or JPEG, pronounced "jay-peg." The first JPEG standard, 10928-1, was finalized in 1992. Major revisions and enhancements to this standard were begun in 1997. The new standard is called JPEG2000 and was finalized in December 2000.

JPEG is a collection of algorithms that provides excellent compression at the expense of some image information loss. Up to this point, we have been describing lossless data compression: The data restored from the compressed sequence is precisely the same as it was before compression, barring any computational or media errors. Sometimes we can achieve much better compression if a little information loss can be tolerated. Photographic images lend themselves particularly well to lossy data compression because of the human eye's ability to compensate for minor imperfections in graphical images. Of course, some images carry real information, and should be subjected to lossy compression only after "quality" has been carefully defined. Medical diagnostic images such as x-rays and electrocardiograms fall into this class. Family album and sales brochure photographs, however, are the kinds of images that can lose considerable "information," while retaining their illusion of visual "quality."

One of the salient features of JPEG is that the user can control the amount of information loss by supplying parameters prior to compressing the image. Even at 100% fidelity, JPEG produces remarkable compression. At 75% the "lost" information is barely noticeable and the image file is a small fraction of its original size. Figure 7.30 shows a gray-scale image compressed using different quality parameters. (The original 7.14KB bitmap was used as input with the stated quality parameters.)

Click To expand
Figure 7.30: JPEG Compression Using Different Quantizations on a 7.14 KB Bitmap File

As you can see, the lossiness of JPEG becomes problematic only when using the lowest quality factors. You'll also notice how the image takes on the appearance of a crossword puzzle when it is at its highest compression. The reason for this becomes clear once you understand how JPEG works.

When compressing color images, the first thing that JPEG does is to convert the RGB components to the domain of luminance and chrominance, where luminance is the brightness of the color and chrominance is the color itself. The human eye is less sensitive to chrominance than luminance, so the resulting code is constructed so that the luminance component is least likely to be lost during the subsequent compression steps. Grayscale images do not require this step.

Next, the image is divided into square blocks of eight pixels on each side. These 64-pixel blocks are converted from the spatial domain (x, y) to a frequency domain (i, j) using a discrete cosine transform (DCT) as follows:

where

The output of this transform is an 8x8 matrix of integers ranging from –1024 to 1023. The pixel at i = 0, j = 0 is called the DC coefficient and it contains a weighted average of the values of the 64 pixels in the original block. The other 63 values are called AC coefficients. Owing to the behavior of the cosine function (cos 0 = 1), the resulting frequency matrix, the (i, j) matrix, has a concentration of low numbers and zeros in the lower right-hand corner. Larger numbers collect toward the upper left-hand corner of the matrix. This pattern lends itself well to many different compression methods, but we're not quite ready for that step.

Before the frequency matrix is compressed, each value in the matrix is divided by its corresponding element in a quantization matrix. The purpose of the quantization step is to reduce the 11-bit output of the DCT to an 8-bit value. This is the lossy step in JPEG, the degree of which is selectable by the user. The JPEG specification gives several quantization matrices, any of which may be used at the discretion of the implementer. All of these standard matrices assure that the frequency matrix elements containing the most information (those toward the upper left-hand corner) lose the least amount of their information during the quantization step.

Following the quantization step, the frequency matrix is sparse—containing more zero than nonzero entries—in the lower right-hand corner. Large blocks of identical values can be compressed easily using run-length coding.

Run-length coding is a simple compression method where instead of coding XXXXX, we code 5,X to indicate a run of five Xs. When we store 5,X instead of XXXXX, we save three bytes, not including any delimiters that the method might require. Clearly, the most effective way of doing this is to try to align everything so that we get as many of the zero values adjacent to each other as we can. JPEG achieves this by doing a zig-zag scan of the frequency matrix. The result of this step is a one-dimensional matrix (a vector) that usually contains a long run of zeros. Figure 7.31 illustrates how the zig-zag scan works.

Click To expand
Figure 7.31: A Zig-Zag Scan of a JPEG Frequency Matrix

Each of the AC coefficients in the vector are compressed using run-length coding. The DC coefficient is coded as the arithmetic difference between its original value and the DC coefficient of the previous block, if there is one.

The resulting run-length encoded vector is further compressed using either Huffman or arithmetic coding. Huffman coding is the preferred method owing to a number of patents on the arithmetic algorithms.

Figure 7.32 summarizes the steps that we have just described for the JPEG algorithm. Decompression is achieved by reversing this process.

Click To expand
Figure 7.32: The JPEG Compression Algorithm

JPEG2000 offers a number of improvements over the JPEG standard of 1997. The underlying mathematics are more sophisticated, which permits greater flexibility with regard to quantization parameters and the incorporation of multiple images into one JPEG file. One of the most striking features of JPEG2000 is its ability to allow user definition of regions of interest. A region of interest is an area within an image that serves as a focal point, and would not be subjected to the same degree of lossy compression as the rest of the image. If, for example, you had a photograph of a friend standing on the shore of a lake, you would tell JPEG2000 that your friend is a region of interest. The background of the lake and the trees could be compressed to a great degree before they would lose definition. The image of your friend, however, would remain clear—if not enhanced—by a lower-quality background.

JPEG2000 replaces the discrete cosine transform of JPEG with a wavelet transform. (Wavelets are a different way of sampling and encoding an image or any set of signals.) Quantization in JPEG2000 uses a sine function rather than the simple division of the earlier version. These more sophisticated mathematical manipulations require substantially more processor resources than JPEG, causing noticeable performance degradation. Until the performance issues are resolved, JPEG2000 will be used only where the value of its unique features offset the cost of increased computational power. (Equivalently, JPEG2000 will be used when the decreased cost of computational power makes its sluggish performance irrelevant.)


Team LiB
Previous Section Next Section
Linking to Www Google.Com. Host by Book Fire