Lecture 5: Information Representation; Storage Devices

File sizes

Bit: (Binary Digit):

The smallest unit of memory. Can take on one of two

values (0 or 1). “Wires” are either on or off – corresponding to 1 or 0. All data in a computer is represented by patterns of bits.

Byte:

A group of 8 bits. (Memory is measured by the number of bytes it contains.)

Since each bit can be either 0 or 1, there are 256 different bit patterns that can be represented with 8 bits.

ASCII Code (American Standard Code for Information Interchange):

Ascii is a standardized scheme for representing characters in

patterns of 7 bits. There are 27 = 128 ascii patterns, more than enough for upper and lower case, digits and punctuation.

(Since we use bits in groups of 8, extra bit can be used for error-checking. )

0000000-1111111 (0 – 127 = 128 characters)

1+2+4+8+16+32+64=127

As computing becomes more and more international, ASCII is being replaced by Unicode. Although ASCII is sufficient for representing English text, it is too small for other languages. Ex. The Chinese language has thousands of characters, and we need more than one byte. Unicode, a 16-bit code capable of representing 65,536.

Ex: a=97

A=65

(page 637 C book) (page 221 Reed book)

ex. Can say “a” < “b” this would be true or “A” < “a” is true too.

ASCII File (or text file):

A document that contains plain text only (e.g.,a Notepad

file). Each character of text is stored as a single byte

using the ASCII code. So a file containing 20 lines of text, with 100 characters per line, would be stored in 2000 bytes.

Binary File:

Files that contain data that is not plain text (e.g., Word documents which contain formatting information (word processing files), executable files, graphics files) are not stored as plain ASCII files. But the information is still stored in some type of binary format. They are called binary files. (What happens when you try to open a Word document in Notepad? Sometimes you see garbage characters on the screen, because those bytes don't correspond to ASCII codes.)

Text files use ASCII and can be read by text word processors such as Notepad and Word. Pictures use pixels (bitmap) and can be read by a graphics program.

Unicode is an extension of ASCII, which uses 2-bytes instead of one, but allows for many different characters, so can represent characters from different foreign languages that use other character sets such as Russian, Japanes, Hebrew, Arabic. Files that are stored using Unicode will require more memory.

File Size:

The size of a file = Numbers of bytes in the file.

For plain ASCII text files, the size of the file = number of characters.
Word processing documents are larger because of the extra formatting information that is part of the file.

1KB (kilobyte) = 210 bytes = 1024 bytes (example above, a file of 20 lines of text, about 100 chars per line, would be about 2 KB)

1MB (megabyte) = 220 bytes = 1024KB (about 1,000 pages of text, each page 20 lines of 100 chars, would be about 2MB) RAM is usually measured in MB.

floppy disk can store 1.44MBm which is usually enough for several short text files.

"A picture is worth 1,000 words" - Actually, computer scientists would say that it is worth more! - 1,000 words, at an average of 5 chars per word = 5,000 chars = about 5KB. That's only enough for a very, very tiny picture. Most graphics on the web are over 30KB!

1GB (gigabyte) = 230 bytes = 1024MB

other storage devices that can store larger quantities of data: Zip disks, Jaz drives, CDs, DVDs.

1TB (terabyte) = 240 bytes = 1024GB


File Compression:

Used for large files. graphics (picture), music and video files are very large.

Sample techniques: jpeg, mp3, mpeg, LZW, MH, ...

Speed of Data Transmission:

Data are transmitted at speeds measured in bps (bits

per second).

The time it takes to download a file depends on the size of the file and the speed of the transmission.

When you connect to the Internet, slowest point is usually the connection from home.

Typical Speeds:

Modem - 56Kbps

ISDN - 64Kbps, 2 x 64 Kbps (less common -

DSL – (1.5 Mbps) (private communication channel)

Cable Modem - 1.5mbps (shared)

Within programs be stored in different ways. Characters values are stored as ASCII, (or Unicode). In order to allow arithmetic operations, numbers cannot be stored as ASCII.

Numbers are stored using binary. Explain how binary works. (Students should be able to convert a small number from binary to decimal, but would not be expected to be able to convert from decimal to binary.)

Binary Numbers

·  1-bit numbers:

Only two different numbers can be represented:

0

1

·  2-bit numbers:

Four different numbers can be represented:

00 - 0

01 - 1

10 - 2

11 - 3

·  3-bit numbers:

Eight different numbers can be represented:

000 - 0

001 - 1

010 - 2

011 - 3

100 - 4

101 - 5

110 - 6

111 - 7

Show an example that highlights the difference in interperating the binary value 01000001 as ASCII for A or binary for the number 65.

In order to allow arithmetic operations, numbers cannot be stored as ASCII. They are stored using excess notation or two’s complement for integers and floating-point notation for decimal-numbers.

Optional

Signed Numbers

The leftmost bit is used to indicate the sign of the number:

0 - positive

1- negative

Signed Integer – leftmost signbit is 0 if the number is positive or 0, and 1 if it’s negative.

(Largest 16 bit integer is 32,767 (2 to 15 power – 1). 32-bit is 2,147,483,647 (2 to 31 power – 1).)

Negative numbers are represented in two’s complement form.

Two’s complement notation: The most popular system for presenting integers within today’s computers is two’s complement notation. Each value is represented by a pattern of 32 bits. Such a large system allows a wide range of numbers to be represented.

-231 = -2,147,483,648

231 – 1 = 2,147,483,647

Example of smaller systems:

Using patterns of length three:

(starts 0000 counts up till 0 followed by 111s is reached, and counts backwards in binary until a 1 followed by 0s is reached. )

left most is sign bit.

Bit pattern Value represented

011 3

010 2

001 1

000 0

111 -1

110 -2

101 -3

100 -4

(patterns complement each other – change all 0s to 1s and 1s to 0s).

Using patterns of length four:

Bit pattern Value Represented

0111 7

0110 6

0101 5

0100 4

0011 3

0010 2

0001 1

0000 0

1111 -1

1110 -2

1101 -3

1100 -4

1011 -5

1010 -6

1001 -7

1000 -8

Excess notation: Also for integers:

Bit Pattern Value Represented

111 3

110 2

101 1

100 0

011 -1

010 -2

001 -3

000 -4

excess four notation – exceeds the excess notation interpretation by the value 4.

An excess eight conversion table

Bit Pattern Value Represented

1111 7

1110 6

1101 5

1100 4

1011 3

1010 2

1001 1

1000 0

0111 -1

0110 -2

0101 -3

0100 -4

0011 -5

0010 -6

0001 -7

0000 -8

Ex: 1100 normally represents 12, but in our system it represents 4. 0000 normally represents 0, in our system it represents -8.

Real numbers (e.g., 256.78) use floating point notation to represent the mantissa and the exponent of the number.

0 – nonnegative 1 - negative

(256.78 = 2.5678 x 102)

01101011

0 = positive

110 = exponent

1011 = mantissa

.1011

110 = 2 (in three bit excess method)

move 2 over to the right. (-2 would be to the left).

10.11

and nonnegative, so it’s 2 ¾.

Another example:

10111100

mantissa .1100

exponent 011= -1

.01100

1 is negative

so -3/8 (011 = 3/8)

Reverse:

1 1/8

1.001

mantissa

_ _ _ _ 1 0 0 1

.1001 we need to move one bit to the right.

101 (positive 1 in excess four notation)

0 because nonnegative

01011001

Round-off Error/Truncation Error

2 5/8 is 10.101. But it’s too big for the mantissa field in one-byte floating-point system. And the rightmost 1 (represents last 1/8) is lost. Instead of 10101 being stored, it’s 1010 in the mantissa field. (01101010) Which represents 2 ½ instead of 2 5/8.

You can reduce the problem by using a longer mantissa field. Most computers today will use at least 32 bits for storing values in floating-point notation instead of the 8 bits here. This also allows for a longer exponent field.

1

cc312_5