CIT-255 ADVANCED COMPUTER FORENSICS

TECHNICAL BACKGROUNDERS

• Data Carving

• FAT Details

• File Signatures

• Finding Subdirectories

• Known-Plaintext Attack

• Recovery of Data from Slack and Unallocated Space

• Recovery of Deleted Files

• The Use of Automated Tools in Computer Forensics Examinations

Data Carving

When a file is "deleted" from a disk, the first character of the entry in the directory is overwritten

and the FAT entries associated with that file are reset to zero; the remainder of the directory

entry and the actual contents of the file remain intact, however. For that reason, recovering the

first cluster of data is easy since the pointer to that cluster can be found in the directory entry.

Examination of the file signature in the first cluster can generally identify the file type.

If the file is larger than a single cluster, data carving techniques are used to attempt to recover the

rest of the cluster chain comprising the original file. If the remaining clusters are stored in

sequence, contiguously to the first one, then recovery is almost trivial. If the clusters are not

contiguous, then the examiner needs to reassemble the clusters that appear likely to fit the data

pattern of the sought after file. Text files are relatively easy because the examiner can merely

read the blocks of text and reassemble the original; graphics, compressed, and application files

are harder to reassemble because they are merely binary.

FAT Details

In a file system that employs a File Allocation Table (FAT), the FAT itself defines the status of

every data cluster on the medium. The FAT also defines the linked list of clusters that comprise

individual files.

The FAT has a single entry for each cluster on the medium; the entry is 12, 16, or 28 bits in

length in the FAT12, FAT16, and FAT32 file systems, respectively.

If a cluster unallocated, the associated FAT entry will contain an all-zeroes value (0x000,

0x0000, or 0x00000000 for FAT12, FAT16, or FAT32, respectively).

If the cluster is allocated to a file, the cluster will either be at the beginning or middle of a chain

of clusters comprising the file or be the sole or final cluster assigned to the file. In the former

case, the FAT value will be a pointer to the next cluster in the chain and in the latter case the

FAT value will indicate the end-of-clusterchain (EOC) with an all-ones value (i.e., 0xFFF,

0xFFFF, or 0x0FFFFFFF1 in FAT12, FAT16, or FAT32, respectively.) The EOC value is also

sometimes referred to as the end-of-file (<EOF>) marker.

For example purposes, consider the following files and the associated FAT table entries:

• File A starts in cluster 105; the cluster chain is 105, 112, and 113.

• File B starts in cluster 102; the cluster chain is 102, 103, and 104.

• File C occupies only cluster 108.

• The remaining clusters (100, 101, 106, 107, 109-111, and 114) are unallocated.

The portion of the FAT that would contain these entries might look like the following (assume

FAT12 in this example):

100: 0x000 105: 112 110: 0x000

101: 0x000 106: 0x000 111: 0x000

102: 103 107: 0x000 112: 113

103: 104 108: 0xFFF 113: 0xFFF

104: 0xFFF 109: 0x000 114: 0x000

Finally, it is important to remember that the FAT entries for a file are reset to zero when the file

is deleted.

File Signatures

Most users of DOS and Windows computers are familiar with file extensions such as .EXE and

.GIF; these file extensions refer to executable and GIF image files, respectively.

But do they?

A .GIF file extension is not the definitive indicator that this file has GIF content; it merely

associates the file with a graphics viewer application so that when the user double-clicks on the

file, it will be displayed properly.

In fact, most files have a file signature, which is some set of bytes at the beginning of the file that

indicates the file's content. As an example, GIF files start with the characters GIF87a or

GIF89a in the first six bytes of the file.

When recovering data from digital media, particularly when in unallocated space, the file

signature is a significant aid in helping the examiner find files of a particular type and view files

properly.

______

1 The FAT32 EOC value is all-ones in the lower 28 bits; the high-order four bits are always zero.

Finding Subdirectories

In the FAT file system, a subdirectory appears as a file entry just like any other file, with two

notable exceptions; namely, the DIRECTORY attribute bit is set and the file size is 0. The

subdirectory entry, however, does include a pointer to the subdirectory's starting cluster and the

FAT table is linked appropriately to the cluster chain comprising the subdirectory itself.

Subdirectories, then, are also deleted from a directory just like any other file. In particular, the

only changes when a subdirectory is deleted are that the first character is overwritten with a 0xE5

and the FAT chain is reset to 0.

All directories have the following generic format:

Volume in drive C has no label.

Volume Serial Number is F82E-F5C0

Directory of C:\perl

06/14/2006 20:50 <DIR> .

06/14/2006 20:50 <DIR> ..

06/14/2006 23:46 <DIR> bin

06/14/2006 20:50 <DIR> eg

06/14/2006 20:50 <DIR> html

06/14/2006 20:50 <DIR> lib

06/14/2006 20:50 37,792 pod2htmd.tmp

06/14/2006 20:50 17,147 pod2htmi.tmp

06/14/2006 20:49 <DIR> site

2 File(s) 54,939 bytes

7 Dir(s) 43,824,070,656 bytes free

There are two things of particular relevance in this listing. First, note the first two entries,

namely . (dot) and .. (double-dot). Both of these are directory entries that refer this directory and

the parent directory, respectively. Both dot and double-dot are created when the directory is

created and are always the first two entries in the directory.

The second thing to notice here is that this listing shows seven subdirectories; dot, double-dot,

and five more. They are denoted here, as expected, with the <DIR> tag because the

DIRECTORY attribute bit is set.

Finding a deleted subdirectory is relatively straight-forward. The first two entries will be the dot

and double-dot entries and they will appear at the beginning of a cluster; thus, finding a 0x2E

(ASCII .) at offset 0x00 (0) followed by 0x2E-2E (ASCII ..) at offsets 0x20-21 (32-33) in any

cluster might be a subdirectory. Once the subdirectory is recovered, it will point to any files

stored in that directory complete with file names and pointers.

Data recovery from a drive does not actually require recovery of the subdirectories because the

data is still present on the drive. Recovering the subdirectories is helpful, however, because it

aids in determining the organization of the data on the disk and just adds to the pattern of

activities associated with the investigation.

As a final note, the format of a subdirectory entry is identical to that of an entry in the Root

Directory.

Known-Plaintext Attack

When data is encrypted, there are several ways in which an analyst can recover the key with

which to decrypt the data. The simplest approach is to ask the person who encrypted the data to

tell you the key.

In the absence of being told the key, the examiner needs to use some form of automated attack

on the encrypted file. Since different applications use different forms of encryption -- and

because the type of encryption used with any given application is relatively well-known -- some

attack methods look for weaknesses in the encryption algorithm. Another approach is to try to

guess the key, either based on words in the dictionary (called a dictionary attack) or by using

every possible character combination (called a brute-force attack). Password-guessing can take a

long time, particularly if the encryption key is long.

A simpler, more elegant attack is called a known-plaintext attack. This attack method can work if

the examiner can find a plaintext file that has the exact same content as an encrypted file.

Assuming that the crypto attack software can determine the encryption algorithm, the software

can examine the plaintext and encrypted files, and determine the key.

This type of attack can be particularly successful on password-protected ZIP files; if a plaintext

version of a single file from the protected ZIP archive can be found, programs such as

AccessData's Password Recovery Toolkit or Passware can find the key for all of the files. The

method is relatively straight-forward -- the examiner finds one plaintext version of a file in the

ZIP archive and then compresses that file using a compatible version of ZIP. The software will

then compare the plaintext archive against the password-protected archive to mathematically

derive the protection key. Once the key is found, all of the files in the archive can be recovered.

Recovery of Data from Slack and Unallocated Space

Data is written to digital media in a block of bytes known as a cluster. Clusters are fixed in size

on a particular disk or other medium, and all files occupy an integer number of clusters.

It is unlikely for a file to end exactly on a cluster boundary; indeed, most files have empty space

in the last cluster of the cluster chain. This "empty" space is known as the slack space or file

slack.

As an example, suppose a disk has a cluster size of 2,048 B. Suppose further that a file to be

written to the disk has a size of 4,052 B. This file will be stored in two clusters on the disk; the

two clusters (4,096 B) will contain 4,052 B of data and 44 B of slack space.

Unallocated space is the collection of clusters that are not currently assigned to any file.

While the operating system and most applications cannot write directly to slack or unallocated

space, hex editors such as WinHex and DISKEDIT can read and write down to this level.

Searching slack and unallocated space is exactly that -- searching. Information can, in fact, pop

out at the astute examiner because the data pattern will be noticeable, plaintext will be visible, or

known file signatures will appear. Once an area of interest is found, the examiner can copy a

block of data and copy it into a new file for further examination and analysis.

Recovery of Deleted Files

To understand how deleted files can be recovered, it is necessary to understand how files are

written to a disk in the first place.

In the File Allocation table (FAT) file system, the following occurs when the operating system

creates a file:

1. An empty entry is found in the appropriate directory, and the file's name, attributes,

date(s), etc. are entered into the directory.

2. An empty cluster is found on the medium with which to receive the file's data. This

starting cluster address is placed into the directory entry.

3. The cluster's entry in the FAT is marked as in-use, and data is written to the cluster.

4. If more than one cluster is needed, the first cluster's FAT entry points to the second

cluster, etc., forming a cluster chain, as data is written to the additional clusters.

Deleting a file undoes only some of the steps above:

1. The first character of the file name is changed to 0xE5, an indicator that this directory

entry can be reused.

2. The directory entry is consulted to find the starting cluster of the file.

3. The FAT table's entries corresponding to the file's cluster chain are marked empty.

Note, in particular, that the data is never removed from the medium; the clusters are marked as

empty but never actually emptied.

One analogy to this process might be the following magic data book. In the data book, consider a

chapter to be the equivalent of a data file. The rule is -- to access a chapter, the reader needs to

turn to the table of contents and then can go directly to the first page of any chapter. Each page

of the chapter points to the next page until the end of the chapter. When a chapter is to be

"erased," the author merely erases the entry in the table of contents; the words are still written on

the pages, however, until those pages get reallocated to another chapter.

To find deleted files on a disk, then, the examiner only needs to find directory entries with a

0xE5 in the first byte, indicating those that have been marked as deleted. The pointer to the firstcluster is still valid and, with luck, all of the clusters have been allocated in sequence. Once the examiner finds the beginning and end of the data, it can be copied into a new file and viewed.

The Use of Automated Tools in Computer Forensics Examinations

To ensure reliable discovery and reporting, automated computer forensics may be employed to

augment the examiner. A discussion about the use of automated tools in compute forensics

exams can be found in the article 'The "Tools Proven in Court" Question' by Steve Hailey,

available from . The tool, the particular exam,

and the examiner should be able to pass all of the tests brought up in this article:

1. Was the evidence gathered and verified in a sound manner?

2. Was a chain of custody maintained?

3. Is the ownership and licensing appropriate for the tools used?

4. Was the proper examination environment being maintained?

5. Can the results of the technical analysis be duplicated using other tools?

6. Does the Analyst understand what the tools they use are actually doing, or are they

merely taking for granted what an automated process is reporting?

7. Do other professionals use the same techniques and methodology?

8. Is the Analyst technically capable of defending/supporting their interpretation of the

evidence?

The Tests:

• Has or can the expert's technique or theory been/be tested?

• Has the technique or theory has been subject to peer review and publication?

• Is the potential rate of error of the technique or theory known, and accepted?

• Are standards controlling the technique's operation in existence and are they maintained?

• Has the theory or method been generally accepted by the scientific community?