CIT-255 ADVANCED COMPUTER FORENSICS
TECHNICAL BACKGROUNDERS
• Data Carving
• FAT Details
• File Signatures
• Finding Subdirectories
• Known-Plaintext Attack
• Recovery of Data from Slack and Unallocated Space
• Recovery of Deleted Files
• The Use of Automated Tools in Computer Forensics Examinations
Data Carving
When a file is "deleted" from a disk, the first character of the entry in the directory is overwritten
and the FAT entries associated with that file are reset to zero; the remainder of the directory
entry and the actual contents of the file remain intact, however. For that reason, recovering the
first cluster of data is easy since the pointer to that cluster can be found in the directory entry.
Examination of the file signature in the first cluster can generally identify the file type.
If the file is larger than a single cluster, data carving techniques are used to attempt to recover the
rest of the cluster chain comprising the original file. If the remaining clusters are stored in
sequence, contiguously to the first one, then recovery is almost trivial. If the clusters are not
contiguous, then the examiner needs to reassemble the clusters that appear likely to fit the data
pattern of the sought after file. Text files are relatively easy because the examiner can merely
read the blocks of text and reassemble the original; graphics, compressed, and application files
are harder to reassemble because they are merely binary.
FAT Details
In a file system that employs a File Allocation Table (FAT), the FAT itself defines the status of
every data cluster on the medium. The FAT also defines the linked list of clusters that comprise
individual files.
The FAT has a single entry for each cluster on the medium; the entry is 12, 16, or 28 bits in
length in the FAT12, FAT16, and FAT32 file systems, respectively.
If a cluster unallocated, the associated FAT entry will contain an all-zeroes value (0x000,
0x0000, or 0x00000000 for FAT12, FAT16, or FAT32, respectively).
If the cluster is allocated to a file, the cluster will either be at the beginning or middle of a chain
of clusters comprising the file or be the sole or final cluster assigned to the file. In the former
case, the FAT value will be a pointer to the next cluster in the chain and in the latter case the
FAT value will indicate the end-of-clusterchain (EOC) with an all-ones value (i.e., 0xFFF,
0xFFFF, or 0x0FFFFFFF1 in FAT12, FAT16, or FAT32, respectively.) The EOC value is also
sometimes referred to as the end-of-file (<EOF>) marker.
For example purposes, consider the following files and the associated FAT table entries:
• File A starts in cluster 105; the cluster chain is 105, 112, and 113.
• File B starts in cluster 102; the cluster chain is 102, 103, and 104.
• File C occupies only cluster 108.
• The remaining clusters (100, 101, 106, 107, 109-111, and 114) are unallocated.
The portion of the FAT that would contain these entries might look like the following (assume
FAT12 in this example):
100: 0x000 105: 112 110: 0x000
101: 0x000 106: 0x000 111: 0x000
102: 103 107: 0x000 112: 113
103: 104 108: 0xFFF 113: 0xFFF
104: 0xFFF 109: 0x000 114: 0x000
Finally, it is important to remember that the FAT entries for a file are reset to zero when the file
is deleted.
File Signatures
Most users of DOS and Windows computers are familiar with file extensions such as .EXE and
.GIF; these file extensions refer to executable and GIF image files, respectively.
But do they?
A .GIF file extension is not the definitive indicator that this file has GIF content; it merely
associates the file with a graphics viewer application so that when the user double-clicks on the
file, it will be displayed properly.
In fact, most files have a file signature, which is some set of bytes at the beginning of the file that
indicates the file's content. As an example, GIF files start with the characters GIF87a or
GIF89a in the first six bytes of the file.
When recovering data from digital media, particularly when in unallocated space, the file
signature is a significant aid in helping the examiner find files of a particular type and view files
properly.
______
1 The FAT32 EOC value is all-ones in the lower 28 bits; the high-order four bits are always zero.
Finding Subdirectories
In the FAT file system, a subdirectory appears as a file entry just like any other file, with two
notable exceptions; namely, the DIRECTORY attribute bit is set and the file size is 0. The
subdirectory entry, however, does include a pointer to the subdirectory's starting cluster and the
FAT table is linked appropriately to the cluster chain comprising the subdirectory itself.
Subdirectories, then, are also deleted from a directory just like any other file. In particular, the
only changes when a subdirectory is deleted are that the first character is overwritten with a 0xE5
and the FAT chain is reset to 0.
All directories have the following generic format:
Volume in drive C has no label.
Volume Serial Number is F82E-F5C0
Directory of C:\perl
06/14/2006 20:50 <DIR> .
06/14/2006 20:50 <DIR> ..
06/14/2006 23:46 <DIR> bin
06/14/2006 20:50 <DIR> eg
06/14/2006 20:50 <DIR> html
06/14/2006 20:50 <DIR> lib
06/14/2006 20:50 37,792 pod2htmd.tmp
06/14/2006 20:50 17,147 pod2htmi.tmp
06/14/2006 20:49 <DIR> site
2 File(s) 54,939 bytes
7 Dir(s) 43,824,070,656 bytes free
There are two things of particular relevance in this listing. First, note the first two entries,
namely . (dot) and .. (double-dot). Both of these are directory entries that refer this directory and
the parent directory, respectively. Both dot and double-dot are created when the directory is
created and are always the first two entries in the directory.
The second thing to notice here is that this listing shows seven subdirectories; dot, double-dot,
and five more. They are denoted here, as expected, with the <DIR> tag because the
DIRECTORY attribute bit is set.
Finding a deleted subdirectory is relatively straight-forward. The first two entries will be the dot
and double-dot entries and they will appear at the beginning of a cluster; thus, finding a 0x2E
(ASCII .) at offset 0x00 (0) followed by 0x2E-2E (ASCII ..) at offsets 0x20-21 (32-33) in any
cluster might be a subdirectory. Once the subdirectory is recovered, it will point to any files
stored in that directory complete with file names and pointers.
Data recovery from a drive does not actually require recovery of the subdirectories because the
data is still present on the drive. Recovering the subdirectories is helpful, however, because it
aids in determining the organization of the data on the disk and just adds to the pattern of
activities associated with the investigation.
As a final note, the format of a subdirectory entry is identical to that of an entry in the Root
Directory.
Known-Plaintext Attack
When data is encrypted, there are several ways in which an analyst can recover the key with
which to decrypt the data. The simplest approach is to ask the person who encrypted the data to
tell you the key.
In the absence of being told the key, the examiner needs to use some form of automated attack
on the encrypted file. Since different applications use different forms of encryption -- and
because the type of encryption used with any given application is relatively well-known -- some
attack methods look for weaknesses in the encryption algorithm. Another approach is to try to
guess the key, either based on words in the dictionary (called a dictionary attack) or by using
every possible character combination (called a brute-force attack). Password-guessing can take a
long time, particularly if the encryption key is long.
A simpler, more elegant attack is called a known-plaintext attack. This attack method can work if
the examiner can find a plaintext file that has the exact same content as an encrypted file.
Assuming that the crypto attack software can determine the encryption algorithm, the software
can examine the plaintext and encrypted files, and determine the key.
This type of attack can be particularly successful on password-protected ZIP files; if a plaintext
version of a single file from the protected ZIP archive can be found, programs such as
AccessData's Password Recovery Toolkit or Passware can find the key for all of the files. The
method is relatively straight-forward -- the examiner finds one plaintext version of a file in the
ZIP archive and then compresses that file using a compatible version of ZIP. The software will
then compare the plaintext archive against the password-protected archive to mathematically
derive the protection key. Once the key is found, all of the files in the archive can be recovered.
Recovery of Data from Slack and Unallocated Space
Data is written to digital media in a block of bytes known as a cluster. Clusters are fixed in size
on a particular disk or other medium, and all files occupy an integer number of clusters.
It is unlikely for a file to end exactly on a cluster boundary; indeed, most files have empty space
in the last cluster of the cluster chain. This "empty" space is known as the slack space or file
slack.
As an example, suppose a disk has a cluster size of 2,048 B. Suppose further that a file to be
written to the disk has a size of 4,052 B. This file will be stored in two clusters on the disk; the
two clusters (4,096 B) will contain 4,052 B of data and 44 B of slack space.
Unallocated space is the collection of clusters that are not currently assigned to any file.
While the operating system and most applications cannot write directly to slack or unallocated
space, hex editors such as WinHex and DISKEDIT can read and write down to this level.
Searching slack and unallocated space is exactly that -- searching. Information can, in fact, pop
out at the astute examiner because the data pattern will be noticeable, plaintext will be visible, or
known file signatures will appear. Once an area of interest is found, the examiner can copy a
block of data and copy it into a new file for further examination and analysis.
Recovery of Deleted Files
To understand how deleted files can be recovered, it is necessary to understand how files are
written to a disk in the first place.
In the File Allocation table (FAT) file system, the following occurs when the operating system
creates a file:
1. An empty entry is found in the appropriate directory, and the file's name, attributes,
date(s), etc. are entered into the directory.
2. An empty cluster is found on the medium with which to receive the file's data. This
starting cluster address is placed into the directory entry.
3. The cluster's entry in the FAT is marked as in-use, and data is written to the cluster.
4. If more than one cluster is needed, the first cluster's FAT entry points to the second
cluster, etc., forming a cluster chain, as data is written to the additional clusters.
Deleting a file undoes only some of the steps above:
1. The first character of the file name is changed to 0xE5, an indicator that this directory
entry can be reused.
2. The directory entry is consulted to find the starting cluster of the file.
3. The FAT table's entries corresponding to the file's cluster chain are marked empty.
Note, in particular, that the data is never removed from the medium; the clusters are marked as
empty but never actually emptied.
One analogy to this process might be the following magic data book. In the data book, consider a
chapter to be the equivalent of a data file. The rule is -- to access a chapter, the reader needs to
turn to the table of contents and then can go directly to the first page of any chapter. Each page
of the chapter points to the next page until the end of the chapter. When a chapter is to be
"erased," the author merely erases the entry in the table of contents; the words are still written on
the pages, however, until those pages get reallocated to another chapter.
To find deleted files on a disk, then, the examiner only needs to find directory entries with a
0xE5 in the first byte, indicating those that have been marked as deleted. The pointer to the firstcluster is still valid and, with luck, all of the clusters have been allocated in sequence. Once the examiner finds the beginning and end of the data, it can be copied into a new file and viewed.
The Use of Automated Tools in Computer Forensics Examinations
To ensure reliable discovery and reporting, automated computer forensics may be employed to
augment the examiner. A discussion about the use of automated tools in compute forensics
exams can be found in the article 'The "Tools Proven in Court" Question' by Steve Hailey,
available from . The tool, the particular exam,
and the examiner should be able to pass all of the tests brought up in this article:
1. Was the evidence gathered and verified in a sound manner?
2. Was a chain of custody maintained?
3. Is the ownership and licensing appropriate for the tools used?
4. Was the proper examination environment being maintained?
5. Can the results of the technical analysis be duplicated using other tools?
6. Does the Analyst understand what the tools they use are actually doing, or are they
merely taking for granted what an automated process is reporting?
7. Do other professionals use the same techniques and methodology?
8. Is the Analyst technically capable of defending/supporting their interpretation of the
evidence?
The Tests:
• Has or can the expert's technique or theory been/be tested?
• Has the technique or theory has been subject to peer review and publication?
• Is the potential rate of error of the technique or theory known, and accepted?
• Are standards controlling the technique's operation in existence and are they maintained?
• Has the theory or method been generally accepted by the scientific community?