1 Followers
26 Following
templepie47

templepie47

SPOILER ALERT!

NTFS file program information recovery plan style. Txt

Introduction to this page: System and data recovery technology has begun to be opened in the details security majors of some colleges and universities, I believe this may make much more students no longer worry information recovery
System and information recovery technologies has begun to be opened within the info security majors of some colleges and universities. I think this will make more students no longer fear information recovery, since every time the author talks with classmates regarding the design and implementation of data recovery technologies, the majority of people I consider it is really deep. The truth is, this really is mostly mainly because I do not genuinely fully grasp the internal structure from the file system, but I just learned a bit concerning the organization and function of your file technique through the operating system course. Therefore, there is no solution to discuss data recovery. The plan is created.
In reality, in the very first problem of 2009, the author deeply analyzed the data recovery technology primarily based on NTFS, however the author first published an write-up around the black defense, pondering that readers really should all have specific information recovery technologies, so there is certainly no document The internal structure on the technique started to become analyzed, but directly communicated with all the readers on the crucial technologies employed by the author when implementing the information recovery plan design and style. Lots of readers may not realize clearly. The author begins with all the analysis on the internal structure principle of NTFS, and after that steadily transitions towards the realization of data recovery system design. I hope that the introduction of this short article plus the 'independent development of specialist data recovery software' in 2009, will allow readers to definitely recognize Data recovery technologies beneath NTFS, and created its own experienced recovery computer software.
This short article is going to be explained in 3 subsections. First, it will deeply analyze the internal structure principles in the NTFS file system, like the principle handle file table MFT, every single essential attribute beneath NTFS, its index records and directory structure, and so forth.; second, the file deletion The principle of post-recoverability would be to analyze the adjustments of essential attributes of MFT prior to and after file deletion via an instance comparison. Finally, the data recovery system design and style of deleted files under NTFS is analyzed, and an algorithm for directory tree reconstruction of deleted files is proposed. Deleted file data recovery algorithm and clarified the storage principle of information operation inside the 80 attribute of your master file table (MFT). (Note: The key code for data recovery was given in the very first situation of 2009. The significance of this short article will be to elaborate on the core principles of recovery technology. That is required, otherwise newbies cannot recognize the code of situation 01. I hope to know the recovery in depth. Readers of core technologies read cautiously).
Principle of internal structure of NTFS file technique
A crucial function from the NTFS file technique is the fact that all data, which includes program data, such as boot applications, bitmaps that record the allocation status with the whole volume, etc., all exist inside the type of files. MFT is the core of your NTFS volume structure. The method uses MFT to decide the location in the file around the disk and all of the attributes in the file. MFT is often a file attribute database corresponding towards the file, which records all attributes except the file data information and facts (like File MAC time, file name, file parent directory MFT reference number, and so forth.), even when the file content is quite brief, its content material is straight stored in the MFT data attributes, no longer occupy added cluster space. This can be unique in the FAT system. In the FAT structure, even if the file is small, it have to occupy a single cluster of space, which can be really a waste of disk space. Readers can use WinHex software test (WinHex software is an indispensable tool for data recovery plan style and improvement).
1. MFT structural evaluation
Every file includes a one-to-one correspondence using the MFT table, and file deletion, modification and also other operations are reflected within the MFT table, so we ought to first analyze the MFT structure. Similarly, WinHex application is applied for evaluation. MFT consists of two components, namely the MFT header (also called the file record header) and the attribute list. The information which means of the length and offset in the MFT header is fixed, even though the attribute list is variable. The unique attribute information has distinct meanings, which will be analyzed later. The following 1st analyzes the principle offset position in the MFT header structure, that may be, the offset position that the author believes must be considered in the style of your data recovery program.
(1) The value on the initially four bytes of your file record header is generally 0x454C4946H, identifying it as an MFT record entry.
(2) At offset 14H, the total length of your MFT header is recorded, that is the starting from the 1st attribute stream in the MFT. Consequently, in the plan design, you could make use of the conditional statement IF (strcmp (MFTFlag, 'FILE')! = 0 | | * (LPWORD) (lpBuffer + 0x14) = = 0) to establish whether the currently read file is an MFT record Table entries, if not, ignore this file directly (where MFTFlag represents the very first four-byte worth on the MFT header; lpBuffer represents the starting offset of the MFT).
(3) The worth at offset 16H would be the mark byte. The distinct which means is: 00H indicates the deleted file, 01H indicates the standard file, 02H indicates the deleted directory, and 03H indicates the typical directory, so offset 16H The worth of might be applied to identify whether or not this MFT entry can be a non-directory file and has been deleted.
(four) The 4 bytes at offset 18H record the total length on the MFT entry (ie, the recording header and attributes). By acquiring this worth, the finish offset position on the MFT entry is often determined, of which the end offset on the MFT Is marked as FFFFFFFFH, but in some MFTs of NTFS volumes in some cases there's greater than one FFFFFFFFH.
The attribute list is also subdivided into two logical elements, namely the attribute header and attribute information, exactly where the attribute header identifies the kind of attribute, the relative offset with the attribute information and its length. Since the MFT size is only 1KB, some attributes may not be entirely stored inside a single MFT entry. NTFS makes use of a cluster run list structure (runslist), and also the attribute values ​​that can not be absolutely stored are stored separately by several clusters. These clusters can be physically discontinuous, so NTFS introduces LCN and VCN to find the cluster number. This type of attribute is called a non-resident attribute inside the NTFS volume, otherwise it's a resident attribute, of which 10 attributes, 30 attributes, 90 The attribute is generally a resident attribute, as well as the 80 attribute employed to reflect the information storage location is often a non-resident attribute. The following is an in-depth analysis on the attribute structure associated to data recovery.
two. Critical attribute structure below NTFS
1) 10 attribute evaluation
ten varieties of attributes, namely $STANDARD_INFORMATION attributes, would be the common details from the file. Like some standard file attributes, for instance read-only, program, archive, hidden, and MAC attributes. Among them, the two-byte offset of 14H-15H relative for the attribute header will be the starting offset of 10 attribute values, and also the four-byte offset 04H~08H is definitely the total length of ten attributes.
Get the MAC time associated with all the file. The offset from the MAC date and time relative to the attribute worth is at 00H, 08H, and 18H, and each and every occupies 8 bytes. The MAC within the MFT entry is expressed in 64 bits or 8 bytes. The storage format is CoordinatedUniversalTime (UTC). Its accuracy is 100ns, and it begins from 00:00:00 on January 1, 1601. Within the plan design and style, the 64-bit MAC time is usually stored by using the structure FILETIME, and then the upper-layer API function FileTimeToSystemTime provided by Microsoft may be called to convert the UTC time for you to nearby time to reproduce the MAC date and time in the file.
two) 30 attribute evaluation
The 30 kind attribute would be the $FILE_NAME attribute, which is applied to shop the file name. It is actually always a resident attribute. restore seagate offset from the attribute header represents the same which means because the ten attribute, and can not be repeated. The attribute value consists of the MFT reference quantity and file name of your parent directory, and the MFT reference number from the parent directory is obtained. The MFT reference number will be the sequence quantity in the MFT relative $MFT table, which is, given an MFT reference quantity N, the physical position of your MFT entry within the file within the MFT region might be obtained, and the number of physical sectors on the MFT entry corresponding to N = N*2+BPB_MFTStartClus*8. Amongst them, the worth of BPB_MFTStartClus represents the MFT commence cluster from the NTFS designated partition, which might be obtained in the BPB structure. The above formula shows that the MFT reference quantity with the parent directory may be mapped towards the location of the MFT entry of its parent directory, in order that the relevant attribute details in the MFT entry with the parent directory could be obtained, so the whole design and style may be searched up in sequence within the system style. Directory tree.
The 30 attribute value starts at 00H offset, a total of eight bytes, which can be the MFT reference quantity of your parent directory. In plan design and style, actually, only really need to read the very first four bytes of content, because the maximum worth of 4 bytes is FFFFFFFFH, after conversion, the logical partition is about 4TB in size, that is impossible to exist at present, so it doesn't will need Consider the last 4 bytes.
Get the file or directory name. File names in NTFS volumes are encoded in Unicode, and each character occupies two bytes. In the 30 attribute, the first character on the file name starts in the offset of your attribute worth of 42H, along with the total quantity of characters is recorded at the offset of 40H, so with all the help of those two offset values, the file or directory name is often accurately read. Nonetheless, you'll find two 30 attributes in some MFT entries. The former is applied to be compatible with 8.three format files. Assuming that the file name is recovery.txt, in the first 30 attributes, the file name is RECOVE~1.txt. 12 bytes; the latter is applied to record the lengthy file name of a file or directory, as shown in Figure 1.
Figure 1 The attribute structure with the MFT entry 30 on the Encase6_en.ppt file
It may be seen in the figure that when implementing data programming coding, it can be essential to decide regardless of whether you will discover two 30 attributes within the MFT entry. In that case, skip the initial 30 attributes directly and analyze the following 30 attributes, in order that the file or The file name on the directory.
3) 80 attribute analysis
The 80 attribute could be the $DATA attribute, which can be made use of to store the true information in the file or the physical offset from the file information. The 80 attribute is usually a resident attribute or a non-resident attribute, depending around the file information size. When the file information is smaller and can be stored straight inside the MFT, it truly is a resident attribute; otherwise, it can be a non-resident attribute. Permanent 80 attribute, the worth from the attribute header offset 10H ~ 13H will be the quantity of bytes occupied by the file data, the worth of the offset 14H ~ 17H will be the beginning position of the file information storage, so study these two offsets The value of can straight obtain the correct content with the file. It can be vital to note that the author has performed lots of tests on the information recovery software program that he has developed and discovered that a special circumstance must be viewed as when coding design and style. When the corresponding value at a specific position x with the file information is 0x00H, the MFT header must be The two offset values ​​of 32H and 33H are utilised to fill the values ​​of x-1 and x position, respectively.
Incredibly resident 80 attributes. When the file is huge, the cluster run list (DataRunslist) of the file data will probably be recorded within the attribute value position. This list directly reflects no matter if the file data is discretely stored, the number of clusters occupied, as well as the physical offset position in the cluster. The value at offset 20H in the non-resident attribute header records the offset address of your data run (DataRuns); at offset 04H, the total length in the attribute is recorded. For that reason, subtract the worth at offset 20H from the worth at offset 04H to obtain the number of bytes occupied by the information run list. The general structure diagram from the data run list is given below, as shown in Figure two.
Figure two The fundamental structure on the information run list
The first byte in the data run list is 31. The lower four bits represent the amount of clusters occupied by the run. The greater 4 bits represent the beginning cluster number of the run. The amount of bytes. If you can find numerous data runs, the beginning cluster quantity from the subsequent run is calculated relative to the very first cluster quantity of the prior run. But what must be clarified is how the starting cluster of each cluster is relative for the starting cluster quantity in the previous cluster. This isn't a simple relative offset connection. In reality, the core storage principle of information operation needs to be divided into a number of conditions to think about.
Regular, Fragmented (normal fragmented files)
Typical, Scrambled (typical messy files)
Sparse, Unfragmented (sparse standard file)
Compressed, Unfragmented (Compressed standard file)
The author here only gives a special case for analysis. The following information runs are regular fragmented files. The evaluation results of each run within the information run list are shown in Table 1.
Dataruns: 113060211000011120E000
Regrouped: 113060-21100001-1120E000
NumGroupHeaderDataLengthsizeOffsetsizeLengthOffset1130601byte1byte0x30 (1byte) 0x60 (1byte) 211000011byte2bytes 0x100x160 (0x100 relative 0x60) 1120E01byte1byte0x200x140 (-0x20 relative 0x160) 00End
Table 1 above information run list evaluation
In line three, why the starting cluster offset from the information is 0x140. In the event the offset is directly relative to the prior cluster operation, its worth should really be 0x100+0xE0=0x1E0. In actual fact, the relative offset is established after a modulo operation, which is specifically parsed as follows: when the beginning cluster number with the file throughout information operation is N, then if N occupies 1 byte and N>0x80H, a damaging worth is taken , N=(Nmod0x80) 0x80; if N occupies 2 bytes and N>0x8000, then N=(Nmod0x8000) 0x8000; if N occupies 3 bytes and N>0x800000, then N=(Nmod0x800000) 0x800000; and so on , You may accurately analyze the beginning cluster offset of every cluster operation.
four) 90 attribute evaluation
The 90 kind attribute, namely $INDEX_ROOT, would be the index root attribute. This attribute may be the root node of your B+ tree index that implements NTFS, including the normal attribute header, index root, and several index products. Definitely, only directory files have 90 attributes. Quickly following the index root could be the index header information and facts. The offset 00H will be the offset with the very first index entry (Note: this offset begins relative to this position). Inside the plan design and style, after acquiring the offset worth on the initially index item, you could straight turn for the index item. In each and every index item, the MFT reference quantity of this file and the MFT reference number of its parent directory are recorded at offsets 00H and 10H; the size of each and every index item is recorded at offset 08H, so the index item may be utilized Turn the size towards the subsequent index item, after which combine the total length from the 90 attribute described at the offset of the attribute header at 04H to acquire the MFT reference quantity of all files and subdirectories inside the directory.
Even so, when the directory contains a large number of files and subdirectories, which can't shop each of the MFT entries, there will likely be two further attributes, namely the index assignment attribute and the index bitmap attribute. The index allocation attribute is employed to describe the youngster nodes in the B+ tree directory; the index bitmap attribute describes the virtual cluster quantity made use of by the index allocation attribute from the index block, which will introduce the index allocation attribute, which is, the $INDEX_ALLOCATION attribute, also referred to as A0 Attribute, the structure of this attribute is specifically exactly the same because the non-resident attribute structure of 80 attributes. Given below is usually a basic root directory MFT table structure, as shown in Figure three.
Figure three The structure of the root directory MFT entries below the NTFS partition
three. Tree structure
When generating a directory file, NTFS should index the file names inside the directory. The MFT entry of your directory sorts the file names and subdirectory names in its directory and saves them inside the 90 attribute, but for huge directories, the A0 attribute is introduced. The A0 attribute really retailers the cluster run list (Runslist ), its structure is specifically the identical because the information operating structure inside the 80 attribute, and describes the starting offset and also the number of clusters of your running cluster. The values ​​of these two places is usually used to locate the physical place of your files and directories stored in the huge directory, namely the index buffer (INDX structure).
The size of the index buffer is fixed at 4KB, plus the B+ tree structure is applied to drastically raise the number of disk accesses necessary to locate files or directory entries. The B+ tree index is composed of two components, the upper element may be the index, the reduce portion will be the sequence set, the data content material records are all in the leaf nodes, plus the index only serves as a road sign. The key offsets inside the INDX structure are analyzed below.
Every single index record in the index structure is composed of a standard index header and a set of blocks containing index key phrases and index data. The size in the index record is normally 4KB, and its size is defined inside the BPB structure member BPB_ClusPerIndexBloc. The very first four bytes from the regular index header structure are constantly INDX. Offsets 18H to 1BH record the offset on the initially index item in the INDX structure, and offsets 1CH to 1FH record the total size in the index structure. As a result, in the programming, you'll be able to directly transfer towards the index item to get all index sub-items inside the INDX structure.
In every index sub-item, the MFT reference quantity in the file and its parent, the MAC date and time on the file, the file name and also other related information are recorded. The particular offset positions from the first three things within the index item are 00H~07H and 10H, respectively ~17H, 18H~30H. You are able to discover your MFT entries in the MFT area by means of the MFT reference quantity from the file. You are able to study the important info from the file according to the 10 attribute, 30 attribute, and 80 attribute. Should you are a directory file, you must read the 90 attribute and even the A0 attribute to find The data information and facts on the file is stored inside the physical place within the partition or the physical storage place on the file and its subdirectories in the partition inside the partition.
In reality, the directory tree structure of NTFS can also be an inverted directory tree structure. The storage place with the directory area continues to be obtained via the BPB structure, so as to study the files and subdirectories within the directory, and after that turn to MFT together with the aid of their respective MFT reference numbers. Table entry to acquire the file data or the physical storage place of the files and subdirectories within the directory. Compared with all the FAT file system, it's located that the FAT file program is mapped for the FAT table by way of the very first cluster quantity within the directory, plus the cluster number on the file data or the cluster numbers on the files and subdirectories in the directory are obtained by means of the FAT table; while the NTFS technique is Use the MFT table to locate the physical storage place of a file or directory. It's just that the FAT table entries have to be mapped several instances, as well as the MFT table can complete all the mappings at when, and their directory tree construction comparison analysis charts are shown in Figures 4 and five.
Figure 4 Basic algorithm for creating directory tree in FAT structure Figure 5 Basic algorithm for creating directory tree in NTFS volume
Comparative evaluation of MFT table before and after file deletion below NTFS
When deleting a file in an NTFS volume, the system produced changes in a minimum of 3 places. Very first, the MFT header of your file is offset by the byte value at 16H; second, the 90 attribute or A0 attribute value inside the MFT entry on the parent directory; third, the deleted file is occupied by the bitmap metadata file $Bitmap The position corresponding to the number of clusters is cleared, in order that when the new file will not have sufficient new space, the disk space occupied by the deleted file is often straight overwritten.
The author will conduct an experiment in a new NTFS partition. Just before and after the file is deleted, the corresponding attribute values ​​of the corresponding MFT entries and the modifications from the connected attribute values ​​of the parent directory MFT entries are compared and analyzed. bedding. So that you can more clearly understand the MFT modifications soon after file deletion, the following is considered in two instances: one particular is to delete the files and subdirectories beneath the directory and the directory; the second is to delete only the files under the directory.
Delete all files and subdirectories beneath the directory and directory. A directory named Experiment is stored within the NTFS volume partition. 5 files are stored in the Experiment directory, and they're named abc.txt, bde.pdf, fgh.doc, klm.ppt, and pku.jpg. Prior to and immediately after the Experiment directory is deleted, its corresponding MFT entries adjust ahead of and right after as shown in Figure six and Figure 7.
Figure six The corresponding MFT90 attribute information just before the Experiment directory is deleted
Due to the size with the picture, right here are each of the contents of your 90 attribute. You need to comprehend the adjustments inside the complete MFT entry. You may use Winhex software program to complete your own personal experiments and compare the complete MFT prior to and soon after deletion. As could be noticed from the above figure, the index things of all sub-files in the Experiment directory are stored within the 90 attribute, then the standard directory tree building can directly begin from the root directory area and continue to seek out the sub-directory under the deep level All files are lastly constructed into an inverted directory tree. The particular operation procedure is shown in Figure 5.
Figure 7 The corresponding MFT90 attribute details after the Experiment directory is deleted
Observe the 90 attribute in Figure 7. The value in the physical location 0x0C00079C0 may be the offset from the first index sub-item. The value is 0x30, and 0x0C00079C0+0x30=0x0C00079F0. Verify the worth at 0x0C00079F0 and discover that the value is 0xFFFFFFFFH, indicating The 90 attribute has ended, so all index sub-items under the 90 attribute happen to be deleted. Mainly because seagate restore can find only 5 files inside the Experiment directory, the index assignment attribute, namely the A0 attribute, is not involved in its MFT entry. So when there are a sizable number of files and subdirectories within the Experiment directory, will the cluster operation from the A0 attribute be cleared together with the deletion with the directory? The author discovered via a sizable quantity of experiments that below normal circumstances it will not be cleared unless this MFT entry has been cleared.
For the case of deleting only the file in the directory, the content in the index sub-item describing the file beneath the 90 attribute will also be cleared, and no a lot more examples will likely be described right here.
In quick, within the NTFS volume, even though the file is deleted, but its MFT entry has not been emptied. However, when deleting files, the possibility of MFT being emptied also exists. In extreme cases, for example when the space allocated to MFT by the method is close to being made use of up, the system will also clear MFT when deleting files and use other files directly MFT overwrites the MFT with the file. Normally, as long as the MFT entry will not be cleared, the 80 attribute value within the file MFT entry is not going to be cleared, that is definitely to say, the file data nevertheless exists around the disk, so the file data could be recovered.
Evaluation of key technologies in data recovery system style
Each and every directory and file has at the very least one particular MFT entry, so so long as all the MFT entries are traversed, and the significant attributes contained in them are study, it can be judged irrespective of whether the file is deleted plus the file name, file data cluster operation, etc. might be obtained. So how you can traverse MFT entries? Does it start from the very first non-meta file to traverse the MFT entries in sequence till the starting position of your read no longer indicates FILE0? Actually, the author identified by means of experiments that NTFS volumes usually have MFT fragmentation inside the absence of disk defragmentation to get a extended time, which is, you can find multiple MFT subregions which can be physically discontinuous, so continuous traversal of MFT entries isn't feasible . Some individuals have proposed a technique to locate various MFT regions which might be not continuous by traversing the complete partition, but this system calls for a good deal of scanning time. When the partition is big, the scan time is pretty much unacceptable to the user.
Immediately after in-depth evaluation of your 1st handful of metafiles inside the MFT area, it is not hard to find that the starting sector quantity and cluster number of your MFT entry of all files are basically stored in the $MFT entry, plus the storage system utilized is definitely the data running within the 80 attribute The structure in the structure is often judged no matter if the MFT is fragmented by way of the data operating structure list, as shown in Figure eight.
Figure eight Information regarding the 30 attributes and 80 attributes on the $MFT entry in a partition of an NTFS volume
As can be noticed from Figure 8, there is MFT fragmentation inside the partition above, due to the fact there are two data operation structures beneath the 80 attribute, which are 32DC0700000C and 323001AE5B0E. From the previous running structure, we are able to see that the beginning cluster number of the very first MFT region is 0C0000H=786432, which can be precisely the first cluster quantity on the far left of Figure 8, so you only should study the 80 attribute within the $MFT entry within the metafile. By parsing the starting cluster number and cluster quantity of every structure inside the information operating list, you may speedily traverse all MFT entries under the NTFS volume.
Construct a directory tree of deleted files. By traversing all of the MFT entries beneath the NTFS volume, you can clearly figure out regardless of whether the file is deleted and also the file name, file data starting cluster number, cluster quantity, and so on. So how do seagate external hard disk recovery tool combine deleted files and directories? The author made use of to create the complete directory tree progressively from the bottom of the directory tree. The fundamental flow of your algorithm is as follows:
(1) Read the cluster running list below the 80 attribute inside the $MFT entry, obtain the beginning cluster number and cluster quantity of every cluster operating structure, and after that loop by means of each cluster below the NTFS volume according to the first cluster quantity and cluster quantity Run all MFT entries within the structure.
(two) Read the MFT header to identify whether or not the file and directory are deleted. When the directory and file will not be deleted, ignore the MFT entry and go to the next entry; otherwise, new a brand new node structure to receive the entry The file name or directory name below the 30 attribute and also the MFT reference quantity in the parent directory.
(three) If the MFT reference number of the parent directory just isn't 0x05 (that's, the file or subdirectory inside the root directory), visit step (four); after which establish no matter if this MFT corresponds to a file or directory, if it is actually a non-directory file, then Study the 80 attribute value, get the very first cluster number and cluster quantity of your file information or file data, store it within a new node structure, then insert this node in to the directory tree; if it can be a directory file, insert this node directly into Inside the directory tree.
(4) As outlined by the MFT reference quantity of your parent directory, study the MFT reference quantity of each and every parent directory, obtain the file name on the directory, and after that get in touch with the search function to find whether the file is within the directory tree in the partially constructed directory tree , If there's no search, then new a brand new node, and save the file name from the file, the file as well as the MFT reference number in the parent directory in the node, and then establish whether this directory is definitely the root directory, if not, save this node to Inside the Vector array in the directory tree structure, then contact it recursively, go to step (four) and continue to read the MFT number of its parent directory. If it can be the root directory, insert this node beneath the appropriate node in the directory tree; if it is discovered, then Return directly.
In this way, you can get each of the deleted files and directories in the NTFS volume partition, and construct the complete directory tree structure. The algorithm for constructing the directory tree is shown within the following figure, as shown in Figure 9.
Figure 9 NTFS reconstruction algorithm for deleted files
So much writing and writing, the author also wants to briefly describe, but the difficulty of information recovery technologies lies in understanding these core technical principles, which can not be simplified. Immediately after reading right here, the reader may be pondering, ways to insert the scanned file? What kind of tree structure is employed to attain? These have been analyzed inside the first problem. OK, you will discover a lot of core technologies. Readers have any new suggestions, welcome to communicate with me.
summary
The realization of data recovery technology is actually a tough point, but provided that you understand its internal structure principles, plus a particular C++ programming experience, I believe that you just can create your individual data recovery application.