Welcome to the Corner of Excellence: June 2008

Saturday, June 21, 2008

How to Suppress Transaction Log Checking

In MS SQL Server, Transaction Log refers to the mechanism of undo and redo the data in the database. Sometime, in case of database crash, we just one to be able to view the data as it is so the recovery step can be carefully orchestrated. In this case, we have to find some ways to suppress undo/redo mechanism, if this became issue that crop up in data recovery session.

In the event of database crash, MS SQL server will first marked the database as suspect. If we then detach the suspect database, and if crashes affect the important pages such as PFS page, then it is impossible to re-attach the database again, unless some supression of checking mechanism is to be done on that database.

Recall that in my previous post, it is possible to bypass the checking mechanism by performing 'page transplantation' from backup database. But we are in big trouble if there are no backup available, i.e. we just have the crashed database and can not do anything about it because it can not be attached to be examined carefully (i.e. by DBCC PAGE command).

The illustrated case below will explain why this kind of suppresion is important.

I was faced with crashed database with no backup available. As usual, the database is marked as suspect (greyed icon), and after some futile attempt to do the recovery in order that the database is up again, I detach the database, upon trying to attach the database again I came upon this message :

Recall that in my previous post, I already describe the way to 'fix' the torn page by examining and then synchronizing the sectors in that page. After examining each sector, it revealed that the m_tornbits (red box) value is not in synch with the rest of sectors' torn bits.

One of the torn bits of sample sector is shown here (green box) :

All of the torn bits in the end of each sectors evaluated to 0x02 where as the m_tornbits value is 0x01 which is why MS SQL server give the above error.

To get rid of this nasty torn page error that preventing me to attach the database, I change the m_tornbits value from 0x01 to 0x02, and re-attach the database again using SQL Enterprise Manager. Then, I arrived with this message :

It seems that I already bypass the torn page checking, but tripped with another checks that preventing successful attach process. Upon examining the error log :

2008-06-21 15:10:35.43 spid51 Error while redoing logged operation in database 'MBS'. Error at log record ID (2387:408:4)

So, I am faced with transaction log issue. It seems that MS SQL server is trying to redo the transaction on this page, that is already corrupted, which is fortunately, unsuccesful, because even if it is successful, it will make things more complicated. It is because we only want to check the database as it is, seconds before disaster.

In order to bypass this mechanism, now it is time to consult the 'Oracle' Kalen Delaney (Inside Microsoft SQL Server 2000) again about transaction log :

If the LSN in the page is less than that of log LSN, then the transaction should be redone. Otherwise, if LSN in the page is greater, transaction should not be redone.

This 'law' implied that it is possible to bypass the redo mechanism, because this is the exact thing I expected. By somehow change the LSN in that page to be of higher value, then by theory, it will be able to bypass the redo mechanism.

If you see the page header in the above picture, the yellow box denotes the LSN of that defected page. Hence, by changing the value of 2387 = 0x00000953 to for example 2388 = 0x00000954 (to 0x54090000 in big endian in physical file), it is possible to bypass the transaction redone mechanism. I change the LSN value, firing up the attach database dialogue box, and bingo ! The database can be attached !

Now, with database successfully attached, I can examine more closely which data is corrupted.

Happy transaction log-ing :)

Thursday, June 19, 2008

Which Sector is "torn" in Torn Page Error ?

Recently I'm doing the analysis of the "torn page" thing again. If the SQL server throws the torn page error, it doesn't show which sector the page is torn. Let's review again the definition of "torn page" from Kalen Delaney's "Inside Microsoft SQL Server 2000" :

..it causes a bit to be flipped for each 512 byte sector in a database page (8 KB) whenever the page is written to disk..

In practice, there are two kinds of torn page error events. By this event, I am referring to the behaviour of MS SQL 2000.

First, torn page error during the database attachment process. Usually, this is caused by some corrupted data that occur in MS SQL server's important pages, such as PFS pages.

Second, torn page during accessing records, i.e. ours important pages :)

Since the data is of utmost important, the precise determination of which sector of size 512 bytes that gets torn apart is required.

To perform this kind of analysis, I am creating the testing database (named Test), create table and input some record using SQL Enterprise Manager. Next, I am determining the first page that contain my data :

use test
dbcc checkalloc

***************************************************************
Table TestTbl Object ID 357576312.
Index ID 0. FirstIAM (1:91). Root (1:90). Dpages 1.
Index ID 0. 2 pages used in 0 dedicated extents.
Total number of extents is 0.
***************************************************************

So, the first page of my data is located at page 90 which is at offset 90 * 8192 = 737280 = 0xB4000 in physical file.

Now, lets see the physical page created by MS SQL Server using hex editor :

Here, the offset 5 (marked with green box) in the page header is somehow related to the torn page checking. Turning the first nibble to zero (i.e. from 0x81 to 0x80) should bypass the torn page checking. But it is not, see the paragraph below for further information.

OK, let's go back to the original question. To determine which sector is get torn, we have to examine the calculation of m_tornbits flag in relation to the last offset at each 512 sector. Here is how it is done :

First byte of m_tornbits is masked with 0x03, so any byte will end up with either the value 0x00, 0x01, 0x02 or 0x03. For the above example, first byte is 0x01, with this operation, it will give 0x01. This value is used to compare with the last byte offset of second sector onward (or sector 1, if I count first sector as zero) (starting from offset 0x03FF onward).

The last byte of each sector is performed using the same operation as above and compared with the value from masked first byte of m_tornbits. If there are any differences, the torn page error will be thrown.

So, using the above operation, it is possible to determine which sector is not in synch with the rest of the sector or in other word, which sector is a torn page. But, it is inevitable with the assumption that, the first sector which contains the page header, hence the m_tornbits value, is a valid one.

However, it is not possible to temporarily turn off torn page checking by resetting the byte in offset 5 at page header. SQL server seems perform the consistency checking :

So, you should change the last offset of defected sector one by one in order to be accessed by MS SQL Server.

Happy torn page-ing :)

Friday, June 6, 2008

Reverse Engineering Lotus Notes

The above title will provoke some hard criticisms, or even some warning letters sent to me to delete, or maybe otherwise, instruct the company that host my web site be blocked.

Some will opinioned that I has perform something forbidden by law. But as I already written on previous posts in this blog, actually all human activities for the namesake to increase the prosperity and better living can be categorized as many forms of reverse engineering activities.

Ok, enough of the preaching. Let's back to the above topic. Now, the question is, is it possible ? The very first step in reverse engineering is to identify the function name. This is important because the name will convey some internal working intended by the programmer or developer of those functions.

Is it possible to identify the function name in the myriads of hex codes of calls in assemblies of Lotus Notes applications ? I would like to say, it is possible.

First of all, I will define the scope of my discussion to Lotus Notes that is running using Windows OS, in other word, windows version of Lotus Notes. And I am not discussing the server site runtime routine, but the client side.

So, let's start with the beginning when the Lotus Notes program is invoked via windows icon. In this discussion, I am using Lotus Notes version 7. Executable file that spawned the Lotus Notes Client is NLNOTES.EXE.

Using the WinDBG debugger, we can determine the entry point, i.e. where all of the Lotus Notes routine resides, so we don't get lost into identifying the windows runtime instead of Lotus Notes runtime.

Executable search path is:
ModLoad: 00400000 0050f000 nlnotes.exe
ModLoad: 7c900000 7c9b0000 ntdll.dll

Here, at my current computer memory configuration, the NLNOTES.EXE file is loaded at hex address 400000. We can determine the address of entry point using the !dh macro in WinDBG :

0:000> !dh 00400000
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
14C machine (i386)
4 number of sections
4304A8DE time date stamp Thu Aug 18 22:27:26 2005
0 file pointer to symbol table
0 number of symbols
E0 size of optional header
10F characteristics
Relocations stripped
Executable
Line numbers stripped
Symbols stripped
32 bit word machine
OPTIONAL HEADER VALUES
10B magic #
7.10 linker version
1000 size of code
10D000 size of initialized data
0 size of uninitialized data
1C94 address of entry point
1000 base of code
----- new -----
00400000 image base
1000 section alignment
1000 file alignment
2 subsystem (Windows GUI)
4.00 operating system version
0.00 image version
4.00 subsystem version
10F000 size of image
1000 size of headers
...

Searching the "address of entry point" string in the result gives hex 1C94. So, we can calculate the actual address in the memory where the entry point reside :

400000 + 1C94 = 401C94

Now, we can perform break point and goes directly to the entry point of Lotus Notes Application :

0:000> bp 00401c94
0:000> g
ModLoad: 5cb70000 5cb96000 C:\WINDOWS\system32\ShimEng.dll
ModLoad: 76390000 763ad000 C:\WINDOWS\system32\IMM32.DLL
ModLoad: 629c0000 629c9000 C:\WINDOWS\system32\LPK.DLL
ModLoad: 74d90000 74dfb000 C:\WINDOWS\system32\USP10.dll
ModLoad: 773d0000 774d3000 C:\WINDOWS\WinSxS\x86_Microsoft.Windows.Common-Controls_6595b64144ccf1df_6.0.2600.2982_x-ww_ac3f9c03\comctl32.dll
Breakpoint 0 hit
eax=00000000 ebx=7ffdd000 ecx=0013ffb0 edx=7c90eb94 esi=7c9118f1 edi=00011970
eip=00401c94 esp=0013ffc4 ebp=0013fff0 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
nlnotes+0x1c94:
00401c94 6a74 push 74h
0:000> u 401c94
*** WARNING: Unable to verify checksum for nlnotes.exe
*** ERROR: Module load completed but symbols could not be loaded for nlnotes.exe
nlnotes+0x1c94:
00401c94 6a74 push 74h
00401c96 68c0214000 push offset nlnotes+0x21c0 (004021c0)
00401c9b e8e8020000 call nlnotes+0x1f88 (00401f88)
00401ca0 33db xor ebx,ebx
00401ca2 895de0 mov dword ptr [ebp-20h],ebx
00401ca5 53 push ebx
00401ca6 8b3d24204000 mov edi,dword ptr [nlnotes+0x2024 (00402024)]
00401cac ffd7 call edi

Here is our sample Lotus Notes function to be identified :

00401c9b e8e8020000 call nlnotes+0x1f88 (00401f88)

In Lotus Notes 7 client application, usually resides in C:\Lotus\Notes directory, there is the file called LotusNotes.sym file in compressed format. The first thing to do is to de-compressed this file using the MAP2ISYM.EXE :

For the one who is curious, MAP2ISYM.EXE can be obtained from Lotus Notes official sites.

Next, we have to determine the start and end offset of NLNOTES.EXE module, so we can avoid wrong identification of function names.

This is the portion of LotusNotes.sym raw data shown using hex file that determine the start offset and end offset of NLNOTES.EXE module :

Offset 0 1 2 3 4 5 6 7 8 9 A B C D E F
00000650 04 43 00 00 40 00 37 A8 B7 00 07 00 00 00 4E 4C C @ 7¨· NL
00000660 4E 4F 54 45 53 7F 5D 00 43 00 00 C5 62 F4 B8 B7 NOTES] C Åbô¸·
00000670 00 08 00 00 00 4E 4C 53 43 43 53 54 52 59 61 00 NLSCCSTRYa
00000680 43 C

The hex marked with red and blue one denotes the start and end offset of NLNOTES.EXE module which is B7A837 and B7B8F4 respectively. So, the relative address for NLNOTES.EXE should resides in this range.

For the above call, the relative address is 1F88, because base address is already determined, which is 400000.

This relative address should be transformed to 00001F88 and transposed become big endian format 88 1F 00 00. This value could be used to search using hex editor :

Offset 0 1 2 3 4 5 6 7 8 9 A B C D E F
00B7AB40 84 1F 00 00 0A 00 5F 5F 73 65 74 61 72 67 76 00 „ __setargv
00B7AB50 88 0F 00 00 88 1F 00 00 0D 00 5F 5F 53 45 48 5F ˆ ˆ __SEH_
00B7AB60 70 72 6F 6C 6F 67 00 C3 0F 00 00 C3 1F 00 00 0D prolog Ã Ã

The red one denotes the translation between the relative to the function name, and the offset at location in the vicinity of 00B7AB50 is still well within the defined boundary of start and end offset of NLNOTES.EXE module. So, rest assured, we can deduct that the function name should be _SEH_prolog :

00401c94 6a74 push 74h
00401c96 68c0214000 push offset nlnotes+0x21c0 (004021c0)
00401c9b e8e8020000 call nlnotes+0x1f88 (00401f88) ;;_SEH_prolog

Using this manual method, we can theoritically determine any function name of interest. And this conclude the answer of the possibility of reverse engineering Lotus Notes application, particularly determining the function name.

As for the side comment, the identified function in this sample case (_SEH_prolog), although already reside in NLNOTES.EXE module, is still part of windows OS runtime. Maybe this executable is generated using Microsoft's Visual C++.

Happy identifying :)

Thursday, June 5, 2008

Unraveling MS SQL 2000 Database Format (Part 1)

This time, I would like to discuss about the physical format of Microsoft's SQL 2000 Database. As you probably already know, each MS SQL 2000 database corresponds to one or more file. The file has the default extention of .MDF.

The raw data of MDF file is logically divided by blocks of size 8192 bytes and it is called page. In each block, the first 96 bytes is the header, and the rest of it, is the data, depends of the page type that is defined in the page header.

Now, let's see the first block, which corresponds to the page 0 (zero) in some sample physical SQL 2000 database, using hex editor program :

Offset 0 1 2 3 4 5 6 7 8 9 A B C D E F
00000000 01 0F 00 00 08 01 00 00 00 00 00 00 00 00 00 00
00000010 00 00 00 00 00 00 01 00 63 00 00 00 01 1F DE 18 c Þ
00000020 00 00 00 00 01 00 00 00 67 00 00 00 78 01 00 00 g x
00000030 13 00 00 00 00 00 00 00 00 00 00 00 42 D2 00 00 BÒ
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Please by reminded that I will not explain further about each variable in each bytes (for instance, I will not explain what is m_type, etc.). You can consult other documentation regarding these property.

Offset 0 which is Hex 01 is headerVersion
Offset 1 which is Hex 0F is Page Type (m_type)
Offset 2 which is Hex 00 is m_typeFlagBits
Offset 3 which is Hex 00 is m_level
Offset 4 which is Hex 08 is m_flagBits
Offset 5 which is Hex 01 is at the time of writing is still not yet identified
Offset 6-7 which is Hex 00 00 is m_indexId
Offset 18-1B which is Hex 63 00 00 00 which is 00000063 Hex which is 99 Dec is m_objId
Offset 20-23 which is Hex 00 00 00 00 is second part of m_PageId
Offset 24-25 which is Hex 01 00 is first part of m_PageId or m_FileId

If time permitted, I will continue with next deciphering task. That's all folks !

Wednesday, June 4, 2008

Reverse Engineering the Adobe Acrobat Applications

By limiting the scope to the computer technology, especially to programming languages, I personally considered Reverse Engineering as a means to convey more meaning to the otherwise lifeless assembly codes that is consists of numbers in hexadecimal format.

Putting into more broader scope, I believe that every human searches for the meaning of life, every human pursues in each aspect of sciences is also a reverse engineering activities, also tries to convey meaning of the mechanism or inner workings of universe.

So, based on the above title, I try to convey some meaning of the existing assembly codes of existing Adobe plug-ins. The most basic aspect is to find or identify functions that is used by any Acrobat Plug-ins.

Based on the documentations given by Adobe, the concept of function calling in the Acrobat framework is using HFTs (Host Function Tables). If I created some plug-in, I can also, exports the HFTs, so when it is well documented, other plug-ins can use the functions in my plug-ins to perform some intended operations.

To prove the Acrobat concept of the HFT mechanism, let us examined the smallest prototype plug-in supplied by Adobe in the SDK, named starter.api. After compiling it, then disasembled it, I am using DISASM.EXE provided by SangCho. On the original source, the starter.api is calling the most used Acrobat function called ASAtomFromString. In the compiled form, it is taking the form of :

:1001266A 68E4410210 push 100241E4
(StringData)"ADBE:Starter"
:1001266F 8B1520850210 mov edx, dword[10028520 {gCoreHFT} ]
:10012675 FF5214 call dword[edx+14]

Surely, the edx register contains the index to the HFT, in this case is gCoreHFT. Now the hexadecimal value 14, or 20 in decimal should corresponds the enumerated index.

The calling methods is using indexes of the enumerated types of function name plus the SEL suffix. For example, the enumerated name of ASAtomFromString will be ASAtomFromStringSEL. This enumerated names is supposed to be existent in the header files supplied by Adobe in the SDK.

But if you try to search the string "ASAtomFromStringSEL" into any of the header files, you will find nothing. Why ? Because it is sophistically constructed using definition macro. So, there is no way you can find the value of enumerated name based on existing header files.

Some compilers have the facillity to generate the pre-compiled header, i.e. file with .PCH extension. And if you want to find, you should consult this file, but, again, unfortunately, it is in the form of machine language that human can not understand. I don't know whether currently there a some program to decompile the .PCH file down to readable definitions out there.

So, the most easiest method is assigned the ASAtomFromStringSEL to some value in the plug-in we have, in this case is starter.api, and have it inspected during debugging session, or have it printed or written to some temporary file. Using this method, we can find that ASAtomFromStringSEL corresponds to value 5. And the relation of 5 with 20 is 20/4 or divided by 4. Why 4 ? Because each index is a multiply of words.

So, provided we have all the enumerated values in each of HFT index category, such as Core, AcroSupport, etc, theoritically we can identify any HFTs in the plug-ins, hence, serve the purpose to convey some meaning to the lifeless assembly codes in the Adobe plug-ins, the Reverse Engineering.

Welcome to the Corner of Excellence