Showing posts with label WinDbg. Show all posts

Solving .net source code lines problem in windbg with SOSEX

 
Introduction

The SOSEX debugger extension has all kinds of commands that work with source code lines, for example !mk displays a stack trace with both managed and unmanged frames, potentially mapping managed frames to their source code locations.

Like native source code lines support, it requires the symbols file (PDB) to be available to the debugger. Unfortunately, .NET doesn’t call LoadLibrary on its managed non mixed-mode assemblies. Not calling LoadLibrary causes the debugger to not see the module (assembly, DLL, I have used those terms interchangeably in this post), thus obviously unable to load its symbols. That happens because the debugger gets the list of modules from the ntdll loader’s list, which is populated by LoadLibrary calls.

CoreCLR does call LoadLibrary. I’m guessing that not calling is a compatibility issue rather than security issue, as CoreCLR calls LoadLibrary with the DONT_RESOLVE_DLL_REFERENCES flag, which prevents native code from running (DLLMain isn’t called) in an untrusted assembly.

If the assembly is loaded after we have attached, the debugger actually sees it, because after attaching it receives lower level notifications of module loading, at the file mapping level (Managed DLLs are still mapped to the process). You can list the loaded modules with the lm command.

So this problem is only relevant to assemblies that were loaded before attaching the debugger.

Solution

Despite its name, Process Explorer’s DLLs pane isn’t limited to DLLs loaded by LoadLibrary, but it actually displays all memory mapped files, including all managed DLLs.

For example, some managed application:

image_thumb1

Luckily, there’s a way of “hinting” the debugger where a DLL is located in the target’s memory space with the .reload command, for our case we could run:

.reload /f ClassLibrary1.dll=0x2700000

Notice that I took the base address from the above Process Explorer’s screenshot. /f simply forces immediate symbols load. After that, all the commands that support sources (like !mk) would work on code from that DLL with source lines.

The option of specifying base address is usually used for kernel debugging, in cases where some of the memory that contains the loaded DLLs is paged out, so the debugger can’t figure out what is loaded by itself.

Posted in: , , by . No comments

The Cause of Spurious Mouse Move Messages/Events when task manager is open

Recently while programming in JavaScript I noticed that my mousemove event keeps firing without my mouse moving, as long as my mouse is over the relevant object. It seemed strange so I searched for this behavior online and apparently quite a few others encountered it and thought it was strange: [1] http://stackoverflow.com/questions/17818493/mousemove-event-repeating-every-second, [2], [3], [4].

The specification says “A user agent must dispatch this event when a pointing device is moved while it is over an element (emphasis is mine), confirming that this behavior isn’t originated in the standard. Luckily for me, the first link from above helped me find the source of this behavior: the Task Manager was open in the background.

At the OS level

First I wanted to check what’s happening at the OS level, Applications mostly handle input by listening to Windows Messages and a tool for observing them is Spy++. Because my version of Chrome is 32-bit, I opened the 32-bit version of Spy++ (spyxx.exe rather than spyxx_amd64.exe). After locating Chrome’s window using the crosshair from the Search –> Find Window dialog we get information about the window:
image 
Clicking Synchronize will highlight the window on the main windows tree. By right clicking the window entry in the tree and clicking Messages we can capture windows messages targeted at this window:
image 
When holding the mouse over the browser window without moving it the following messages keep repeating:
 image
One of the repeating messages is WM_MOUSEMOVE which not surprisingly represents mouse movement, and apparently Chrome doesn’t check whether the mouse has actually moved and passes the event to the web application. Actually Firefox and IE11 behave differently in this case (the mousemove event doesn’t keep firing), probably because they do check whether the mouse has moved each time.

That showed that the underlying issue is the continuing arrival of WM_MOUSEMOVE messages, and not a web specific issue.

Why does it happen?

I wondered why would a background application (Task Manager) affect in such a way other applications (Although this shouldn’t cause bugs because mouse move handling is usually idempotent), so I wanted to debug this issue.

The assumption is that there’s a piece of code in Task Manager that causes it and the goal is finding it. Since Task Manager is a Win32 GUI application much of its code is powered by Windows Messages, and our code of interest is likely run in response to a Windows Message.

Since the WM_MOUSEMOVE messages seem to be arriving approximately every 1 second, we can look for similar periodical windows messages with Spy++ (This time with the 64-bit version), After Similarly locating the Task Manager main window we can capture messages targeted at this window:image

To our expectation, there’s a user defined Windows Message arriving every 1 second, so we can debug its handling code and likely find the root cause. After attaching WinDBG we can switch context to the message pumping thread (with ~0s) and look at the call stack:

0:000> k
Child-SP RetAddr Call Site
000000eb`dad2f618 00007ffc`2b4013ed ntdll!NtWaitForMultipleObjects+0xa
000000eb`dad2f620 00007ffc`2ded3100 KERNELBASE!WaitForMultipleObjectsEx+0xe1
000000eb`dad2f900 00007ffc`265311ca USER32!RealMsgWaitForMultipleObjectsEx+0x100
000000eb`dad2f9b0 00007ffc`2653134b DUser!CoreSC::xwProcessNL+0x2d6
000000eb`dad2fa40 00007ffc`2ded2822 DUser!MphProcessMessage+0xb3
000000eb`dad2faa0 00007ffc`2e213034 USER32!_ClientGetMessageMPH+0x52
000000eb`dad2fb30 00007ffc`2ded26ca ntdll!KiUserCallbackDispatcherContinue
000000eb`dad2fba8 00007ffc`2ded2695 USER32!NtUserGetMessage+0xa
000000eb`dad2fbb0 00007ff6`ba4d6d6a USER32!GetMessageW+0x25
000000eb`dad2fbe0 00007ff6`ba4e2144 Taskmgr!wWinMain+0x2aa
000000eb`dad2fcc0 00007ffc`2b8113d2 Taskmgr!__wmainCRTStartup+0x1d4
000000eb`dad2fd90 00007ffc`2e195444 KERNEL32!BaseThreadInitThunk+0x22
000000eb`dad2fdc0 00000000`00000000 ntdll!RtlUserThreadStart+0x34

The interesting frame is highlighted above, that’s the message loop frame and there we can start debugging the message handling code. First we can disassemble around it to see where is the current message stored:


0:000> u Taskmgr!wWinMain+0x270 l30
Taskmgr!wWinMain+0x270:
00007ff6`ba4d6d30 488b0dc1d60700 mov rcx,qword ptr [Taskmgr!g_MainWindow (00007ff6`ba5543f8)]
00007ff6`ba4d6d37 4c8d4510 lea r8,[rbp+10h]
00007ff6`ba4d6d3b 488bd3 mov rdx,rbx
00007ff6`ba4d6d3e ff1544c30800 call qword ptr [Taskmgr!_imp_TranslateAcceleratorW (00007ff6`ba563088)]
00007ff6`ba4d6d44 488d4d10 lea rcx,[rbp+10h]
00007ff6`ba4d6d48 ff15eac20800 call qword ptr [Taskmgr!_imp_TranslateMessage (00007ff6`ba563038)]
00007ff6`ba4d6d4e 488d4d10 lea rcx,[rbp+10h]
00007ff6`ba4d6d52 ff15e8c20800 call qword ptr [Taskmgr!_imp_DispatchMessageW (00007ff6`ba563040)]
00007ff6`ba4d6d58 488d4d10 lea rcx,[rbp+10h]
00007ff6`ba4d6d5c 4533c9 xor r9d,r9d
00007ff6`ba4d6d5f 4533c0 xor r8d,r8d
00007ff6`ba4d6d62 33d2 xor edx,edx
00007ff6`ba4d6d64 ff1516c30800 call qword ptr [Taskmgr!_imp_GetMessageW (00007ff6`ba563080)]
00007ff6`ba4d6d6a 85c0 test eax,eax

The first parameter to GetMessage is the message (passed in the rcx register), so It’s stored at [rbp+10]. Now we can add a conditional breakpoint for the user defined message with bp Taskmgr!wWinMain+0x2aa "j poi(@rbp+18)=4df '';'gc'"  (rbp+18 is used because the message number is the second parameter of the MSG structure, 4df is the user defined message code from the above Spy++ output).


After we run the app and the breakpoint fires it’s just a matter of tracing until finding the code of interest. A good method for that kind of search is jumping over function calls and seeing how the application reacts, until we jump over the function that causes the suspicious behavior to disappear (sort of binary search). To save time, I’ll skip the search process and show that function, it’s a call to SendMessage at Taskmgr!WdcDataMonitor::ListUpdate+0x45f:

0:000>  u Taskmgr!WdcDataMonitor::ListUpdate+0x45f
Taskmgr!WdcDataMonitor::ListUpdate+0x45f:
00007ff6`ba505d0f ff1583d10500 call qword ptr [Taskmgr!_imp_SendMessageW (00007ff6`ba562e98)]
00007ff6`ba505d15 418bc4 mov eax,r12d
00007ff6`ba505d18 488b8d90010000 mov rcx,qword ptr [rbp+190h]
00007ff6`ba505d1f 4833cc xor rcx,rsp
00007ff6`ba505d22 e8f9c4fdff call Taskmgr!_security_check_cookie (00007ff6`ba4e2220)
00007ff6`ba505d27 488b9c24f0020000 mov rbx,qword ptr [rsp+2F0h]
00007ff6`ba505d2f 4881c4a0020000 add rsp,2A0h
00007ff6`ba505d36 415f pop r15

The trick for constantly jumping over a function call (like a patch) is a breakpoint that changes the value of rip (the instruction pointer in x64) when it fires. In our case, because the code doesn’t check the return value we can advance it to the next instruction with that command: bp Taskmgr!WdcDataMonitor::ListUpdate+0x45f "r @rip=00007ff6`ba505d15;gc". Now after the application is run we can see that the spurious WM_MOUSEMOVE messages stop arriving.



The Windows message that is being sent – WM_SETREDRAW


The call we patched is a SendMessage call that sends a WM_SETREDRAW with the false parameter. That prevents a window from redrawing until WM_SETREDRAW is sent with the true parameter, it is targeted at the processes list view control in the Details tab and as evident by the stack trace it is sent while it’s being sorted:

0:000> k
Child-SP RetAddr Call Site
000000eb`dad2f520 00007ff6`ba508488 Taskmgr!WdcDataMonitor::ListUpdate+0x45f
000000eb`dad2f800 00007ff6`ba50643a Taskmgr!WdcProcessMonitor::ListUpdate+0x38
000000eb`dad2f840 00007ff6`ba503dba Taskmgr!WdcDataMonitor::SetSortDirty+0x5e
000000eb`dad2f880 00007ff6`ba502dc5 Taskmgr!WdcListView::Sort+0xaa
000000eb`dad2f8c0 00007ff6`ba4bbe84 Taskmgr!WdcListView::Render+0x115
000000eb`dad2f940 00007ff6`ba4f0c5c Taskmgr!WdcDataPortal::Render+0xe8
000000eb`dad2f970 00007ff6`ba4c6431 Taskmgr!WdcMonitor::Render+0x2a71c
000000eb`dad2f9c0 00007ff6`ba4c6016 Taskmgr!TmTraceControl::ForceRenderUI+0x2d
000000eb`dad2fa00 00007ffc`2ded250d Taskmgr!TmWindowProc+0xf6
000000eb`dad2fa90 00007ffc`2ded2367 USER32!UserCallWinProcCheckWow+0x149
000000eb`dad2fb60 00007ff6`ba4d6d58 USER32!DispatchMessageWorker+0x1a7
000000eb`dad2fbe0 00007ff6`ba4e2144 Taskmgr!wWinMain+0x298

That message is used to prevent numerous window redraws while many updates are being made to it (sorting could surely cause many updates). The issue is how it is implemented by default, and that’s by making the window invisible, which along with the WM_SETREDRAW with the true parameter that is being sent later, causes Windows to send WM_MOUSEMOVE messages, and the puzzle is complete.


In conclusion, as I wrote above, I don’t think this behavior is very problematic, as the extra messages shouldn’t usually change application’s behavior, and if they do, the application should be fixed anyway because Windows doesn’t guarantee sending them on actual mouse movement (the above linked post explains why), and even the web standard is somewhat ambiguous about the mousemove event.

Posted in: , by . 1 Comment

Services Console Snap-in Timeout Error message Causes

If you ever encountered a hanging service you probably saw the error: "Error 1053: The service did not respond to the start or control request in a timely fashion" when starting or stopping it from the Services console snap-in (can be quickly displayed with by running services.msc).

image

You may encounter this error when debugging service startup code and notice that after that error message your service is being terminated and the debug session is ruined. To solve this you can set the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\ServicesPipeTimeout registry key. Having this value set you can still notice that this message appears but instead of the default 30 seconds (unlike the 2 minutes default stated in the article, at least since XP) the error message appears after 2 minutes. You may think that windows ignores values higher than 2 minutes but actually the service process isn’t terminated and this error message box appears simply because the progress dialog (pictured below) has reached its end. (Tested on Windows 8.1 and 10)

image

The fact that the displayed error code is the same in both cases, and that it corresponds to the System event log entry that happens when termination occurs (no event log entry is created when the registry key is set to a high value) were confusing to me and I wanted to verify that it actually is a UI only error and the service is not affected.

Debugging

After attaching to mmc.exe I first guessed the relevant DLL by setting a breakpoint (bp) on the user32!MessageBoxW and waited for the message box to appear, the breakpoint hit and that was the top of the stack:

0:000> k
Child-SP RetAddr Call Site
00000000`00bad578 00007ffc`7c054f59 USER32!MessageBoxW
00000000`00bad580 00007ffc`7c055039 filemgmt!DoErrMsgBoxHelper+0xa5
00000000`00bad5d0 00007ffc`7c05b86c filemgmt!DoServicesErrMsgBox+0x25
00000000`00bad620 00007ffc`7c05b1cf filemgmt!CServiceControlProgress::M_ProcessErrorCode+0x150
00000000`00badaa0 00007ffc`7c05bce7 filemgmt!CServiceControlProgress::M_EDoExecuteServiceThread+0x6f
00000000`00badae0 00007ffc`7c042282 filemgmt!CServiceControlProgress::S_EStartService+0xbb
00000000`00badb30 00007ffc`7c03a744 filemgmt!CFileMgmtComponentData::Service_FDispatchMenuCommand+0x10e
00000000`00badbb0 00007ffc`7c03e2f0 filemgmt!CFileMgmtComponent::Command+0xa4
00000000`00badc30 00007ffc`7c03df04 filemgmt!CFileMgmtComponent::OnToolbarButton+0x78
00000000`00badc60 00007ffc`7c24aacc filemgmt!CFileMgmtComponent::ControlbarNotify+0x94

filemgmt and particularly CServiceControlProgress seem relevant. Now we can guess that because the progress bar (pictured above) is most likely powered by the SetTimer API and WM_TIMER messages there is a symbol containing “Timer” in this dll:

0:000> x filemgmt!*timer*
00007ffc`7c05b5b4 filemgmt!CServiceControlProgress::M_OnTimer (<no parameter info>)
00007ffc`7c082aa8 filemgmt!_imp_KillTimer = <no type information>
00007ffc`7c0389cc filemgmt!MakeElapsedTimeResult (<no parameter info>)
00007ffc`7c082aa0 filemgmt!_imp_SetTimer = <no type information>

filemgmt!CServiceControlProgress::M_OnTimer seems easily the best canditate.



If this function didn’t contain the word “timer” and you couldn’t guess it otherwise by dumping the members of CServiceControlProgress (x filemgmt!CServiceControlProgress::*) you could still deductively find it by setting a breakpoint at the default DialogProc (user32!DefDlgProcW) then tracing to find the DialogProc of the progress bar dialog (filemgmt!CServiceControlProgress::S_DlgProcControlService), then setting a conditional breakpoint waiting for WM_TIMER messages (0x113) with bp filemgmt!CServiceControlProgress::S_DlgProcControlService ".if @rdx=113 {} .else {gc}" and then quickly tracing to the timer handler.


Fortunately, the relevant control flow is right at the start of the filemgmt!CServiceControlProgress::M_OnTimer  function (I removed some error checking code):

filemgmt!CServiceControlProgress::M_OnTimer:
00007ffc`7c05b5b4 48895c2418 mov qword ptr [rsp+18h],rbx
00007ffc`7c05b5b9 57 push rdi
00007ffc`7c05b5ba 4883ec50 sub rsp,50h
00007ffc`7c05b5cd 81413890010000 add     dword ptr [rcx+38h],190h
00007ffc`7c05b5db 8b4138 mov eax,dword ptr [rcx+38h]
00007ffc`7c05b5de 488bfa mov rdi,rdx
00007ffc`7c05b5e1 488bd9 mov rbx,rcx
00007ffc`7c05b600 3b01 cmp eax,dword ptr [rcx]
00007ffc`7c05b602 761e jbe filemgmt!CServiceControlProgress::M_OnTimer+0x6e (00007ffc`7c05b622)
00007ffc`7c05b604 488b5130 mov rdx,qword ptr [rcx+30h]
00007ffc`7c05b608 488bcf mov rcx,rdi
00007ffc`7c05b60b ff1597740200 call qword ptr [filemgmt!_imp_KillTimer (00007ffc`7c082aa8)]
00007ffc`7c05b611 4883633000 and qword ptr [rbx+30h],0
00007ffc`7c05b616 c783a00600001d040000 mov dword ptr [rbx+6A0h],41Dh

The first parameter of the function, the this pointer (of CServiceControlProgress) that is passed at the rcx register (x64 calling convention) has some interesting fields:

0:000> dps rcx
00000000`03a17b80 00000005`0001e848
00000000`03a17b88 00000000`002b063a
00000000`03a17b90 00000000`00500e0e
00000000`03a17b98 00000000`003c0e96
00000000`03a17ba0 00000000`02440d9e
00000000`03a17ba8 00000000`00000000
00000000`03a17bb0 00000000`00000001
00000000`03a17bb8 00000000`00000960

We can see in the code that the [rcx+38] offset is incremented by 0x190 and then compared to [rcx] (both values are highlighted in the memory dump above). When [rcx+38] is bigger than [rcx], KillTimer is called and [rbx+0x6A0], which is equal to the [rcx+0x6A0] of the start of the function (reminder: rcx was the this pointer), gets assigned the value of decimal 1053 (0x41D), which is our displayed error code, and if we disassemble CServiceControlProgress::M_ProcessErrorCode we would see that it reads the same pointer and passes the value to the message box with the description of this error code, which pretty much confirms that perpetrator is the progress bar dialog and that we shouldn’t worry about the error message box, as the actual interaction with the service happens on another thread, and is not affected by this, below was the call stack during all of this:

0:000> ~9k
Child-SP RetAddr Call Site
00000000`04cef0d8 00007ffc`9b5e1199 ntdll!NtAlpcSendWaitReceivePort+0xa
00000000`04cef0e0 00007ffc`9b5d3f2d RPCRT4!LRPC_CCALL::SendReceive+0x169
00000000`04cef1a0 00007ffc`9b5dea0f RPCRT4!I_RpcSendReceive+0x59
00000000`04cef1d0 00007ffc`9b6df750 RPCRT4!NdrSendReceive+0x2b
00000000`04cef200 00007ffc`9b6dfc1c RPCRT4!NdrpClientCall2+0x363
00000000`04cef820 00007ffc`9aeb62a3 RPCRT4!NdrClientCall2+0x1c
00000000`04cef850 00007ffc`7c05be40 sechost!StartServiceW+0x23
00000000`04cef8a0 00007ffc`9b2713d2 filemgmt!CServiceControlProgress::S_ThreadProcStartService+0x80

00000000`04cef8d0 00007ffc`9d525444 KERNEL32!BaseThreadInitThunk+0x22
00000000`04cef900 00000000`00000000 ntdll!RtlUserThreadStart+0x34


Trivia and a sanity check: if we set a breakpoint at the user32!SetTimer and get the interval parameter that was passed by our dialog, we would see that it’s 0x190, which equals to the increment at the timer handler function we saw, meaning that the values we saw are in time units and that the stop condition which is 0x1e848, or 125000 in decimal, is 125 seconds, pretty close to the 2 minutes that we observed.

Posted in: , by . No comments