Services Console Snap-in Timeout Error message Causes

Posted in: , by . No comments

If you ever encountered a hanging service you probably saw the error: "Error 1053: The service did not respond to the start or control request in a timely fashion" when starting or stopping it from the Services console snap-in (can be quickly displayed with by running services.msc).

image

You may encounter this error when debugging service startup code and notice that after that error message your service is being terminated and the debug session is ruined. To solve this you can set the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\ServicesPipeTimeout registry key. Having this value set you can still notice that this message appears but instead of the default 30 seconds (unlike the 2 minutes default stated in the article, at least since XP) the error message appears after 2 minutes. You may think that windows ignores values higher than 2 minutes but actually the service process isn’t terminated and this error message box appears simply because the progress dialog (pictured below) has reached its end. (Tested on Windows 8.1 and 10)

image

The fact that the displayed error code is the same in both cases, and that it corresponds to the System event log entry that happens when termination occurs (no event log entry is created when the registry key is set to a high value) were confusing to me and I wanted to verify that it actually is a UI only error and the service is not affected.

Debugging

After attaching to mmc.exe I first guessed the relevant DLL by setting a breakpoint (bp) on the user32!MessageBoxW and waited for the message box to appear, the breakpoint hit and that was the top of the stack:

0:000> k
Child-SP RetAddr Call Site
00000000`00bad578 00007ffc`7c054f59 USER32!MessageBoxW
00000000`00bad580 00007ffc`7c055039 filemgmt!DoErrMsgBoxHelper+0xa5
00000000`00bad5d0 00007ffc`7c05b86c filemgmt!DoServicesErrMsgBox+0x25
00000000`00bad620 00007ffc`7c05b1cf filemgmt!CServiceControlProgress::M_ProcessErrorCode+0x150
00000000`00badaa0 00007ffc`7c05bce7 filemgmt!CServiceControlProgress::M_EDoExecuteServiceThread+0x6f
00000000`00badae0 00007ffc`7c042282 filemgmt!CServiceControlProgress::S_EStartService+0xbb
00000000`00badb30 00007ffc`7c03a744 filemgmt!CFileMgmtComponentData::Service_FDispatchMenuCommand+0x10e
00000000`00badbb0 00007ffc`7c03e2f0 filemgmt!CFileMgmtComponent::Command+0xa4
00000000`00badc30 00007ffc`7c03df04 filemgmt!CFileMgmtComponent::OnToolbarButton+0x78
00000000`00badc60 00007ffc`7c24aacc filemgmt!CFileMgmtComponent::ControlbarNotify+0x94

filemgmt and particularly CServiceControlProgress seem relevant. Now we can guess that because the progress bar (pictured above) is most likely powered by the SetTimer API and WM_TIMER messages there is a symbol containing “Timer” in this dll:

0:000> x filemgmt!*timer*
00007ffc`7c05b5b4 filemgmt!CServiceControlProgress::M_OnTimer (<no parameter info>)
00007ffc`7c082aa8 filemgmt!_imp_KillTimer = <no type information>
00007ffc`7c0389cc filemgmt!MakeElapsedTimeResult (<no parameter info>)
00007ffc`7c082aa0 filemgmt!_imp_SetTimer = <no type information>

filemgmt!CServiceControlProgress::M_OnTimer seems easily the best canditate.



If this function didn’t contain the word “timer” and you couldn’t guess it otherwise by dumping the members of CServiceControlProgress (x filemgmt!CServiceControlProgress::*) you could still deductively find it by setting a breakpoint at the default DialogProc (user32!DefDlgProcW) then tracing to find the DialogProc of the progress bar dialog (filemgmt!CServiceControlProgress::S_DlgProcControlService), then setting a conditional breakpoint waiting for WM_TIMER messages (0x113) with bp filemgmt!CServiceControlProgress::S_DlgProcControlService ".if @rdx=113 {} .else {gc}" and then quickly tracing to the timer handler.


Fortunately, the relevant control flow is right at the start of the filemgmt!CServiceControlProgress::M_OnTimer  function (I removed some error checking code):

filemgmt!CServiceControlProgress::M_OnTimer:
00007ffc`7c05b5b4 48895c2418 mov qword ptr [rsp+18h],rbx
00007ffc`7c05b5b9 57 push rdi
00007ffc`7c05b5ba 4883ec50 sub rsp,50h
00007ffc`7c05b5cd 81413890010000 add     dword ptr [rcx+38h],190h
00007ffc`7c05b5db 8b4138 mov eax,dword ptr [rcx+38h]
00007ffc`7c05b5de 488bfa mov rdi,rdx
00007ffc`7c05b5e1 488bd9 mov rbx,rcx
00007ffc`7c05b600 3b01 cmp eax,dword ptr [rcx]
00007ffc`7c05b602 761e jbe filemgmt!CServiceControlProgress::M_OnTimer+0x6e (00007ffc`7c05b622)
00007ffc`7c05b604 488b5130 mov rdx,qword ptr [rcx+30h]
00007ffc`7c05b608 488bcf mov rcx,rdi
00007ffc`7c05b60b ff1597740200 call qword ptr [filemgmt!_imp_KillTimer (00007ffc`7c082aa8)]
00007ffc`7c05b611 4883633000 and qword ptr [rbx+30h],0
00007ffc`7c05b616 c783a00600001d040000 mov dword ptr [rbx+6A0h],41Dh

The first parameter of the function, the this pointer (of CServiceControlProgress) that is passed at the rcx register (x64 calling convention) has some interesting fields:

0:000> dps rcx
00000000`03a17b80 00000005`0001e848
00000000`03a17b88 00000000`002b063a
00000000`03a17b90 00000000`00500e0e
00000000`03a17b98 00000000`003c0e96
00000000`03a17ba0 00000000`02440d9e
00000000`03a17ba8 00000000`00000000
00000000`03a17bb0 00000000`00000001
00000000`03a17bb8 00000000`00000960

We can see in the code that the [rcx+38] offset is incremented by 0x190 and then compared to [rcx] (both values are highlighted in the memory dump above). When [rcx+38] is bigger than [rcx], KillTimer is called and [rbx+0x6A0], which is equal to the [rcx+0x6A0] of the start of the function (reminder: rcx was the this pointer), gets assigned the value of decimal 1053 (0x41D), which is our displayed error code, and if we disassemble CServiceControlProgress::M_ProcessErrorCode we would see that it reads the same pointer and passes the value to the message box with the description of this error code, which pretty much confirms that perpetrator is the progress bar dialog and that we shouldn’t worry about the error message box, as the actual interaction with the service happens on another thread, and is not affected by this, below was the call stack during all of this:

0:000> ~9k
Child-SP RetAddr Call Site
00000000`04cef0d8 00007ffc`9b5e1199 ntdll!NtAlpcSendWaitReceivePort+0xa
00000000`04cef0e0 00007ffc`9b5d3f2d RPCRT4!LRPC_CCALL::SendReceive+0x169
00000000`04cef1a0 00007ffc`9b5dea0f RPCRT4!I_RpcSendReceive+0x59
00000000`04cef1d0 00007ffc`9b6df750 RPCRT4!NdrSendReceive+0x2b
00000000`04cef200 00007ffc`9b6dfc1c RPCRT4!NdrpClientCall2+0x363
00000000`04cef820 00007ffc`9aeb62a3 RPCRT4!NdrClientCall2+0x1c
00000000`04cef850 00007ffc`7c05be40 sechost!StartServiceW+0x23
00000000`04cef8a0 00007ffc`9b2713d2 filemgmt!CServiceControlProgress::S_ThreadProcStartService+0x80

00000000`04cef8d0 00007ffc`9d525444 KERNEL32!BaseThreadInitThunk+0x22
00000000`04cef900 00000000`00000000 ntdll!RtlUserThreadStart+0x34


Trivia and a sanity check: if we set a breakpoint at the user32!SetTimer and get the interval parameter that was passed by our dialog, we would see that it’s 0x190, which equals to the increment at the timer handler function we saw, meaning that the values we saw are in time units and that the stop condition which is 0x1e848, or 125000 in decimal, is 125 seconds, pretty close to the 2 minutes that we observed.

Posted in: , by . No comments

The permalink

Leave a Reply