Hanging situation

Question: We had hanging situation in this week a few times in our productive system


The dpmon overview was like this:

Workprocess Table (long)
========================

No Ty. Pid Status Cause Start Err Sem CPU Time Program Cl User Action Table
-----------------------------------------------------------------------------------------------------------------------
0 DIA 2008 Ended no 1 0 1109SAPMF05L 200 LISSENSL Direct Read RFBLG
1 DIA 1360 Ended no 1 0 809SAPDBMMI 200 RUBENSE Sequential Read RKPF
2 DIA 1780 Ended no 1 0 1094SAPLSUPR 200 CSIKOSC Direct Read TFDIR
3 DIA 1584 Ended no 1 0 809 200 RUBENSE Sequential Read ESDUS
4 DIA 1664 Ended no 1 0 1062SAPMSSY2 000 SAPSYS Direct Read BTCCTL
5 UPD 1656 Wait yes 0 0 0
6 ENQ 1680 Wait yes 0 0 0
7 BTC 1700 Wait yes 0 0 0
8 BTC 1716 Wait yes 0 0 0
9 SPO 1724 Run yes 0 0 822 000 SAPSYS Direct Read TSPOPTIONS
10 UP2 1732 Wait yes 0 0 0

I think the problem, that the SPO request made a dead-lock, and blocked all the DIA WP.
I checked this table TSPOPTIONS it has only 7 entries....buffering also switched off....

I was unable to connect to the DB (using Query Analyzer).

There was no entry in the SQL log, not too much info in the dev traces, except the DIA WPs restarted several times.

Anybody had similar problem with TSPOPTIONS?
What is the meaninng of the entries?

Thanks,


Ps: Of course we opened an OSS message, but we got the template answer: apply the latest kernel/ SQL SP, GUI Patch

Answer:

You only have 5 Dialog WP and they were all being used. Because of that, everyone else needing a Dialog WP needs to wait until one gets freed.

Many solutions are available:

1. Ask your users to run reports as background
2. Add some Dialog WP. 5 is not enough.
3. Add an application server
4. if you have enough RAM, add an additional instance on your box. but it is preferable to go for solution #2.

As for the Spool process being the culprit, this is possible but am not sure. Maybe killing this process (with restart) might have helped.

Good luck.



Answer:
Hi

thanks for your answer!!!

The problem is not the number of the DIA WP. Normally in the process overview only a few DIA is used in the same time...CPU time is not too high for the WPs.
I've already increased it from 5 to 8.

In 40B (now we're in 46C) once we had a deadlock problem with the spool request, but solved with an OSS note.
Now both case the SPO DIA was reading the mentioned table...so I'm almost sure, that the problem is with the SPO DIA or the table.



Answer:
HI

Can you post a part of the dev_w0, dev_w9 and dev_disp for those times?

Thanks,

As for the SPO being the problem, those log files will verify if it is or not.

Answer:
dev_disp:
Wed Feb 12 11:14:08 2003
*** ERROR => W0 (pid 2008) died [dpxxdisp.c 9991]

Wed Feb 12 11:14:48 2003
*** ERROR => W1 (pid 1360) died [dpxxdisp.c 9991]
*** ERROR => W2 (pid 1780) died [dpxxdisp.c 9991]

Wed Feb 12 11:15:28 2003
*** ERROR => W4 (pid 1664) died [dpxxdisp.c 9991]

Wed Feb 12 11:15:48 2003
*** ERROR => W3 (pid 1584) died [dpxxdisp.c 9991]
my types changed after wp death/restart 0x3f --> 0x3e

Wed Feb 12 11:21:25 2003
caught signal 2

Wed Feb 12 11:22:26 2003
***LOG Q0I=> NiPWrite: WSASend (10054: WSAECONNRESET: Connection reset by peer) [ninti.c 1699]
*** ERROR => DpTmSend: NiBufSend failed(rc=-6)->disconnect tm: 32 [dpxxdisp.c 8413]
RM-T32, U341, 200 CSIKOSC, csikosc, 11:02:17, M0, W2, PFCG, 2/1
*** ERROR => DpTmTrc: DpTmInfo (rc=-2) [dpxxtool.c 1877]
????
*** ERROR => DpTmDisconnect: T32 disconnected already [dpxxdisp.c 4827]

Wed Feb 12 11:23:25 2003
*** ERROR => DpHalt: W7 (pid 1700) still alive ... [dpxxdisp.c 6529]

Wed Feb 12 11:23:26 2003
***LOG Q0M=> DpMsDetach, ms_detach () [dpxxdisp.c 7964]
MBUF state OFF
MBUF component DOWN
***LOG Q05=> DpHalt, DPStop ( 1280) [dpxxdisp.c 6651]


dev_w0:
M
M Wed Feb 12 10:52:27 2003
M ***LOG R0K=> ThDelForMode, ExitMode (001VERMAELM ) [thxxmode.c 3599]
M
M Wed Feb 12 11:13:05 2003
M 2. ALARM: terminate process (pid=720)
M previous sql break failed, terminate without db cleanup and hooks
M ThAlarmHandler: C-Stack during alarm handler
M ------------------ C-STACK ----------------------
SAP (R) - R/3(TM) Callstack, Version 1.0
Copyright (C) SAP AG. All rights reserved.
------------------------------------------------------------
Callstack without Exception:
App : disp+work.EXE (pid=720)
When : 2/12/2003 11:13:7.791
Threads : 2
------------------------------------------------------------
Comuter Name : SRVSAP01
User Name : SAPServiceD01
------------------------------------------------------------
Number of Processors: 1
Processor Type: x86 Family 6 Model 7 Stepping 3
Windows Version: 5.0 Current Build: 2195
------------------------------------------------------------
State Dump for Thread Id 1b8

eax=4914d080 ebx=02386750 ecx=49091ea8 edx=00000000 esi=0000008c edi=07213918
eip=77f8224d esp=02386638 ebp=02386684 iopl=0 nv up ei pl nz ac po nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000216


function : NtRequestWaitReplyPort
77f82242 b8b0000000 mov eax,0xb0
77f82247 8d542404 lea edx,[esp+0x4] ss:1c225637=????????
77f8224b cd2e int 2e
77f8224d c20c00 ret 0xc

*----> Stack Back Trace <----*
FramePtr ReturnAd Param#1 Param#2 Param#3 Param#4 Function Name
02386684 74cb102b 035bdd58 0721f390 0000008c 02386750 ntdll!NtRequestWaitReplyPort
0238669c 74cb1113 0721ccd8 0721f390 0721008c 02386750 DBNETLIB!<nosymbols>
023866e0 7537465a 0721ccd8 0721f390 0000008c 02386750 DBNETLIB!ConnectionWrite
0721ccd8 00000000 00000000 00000000 00000000 00000000 sqloledb!<nosymbols>

------------------------------------------------------------
State Dump for Thread Id 710

eax=045bff18 ebx=00000103 ecx=045bfc90 edx=00000000 esi=00000000 edi=00000000
eip=77f82efd esp=045bfc74 ebp=045bfcb4 iopl=0 nv up ei pl zr na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000246


function : ZwFsControlFile
77f82ef2 b848000000 mov eax,0x48
77f82ef7 8d542404 lea edx,[esp+0x4] ss:1e45ec73=????????
77f82efb cd2e int 2e
77f82efd c22800 ret 0x28

*----> Stack Back Trace <----*
FramePtr ReturnAd Param#1 Param#2 Param#3 Param#4 Function Name
045bfcb4 00bda2d2 000001f8 00000000 780035b0 0255fd68 ntdll!ZwFsControlFile
045bff80 7800a3c0 00000000 780035b0 0000000a 0255fd68 disp+work!SigILastSig
045bffb4 77e8758a 0255fd68 780035b0 0000000a 0255fd68 MSVCRT!beginthreadex
045bffec 00000000 00000000 00000000 00000000 00000000 KERNEL32!SetFilePointer

------------------------------------------------------------
M -------------------------------------------------

---------------------------------------------------
trc file: "dev_w0", trc level: 1, release: "46D"
---------------------------------------------------
*
* ACTIVE TRACE LEVEL 1
* ACTIVE TRACE COMPONENTS all, M
*
B
B Wed Feb 12 11:13:10 2003
B create_con (con_name=R/3)
B Loading DB library 'E:\usr\sap\D01\SYS\exe\run\dbmssslib.dll' ...
B Library 'E:\usr\sap\D01\SYS\exe\run\dbmssslib.dll' loaded
B Version of library 'E:\usr\sap\D01\SYS\exe\run\dbmssslib.dll' is "46D.00", patchlevel (0.1269)
B New connection 0 created
M relno 4640
M patchlevel 0
M patchno 1289
M intno 0
M pid 2008
M
M ***LOG Q0Q=> tskh_init, WPStart (Workproc 0 2008) [dpxxdisp.c 929]
I MtxInit: -2 0 0
X EmInit: MmSetImplementation( 2 ).
X <ES> client 0 initializing ....
X Using implementation std
M <ES> Memory Reset enabled
X ES initialized.
M calling db_connect ...
C
C Wed Feb 12 11:13:31 2003
C MapHr:8000FFFF E_UNEXPECTED
C HandleOledbError [OpenOledbConnection,line 33020]: Error/Message: (err 17, sev 0), [DBNETLIB][ConnectionOpen (Connect()).]SQL Server does not exist or access denied.
C HandleOledbError [OpenOledbConnection,line 33020]: Error/Message: (err 0, sev 0), Invalid connection string attribute
C
C Wed Feb 12 11:13:48 2003
C MapHr:8000FFFF E_UNEXPECTED
C HandleOledbError [OpenOledbConnection,line 33020]: Error/Message: (err 17, sev 0), [DBNETLIB][ConnectionOpen (Connect()).]SQL Server does not exist or access denied.
C HandleOledbError [OpenOledbConnection,line 33020]: Error/Message: (err 0, sev 0), Invalid connection string attribute
C
C Wed Feb 12 11:14:04 2003
C MapHr:8000FFFF E_UNEXPECTED
C HandleOledbError [OpenOledbConnection,line 33020]: Error/Message: (err 17, sev 0), [DBNETLIB][ConnectionOpen (Connect()).]SQL Server does not exist or access denied.
C HandleOledbError [OpenOledbConnection,line 33020]: Error/Message: (err 0, sev 0), Invalid connection string attribute
C failed to establish conn. 0
M ***LOG R19=> tskh_init, db_connect ( DB-Connect 000256) [thxxhead.c 1085]
M in_ThErrHandle: 1
M *** ERROR => tskh_init: db_connect (step 1, th_errno 13, action 3) [thxxhead.c 7919]
M
M Info for wp 0
M
M stat = 4
M reqtype = 1
M act_reqtype = 1
M tid = 12
M mode = 0
M len = 196
M rq_id = 6116
M rq_source = 1
M last_tid = 12
M last_mode = 0
M rfc_req = 0
M report = >SAPMF05L <
M action = 10
M tab_name = >RFBLG  <
M
M *****************************************************************************
M *
M * LOCATION SAP-Server srvsap01_D01_00 on host srvsap01 (wp 0)
M * ERROR tskh_init: db_connect
M *
M * TIME Wed Feb 12 11:14:04 2003
M * RELEASE 46D
M * COMPONENT Taskhandler
M * VERSION 1
M * RC 13
M * MODULE thxxhead.c
M * LINE 8012
M * COUNTER 1
M *
M *****************************************************************************
M
M Entering ThSetStatError
M call ThrShutDown (1) ...
M ***LOG Q02=> wp_halt, WPStop (Workproc 0 2008) [dpnttool.c 301]


dev_w9:
M
M Wed Feb 12 11:22:26 2003
M caught signal 2
M in_ThErrHandle: 1
M ThSigHandler: signal (step 4, th_errno 11, action 2)
M th_during_shutdown == TRUE
M Entering TH_CALL_HOOKS
M ThCallHooks: call hook >ThrSaveSPAFields< for event BEFORE_DUMP
M Entering ThSetStatError
M *** WARNING => sql break failed (-1)
M ThErrHandle: don't try rollback again
M ThErrHandle: normal shutdown
M *** ERROR => ThErrHandle: bad value for th_act_em_hdl (29), detach T-1/M255 [thxxhead.c 8207]
M call ThrShutDown (2)...
B Disconnecting from ALL connections:
B Wp Hdl ConName ConCnt ConState TX PRM RCT Date Time
B 009 000 R/3 000000 ACTIVE NO YES NO 20030211 104326
C cancelled processing (hr 0)
B Disconnected from connection 0
B statistics db_con_commit (com_total=3102, com_forced=1312, com_tx=1312)
B statistics db_con_rollback (roll_total=0, roll_forced=0, roll_tx=0)
M ***LOG Q02=> wp_halt, WPStop (Workproc 9 1724) [dpnttool.c 301]


Thanks in advance,
Incho

Answer:
Hi ,

I was hoping that the developer traces would help but it didn't (forgot that NT developer traces also gives out stack info which you can't use unless you've got a hex edior or a core dump analyzer ... UNIX gives more info instead of the stack info in NT). Anyway, I'll just based my observations on the dpmon printout you posted earlier.

It seems that there's two probable causes. First is the SQL DB itself (e.g. tablespaces, structure, etc). But since you said that there's nothing wrong with the DB (and you'ved looked into the SQL log), we'll go to the second probable cause. I'd say you're right that the problem is with the table TSPOPTIONS (either due to buffering options or enqueue locks) since the SPO work process seems hang on reading this table (currently been reading it for the past 822 seconds). The fact that DIA work processes 0 to 4 ended when trying to access/use memory key segment 1 (Application server administration) also points to the access of the table TSPOPTIONS as the culprit.

Can you verify in SM13/12 (after the system has been started) if there are any TSPOPTIONS or if there are any errors in the updates?

Thanks

Answer:
Can you verify in SM13/12 (after the system has been started) if there are any TSPOPTIONS or if there are any errors in the updates?

Hi ,

thanks a lot for your help!!!!

There is no error is sm12/sm13

I
Copyright ?2007 - 2008 www.jt77.com