Saturday, July 26, 2014

Troubleshooting SNMP Probe Load

Technote (FAQ)


Question

MTTrapd probe : How is trap delivery confirmed?

Cause

SNMP trap overload on probe port

Answer

IMPORTANT:
It is recommended that a firewall is installed on the SNMP(MTTrapd) probe server whenever event delivery is mission critical. This will allow the timely diagnosis of any event delivery issues.

The latest version of the probe should have the following log messages when traps are being deliberately dropped by the SNMP (MTTrapd) probe due to the trap Queue becoming full.

Warning: Dropping Trap!

During normal processing the following messages are available dependent upon message level;

Information: Number of items in the trap queue is 0
Debug: 1 trap in queue

Earlier versions of the probe may not have these messages.

To diagnose SNMP delivery the IP analysis tool for the probes platform should be used.
e.g.
Ethereal/WireShark
Capture a fixed number of traps (e.g. The TrapQueue size 20,000) then analyse the data using the Analyse functions. The most useful of which is by IP address. The Analysis Summary should indicate how much data is being sent to the probes port.

To examine load processing via the probes rulesfile use;

if ( match(%counter,"") )
{
%counter = 1
%start_time=getdate
}
else
{
%end_time=getdate
$time_elapsed = real(int(%end_time) - int(%start_time))

if (int($time_elapsed) > 59 )
{
$current_load = real(%counter) / real($time_elapsed)
log(info,"Events per second = " + $current_load + " " + (%counter) + " [ " + $time_elapsed + " ]")
%counter = 1
%start_time=getdate
}
else
{
%counter = int(%counter) + 1
}
}


The difference between the two loads, IP on the SNMP (MTTrapd) port and the events that the probe is actually parsing, should help diagnose issues.
It may be necessary to reduce the overall loading of the probe by using more SNMP (MTTrapd) probes or by reducing the events being sent.

The use of a firewall on the probe server allows timely diagnosis of issues as well as controlling access to the SNMP (MTTrapd) probes port.


http://www-01.ibm.com/support/docview.wss?uid=swg21327391


Friday, July 25, 2014

WebGUI OutOfMemory Exceptions

https://www.ibm.com/developerworks/community/blogs/cdd16df5-7bb8-4ef1-bcb9-cefb1dd40581/entry/webgui_outofmemoryexceptions_don_t_panic_and_here_s_what_to_do9?lang=en

WebGUI is a fairly complex enterprise Java application. Because of this, it is sensitive to incorrect sizing and hardware capabilities of the server it is running on. This blog post will address what you can do when WebGUI runs out of memory. A sure tell-tale sign of this is the dreaded OutOfMemoryExceptions (OOMs) in various log files. OOMs are never good thing to see in your log files as they will often cause unpredicatable failures.
 

1. Increase WebGUI's (Java) Available Memory

 
First you'll need to decide if this is a simple sizing issue or something else altogether. The easiest way to do this is to simply bump up the amount of Java memory available to WebGUI (without exceeding your server's physical memory). You can do this by using the wsadmin command or by manually editing a server.xml file. I'll outline how to do both.

1.1 wsadmin


bash-2.05$ cd <tip_home>/bin
bash-2.05$ ./wsadmin.sh -lang jython -username <admin_username> -password <password>
wsadmin>AdminTask.setJVMInitialHeapSize ('[-serverName server1 -nodeName TIPNode -initialHeapSize 1024]')
wsadmin>AdminConfig.save()
wsadmin>AdminTask.setJVMMaxHeapSize('[-serverName server1 -nodeName TIPNode -maximumHeapSize 2048]')
wsadmin>AdminConfig.save()
wsadmin>AdminTask.showJVMProperties('[-serverName server1 -nodeName TIPNode]')
'[ [internalClassAccessMode ALLOW] [debugArgs -Djava.compiler=NONE -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=35050] [classpath ] [initialHeapSize 768] [runHProf false] [genericJvmArguments ] [hprofArguments ] [osName ] [bootClasspath ] [verboseModeJNI false] [maximumHeapSize 1152] [disableJIT false] [executableJarFileName ] [verboseModeGarbageCollection false] [debugMode false] [verboseModeClass false] ]'

1.2 server.xml


Memory settings (and other JVM parameters) are stored in <install_dir>/profiles/TIPProfile/config/cells/TIPCell/nodes/TIPNode/servers/server1/server.xml. E.g.

<jvmEntries xmi:id="JavaVirtualMachine_1270953424359" verboseModeClass="false" verboseModeGarbageCollection="false" verboseModeJNI="false" initialHeapSize="768" maximumHeapSize="1152" runHProf="false" hprofArguments="" debugMode="false" debugArgs="-Djava.compiler=NONE -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=35050" genericJvmArguments="">
<systemProperties xmi:id="Property_1270954192861" name="com.ibm.tivoli.reporting.installdir" value="/home/jeffri/build/22/products/tcr" description="Tivoli Common Reporting Home" required="false"/>
</jvmEntries>

Doing either of these will require a restart of WebGUI. Once the new memory settings have taken effect, monitor WebGUI to see if the issues persist. If they do, see if OOMs are in the log files (ncw.* or webtop.log, depending on which version of WebGUI you are running). Let's assume the issues persist, what next?

At this point, you can try to increase memory further, to see if the issue still occurs. But keep in mind that if you are on a 32-bit operating system, there is a hard-limit of 4GB for each process, including the JVM, so check your OS documentation on the recommended Java -Xmx setting to use (typically it's around 3GB for a 32-bit OS). If you're on a 64-bit OS and have set it to a relatively high number (e.g. >4GB, assuming your server has more than 4GB of memory available) but are still seeing OOMs, it's time to gather a bit of information before deciding your next move.
 

2. Determine if you've run out of native memory

 
I won't delve too deep into the various types of memory one should be concerned about when running a Java application, as doing that will risk running this blog post into a small book. I'll direct you instead to an article on DeveloperWorks. Safe to say that should you run out of native memory, allocating more memory to Java heap will only make matters worse. There are a tell-tale signs that native memory has been exhausted:

- 1TISIGINFO in Java core
- heap dumps
- Java stack trace
 

2.1 Java core

 
A Java core is a formatted and pre-analyzed text file created by the JVM during an event (in our case - the OOM), or via manual intervention. It contains a lot of information about the runtime condition of the JVM during a snapshot in time, but you should only be interested in the first few lines. Use your favorite file search tool to look for the files with the following names: javacore.* and you should be able to find a bunch of text files. Open them in a text editor and you should see the following. Pay attention to the line which has '1TISIGINFO' in it, as it contains the dump event reason.

NULL           ------------------------------------------------------------------------
0SECTION       TITLE subcomponent dump routine
NULL           ===============================
1TISIGINFO     Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Failed to create a thread: retVal -1073741830, errno 11" received
1TIDATETIME    Date:                 2011/09/22 at 16:03:57
1TIFILENAME    Javacore filename:    /usr/tivoli/tip/profiles/TIPProfile/javacore.20110922.160347.21627046.0005.txt
NULL           ------------------------------------------------------------------------

You'll notice that the reason for java/lang/OutOfMemoryError is "Failed to create a thread: ...". If you see this error message it is likely that you've run out of native memory.
 

2.2 Heap Dump


Another indication is by examining a Java heap dump generated during the OOM. A heap dump is a binary file, containing a dump of all reachable objects in memory at a certain point in time. It's typically used to examine what objects are occupying memory, handy if you've got an OOM. To examine a heap dump, I recommend a tool called Eclipse Memory Analyzer. If you need to analyze an IBM heapdump, you'll need an additional IBM plugin for MAT. Open up the heap dump using the tool and see what's the size of the heap. If it is significantly smaller than the amount of memory allocated to Java (via -Xmx command or wsadmin), then you have probably run out of native memory. E.g. if you've allocated 2GB, but get an OOM and the heap shows only 1GB of heap successfully allocated.

If no heap dumps are being generated, you'll need to set the following environment variables (refer to your OS documentation on how to set them):

IBM_HEAP_DUMP=true
IBM_HEAPDUMP=true
IBM_HEAPDUMP_OUTOFMEMORY=true
IBM_JAVACORE_OUTOFMEMORY=true
IBM_HEAPDUMPDIR=<directory>

The next time an OOM occurs, heap dumps will be generated in the directory you specified. You do not need to restart TIP for this to take effect.
 

2.3 Java stack trace

 
Yet another handy way to know if the culprit is lack of native memory is to look at WebGUI's error logs (either ncw.*.trace for WebGUI 7.3.1 and above, or webtop.log for WebGUI 7.3.0 and below). Look for the line that contains the stack trace of the OutOfMemoryException. Here are 2 examples:

Allocated 1953546760 bytes of native memory before running out
Exception in thread "main" java.lang.OutOfMemoryError
   at sun.misc.Unsafe.allocateMemory(Native Method)
   at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:99)
   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
   at com.ibm.jtc.demos.DirectByteBufferUnderNativeStarvation.main(
DirectByteBufferUnderNativeStarvation.java:29)

Allocated 1953546736 bytes of native memory before running out
Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
   at java.lang.Thread.start0(Native Method)
   at java.lang.Thread.start(Thread.java:574)
   at com.ibm.jtc.demos.StartingAThreadUnderNativeStarvation.main(
StartingAThreadUnderNativeStarvation.java:22)

Exceptions occuring during allocation of memory in DirectByteBuffer and failure to create new native thread (similar error to the Java core file above) are typical examples of running out of native memory.

If you determined that you need more native memory, you should either add more physical memory and/or reduce the number of applications running concurrently on the server. Hint: It's generally a bad idea if you're running WebGUI and ObjectServer in the same physical machine. Keep in mind that if you plan to use >4GB of memory, you'll need a 64-bit OS. Once you've added more memory and increase the Java heap available to WebGUI, continue monitoring to see if the issue still occurs.
 

3. Help! There's plenty of native memory, I've increased Java heap, but I still get OOMs

 
At this point, I advise upgrading your version of WebGUI or Webtop and/or installing fix packs. Recent versions have critical bugs related to memory consumption. If you still have issues after upgrading, then contact IBM support engineers with the following information:
  • WebGUI configuration information:
    • ncwDataSourceDefinitions.xml
    • wimconfig.xml
    • Output from 'Troubleshooting and Support > System Information for Tivoli Netcool/OMNIbus Web GUI'
  • WebGUI server logs in $TIPHOME\profiles\TIPProfile\logs
  • Java heap dumps (or Eclipse Memory Analyzer reports if you've previously analyzed a heap dump)
  • Java core files