Service Assurance/Management: July 2014

https://www.ibm.com/developerworks/community/blogs/cdd16df5-7bb8-4ef1-bcb9-cefb1dd40581/entry/webgui_outofmemoryexceptions_don_t_panic_and_here_s_what_to_do9?lang=en

WebGUI is a fairly complex enterprise Java application. Because of this, it is sensitive to incorrect sizing and hardware capabilities of the server it is running on. This blog post will address what you can do when WebGUI runs out of memory. A sure tell-tale sign of this is the dreaded OutOfMemoryExceptions (OOMs) in various log files. OOMs are never good thing to see in your log files as they will often cause unpredicatable failures.

1. Increase WebGUI's (Java) Available Memory

First you'll need to decide if this is a simple sizing issue or something else altogether. The easiest way to do this is to simply bump up the amount of Java memory available to WebGUI (without exceeding your server's physical memory). You can do this by using the wsadmin command or by manually editing a server.xml file. I'll outline how to do both.

1.1 wsadmin

bash-2.05$ cd <tip_home>/bin
bash-2.05$ ./wsadmin.sh -lang jython -username <admin_username> -password <password>
wsadmin>AdminTask.setJVMInitialHeapSize ('[-serverName server1 -nodeName TIPNode -initialHeapSize 1024]')
wsadmin>AdminConfig.save()
wsadmin>AdminTask.setJVMMaxHeapSize('[-serverName server1 -nodeName TIPNode -maximumHeapSize 2048]')
wsadmin>AdminConfig.save()
wsadmin>AdminTask.showJVMProperties('[-serverName server1 -nodeName TIPNode]')
'[ [internalClassAccessMode ALLOW] [debugArgs -Djava.compiler=NONE -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=35050] [classpath ] [initialHeapSize 768] [runHProf false] [genericJvmArguments ] [hprofArguments ] [osName ] [bootClasspath ] [verboseModeJNI false] [maximumHeapSize 1152] [disableJIT false] [executableJarFileName ] [verboseModeGarbageCollection false] [debugMode false] [verboseModeClass false] ]'

1.2 server.xml

Memory settings (and other JVM parameters) are stored in <install_dir>/profiles/TIPProfile/config/cells/TIPCell/nodes/TIPNode/servers/server1/server.xml. E.g.

Doing either of these will require a restart of WebGUI. Once the new memory settings have taken effect, monitor WebGUI to see if the issues persist. If they do, see if OOMs are in the log files (ncw.* or webtop.log, depending on which version of WebGUI you are running). Let's assume the issues persist, what next?

At this point, you can try to increase memory further, to see if the issue still occurs. But keep in mind that if you are on a 32-bit operating system, there is a hard-limit of 4GB for each process, including the JVM, so check your OS documentation on the recommended Java -Xmx setting to use (typically it's around 3GB for a 32-bit OS). If you're on a 64-bit OS and have set it to a relatively high number (e.g. >4GB, assuming your server has more than 4GB of memory available) but are still seeing OOMs, it's time to gather a bit of information before deciding your next move.

2. Determine if you've run out of native memory

I won't delve too deep into the various types of memory one should be concerned about when running a Java application, as doing that will risk running this blog post into a small book. I'll direct you instead to an article on DeveloperWorks. Safe to say that should you run out of native memory, allocating more memory to Java heap will only make matters worse. There are a tell-tale signs that native memory has been exhausted:

- 1TISIGINFO in Java core
- heap dumps
- Java stack trace

2.1 Java core

A Java core is a formatted and pre-analyzed text file created by the JVM during an event (in our case - the OOM), or via manual intervention. It contains a lot of information about the runtime condition of the JVM during a snapshot in time, but you should only be interested in the first few lines. Use your favorite file search tool to look for the files with the following names: javacore.* and you should be able to find a bunch of text files. Open them in a text editor and you should see the following. Pay attention to the line which has '1TISIGINFO' in it, as it contains the dump event reason.

NULL           ------------------------------------------------------------------------
0SECTION       TITLE subcomponent dump routine
NULL           ===============================
1TISIGINFO     Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Failed to create a thread: retVal -1073741830, errno 11" received
1TIDATETIME    Date:                 2011/09/22 at 16:03:57
1TIFILENAME    Javacore filename:    /usr/tivoli/tip/profiles/TIPProfile/javacore.20110922.160347.21627046.0005.txt
NULL           ------------------------------------------------------------------------

You'll notice that the reason for java/lang/OutOfMemoryError is "Failed to create a thread: ...". If you see this error message it is likely that you've run out of native memory.

2.2 Heap Dump

Another indication is by examining a Java heap dump generated during the OOM. A heap dump is a binary file, containing a dump of all reachable objects in memory at a certain point in time. It's typically used to examine what objects are occupying memory, handy if you've got an OOM. To examine a heap dump, I recommend a tool called Eclipse Memory Analyzer. If you need to analyze an IBM heapdump, you'll need an additional IBM plugin for MAT. Open up the heap dump using the tool and see what's the size of the heap. If it is significantly smaller than the amount of memory allocated to Java (via -Xmx command or wsadmin), then you have probably run out of native memory. E.g. if you've allocated 2GB, but get an OOM and the heap shows only 1GB of heap successfully allocated.

If no heap dumps are being generated, you'll need to set the following environment variables (refer to your OS documentation on how to set them):

IBM_HEAP_DUMP=true
IBM_HEAPDUMP=true
IBM_HEAPDUMP_OUTOFMEMORY=true
IBM_JAVACORE_OUTOFMEMORY=true
IBM_HEAPDUMPDIR=<directory>

The next time an OOM occurs, heap dumps will be generated in the directory you specified. You do not need to restart TIP for this to take effect.

2.3 Java stack trace

Yet another handy way to know if the culprit is lack of native memory is to look at WebGUI's error logs (either ncw.*.trace for WebGUI 7.3.1 and above, or webtop.log for WebGUI 7.3.0 and below). Look for the line that contains the stack trace of the OutOfMemoryException. Here are 2 examples:

Allocated 1953546760 bytes of native memory before running out
Exception in thread "main" java.lang.OutOfMemoryError
   at sun.misc.Unsafe.allocateMemory(Native Method)
   at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:99)
   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
   at com.ibm.jtc.demos.DirectByteBufferUnderNativeStarvation.main(
DirectByteBufferUnderNativeStarvation.java:29)

Allocated 1953546736 bytes of native memory before running out
Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
   at java.lang.Thread.start0(Native Method)
   at java.lang.Thread.start(Thread.java:574)
   at com.ibm.jtc.demos.StartingAThreadUnderNativeStarvation.main(
StartingAThreadUnderNativeStarvation.java:22)

Exceptions occuring during allocation of memory in DirectByteBuffer and failure to create new native thread (similar error to the Java core file above) are typical examples of running out of native memory.

If you determined that you need more native memory, you should either add more physical memory and/or reduce the number of applications running concurrently on the server. Hint: It's generally a bad idea if you're running WebGUI and ObjectServer in the same physical machine. Keep in mind that if you plan to use >4GB of memory, you'll need a 64-bit OS. Once you've added more memory and increase the Java heap available to WebGUI, continue monitoring to see if the issue still occurs.

3. Help! There's plenty of native memory, I've increased Java heap, but I still get OOMs

At this point, I advise upgrading your version of WebGUI or Webtop and/or installing fix packs. Recent versions have critical bugs related to memory consumption. If you still have issues after upgrading, then contact IBM support engineers with the following information:

WebGUI configuration information:

ncwDataSourceDefinitions.xml
wimconfig.xml
Output from 'Troubleshooting and Support > System Information for Tivoli Netcool/OMNIbus Web GUI'

WebGUI server logs in $TIPHOME\profiles\TIPProfile\logs
Java heap dumps (or Eclipse Memory Analyzer reports if you've previously analyzed a heap dump)
Java core files

Service Assurance/Management

Saturday, July 26, 2014

Troubleshooting SNMP Probe Load

Technote (FAQ)

Question

Cause

Answer

Friday, July 25, 2014

WebGUI OutOfMemory Exceptions