Page 1 of 1

5.2.2 server takes over all CPU - becomes unresponsive

PostPosted: Wed Mar 30, 2016 11:33 pm
by dsudar
Hi,
I upgraded my Ubuntu 14.04 machine to 5.2.2 on Monday (from 5.1.4) and large imports of ScanR plates (384 wells, 4 channels, 4 fields) since then cause the OMERO server to take over all CPU on the machine which makes the machine unresponsive for a long time. In fact, OMERO.web session crashed because of that (see QA case 17124). This appears to happen immediately after the image import has concluded and it's probably creating thumbnails or something like that.

Here's a screen shot of htop while this is happening:
htop_OMERO.GIF
htop_OMERO.GIF (116.69 KiB) Viewed 2920 times


In master.err, I get a number of exceptions related to GC overhead limit exceeded:
Exception in thread "bitronix-task-scheduler" Exception in thread "OMERO.scheduler_QuartzSchedulerThread" java.lang.OutOfMemoryError: GC overhead limit exceeded
at bitronix.tm.timer.TaskScheduler.executeElapsedTasks(TaskScheduler.java:258)
at bitronix.tm.timer.TaskScheduler.run(TaskScheduler.java:244)
Exception in thread "Blitz-0-Ice.ThreadPool.Server-8" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "Blitz-0-Ice.ThreadPool.Client-6" Exception in thread "Timer-0" java.lang.OutOfMemoryError: GC overhead limit exceeded

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "OMERO.scheduler_QuartzSchedulerThread"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "pool-5-thread-7"
Exception in thread "pool-5-thread-7" java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "Blitz-0-Ice.ThreadPool.Server-16"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Blitz-0-Ice.ThreadPool.Server-16"
Exception in thread "Blitz-0-Ice.ThreadPool.Server-19" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "Ice.Timer" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "perf4j-async-stats-appender-sink-CoalescingStatistics" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "Blitz-0-Ice.ThreadPool.Client-8" java.lang.OutOfMemoryError: GC overhead limit exceeded


None of this happened under 5.1.4 and outside the 5.2.2 upgrade nothing else changed on the machine.

Thanks,
- Damir

Re: 5.2.2 server takes over all CPU - becomes unresponsive

PostPosted: Thu Mar 31, 2016 4:43 am
by dsudar
Quick follow-up: I tried importing the exact same data set with another OMERO 5.2.2 instance on a Centos6.7 system (similarly powerful system as the Ubuntu unit). It imported okay but I did notice that immediately after the command line import concluded, the CPU use spiked a number of times extremely high but then came down to a normal low CPU use within 1 minute.
I have that data set ready for your testing tar'ed and bzip2'ed up (5.5GB) in case that helps.
Thanks,
- Damir

Re: 5.2.2 server takes over all CPU - becomes unresponsive

PostPosted: Thu Mar 31, 2016 6:18 am
by atarkowska
Hi Damir,

Could you try to increase the heap size? For more details please look at https://www.openmicroscopy.org/site/sup ... mance.html

Ola

Re: 5.2.2 server takes over all CPU - becomes unresponsive

PostPosted: Thu Mar 31, 2016 6:50 am
by dsudar
Hi Ola,
Thanks for the quick response.

I did not yet try a larger heap size and had been using the percent method which results in:
omero_user@omero:~$ omero admin jvmcfg
JVM Settings:
============
blitz=-Xmx5053m -XX:MaxPermSize=1g -XX:+IgnoreUnrecognizedVMOptions
indexer=-Xmx3368m -XX:MaxPermSize=1g -XX:+IgnoreUnrecognizedVMOptions
pixeldata=-Xmx5053m -XX:MaxPermSize=1g -XX:+IgnoreUnrecognizedVMOptions
repository=-Xmx3368m -XX:MaxPermSize=1g -XX:+IgnoreUnrecognizedVMOptions


The server has 32GB physical memory (I'm waiting for more memory that is on back-order). What would you suggest I try for the new settings?

And do you think that allocating more memory will mitigate the extreme CPU usage issue? I had the same settings with 5.1.4 and this issue never occurred.

Thanks,
- Damir

Re: 5.2.2 server takes over all CPU - becomes unresponsive

PostPosted: Thu Mar 31, 2016 7:12 pm
by dsudar
Hi Ola,

On my Ubuntu 14.04 server I upped the allocations to all the 4 components and re-ran the import
omero admin jvmcfg is now:
omero_user@omero:~$ omero admin jvmcfg
JVM Settings:
============
blitz=-Xmx10106m -XX:MaxPermSize=1g -XX:+IgnoreUnrecognizedVMOptions # Settings({'percent': '30'})
indexer=-Xmx5053m -XX:MaxPermSize=1g -XX:+IgnoreUnrecognizedVMOptions # Settings({'percent': '15'})
pixeldata=-Xmx10106m -XX:MaxPermSize=1g -XX:+IgnoreUnrecognizedVMOptions # Settings({'percent': '30'})
repository=-Xmx5053m -XX:MaxPermSize=1g -XX:+IgnoreUnrecognizedVMOptions # Settings({'percent': '15'})


As before the import itself completes and the extreme CPU usage spikes are there for about 2 minutes and then it quiets down to normal levels. So indeed, it appears that allocating more memory helps. But I'm still wondering, what is causing those extreme CPU usage spikes immediately after the import itself completes? Is that thumbnail creation or something?

Thanks,
- Damir

Re: 5.2.2 server takes over all CPU - becomes unresponsive

PostPosted: Fri Apr 01, 2016 5:38 am
by atarkowska
Hi Damir,

Could you confirm if you tried exactly the same import on 5.1?

Could you try
Code: Select all
bin/omero import ... --debug
(see https://www.openmicroscopy.org/site/sup ... ort--debug) or
Code: Select all
$ importer-cli ... --debug ALL
, then pipe that to a file and also send us server logs to help say what is happening

Did you try to turn off thumbnail generation on import
Code: Select all
--skip thumbnails
(see https://www.openmicroscopy.org/site/sup ... port--skip) to see if it makes any difference with CPU usage.

Ola