Page 1 of 2

Lucene Merge Thread - Java memory error

PostPosted: Tue May 06, 2014 10:00 pm
by dsudar
Hi,

Since upgrading to 5.0.0 (and now also after upgrading to 5.0.1) on Ubuntu 12.04 with Ice 3.4, I get lots of errors in master.err about Lucene running out of memory such as:

----
omero@quid:~/OMERO.server/var/log$ more master.err
Exception in thread "Lucene Merge Thread #0" org.apache.lucene.index.MergePolicy
$MergeException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException
(ConcurrentMergeScheduler.java:309)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Conc
urrentMergeScheduler.java:286)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.
java:154)
------

and many of these:
--------
Exception in thread "Lucene Merge Thread #0" org.apache.lucene.index.MergePolicy
$MergeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException
(ConcurrentMergeScheduler.java:309)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Conc
urrentMergeScheduler.java:286)
Caused by: java.lang.OutOfMemoryError: Java heap space
Exception in thread "Lucene Merge Thread #0" org.apache.lucene.index.MergePolicy
$MergeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException
(ConcurrentMergeScheduler.java:309)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Conc
urrentMergeScheduler.java:286)
Caused by: java.lang.OutOfMemoryError: Java heap space
--------------

Is there indeed a place somewhere in Templates.xml where I should increase the Java memory? I already increased the Java memory allocations for BlitzTemplate and PixelDataTemplate per the manual. Should I increase the one for IndexerTemplate as well?

Thanks,
- Damir

Re: Lucene Merge Thread - Java memory error

PostPosted: Wed May 07, 2014 10:14 am
by jmoore
dsudar wrote:Since upgrading to 5.0.0 (and now also after upgrading to 5.0.1) on Ubuntu 12.04 with Ice 3.4, I get lots of errors in master.err about Lucene running out of memory such as:


...snip...

Odd. We had hoped we had made things better in 5.0. Had you increased the memory for the indexer in 4.4? How large is your /OMERO/FullText directory?

Is there indeed a place somewhere in Templates.xml where I should increase the Java memory? I already increased the Java memory allocations for BlitzTemplate and PixelDataTemplate per the manual. Should I increase the one for IndexerTemplate as well?


With the exception above, there's likely nothing to be done other than to also bump for IndexerTemplate as you did for Blitz and PixelData.

Regards,
~Josh.

Re: Lucene Merge Thread - Java memory error

PostPosted: Thu May 08, 2014 11:43 pm
by dsudar
Hi Josh,

Before upgrading to 5.0.0 I was running 4.4.10 and versions before that and did not increase the IndexerTemplate memory and did not get any such errors in the logs. The master.err was clear in 4.4.10 and the Indexer-0.log only has lots of "INFO" messages.

The /data/OMERO/FullText directory says:
omero@quid:/data/OMERO$ du -sk FullText/
141560 FullText/

Now with 5.0.1, the Indexer-0.log has lots of additional "WARN" messages about indexing 50 objects taking over 5000 ms:
2014-05-06 13:48:58,307 WARN [ ome.services.util.ServiceHandler] (2-thread-2) Method interface ome.services.util.Executor$Work.doWork invocation took 5979
2014-05-06 13:49:04,175 INFO [ ome.services.fulltext.FullTextIndexer] (2-thread-3) INDEXED 50 objects in 1 batch(es) [5844 ms.]
2014-05-06 13:49:04,177 WARN [ ome.services.util.ServiceHandler] (2-thread-3) Method interface ome.services.util.Executor$Work.doWork invocation took 5849


and then after the time taken increases and increases to over 60000 ms, and then a lot of the ERROR messages start similar to what is in master.err:

2014-05-06 14:42:08,101 INFO [ ome.services.fulltext.FullTextIndexer] (2-thread-2) INDEXED 50 objects in 1 batch(es) [60404 ms.]
2014-05-06 14:42:08,104 ERROR [ ome.services.util.ServiceHandler] (2-thread-2) Method interface ome.services.util.Executor$Work.doWork invocation took 60411
2014-05-06 14:43:15,266 INFO [ ome.services.fulltext.FullTextIndexer] (2-thread-5) INDEXED 50 objects in 1 batch(es) [66992 ms.]
2014-05-06 14:43:15,268 ERROR [ ome.services.util.ServiceHandler] (2-thread-5) Method interface ome.services.util.Executor$Work.doWork invocation took 67145
2014-05-06 14:44:05,553 ERROR [ ome.services.util.ServiceHandler] (2-thread-1) java.lang.Error: Wrapped Exception: (java.lang.OutOfMemoryError):
GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
2014-05-06 14:44:05,553 WARN [ ome.services.util.ServiceHandler] (2-thread-1) Unknown exception thrown.


I can upload the Index-0.log if that is useful.

I have now upped the IndexerTemplate memory to 1024 (from the 256 default) and after only a few minutes I again get the same ERROR messages in Indexer-0.log about the indexing of 50 objects taking very long. No out of memory exceptions yet but that will probably happen soon.

Thanks for any pointers, thoughts, or better: solutions ....
- Damir

Re: Lucene Merge Thread - Java memory error

PostPosted: Fri May 09, 2014 2:26 pm
by jmoore
Hi Damir,

this is certainly unfortunate, since we were expecting the 5.0 performance to be better. Do you have a sizable number of imports since upgrading to 5.0?

Perhaps do upload or send the Indexer-0.log and we can see if there's anything peculiar. If the log has rolled over, it would be good to have output from before and after bumping the heap size. Has the restart helped?

Cheers,
~Josh.

Re: Lucene Merge Thread - Java memory error

PostPosted: Fri May 09, 2014 5:25 pm
by dsudar
Hi Josh,

No, we had pretty light import usage since the upgrade to 5.0. One of the big projects that is using OMERO is current re-grouping to assess how to proceed.

I'll attach a Indexer-0.log. It starts with the first start of my new 5.0.1 install, then shows all the errors I reported before, then shows where I increased the heap memory around 16:30 yesterday and did "omero admin restart" and then shows no more actual out-of-memory crashes but there are lots of warnings, errors, etc.
Due to the large size of the Log, I snipped out a large chunk of repeated errors between 2:30 and 6:00 this morning.

Cheers,
- Damir

Re: Lucene Merge Thread - Java memory error

PostPosted: Mon May 12, 2014 7:04 pm
by jmoore
Hi Damir,

could you do me a favor and as an admin run:
Code: Select all
bin/omero hql --admin "select o from OriginalFile o where o.id = 52283"


Other than the elapsed time warnings, the most worrisome warnings in the log since you've increased the memory seem to be:
Code: Select all
2014-05-09 02:07:53,059 INFO  [        ome.services.fulltext.FileParser] (2-thread-1) File too large for indexing. Skipping: /data/OMERO/Files/Dir-052/52283
2014-05-09 02:07:53,059 INFO  [        ome.services.fulltext.FileParser] (2-thread-1) File too large for indexing. Skipping: /data/OMERO/Files/Dir-052/52283
2014-05-09 02:07:54,084 INFO  [        ome.services.fulltext.FileParser] (2-thread-1) File too large for indexing. Skipping: /data/OMERO/Files/Dir-052/52286
2014-05-09 02:07:54,085 INFO  [        ome.services.fulltext.FileParser] (2-thread-1) File too large for indexing. Skipping: /data/OMERO/Files/Dir-052/52286


So it would be good to know what those are. I assume otherwise the Indexer is still making progress?

Cheers,
~Josh.

Re: Lucene Merge Thread - Java memory error

PostPosted: Tue May 13, 2014 4:12 pm
by dsudar
Hi Josh,
Output from the hql call gives:
omero@quid:~/OMERO.server$ bin/omero hql --admin "select o from OriginalFile o where o.id = 52283"
Previously logged in to quid:4064 as joaquincorrea
Server: [quid]
Username: [joaquincorrea]dsudar
Password:
Created session 7c6189a2-5259-49a7-8af4-6851f463909f (dsudar@quid:4064). Idle timeout: 10.0 min. Current group: Microbes
# | Class | Id | mimetype | hash | name | details | mtime | path | hasher | size
---+---------------+-------+-----------------+------------------------------------------+--------------+-----------------+--------------------------+--------------+----------------------+-----------
0 | OriginalFileI | 52283 | application/pdf | 805cbb275fbe971887a1a5f80bde0dbe09845736 | cells502.pdf | owner=3;group=3 | Wed Apr 2 11:27:10 2014 | cells502.pdf | ChecksumAlgorithmI:6 | 426246394
(1 row)

To see details for object, enter line number.
To move ahead one page, enter 'p'
To re-display list, enter 'r'.
To quit, enter 'q' or just enter.


and when asking for details on object 0 I get:

id = 52283
annotationLinksCountPerOwner = {}
annotationLinksLoaded = False
annotationLinksSeq = []
atime = None
ctime = None
hash = '805cbb275fbe971887a1a5f80bde0dbe09845736'
hasher = 'ChecksumAlgorithmI:6'
loaded = True
mimetype = 'application/pdf'
mtime = 'Wed Apr 2 11:27:10 2014'
name = 'cells502.pdf'
path = 'cells502.pdf'
pixelsFileMapsCountPerOwner = {}
pixelsFileMapsLoaded = False
pixelsFileMapsSeq = []
size = 426246394
version = None



This file belongs to a colleague and I can ask him what that file might be.

According to the Indexer-0.log the indexer ran until May 10 at night and then conked out with the following final gasps:
2014-05-10 21:45:36,579 INFO [ ome.services.fulltext.FullTextIndexer] (2-thread-1) INDEXED 50 objects in 1 batch(es) [272453 ms.]
2014-05-10 21:45:36,581 ERROR [ ome.services.util.ServiceHandler] (2-thread-1) Method interface ome.services.util.Executor$Work.doWork invocation took 272459
2014-05-10 21:52:29,974 ERROR [ ome.services.util.ServiceHandler] (2-thread-1) Method interface ome.services.util.Executor$Work.doWork invocation took 40927
2014-05-10 21:53:46,259 ERROR [ ome.services.util.ServiceHandler] (2-thread-1) Method interface ome.services.util.Executor$Work.doWork invocation took 76094
2014-05-10 21:56:02,421 ERROR [ ome.services.util.ServiceHandler] (2-thread-2) Method interface ome.services.util.Executor$Work.doWork invocation took 131966
2014-05-10 21:56:39,373 ERROR [ ome.services.util.ServiceHandler] (2-thread-2) Method interface ome.services.util.Executor$Work.doWork invocation took 20091
2014-05-10 21:59:34,238 ERROR [ ome.services.util.ServiceHandler] (2-thread-4) java.lang.Error: Wrapped Exception: (java.lang.OutOfMemoryError):
Java heap space
java.lang.OutOfMemoryError: Java heap space
2014-05-10 21:59:34,238 WARN [ ome.services.util.ServiceHandler] (2-thread-4) Unknown exception thrown.

java.lang.OutOfMemoryError: Java heap space
2014-05-10 21:59:34,413 ERROR [ ome.services.util.ServiceHandler] (2-thread-4) Method interface ome.services.util.Executor$Work.doWork invocation took 836441
2014-05-10 21:59:34,414 ERROR [ ome.services.util.ServiceHandler] (2-thread-3) Method interface ome.services.util.Executor$Work.doWork invocation took 62136
2014-05-10 21:59:34,416 ERROR [ org.quartz.core.JobRunShell] (2-thread-4) Job DEFAULT.fullTextThreadRun threw an unhandled Exception:
org.springframework.scheduling.quartz.JobMethodInvocationFailedException: Invocation of method 'run' on target class [class ome.services.fulltext.FullTextThread] failed; nested exception is ome.conditions.InternalException: Wrapped Exception: (java.lang.OutOfMemoryError):
Java heap space
at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:273) ~[org.springframework.context.support.jar:3.0.1.RELEASE-A]
at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) ~[org.springframework.context.support.jar:3.0.1.RELEASE-A]
at org.quartz.core.JobRunShell.run(JobRunShell.java:203) ~[quartz.jar:1.5.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [na:1.7.0_11]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [na:1.7.0_11]
at java.lang.Thread.run(Thread.java:722) [na:1.7.0_11]
Caused by: ome.conditions.InternalException: Wrapped Exception: (java.lang.OutOfMemoryError):
Java heap space
2014-05-10 21:59:34,417 ERROR [ org.quartz.core.ErrorLogger] (2-thread-4) Job (DEFAULT.fullTextThreadRun threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception.
at org.quartz.core.JobRunShell.run(JobRunShell.java:214) ~[quartz.jar:1.5.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [na:1.7.0_11]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [na:1.7.0_11]
at java.lang.Thread.run(Thread.java:722) [na:1.7.0_11]
2014-05-10 21:59:36,038 INFO [ ome.security.basic.CurrentDetails] (2-thread-1) Adding log:INSERT,class ome.model.meta.Session,8963
2014-05-10 21:59:47,386 INFO [ ome.services.fulltext.FullTextIndexer] (2-thread-1) INDEXED 5 objects in 1 batch(es) [10650 ms.]
2014-05-10 21:59:47,388 WARN [ ome.services.util.ServiceHandler] (2-thread-1) Method interface ome.services.util.Executor$Work.doWork invocation took 10657
2014-05-10 22:04:10,299 ERROR [ ome.services.util.ServiceHandler] (2-thread-4) Method interface ome.services.util.Executor$Work.doWork invocation took 20625
2014-05-10 22:06:17,455 ERROR [ ome.services.util.ServiceHandler] (2-thread-1) Method interface ome.services.util.Executor$Work.doWork invocation took 26130
2014-05-10 22:09:47,261 ERROR [ ome.services.util.ServiceHandler] (2-thread-6) Method interface ome.services.util.Executor$Work.doWork invocation took 117315
2014-05-10 22:10:29,396 ERROR [ ome.services.util.ServiceHandler] (2-thread-6) Method interface ome.services.util.Executor$Work.doWork invocation took 42134
2014-05-10 22:11:01,325 INFO [ ome.services.fulltext.FullTextIndexer] (2-thread-5) INDEXED 50 objects in 1 batch(es) [673725 ms.]
2014-05-10 22:11:01,507 ERROR [ ome.services.util.ServiceHandler] (2-thread-5) Method interface ome.services.util.Executor$Work.doWork invocation took 673911


One thing I noticed is that I'm still running a fairly old Java (version 1.7.0_11). Any concerns with that?

Thanks,
- Damir

Re: Lucene Merge Thread - Java memory error

PostPosted: Tue May 13, 2014 4:28 pm
by dsudar
Hi Josh,

Forgot to mention but I'm running another instance of OMERO on a server at OHSU (Portland, OR) and that instance shows almost the identical behavior with regards to the Lucene indexer. That system is still at 5.0.0.
But that has a completely different set of users with a very different set of image data. The indexer on that system also seems to have decided to conk out around April 8.

Thanks,
- Damir

Re: Lucene Merge Thread - Java memory error

PostPosted: Wed May 14, 2014 6:44 pm
by jmoore
Hi Damir,

thanks for the info. 1.7.0_11 is the Java version you were using with OMERO4? Regarding the second system: what are the memory settings and which Java version is in use?

Cheers,
~Josh

Re: Lucene Merge Thread - Java memory error

PostPosted: Wed May 14, 2014 9:43 pm
by dsudar
Yes, on my server at LBNL I used to have OMERO4 and Java 1.7.0_11 and upgraded to OMERO5 in February and then to 5.0.1 last week.

On the OHSU server (currently at 5.0.0): Java 1.7.0_17 and still running with the default 256M assigned to the Indexer.

Thanks,
- Damir