Page 1 of 1

CLI Ice.ConnectionLostException error

PostPosted: Thu Aug 16, 2012 9:00 am
by ingvar
Hello,

When doing command line import I sometimes get an Ice.ConnectionLostException. It looks like this exception is thrown whenever the negotiation with Ice on how to communicate in OMEROMetadataStoreClient takes more than 6 seconds.

A little bit about the setup. Virtual machines (VMware running RHEL6.2), not much local disk, so using NFS. I am using fastcgi-tcp. I have one test server and two production servers. The code is modified OMERO 4.3.3

On my test server the negotiation step takes 0.5-2 s, and never encountered the problem there. On the production servers, the negotiation step takes from 4 s upwards, sometimes as much as 45 s. As far as I can tell the ConnectionLostException is thrown whenever this step takes more than 6 s. I am not aware of any differences between the test server and the production servers apart from different levels of load. Our systems group access an image every 3 seconds on the production servers, which gives .

Looking at the Blitz-0.log I do not see anything useful.

A couple of questions:
Anyone seen these problem before?
Can I just increase a time limit somewhere?
Is this NFS related?

Example output first from a failed and then succesful case.

[pdb_em@ves-ebi-81 scripts]$ ./importToOmeroProd.bash
/nfs/web-hx/pdbe/em/data/incoming /nfs/web-hx/pdbe/em/scripts
emd_1051.map.gz emd_5448.map.gz
got a map: /nfs/web-hx/pdbe/em/data/incoming/emd_1051.map.gz
Previous session expired for pdb_em on ves-ebi-81.ebi.ac.uk:4064
Created session 3e5be84b-6370-4750-bf6b-c9f06f7feb2f (pdb_em@ves-ebi-81.ebi.ac.uk:4064). Idle timeout: 10.0 min. Current group: public
WARNING:omero.client:Ignoring error in client.__del__:<class 'Glacier2.SessionNotExistException'>
2012-08-15 17:36:06,413 0 [ main] INFO ome.formats.importer.ImportConfig - OMERO Version: 4.3.3-DEV-ice34
2012-08-15 17:36:06,454 41 [ main] INFO ome.formats.importer.ImportConfig - Bioformats version: 4.4-DEV revision: f91d951 date: 12 January 2012
2012-08-15 17:36:06,595 182 [ main] INFO formats.importer.cli.CommandLineImporter - Log levels -- Bio-Formats: ERROR OMERO.importer: INFO
2012-08-15 17:36:07,811 1398 [ main] INFO ome.formats.importer.ImportCandidates - Depth: 4 Metadata Level: MINIMUM
2012-08-15 17:36:09,452 3039 [ main] INFO ome.formats.importer.ImportCandidates - 1 file(s) parsed into 1 group(s) with 1 call(s) to setId in 1635ms. (1640ms total) [0 unknowns]
2012-08-15 17:36:19,579 13166 [ main] ERROR ome.system.UpgradeCheck - Error reading from url: http://upgrade.openmicroscopy.org.uk?ve ... ystems+Inc. "connect timed out"
2012-08-15 17:36:19,962 13549 [ main] INFO ome.formats.OMEROMetadataStoreClient - Attempting initial SSL connection to ves-ebi-81.ebi.ac.uk:4064
2012-08-15 17:36:35,685 29272 <--- 16 s [ main] ERROR formats.importer.cli.CommandLineImporter - Error during import process.
Ice.ConnectionLostException
error = 0
at IceInternal.ConnectRequestHandler.getConnection(ConnectRequestHandler.java:240)
at IceInternal.ConnectRequestHandler.sendRequest(ConnectRequestHandler.java:138)
at IceInternal.Outgoing.invoke(Outgoing.java:66)
at Ice._ObjectDelM.ice_isA(_ObjectDelM.java:30)
at Ice.ObjectPrxHelperBase.ice_isA(ObjectPrxHelperBase.java:111)
at Ice.ObjectPrxHelperBase.ice_isA(ObjectPrxHelperBase.java:77)
at Glacier2.RouterPrxHelper.checkedCast(RouterPrxHelper.java:2268)
at omero.client.getRouter(client.java:695)
at omero.client.createSession(client.java:629)
at omero.client.joinSession(client.java:559)
at ome.formats.OMEROMetadataStoreClient.initialize(OMEROMetadataStoreClient.java:614)
at ome.formats.importer.ImportConfig.createStore(ImportConfig.java:287)
at ome.formats.importer.cli.CommandLineImporter.<init>(CommandLineImporter.java:96)
at ome.formats.importer.cli.CommandLineImporter.main(CommandLineImporter.java:394)


Successful case:
/nfs/web-hx/pdbe/em/data/incoming ~
emd_2153.map.gz emd_2154.map.gz emd_2161.map.gz emd_5448.map.gz
got a map: /nfs/web-hx/pdbe/em/data/incoming/emd_2153.map.gz
WARNING:omero.client:Ignoring error in client.__del__:<class
'Glacier2.SessionNotExistException'>
WARNING:omero.client:Ignoring error in client.__del__:<class
'Glacier2.SessionNotExistException'>
Created session d2748c47-79e7-41f0-86b6-4d2e38f37106
(pdb_em@ves-ebi-80.ebi.ac.uk:4064). Idle timeout: 10.0 min. Current group: public
2012-08-13 12:00:35,846 0 [ main] INFO
ome.formats.importer.ImportConfig - OMERO Version: 4.3.3-DEV-ice34
2012-08-13 12:00:35,871 25 [ main] INFO
ome.formats.importer.ImportConfig - Bioformats version: 4.4-DEV revision: f91d951
date: 12 January 2012
2012-08-13 12:00:35,985 139 [ main] INFO
formats.importer.cli.CommandLineImporter - Log levels -- Bio-Formats: ERROR
OMERO.importer: INFO
2012-08-13 12:00:36,872 1026 [ main] INFO
ome.formats.importer.ImportCandidates - Depth: 4 Metadata Level: MINIMUM
2012-08-13 12:00:38,502 2656 [ main] INFO
ome.formats.importer.ImportCandidates - 1 file(s) parsed into 1 group(s) with 1
call(s) to setId in 1623ms. (1629ms total) [0 unknowns]
2012-08-13 12:00:48,719 12873 [ main] ERROR
ome.system.UpgradeCheck - Error reading from url:
http://upgrade.openmicroscopy.org.uk?ve ... ystems+Inc.
"connect timed out"
2012-08-13 12:00:49,139 13293 [ main] INFO
ome.formats.OMEROMetadataStoreClient - Attempting initial SSL connection to
ves-ebi-80.ebi.ac.uk:4064
2012-08-13 12:00:50,776 14930 <---- 1.7 seconds later so OK [ main] INFO
ome.formats.OMEROMetadataStoreClient - Insecure connection requested, falling back
2012-08-13 12:00:51,913 16067 [ main] INFO
ome.formats.OMEROMetadataStoreClient - Server: 4.3.3
2012-08-13 12:00:51,913 16067 [ main] INFO
ome.formats.OMEROMetadataStoreClient - Client: Beta4.3.3-DEV-ice34
2012-08-13 12:00:51,913 16067 [ main] INFO
ome.formats.OMEROMetadataStoreClient - Java Version: 1.6.0_29
2012-08-13 12:00:51,913 16067 [ main] INFO
ome.formats.OMEROMetadataStoreClient - OS Name: Linux
2012-08-13 12:00:51,913 16067 [ main] INFO
ome.formats.OMEROMetadataStoreClient - OS Arch: amd64
2012-08-13 12:00:51,913 16067 [ main] INFO
ome.formats.OMEROMetadataStoreClient - OS Version: 2.6.32-220.13.1.el6.x86_64
2012-08-13 12:00:52,346 16500 [ main] INFO
ome.formats.importer.ImportLibrary - File format: Medical Research Council
2012-08-13 12:00:52,346 16500 [ main] INFO
ome.formats.importer.ImportLibrary - Base reader: loci.formats.in.MRCReader
2012-08-13 12:00:52,346 16500 [ main] INFO
ome.formats.importer.ImportLibrary - Metadata only import? false
2012-08-13 12:00:52,346 16500 [ main] INFO
ome.formats.importer.ImportLibrary - Archiving enabled? false
2012-08-13 12:00:52,346 16500 [ main] INFO
ome.formats.importer.ImportLibrary - Container metadata only import? false
2012-08-13 12:00:52,346 16500 [ main] INFO
ome.formats.importer.ImportLibrary - Reader is not of HCS domain, use metafile:
true
2012-08-13 12:00:52,488 16642 [ main] WARN
ome.formats.model.ReferenceProcessor - Unknown target class for LSID: Image:0
2012-08-13 12:01:03,653 27807 [ main] INFO
ormats.importer.cli.LoggingImportMonitor - IMPORT_DONE
Imported pixels:
251
2012-08-13 12:01:03,654 27808 [ main] INFO
ome.formats.importer.cli.ErrorHandler - Number of errors: 0

Thankful for for any pointers,
Ingvar

Re: CLI Ice.ConnectionLostException error

PostPosted: Thu Aug 16, 2012 9:25 am
by wmoore
Hi Ingvar,

Could it be omero.sessions.timeout (see http://www.openmicroscopy.org/site/supp ... figuration)?

Although the default is 10 minutes, so I'm not sure that session timeout is your issue (unless you've got this set too short). NB: it's in millisecs (not secs)!

It's certainly the first thing to check anyway.

Code: Select all
$ bin/omero config get           # will show all settings


Let us know if that helps,

Will.

Re: CLI Ice.ConnectionLostException error

PostPosted: Thu Aug 16, 2012 10:06 am
by ingvar
Hi Will,

The session timeout is 10 min (600000 ms). The exception appears to be related to some other timeout.
Config settings below:

[pdb_em@ves-ebi-81 dist]$ bin/omero config get
omero.data.dir=/nfs/web-hx/pdbe/em/repository/ves-ebi-81/OMERO
omero.db.name=EMDBslice
omero.db.pass=xxxxxxxx
omero.db.user=omero
omero.sessions.timeout=600000
omero.web.admins=["Ingvar","ingvar@ebi.ac.uk"]
omero.web.application_host=http://ves-ebi-81.ebi.ac.uk:80/
omero.web.application_server=fastcgi-tcp
omero.web.debug=False
omero.web.email_subject_prefix=OMERO Slice Viewer
omero.web.emdb_public_password=xxxxxx
omero.web.emdb_public_username=pdb_em
omero.web.static_url=/emdb/omero/static/

/Ingvar

Re: CLI Ice.ConnectionLostException error

PostPosted: Fri Aug 17, 2012 8:36 am
by cxallan
That sounds really evil Ingvar. I can think of a few things that might be the problem but we'll probably need all your logs to really have a proper look. Can you ZIP them all up and put them somewhere for us to grab them?

It's totally likely that you're having excessive locking issues on your OMERO binary repository or temporary directories which might be easily solved by moving the FullText and .omero directories off of NFS and on to local storage. They're trivial in size compared to your actual image data I'm sure.

Re: CLI Ice.ConnectionLostException error

PostPosted: Fri Aug 17, 2012 9:48 am
by ingvar
Hi Chris,

I found one thing that looks suspicious. /tmp on my production servers are full of old sessions, about 500000, so that doing an 'ls' was quite slow. The test server on the only hand only had three sessions.
Is there a way to control the lifespan/number of sessions?

With the .omero directories, do you mean the ~omero/tmp/omero_$user/$pid hierarchy. I have moved them to local disk, but I could try to move the FullText too.

I will restart one of the servers, try to reproduce the problem, and zip up the logs from the restart, in order to keep the logs manageable in size.

Cheers,
Ingvar

Re: CLI Ice.ConnectionLostException error

PostPosted: Fri Aug 17, 2012 1:41 pm
by ingvar
Hi again Chris,

I emailed a zipped up log to you.

Just checking, there is no config option for moving the repository sub directories. It is all done by soft links?

Clearing out the /tmp directory had no noticable effect.

http://trac.openmicroscopy.org.uk/ome/wiki/ScalingOmero gave me this piece of advice about ConnectionLost exceptions:

In OmeroBlitz, too many (500+ on the default configuration) simultaneous invocations will result in ConnectionLost exceptions. We are currently working on ways to extend the number of single invocations on one server, but a simpler solution is to start another OmeroBlitz server.

I have a minimum of 20 requests/min, but I would not expect them to persist long enough to reach 500.

There are four time limits set to 5000 msec in etc/omero.properties, any chance that one of them are reached. I am not suggesting that increasing the limit is the right thing to do, but it may be a way to debug what is going on.
omero.threads.idle_timeout=5000
omero.threads.cancel_timeout=5000
omero.throttling.method_time.warn=5000
omero.grid.registry_timeout=5000

Cheers,
Ingvar

Re: CLI Ice.ConnectionLostException error

PostPosted: Mon Aug 20, 2012 8:06 am
by cxallan
Hi Ingvar,

Based on your logs I'd start with doing the following:

  1. Moving your "dist" directory off of NFS
  2. Moving your omero.data.dir/.omero directory off of NFS

ingvar wrote:Just checking, there is no config option for moving the repository sub directories. It is all done by soft links?


Symlink them off, yes.

ingvar wrote:Clearing out the /tmp directory had no noticable effect.


It would be good to have a quick listing of what's there just to see what might be going on.

Just lastly could you push the branch that you're working from somewhere so that I can see what you're working from? We've been improving many things with respect to public access over the past weeks, including various bug fixes that went in to 4.4.0 and 4.4.1.

Re: CLI Ice.ConnectionLostException error

PostPosted: Mon Aug 20, 2012 10:37 am
by ingvar
Hi Chris,

I start to move stuff off NFS as soon as our systems guys give me write access to something more sensible than /tmp on the virtual servers.
This to include:
omero/dist
omero.data.dir/.omero
omero.data.dir/FullText

I will make another try to move forward to 4.4.0/4.4.1, I messed up badly by not fully following Will's instructions last time. Meanwhile my stuff is on github at:

git@github.com:ingvarlagerstedt/openmicroscopy.git, branch sliceviewer_tmp
this based on Will's feature/6736_emdb
Let me know if I need to do anything for you to have access to my stuff.

/tmp mainly contains files with names such as:
sessionid3d79e9b4fa021ddd3af3cd9dba9f836e that each has a 284 byte long string
The files were up to 10 days old.

Thanks,
Ingvar

Re: CLI Ice.ConnectionLostException error

PostPosted: Tue Aug 21, 2012 4:29 pm
by ingvar
Hi Chris,

I have now moved the distribution, and the .omero and FullText directories to local disk. The CLI works a lot better now, but I still see the occasional ConnectionLostException, say in 2-3 % of the imports, rather than the 60-70 % before. The critical step tends to take 2-10 s now, rather than 4-50 s. So, significant improvement, but still not perfect.
I notice that the flooding of /tmp with sessionid files was addressed in http://trac.openmicroscopy.org.uk/ome/ticket/7272, which must have happened after I branched off. I guess the next step is to try to work myself up to version 4.4.1.

Many Thanks,
Ingvar