Page 1 of 1

Heavy "omero export" load crashes OMERO

PostPosted: Sat Jan 26, 2019 12:34 am
by dsudar
Hi team,

I have a user who wants to retrieve (in ome-tiff format) thousands of images (stored inside nd2 format files) from my OMERO 5.4.8 server using the CLI (omero export) for downstream analysis. These are 1600 by 1600 16-bit by 3 (or 4) channel images. In order to accelerate the process he is issuing many (up to 64) parallel requests and that crashes the server (or makes it unresponsive). I have uploaded a Blitz log to the QA site (#27268). The highly parallel queries started around 16:30 on 23 January and you'll see many errors after that time and again in the early morning of 24 Jan. Besides telling the user to limit the number of parallel queries, does anything point at a configuration change I can make or an issue that can/needs to be resolved? The machine running the server is pretty well loaded with memory and doesn't seem to show extremely high load with (h)top when these queries are running.

Thanks,
- Damir

Re: Heavy "omero export" load crashes OMERO

PostPosted: Mon Jan 28, 2019 11:00 am
by mtbc
Dear Damir,

It's my guess that the export threads are running at USER priority and keeping Blitz's entire thread pool busy: the server would be unresponsive because new requests will be blocked waiting for a thread from the pool to become available.

The server has configuration settings omero.threads.max_threads and omero.threads.background_threads. It would be great if your user could limit parallel requests to comfortably below the level of max_threads.

Sometimes omero.db.poolsize is also a concern: you may want to set that generously if you start to see database connection issues.

An alternative could be to simply provide the user a copy of the original nd2 files from the managed repository and have them run bfconvert locally with whatever parallelism they like.

Cheers,
Mark

Re: Heavy "omero export" load crashes OMERO

PostPosted: Mon Jan 28, 2019 5:50 pm
by dsudar
Hi Mark,

Thanks for the quick response. Indeed the max_threads was at the default 50 while the user was trying to run 64 requests at the same time. So I'll at least double max_threads (to 100) and background_threads (to 20).I'm a bit confused about db.poolsize: in the note on page: https://docs.openmicroscopy.org/omero/5 ... onfig.html
it says that the default is 10 (which I hadn't changed) but that it should be a larger value than max_threads? I'll increase that value to 50 and also increase the postgresql.conf max_connections value to 200 from the original 100. Are there some good rules-of-thumb what good values should be?

And yes, I've asked the user to limit his parallel requests.

Your suggestion to work directly with the nd2's makes sense except we have our entire workflow built around managing the images by their OMERO image_id and all metadata and analysis results are keyed on that. So that would be a huge amount of work to redo.

Cheers,
- Damir

Re: Heavy "omero export" load crashes OMERO

PostPosted: Tue Jan 29, 2019 10:00 am
by jmoore
Hi Damir,

for clarity, the exception:

Code: Select all
2019-01-24 11:03:08,689 INFO  [        ome.services.util.ServiceHandler] (.Server-16)  Excp:   ome.conditions.DatabaseBusyException: Cannot acquire connection


in your log points to the db pool as being the first bottleneck. What's happening is that long-running transactions have blocked all DB access at which point OMERO can't do any work. I'd typically except OMERO to eventually recover from the DoS but it can take some time.

Increasing poolsize (below) in this particular case is likely to help since the user is trying 64 exports, but higher numbers will show a similar issue. On the development side, exports need to be moved to a background thread which can be more strongly controlled (limit of 5 or 10). One choice we've made ourselves with the IDR on the sysadmin side is to limit the absolute time of transactions to prevent DoS'ing. You can see this configuration in idr/deployment. Do note however that this could impact other users' calls as well.

dsudar wrote:Thanks for the quick response. Indeed the max_threads was at the default 50 while the user was trying to run 64 requests at the same time. So I'll at least double max_threads (to 100) and background_threads (to 20).


With the addition of more backgrounded activities (including export), we'll provide more advice on these settings. In general, I wouldn't expect these to particularly apply in this situation, not the least of which due to the additional Ice settings:

Code: Select all
    <properties id="MultiThreaded">
      <property name="Ice.ThreadPool.Client.Size" value="2"/>
      <property name="Ice.ThreadPool.Client.SizeMax" value="50"/>
      <property name="Ice.ThreadPool.Server.Size" value="10"/>
      <property name="Ice.ThreadPool.Server.SizeMax" value="100"/>
    </properties>


I'm a bit confused about db.poolsize: in the note on page: https://docs.openmicroscopy.org/omero/5 ... onfig.html
it says that the default is 10 (which I hadn't changed) but that it should be a larger value than max_threads? I'll increase that value to 50 and also increase the postgresql.conf max_connections value to 200 from the original 100. Are there some good rules-of-thumb what good values should be?


Really, the higher the better. Each postgresql connection requires resources which is why these defaults are as low as they are. If your postgresql server is happy running with 200 then that seems like a good value, and I'd imagine you can give 80%+ to OMERO via omero.db.pool_size as long as postgresql is not being used for any other production resources.

Cheers,
~Josh

Re: Heavy "omero export" load crashes OMERO

PostPosted: Wed Jan 30, 2019 11:22 pm
by dsudar
Thanks Josh,

I upped some of those config values as you suggest and we'll see how that works for us.

At this time I don't want to lower the transaction timeout since we are indeed running a fair number of queries and scripts that legitimately take a long time.

Moving exports to a background thread sounds good practice for most general purpose servers. This one of ours is dedicated to this one LINCS project so I'm less concerned about a heavy export task blocking significant resources.

Thanks,
- Damir