We're Hiring!

large ROI import - Indexer very busy for days afterwards

General user discussion about using the OMERO platform to its fullest. Please ask new questions at https://forum.image.sc/tags/omero
Please note:
Historical discussions about OMERO. Please look for and ask new questions at https://forum.image.sc/tags/omero

There are workflow guides for various OMERO functions on our help site - http://help.openmicroscopy.org

You should find answers to any basic questions about using the clients there.

large ROI import - Indexer very busy for days afterwards

Postby dsudar » Thu Oct 19, 2017 6:50 pm

Hi team,

Using the ROI import tool from Glencoe (part of the OMERO-CP Connector package) we routinely import ROIs from our analysis runs into OMERO. I use the feature where it stores polygon ROIs created from the masks for both nuclei and cells. This works great and is extremely helpful. I noticed however that after importing a batch of ROI masks (in the order 5M - 10M masks, from 8 plates with 5600 images each, 100-200 nuclei/cells per image), the Indexer process is extremely busy for the following multiple days, frequently fully loading the system:
htop.png
htop.png (39.89 KiB) Viewed 4155 times


These heavy loads are brief (3-4 seconds) but happen every 10-20 seconds or so. I already have given the Indexer a lot of memory (-Xmx7200m) so unsure if there's anything else I can do. I do not see anything concerning in the Indexer-0.log, just the hourly "heartbeat". While everything keeps working fine, I'm a bit worried that in production mode (where we will have 2-3 of those batches every week) the whole thing may come to a grinding halt.

Thanks for any insights or pointers,
- Damir
dsudar
 
Posts: 235
Joined: Mon May 14, 2012 8:43 pm
Location: Berkeley, CA, USA

Re: large ROI import - Indexer very busy for days afterwards

Postby cxallan » Mon Oct 23, 2017 12:51 pm

With that many objects being inserted in a short time frame I don't know that there is much you can do other than make a trade off by excluding certain objects from indexing. ROIs are already excluded from indexing by default.

Are you seeing particular objects getting indexed after one of these operations or perhaps a smaller batch?
cxallan
Site Admin
 
Posts: 509
Joined: Fri May 01, 2009 8:07 am

Re: large ROI import - Indexer very busy for days afterwards

Postby dsudar » Wed Nov 01, 2017 10:23 pm

Hi Chris,

I agree that it's a lot of objects being inserted but was hoping that this still falls within the design parameters. I just ran another smaller batch of 5600 3-channel 1600 by 1600 images in 8 wells (700 per well) which had somewhere between 200 and 300 cells per image and both the nuclear and cytoplasm masks were imported as a polygon ROI. The import itself took about 30 minutes so that was pretty fast. Then it was busy with indexing for 2.5-3 hours afterwards. How can I check what exactly (any of the ROI objects?) is actually being indexed and how do I make sure that those ROIs are not indexed?

The indexing issue is only one of the potential issues I noticed with the large numbers of ROIs being generated in a workflow such as what I'm doing. The other issue is a very substantial increase in size of the postgres database: in the example I described, the postgres db's directory increased in size from 104GB to 108GB. Before I started importing these type of segmentation masks a few weeks ago, the typical dump of my db was around 1.5GB; now after a few batches of ROI imports, the dump is over 10GB. Am I looking at a scaling problem here?

Cheers,
- Damir
dsudar
 
Posts: 235
Joined: Mon May 14, 2012 8:43 pm
Location: Berkeley, CA, USA

Re: large ROI import - Indexer very busy for days afterwards

Postby mtbc » Fri Nov 03, 2017 9:30 am

Damir,

I'm not yet sure if this matters but, just in case, which version of OMERO is this on?

Cheers,
Mark
User avatar
mtbc
Team Member
 
Posts: 282
Joined: Tue Oct 23, 2012 10:59 am
Location: Dundee, Scotland

Re: large ROI import - Indexer very busy for days afterwards

Postby cxallan » Fri Nov 03, 2017 10:37 am

It definitely falls within the design parameters. You would be able to find out more about what is indexed, and skipped, by adjusting the debug levels in the appropriate logback.xml file.

When you're inserting that number of rows you are definitely going to see the PostgreSQL data directory increase in size. Your dumps are also going to increase in size. Your database restore times will also grow.

With the structure of the shape and roi tables you will be storing a minimum of somewhere between 244-276 bytes per roi-shape combination. More for the polygons. So before overhead, PostgreSQL page alignment padding, indexes, actual polygon data and several other factors you would be storing at least ~521MB of data.

This assumes:

  • a conservative number of 2,240,000 ROIs; 5600 images * 200 cells * 2 (nuclear and cytoplasm)
  • 244 bytes per roi-shape combination
I would argue this is not a scaling problem but rather the realities of scaling in the first place. You will have to make a decision as to the utility of the mask decomposition into polygons against storage and computational investment for your use case.

At Glencoe we are actively working on additional functionality for masks with OMERO and hope to have a proposal for the community in the coming weeks that touches on:

  • storing compressed masks in OMERO
  • n-bit masks
  • visualization features and efficiency
cxallan
Site Admin
 
Posts: 509
Joined: Fri May 01, 2009 8:07 am

Re: large ROI import - Indexer very busy for days afterwards

Postby mtbc » Fri Nov 03, 2017 11:31 am

Dear Damir,

I am afraid I have ignorance of Glencoe's products but do you know what these imports look like to OMERO? My impression is that you are saying that bulk saves of ROIs with Polygon shapes onto existing images work okay but that with masks it causes the indexer much trouble? These would be masks with a null pixels property, instead carrying their information via bytes? (The shape table in the database has corresponding columns for pixels and bytes.) I am wondering why the indexer becomes so exercised because I am failing to reproduce that locally either for polygons or masks. Are any of your server's omero.search.* configuration options changed from their defaults?

In amplifying Chris' point, be warned that the polygons are stored in a textual format in the shape table's points column in the database. This could certainly become large but would not ordinarily trouble the indexer as far as I can tell. I would be tempted to suggest that while the indexer is busy you try briefly including in your etc/logback-indexing.xml the line,
Code: Select all
<logger name="ome.services.fulltext" level="DEBUG"/>

to see what it is indexing but you may end up with a rather large log.

Cheers,
Mark
User avatar
mtbc
Team Member
 
Posts: 282
Joined: Tue Oct 23, 2012 10:59 am
Location: Dundee, Scotland

Re: large ROI import - Indexer very busy for days afterwards

Postby dsudar » Fri Nov 03, 2017 8:00 pm

Hi Mark and Chris,

Answering Mark's earlier question: this behavior happened in 5.3.4 and now that I upgraded to 5.4.0, it's still there.

Thanks Chris for the explanation. Indeed, I was wondering how much space each polygon ROI occupies and things do add up as you laid out. The main reason I use polygons rather than masks is for the visualization: empty polygons do not obscure the underlying pixels while masks do. I'll do a test with masks instead to see the difference in database space.

I agree that what I'm doing is using the system at scale and noticing that scaling up uses more resources. I was mostly wondering whether I'm heading for problems with my approach.

Very interesting and happy to hear that you are working on new developments in this space. All 3 areas you mention would be relevant to this. If indeed storing masks rather than polygons can be much more efficient in space use especially per your points 1 and 2, then having a visualization that shows the masks as contours would be perfect.

To Mark's questions: Chris and Emil know best how the Glencoe Import_ROIs.py (and "backend" code) functionality exactly works but I believe it's simply using the Python API to create ROI objects and "attaching" those to images already in OMERO. And it does this very many times for very many ROIs. The heavy Indexer load happens both when these ROIs are masks or polygons. No real difference between those 2 options but I haven't yet carefully looked at a difference in database growth between those options. I'll do so today.

I'll also turn on DEBUG logging for the Indexer to see what it's doing.

Thanks,
- Damir
dsudar
 
Posts: 235
Joined: Mon May 14, 2012 8:43 pm
Location: Berkeley, CA, USA

Re: large ROI import - Indexer very busy for days afterwards

Postby dsudar » Fri Nov 03, 2017 8:08 pm

Oh, and no, I did not change the omero.search.* config options.
- Damir
dsudar
 
Posts: 235
Joined: Mon May 14, 2012 8:43 pm
Location: Berkeley, CA, USA

Re: large ROI import - Indexer very busy for days afterwards

Postby dsudar » Sun Nov 05, 2017 8:57 pm

Another update: as both of you suggested, storing the ROIs as masks rather than as polygons DOES reduce the footprint both in size of the DB and load/time of the Indexer significantly. For my typical sized datasets (described above), the growth in the DB is about 850MB using masks vs. 4GB for polygons. And contrary to what I indicated before, the Indexer is busy for much shorter after importing masks vs. polygons.

I did use the DEBUG directive to see what the indexer is doing when it's so busy and it appears to just be indexing the images that just had ROIs attached per:
Code: Select all
2017-11-04 11:01:56,328 INFO  [   ome.services.fulltext.FullTextIndexer] (2-thread-1) INDEXED   14 objects in batch#23597  [   2300 ms.]  ~100% done (133246035 of 133246035)
2017-11-04 11:01:57,674 INFO  [   ome.services.fulltext.FullTextIndexer] (2-thread-3) INDEXED   14 objects in batch#23598  [   1327 ms.]  ~100% done (133247726 of 133247726)
2017-11-04 11:02:00,217 INFO  [   ome.services.fulltext.FullTextIndexer] (2-thread-2) INDEXED   15 objects in batch#23599  [   2195 ms.]  ~100% done (133248438 of 133248438)
2017-11-04 11:03:08,485 INFO  [   ome.services.fulltext.FullTextIndexer] (2-thread-4) INDEXED   14 objects in batch#23600  [   2266 ms.]  100% done (133248865 of 133248865)
2017-11-04 11:03:08,487 ERROR [        ome.services.util.ServiceHandler] (2-thread-4) Method interface ome.services.util.Executor$Work.doWork invocation took 68257
2017-11-04 11:05:17,032 INFO  [   ome.services.fulltext.FullTextIndexer] (2-thread-5) INDEXED  387 objects in batch#23601  [ 128390 ms.]  ~100% done (133280998 of 133280998)
2017-11-04 11:05:17,046 ERROR [        ome.services.util.ServiceHandler] (2-thread-5) Method interface ome.services.util.Executor$Work.doWork invocation took 128456
2017-11-04 11:08:30,161 INFO  [   ome.services.fulltext.FullTextIndexer] (2-thread-2) INDEXED  474 objects in batch#23602  [ 192909 ms.]  ~100% done (133355473 of 133355473)
2017-11-04 11:08:30,175 ERROR [        ome.services.util.ServiceHandler] (2-thread-2) Method interface ome.services.util.Executor$Work.doWork invocation took 192953
2017-11-04 11:16:04,773 INFO  [   ome.services.fulltext.FullTextIndexer] (2-thread-1) INDEXED  604 objects in batch#23603  [ 454264 ms.]  ~100% done (133474558 of 133474558)
2017-11-04 11:16:04,819 ERROR [        ome.services.util.ServiceHandler] (2-thread-1) Method interface ome.services.util.Executor$Work.doWork invocation took 454403
2017-11-04 11:31:20,664 DEBUG [   ome.services.fulltext.FullTextIndexer] (2-thread-3) Indexed: ome.model.core.Image:Id_2825860
2017-11-04 11:31:22,259 DEBUG [   ome.services.fulltext.FullTextIndexer] (2-thread-3) Indexed: ome.model.core.Image:Id_2825726
2017-11-04 11:31:24,014 DEBUG [   ome.services.fulltext.FullTextIndexer] (2-thread-3) Indexed: ome.model.core.Image:Id_2825301
2017-11-04 11:31:25,592 DEBUG [   ome.services.fulltext.FullTextIndexer] (2-thread-3) Indexed: ome.model.core.Image:Id_2825497
2017-11-04 11:31:27,490 DEBUG [   ome.services.fulltext.FullTextIndexer] (2-thread-3) Indexed: ome.model.core.Image:Id_2825865
2017-11-04 11:31:29,186 DEBUG [   ome.services.fulltext.FullTextIndexer] (2-thread-3) Indexed: ome.model.core.Image:Id_2825328

This log excerpt show one transition from just before and just after turning on the DEBUG logging. It is clearly having a hard time with the indexing considering the ERRORs but it doesn't do anything unexpected I guess.

So as Chris noted, I guess everything that happens is nominal and just an issue of the scale at which I use it. Since the ROI masks are significantly less burdensome on the system, I'll probably switch to that and hope that a future upgrade will have a display option for mask ROIs that only shows contours. Maybe to be considered for iviewer?

Thanks,
- Damir
dsudar
 
Posts: 235
Joined: Mon May 14, 2012 8:43 pm
Location: Berkeley, CA, USA

Re: large ROI import - Indexer very busy for days afterwards

Postby mtbc » Mon Nov 06, 2017 1:29 pm

Dear Damir,

Thank you very much for the information and suggestions. Do let us know if you find that the busy indexer does indeed harm interactive user experience on the server. Separately we will of course keep an eye out on our end in such circumstances. To capture a couple of the other issues I have created Trello cards,


Cheers,
Mark
User avatar
mtbc
Team Member
 
Posts: 282
Joined: Tue Oct 23, 2012 10:59 am
Location: Dundee, Scotland


Return to User Discussion

Who is online

Users browsing this forum: No registered users and 1 guest