Page 1 of 2

Moving large project to another group

PostPosted: Tue Jan 28, 2014 3:20 pm
by Sethur
Hi,

recently, we deployed OME 5.0.0-rc1 in our CFU on a Ubuntu 13.10 machine. Server and OMERO.web deployment worked fine up to now, but when we tried moving a relatively large project containing approx. 20 data sets with altogether about 1000 Leica LIF files making up ~14 GiB, we ran into problems.

We used both OMERO.insight and OMERO.web to issue the move command but we had to cancel both operations because the whole process took extremely long and the user does not get any feedback on the remaining time for the process.

I restarted the moving process now (via OMERO.web), and watching the server logs I see Blitz-0.log growing continuously with lots of chgrp lines. It's already at 30 MB, so I cannot attach the whole file (did a tail -n 10000 now).

Our storage is mounted on a fast RAID6 machine with 12 15000 rpm HDDs, so disk speed should not be an issue here. Also, I would be surprised if any files would have to be physically copied in such a "move between groups" process.

Is this behavior normal? Any suggestions would be most welcome! Log files are attached.

Code: Select all
================================================================================
OMERO Diagnostics 5.0.0-rc1-ice34-b10
================================================================================

Commands:   java -version                  1.7.0     (/usr/bin/java)
Commands:   python -V                      2.7.5     (/usr/bin/python)
Commands:   icegridnode --version          3.4.2     (/usr/bin/icegridnode)
Commands:   icegridadmin --version         3.4.2     (/usr/bin/icegridadmin)
Commands:   psql --version                 9.1.11    (/usr/bin/psql -- 2 others)

Server:     icegridnode                    running
Server:     Blitz-0                        active (pid = 110753, enabled)
Server:     DropBox                        active (pid = 110767, enabled)
Server:     FileServer                     active (pid = 110768, enabled)
Server:     Indexer-0                      active (pid = 110790, enabled)
Server:     MonitorServer                  active (pid = 110769, enabled)
Server:     OMERO.Glacier2                 active (pid = 110770, enabled)
Server:     OMERO.IceStorm                 active (pid = 110772, enabled)
Server:     PixelData-0                    active (pid = 110794, enabled)
Server:     Processor-0                    active (pid = 110778, enabled)
Server:     Tables-0                       active (pid = 110784, enabled)
Server:     TestDropBox                    inactive (enabled)

OMERO:      SSL port                       4064
OMERO:      TCP port                       4063

Log dir:    /usr/local/share/omero/OMERO.server/var/log exists

Log files:  Blitz-0.log                    23.0 MB       errors=0    warnings=142
Log files:  DropBox.log                    1.0 KB
Log files:  FileServer.log                 0.0 KB
Log files:  Indexer-0.log                  80.0 KB
Log files:  MonitorServer.log              1.0 KB
Log files:  OMEROweb.log                   223.0 KB      errors=0    warnings=5
Log files:  OMEROweb_request.log           0.0 KB
Log files:  PixelData-0.log                2.0 KB
Log files:  Processor-0.log                1.0 KB
Log files:  Tables-0.log                   0.0 KB
Log files:  TestDropBox.log                n/a
Log files:  master.err                     0.0 KB
Log files:  master.out                     0.0 KB
Log files:  Total size                     23.43 MB


Environment:OMERO_HOME=(unset)
Environment:OMERO_NODE=(unset)
Environment:OMERO_MASTER=(unset)
Environment:OMERO_TEMPDIR=/tmp
Environment:PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/lib/jvm/default-java/bin:/usr/share/Ice-3.4.2:/usr/lib/postgresql/9.1/bin:/usr/local/share/omero/OMERO.server/bin:/usr/lib/jvm/default-java/bin:/usr/share/Ice-3.4.2:/usr/lib/postgresql/9.1/bin:/usr/local/share/omero/OMERO.server/bin:/usr/lib/jvm/default-java/bin:/usr/share/Ice-3.4.2:/usr/lib/postgresql/9.1/bin:/usr/local/share/omero/OMERO.server/bin
Environment:ICE_HOME=/usr/share/Ice-3.4.2
Environment:LD_LIBRARY_PATH=/usr/share/java:/usr/lib:/usr/share/java:/usr/lib:/usr/share/java:/usr/lib:
Environment:DYLD_LIBRARY_PATH=(unset)

OMERO data dir: '/srv/omero_data'       Exists? True    Is writable? True
OMERO temp dir: '/tmp/omero/tmp'        Exists? True    Is writable? True   (Size: 0)
OMERO.web status... [RUNNING] (PID 111626)
omero@romulus:/usr/local/share/omero/OMERO.server/var/log$

Re: Moving large project to another group

PostPosted: Tue Jan 28, 2014 3:53 pm
by jmoore
Hi Tristan,

At the top of your Blitz-0.log, there would be info on the memory settings:

Code: Select all
$ grep -i mem var/log/Blitz-0.log
2014-01-28 15:42:37,813 INFO  [      ome.services.util.JvmSettingsCheck] (      main) Max Memory (MB):   =    455
2014-01-28 15:42:37,814 INFO  [      ome.services.util.JvmSettingsCheck] (      main) OS Memory (MB):    =  24065


How much RAM did you give to the various services? See https://www.openmicroscopy.org/site/sup ... y-settings for relevant docs section.

Cheers,
~Josh

Re: Moving large project to another group

PostPosted: Wed Jan 29, 2014 11:27 am
by Sethur
Hi Josh,

thanks for your reply. You are right, I overlooked the section regarding memory settings in the installation manual. Apparently, 455 MB are not enough for this operation. I will try again with a higher setting.

In that regard, I was also curious what happens when you cancel a "move between groups" operartion bevor it is finished. Can this lead to a corrupted database?

Cheers,

Tristan

Here is the relevant part of the Blitz-0.log:
Code: Select all
2014-01-28 15:21:13,941 INFO  [      ome.services.util.JvmSettingsCheck] (      main) Max Memory (MB):   =    455
2014-01-28 15:21:13,944 INFO  [      ome.services.util.JvmSettingsCheck] (      main) OS Memory (MB):    =  15944

Re: Moving large project to another group

PostPosted: Wed Jan 29, 2014 11:30 am
by jmoore
The entire move operation takes place in a single transaction which is the cause for the high-memory usage. Corruption shouldn't be possible, but it is possible for objects that you are looking at in the UI to disappear. A refresh will usually clear up the confusion.

Cheers,
~Josh

Re: Moving large project to another group

PostPosted: Wed Jan 29, 2014 5:08 pm
by Sethur
Hi Josh,

I now increased the maximum heap memory to 8 GB and the maximum persistent memory to 512 MB. We tried the move operation again and it occupies 25% of our total CPU power + around 4 GB of memory. While the process is running, I am not able to log in with another user, i.e. the server gets unresponsive.

As I am writing this, the operation is still ongoing (running about 15 mins now). Why does it take so long to change some group links in a database? Were does all the CPU time go? Does the system recreate every thumbnail or re-analyse image metadata?

Cheers,

Tristan

Re: Moving large project to another group

PostPosted: Thu Jan 30, 2014 10:37 am
by Sethur
PS: I tried another move today with a smaller project of around 2.5 GB of 100 large TIFF files. This took ~ 10 minutes and the server did not become unresponsive. After the move, I deleted the Project, which took around 1 minute and led to a very high CPU load on the server (2 cores completely occupied). I understand that people rarely delete their projects, so this is probably not a problem on a productive system, but I still do not quite understand where this high demand in CPU power comes from. There might be the case of multiple users deleting projects at the same time. Right now, this would probably again lead to the server becoming unresponsive for any other users in the system.

Re: Moving large project to another group

PostPosted: Thu Jan 30, 2014 11:18 am
by jmoore
Sethur wrote:Hi Josh,


Hi Tristan,

there's really not much more to say then this is something we're working on. This falls again under https://trac.openmicroscopy.org.uk/ome/ticket/11779. These graph deletes/moves were implemented using PostgreSQL savepoints. With very large graphs, unfortunately, there's a huge overhead that we'd all like to see fixed. It's currently targeted for the 5.1.0 release later this year.

It would be good to know if the two moves you describe (and the delete) were all successful from your point of view. Looking at the Blitz-0.log* server logs (from var/log/) might also be useful if you'd be willing to upload them.

Cheers,
~Josh

Re: Moving large project to another group

PostPosted: Sat Feb 01, 2014 1:12 pm
by Sethur
Hi Josh,

the very large move was partly successful. Apparently, 3 datasets have not been moved, but the rest worked after some hours. The somewhat smaller move with 2.5 GB of TIFFs was also successful, I think it took around 20 minutes.

I uploaded our current Blitz logs (171 MB) to my Dropbox, it's 7 MB compressed:
https://dl.dropboxusercontent.com/u/30928669/Blitz-0.log.bz2

Cheers,

Tristan

Re: Moving large project to another group

PostPosted: Mon Feb 03, 2014 1:15 pm
by jmoore
I haven't found anything in particular in your log file yet. The 100 TIFFs that you tried to move? Are they all similar in terms of the amount of metadata? If so, could you upload one (to dropbox, zipped here or to http://qa.openmicroscopy.org.uk/qa/upload/) for me to try to reproduce the timings?

Cheers,
~Josh.

Re: Moving large project to another group

PostPosted: Wed Feb 05, 2014 3:16 pm
by Sethur
Hi Josh,

for the TIFF files, I used an identical copy of some Hubble NASA image similar or identical to this one http://imgsrc.hubblesite.org/hu/db/images/hs-2004-32-d-full_tif.tif. I can look up which one I used exactly if that is important. I just batch copied the image 100 times and uploaded those copies to OMERO, then I issued the move and delete project commands.

Cheers,

Tristan