import speed
Posted: Wed Feb 02, 2011 5:42 pm
Hi!
Still attempting to import a large number of files we realized that the import of about 100.000 files would take basically almost two days. In order to speed things up, we had to change our strategy. we implemented a job queuing system that would start a specified number parallel import threads. The number of threads I did calculate by CPU use and memory usage but the results I got were not as expected:
1. CPU usage of the importing client is fairly low
2. memory usage of importing clients is rather low as well.
3. CPU usage on the omero server did skyrocket with more imports going on in parallel.
The high CPU usage on the Omero server as due to several postgresql threads each using about 20 to 30% of the servers CPU power becoming soon the limiting factor. I was bale to get a bit more speed by separating the postgresql from the omero server by moving the database to its own machine. But even then, the CPU usage was close to 100% when using 4 imports in parallel.
Why is the postgresql CPU usage so high? I am only importing 1400 files in this particular session. High CPU usage for postgresql usually relates to missing indices resulting in often and time consuming queries. With all the relations between users, projects, datasets, images, annotations, experiments, tags,... are all the indices in the right place? Is there any other reason why the CPU usage of postgresql is skyrocketing?
Any advice on how to improve the postgresql behavior would be greatly appreciated as it would very much improve the speed of importing images into OMERO.
Cheers Juergen
Still attempting to import a large number of files we realized that the import of about 100.000 files would take basically almost two days. In order to speed things up, we had to change our strategy. we implemented a job queuing system that would start a specified number parallel import threads. The number of threads I did calculate by CPU use and memory usage but the results I got were not as expected:
1. CPU usage of the importing client is fairly low
2. memory usage of importing clients is rather low as well.
3. CPU usage on the omero server did skyrocket with more imports going on in parallel.
The high CPU usage on the Omero server as due to several postgresql threads each using about 20 to 30% of the servers CPU power becoming soon the limiting factor. I was bale to get a bit more speed by separating the postgresql from the omero server by moving the database to its own machine. But even then, the CPU usage was close to 100% when using 4 imports in parallel.
Why is the postgresql CPU usage so high? I am only importing 1400 files in this particular session. High CPU usage for postgresql usually relates to missing indices resulting in often and time consuming queries. With all the relations between users, projects, datasets, images, annotations, experiments, tags,... are all the indices in the right place? Is there any other reason why the CPU usage of postgresql is skyrocketing?
Any advice on how to improve the postgresql behavior would be greatly appreciated as it would very much improve the speed of importing images into OMERO.
Cheers Juergen