Page 1 of 2
Batch download of 'public' data & scripts.
Posted:
Wed Mar 19, 2014 12:37 pm
by i.munro
Dear all .
We are hoping to use the web client & a public group to allow us to share data.
We have just realised that the only way to do a batch download is via a script.
Currently our configuration blocks the public user from scripts & our sysadmin has
expressed the following concern ". exposing that script might mean that anyone could trivially perform an effective denial-of-service attack on the server by launching lots of batch exports"
Does anyone have any suggestions ?
Ian
Re: Batch download of 'public' data & scripts.
Posted:
Wed Mar 19, 2014 1:03 pm
by wmoore
If you know ahead of time what you are going to allow 'public' users to download, this could be prepared in advance using a script etc. E.g. prepare a zip of all images in a Dataset and attach it to the Dataset (as Batch_Image_Export does). Then this could be downloaded by public users.
I guess you'd want some way to run this script on any new data once it is ready to go public.
Re: Batch download of 'public' data & scripts.
Posted:
Wed Mar 19, 2014 9:30 pm
by i.munro
Thanks Will. There may be concerns about storage space though. We're now looking at 3 copies of the data on the server, the original, a copy in the public group & a zipped copy.
Do you think it might be possible to add an anti-robot to the batch download script?
Ian
Re: Batch download of 'public' data & scripts.
Posted:
Thu Mar 20, 2014 9:21 am
by manics
At present there's no throttling on the Processor service. If you're feeling adventurous you could try out a multi-node configuration so the Processor is on a different host, and give us any feedback since this is a new addition to the docs:
https://www.openmicroscopy.org/site/sup ... iple-hostsSimon
Re: Batch download of 'public' data & scripts.
Posted:
Thu Mar 20, 2014 1:22 pm
by i.munro
Thanks Simon.
I'll pass that along to our sysadmin in the hope that, unlike me, he understands it.
Ian
Re: Batch download of 'public' data & scripts.
Posted:
Thu Mar 20, 2014 1:34 pm
by mwoodbri
Hi Simon,
Would it be feasible to run a second processor but on the same box that handles requests from the "public" user and is throttled to run at most one job simultaneously?
Mark.
Re: Batch download of 'public' data & scripts.
Posted:
Thu Mar 20, 2014 3:11 pm
by manics
Hi Mark
Unfortunately we don't have a way of throttling the number of concurrent script jobs for a single processor, nor is it possible to restrict jobs by omero user. However if it's just a case of preventing the Processor from slowing down the server then in principle you could run the Processor service under a different OS user and limit the resources using ulimit, nice or some other functionality provided by the OS, but this isn't something we've tried.
We've been thinking about how OMERO Processor could be improved, so it's useful to hear how you'd like to use it.
Thanks
Simon
Re: Batch download of 'public' data & scripts.
Posted:
Fri Mar 21, 2014 2:31 pm
by mwoodbri
Thanks - that's a good idea. But I think we would at least need to be able to target jobs by user - so that we could run jobs from unauthenticated or external users under tighter resource constraints.
Re: Batch download of 'public' data & scripts.
Posted:
Fri Mar 21, 2014 2:36 pm
by jmoore
The Processor does take some options as to who it will serve. See:
- Code: Select all
bin/omero script serve -h
usage: dist/bin/omero script serve [-h] [--verbose] [-b] [-t TIMEOUT] [-C]
[-s SERVER] [-p PORT] [-g GROUP] [-u USER]
[-w PASSWORD] [-k KEY]
[who [who ...]]
Start a usermode processor for scripts
Positional Arguments:
who Who to execute for: user, group, user=1, group=5 (default=official)
...
which you can run locally as an unauthenticated user to have your scripts run. The same could be launched in the backend. (That being said, yes, all of this definitely could use more extensive features!)
All the best,
~Josh.
Re: Batch download of 'public' data & scripts.
Posted:
Wed Mar 26, 2014 6:38 pm
by mwoodbri
Thanks guys. That sounds promising. So perhaps we could:
* Prevent the OMERO.web public user from seeing/running 'global' scripts
* Create equivalent user scripts for the features that we wish to provide to the public user
* Configure the server to execute jobs for this user on a separate script processor that is resource constrained (e.g. using ulimit/cgroups/kvm).[/list]
Is this approach possible at the moment?