Page 1 of 2

trigger my own python app whenever a dataset is updated

PostPosted: Thu Apr 07, 2011 3:38 pm
by bhcho
Hi all,

I'm looking for a way to trigger my own python function (which lives in lib/python) when a dataset is updated. That python function will search over the newly inserted images (or deleted images) and do some calculation on each image (that means the function should take input arguments for the dataset ID and (updated image ids, if possible)).

Does anyone have any idea on how to do this?

I also want to know how to trigger my function regularly such as every day or every week, if possible. In this case, it should be better to do the calculation on the updated dataset only (rather than do on every dataset in the OMERO DB).


Best,
BK

Re: trigger my own python app whenever a dataset is updated

PostPosted: Thu Apr 07, 2011 8:15 pm
by jmoore
Hi BK,

it's interesting that you bring this us. I just merged a branch in to develop (http://git.openmicroscopy.org/?p=ome.git;a=commit;h=ed2d780be96fa2b15a3e46f3d7bca500a47de3cf) which contains a separate Java process similar to the Indexer which listens for a particular EventLog ("PIXELDATA") in order to add background processing. After 4.3, we'll most likely be generalizing this facility for handling cases exactly like yours. In the mean-time, there's not a very simple way to do what you want.

Within OMERO, the simplest route is most likely to add a hook to your bridge which kicks off the Python processes. Rather than listening for "Dataset" EventLogs, you might want to handle "DatasetImageLink" EventLogs which are rarely updated, but frequently inserted and deleted. One difficulty would be grouping all of the DatasetImageLink changes, since you don't want to process the dataset multiple times.

Doing this external to OMERO would be as simple as storing the last runtime in a file, and then processing all EventLogs via a cron job. In pseudo-code:

Code: Select all
last_runtime = ... # time since epoch in millis
query_service = ...

query = "select el.entityId from EventLog el where el.entityType = 'ome.model.containers.DatasetImageLink' and el.event.time > :last"
parameters = omero.sys.ParametersI()
parameters.add(omero.rtypes.rtime(last_runtime))
dataset_ids = set()
for x in query_service.projection(query, parameters):
    dataset_ids.add(lookup_dataset(x))
handle_datasets(dataset_ids)
store_new_runtime()


Cheers,
~Josh

Re: trigger my own python app whenever a dataset is updated

PostPosted: Fri Apr 15, 2011 2:00 pm
by bhcho
Hi Josh,

then in my bridge, can I do something like

Code: Select all
public void set(final String name, final Object value,
                     final Document document, final LuceneOptions _opts) {

          if (value instanceof DatasetImageLink) {
                  // 1. extract child image ID
                  // 2. extract parent dataset ID
                  // 3. run my own python code, passing those IDs as input arguments
          }
}


and could you tell me how to get the image ID and dataset ID from the code above?
in addition, I guess I should pass the login account information to my python code in order to create another session with that account. can you tell me how to get the current account's id and password?
(or do you have any idea to pass the currnet session information to my python code?).

BK

Re: trigger my own python app whenever a dataset is updated

PostPosted: Sat Apr 16, 2011 4:55 pm
by jmoore
bhcho wrote:Hi Josh,

then in my bridge, can I do something like

Code: Select all
public void set(final String name, final Object value,
                     final Document document, final LuceneOptions _opts) {

          if (value instanceof DatasetImageLink) {
                  // 1. extract child image ID
                  // 2. extract parent dataset ID
                  // 3. run my own python code, passing those IDs as input arguments
          }
}


and could you tell me how to get the image ID and dataset ID from the code above?


This looks roughly ok. To get the ids, use:
Code: Select all
DatasetImageLink link = (DatasetImageLink) value;
Long datasetID = link.getParent().getId();
Long imageID = link.getChild().getId();


in addition, I guess I should pass the login account information to my python code in order to create another session with that account. can you tell me how to get the current account's id and password?
(or do you have any idea to pass the currnet session information to my python code?).

BK


There is no user or session here so it will not be possible to pass them to your code. Further, it is never possible to retrieve the password for a user. If the python code cannot run as a administrator, you will need to create a new session, perhaps for the owner (link.getDetails().getOwner().getId()) of the object.

Cheers,
~Josh

Re: trigger my own python app whenever a dataset is updated

PostPosted: Mon Apr 18, 2011 4:48 pm
by bhcho
Hi Josh,

my bridge code looks like

Code: Select all
   public void set(final String name, final Object value,
                     final Document document, final LuceneOptions _opts) {
         
          if (value instanceof Image) {
                   logger().info("Scheduling all Experiment Files of " + value + " for re-indexing");
                   // do some other thing
          } else if (value instanceof DatasetImageLink) {
                   // run my python code
          }
   }


But everytime I import images through importer, it looks like it does not go into the DatasetImageLink block (rather, it goes into Image block).

In this case, could you tell me how to inherit the DatasetImageLink object from Image (value) object?
can I just cast it? like
Code: Select all
   public void set(final String name, final Object value,
                     final Document document, final LuceneOptions _opts) {
         
          if (value instanceof Image) {
                  DatasetImageLink link = (DatasetImageLink) value;
          }
   }


(Updated: I think I figured out how to get dataset ID from Image instance. I could get it by unmodifiedDataset or something, which I don't remember now)


In addition, when I looked into the Indexer, it seems like the bridge was called twice (when I imported ONE image. is this common?
(I recognized that it was called twice, because the INFO "Scheduling all Experiment Files ..." was written twice at almost same time, which is included in my code.)

2011-04-18 12:29:32,355 INFO [.cmu.search.bridges.ExperimentFileBridge] (3-thread-2) Scheduling all Experiment Files of ome.model.core.Image:Id_8251 for re-indexing
2011-04-18 12:29:32,658 INFO [ ome.services.fulltext.EventBacklog] (3-thread-2) Added to backlog:ome.model.core.Image:Id_8251
2011-04-18 12:29:33,108 INFO [ ome.services.fulltext.FullTextIndexer] (3-thread-2) INDEXED 9 objects in 1 batch(es) [1085 ms.]
2011-04-18 12:29:36,033 INFO [.cmu.search.bridges.ExperimentFileBridge] (3-thread-5) Scheduling all Experiment Files of ome.model.core.Image:Id_8251 for re-indexing
2011-04-18 12:29:36,104 INFO [ ome.services.fulltext.FullTextIndexer] (3-thread-5) INDEXED 1 objects in 1 batch(es) [88 ms.]
2011-04-18 12:29:40,187 INFO [ ome.services.fulltext.FullTextIndexer] (3-thread-3) INDEXED 2 objects in 1 batch(es) [167 ms.]


Another question. (I'm really sorry that I have tons of stupid questions).
If I delete an image from OMERO DB, can I get the same event log? (such as "value instanceof Image" block).
In that case, how can I distinguish those cases between import and delete?

BK

Re: trigger my own python app whenever a dataset is updated

PostPosted: Fri Apr 22, 2011 6:55 pm
by jennBakal
jmoore wrote:There is no user or session here so it will not be possible to pass them to your code. Further, it is never possible to retrieve the password for a user. If the python code cannot run as a administrator, you will need to create a new session, perhaps for the owner (link.getDetails().getOwner().getId()) of the object.


Josh, can you please clarify some of these statements. specifically:

1. what do you mean by "run as an administrator"? does that mean specifying an administrator's user/password in the python code &/or in the bridge code?
2. if a user/session is specified by the bridge code is it then possible to pass it to the python code? or is it not possible to specify that in the bridge code?
3. in your last line, you suggest creating a new session for the owner, but as far as i know, it's not possible to do that without the password. is there a way to trigger a request for the password to be input?

thanks,
jenn

Re: trigger my own python app whenever a dataset is updated

PostPosted: Tue Apr 26, 2011 9:59 am
by cxallan
bhcho wrote:Hi Josh,

my bridge code looks like

Code: Select all
   public void set(final String name, final Object value,
                     final Document document, final LuceneOptions _opts) {
         
          if (value instanceof Image) {
                   logger().info("Scheduling all Experiment Files of " + value + " for re-indexing");
                   // do some other thing
          } else if (value instanceof DatasetImageLink) {
                   // run my python code
          }
   }


But everytime I import images through importer, it looks like it does not go into the DatasetImageLink block (rather, it goes into Image block).


While Josh's example is valid, the problem you're having here comes down to indexer excludes (omero.search.excludes). These are defined in the etc/omero.properties file. We don't full text index several objects because it simply makes no sense to; almost all link objects, like DatasetImageLink are an example of this.

bhcho wrote:In this case, could you tell me how to inherit the DatasetImageLink object from Image (value) object?
can I just cast it? like
Code: Select all
   public void set(final String name, final Object value,
                     final Document document, final LuceneOptions _opts) {
         
          if (value instanceof Image) {
                  DatasetImageLink link = (DatasetImageLink) value;
          }
   }


(Updated: I think I figured out how to get dataset ID from Image instance. I could get it by unmodifiedDataset or something, which I don't remember now)


If you're willing to put up with the overhead, the first thing you could do is to remove DatasetImageLink from the excludes. Otherwise, you can access the links via:

Code: Select all
...
          if (value instanceof Image) {
                  Image image = (Image) value;
                  for (final DatasetImageLink link : image.unmodifiableDatasetLinks()) {
                      ...
                  }
          }
...


bhcho wrote:In addition, when I looked into the Indexer, it seems like the bridge was called twice (when I imported ONE image. is this common?


Yes, it is. One of the actions is an INSERT and the other is an UPDATE. It's highly dependent on the types of objects being linked at import time. You can however rely on these actions happening in serial.

bhcho wrote:(I recognized that it was called twice, because the INFO "Scheduling all Experiment Files ..." was written twice at almost same time, which is included in my code.)

2011-04-18 12:29:32,355 INFO [.cmu.search.bridges.ExperimentFileBridge] (3-thread-2) Scheduling all Experiment Files of ome.model.core.Image:Id_8251 for re-indexing
2011-04-18 12:29:32,658 INFO [ ome.services.fulltext.EventBacklog] (3-thread-2) Added to backlog:ome.model.core.Image:Id_8251
2011-04-18 12:29:33,108 INFO [ ome.services.fulltext.FullTextIndexer] (3-thread-2) INDEXED 9 objects in 1 batch(es) [1085 ms.]
2011-04-18 12:29:36,033 INFO [.cmu.search.bridges.ExperimentFileBridge] (3-thread-5) Scheduling all Experiment Files of ome.model.core.Image:Id_8251 for re-indexing
2011-04-18 12:29:36,104 INFO [ ome.services.fulltext.FullTextIndexer] (3-thread-5) INDEXED 1 objects in 1 batch(es) [88 ms.]
2011-04-18 12:29:40,187 INFO [ ome.services.fulltext.FullTextIndexer] (3-thread-3) INDEXED 2 objects in 1 batch(es) [167 ms.]


Another question. (I'm really sorry that I have tons of stupid questions).
If I delete an image from OMERO DB, can I get the same event log? (such as "value instanceof Image" block).
In that case, how can I distinguish those cases between import and delete?

BK


In order to distinguish between an inserts, updates and deletes you will have to examine the Events associated with the object at the time of indexing. These include the update event (image.getDetails().getUpdateEvent()) and the creation event (image.getDetails().getCreationEvent()).

Re: trigger my own python app whenever a dataset is updated

PostPosted: Tue Apr 26, 2011 10:42 am
by cxallan
jennBakal wrote:
jmoore wrote:There is no user or session here so it will not be possible to pass them to your code. Further, it is never possible to retrieve the password for a user. If the python code cannot run as a administrator, you will need to create a new session, perhaps for the owner (link.getDetails().getOwner().getId()) of the object.


Josh, can you please clarify some of these statements. specifically:

1. what do you mean by "run as an administrator"? does that mean specifying an administrator's user/password in the python code &/or in the bridge code?


The bridge code is running in a context where every object is accessible. There are no permissions boundaries for that code and there is no access to the session and thus no access to session services.

jennBakal wrote:2. if a user/session is specified by the bridge code is it then possible to pass it to the python code? or is it not possible to specify that in the bridge code?


A user and session are not specified by the bridge code. You'd have to intelligently create that session in the Python code where the Python code itself has access to a session. You could do this by, by passing the owner username to your Python code and then performing something like:

Code: Select all
...
    // In the search bridge
    String user = image.getDetails().getOwner().getOmeName();
    String group = image.getDetails().getGroup().getName();
    Runtime rt = Runtime.getRuntime();
    rt.exec(new String[] { "path/to/python/script.py", user, group});

...
    # In the Python script.py
    client = omero.client('localhost')
    session = client.createSession('an_administrator', 'password')
    principal = omero.sys.Principal()
    principal.name = user
    principal.group = group
    principal.eventType = "User"
    timeout = 60000  # 60 seconds (milliseconds)
    user_session = session.getSessionService().createSessionWithTimeout(principal, timeout)
    client.closeSession()
    session = client.joinSession(user_session.uuid.val)
    # Do work as 'user'
...


This relies on you creating an account in the system group for use by your Python script that will be called by the search bridge. As that account has administrative privileges (is in the system group) it is able to create a session for another user and group and then subsequently join it so that you can perform whatever logic you wish within the correct context.

jennBakal wrote:3. in your last line, you suggest creating a new session for the owner, but as far as i know, it's not possible to do that without the password. is there a way to trigger a request for the password to be input?


There's no requirement for you to know the password when a user has administrative privileges (is in the system group), you can create a session for any user as above.

Re: trigger my own python app whenever a dataset is updated

PostPosted: Tue Nov 29, 2011 10:21 pm
by bhcho
Hi,

I'm migrating to 4.3.x.
I used the same code above to execute my python app from bridge.
Code: Select all
    ...
        // In the search bridge
        String user = image.getDetails().getOwner().getOmeName();
        String group = image.getDetails().getGroup().getName();
        Runtime rt = Runtime.getRuntime();
        rt.exec(new String[] { "path/to/python/script.py", user, group});

    ...
        # In the Python script.py
        client = omero.client('localhost')
        session = client.createSession('an_administrator', 'password')
        principal = omero.sys.Principal()
        principal.name = user
        principal.group = group
        principal.eventType = "User"
        timeout = 60000  # 60 seconds (milliseconds)
        user_session = session.getSessionService().createSessionWithTimeout(principal, timeout)
        client.closeSession()
        session = client.joinSession(user_session.uuid.val)
        # Do work as 'user'
    ...


and after I created a session (with admin account and a general account of the same "Researchers" group),
I tried to get an Image by
Code: Select all
gateway = session.createGateway()
image = gateway.getImage(long(iid))

But the 'image' object was empty.
Does anyone have any idea about this?
FYI, this code was working from 4.2.x.

BK

Re: trigger my own python app whenever a dataset is updated

PostPosted: Wed Nov 30, 2011 8:44 am
by jmoore
Can you print out the current login information: session.getAdminService().getEventContext() along with the same information for the image (getDetails().getGroup().getName(), etc.)

Cheers,
~J.