Page 1 of 1

Regular expressions in the standard "Search:"?

PostPosted: Wed Oct 14, 2015 8:11 pm
by dsudar
Hi all,

Before getting to work on using tags or keyword/values, we were looking how much we can do with the standard Lucene-indexed stuff. It appears that most of the terms we would like to search on are actually in the Lucene index so the question is mostly how to make use of that. The current "Search:" box in Web and Insight appear to support only a very modest subset of regular expressions and thus don't seem to allow searches such as:
(PlateA OR PlateB) AND Well2 AND "series 052"


With the naming scheme we have such a query should give us exactly 2 images of the the equivalent condition across the 2 plates A and B. Is there a way to do more complex searches such as the above in the Lucene index?

We are of course looking at using tags but all our data in this project is in the Screen/Plate/Well format so autotag and tagsearch do not work on that currently.

Thanks,
- Damir

Re: Regular expressions in the standard "Search:"?

PostPosted: Thu Oct 15, 2015 8:24 am
by jmoore
dsudar wrote:Hi all,


Hi Damir,

With the naming scheme we have such a query should give us exactly 2 images of the the equivalent condition across the 2 plates A and B. Is there a way to do more complex searches such as the above in the Lucene index?


If I understand correctly what you're running into is that each of the terms you are searching for and finding individually is not present on the 2 images you'd like to find. To see what I mean, you can turn off, say Plates, in the left-hand panel of the search page, and see if "PlateA" returns anything. I'm assuming no.

The clients don't perform an set logic on the return values but just loop over the different types (Plates, Screens, Images, ...) and shows all the results. To do what you're asking, it would be necessary to have all the metadata fields from Plate etc. also be indexed with the Image. We've been hesitant to do this since it quickly leads to finding everything. We could, however, make it optional.

Until then, you could add your own search bridge which would index things precisely as you expect them. See the Bridges documentation for a general overview or take a look at the ProjectWithImageNameBridge.

Cheers,
~Josh.

Re: Regular expressions in the standard "Search:"?

PostPosted: Fri Oct 16, 2015 6:55 am
by dsudar
Hi Josh,

Thanks for the quick response.

I didn't really explain myself very well. Sorry about that.

Our image names are fairly complete and have the PlateXXXX ID and Well info etc. in them. They look like this:

"PlateLI8X00446_Well2_Seq0001.nd2 [PlateLI8X00446_Well2_Seq0001.nd2 (series 052)]"

So we indeed do the "Search for:" with only the "Images" checkbox checked.
If I search on: "PlateLI8X00446 AND Well2 AND 052" I get exactly that one image as a result. Exactly as expected.

However, if I want to find both this one and the equivalent image in "PlateLI8X00445" I tried to search on:
"(PlateLI8X00445 OR PlateLI8X00446) AND Well2 AND 052"

Instead of getting only the 2 images with Well2 and 052 in them, I get all 1400 images in both PlateLI8X00445 and PlateLI8X00446.

When looking at the Lucene documentation on: https://lucene.apache.org/core/3_5_0/qu ... yntax.html
and I presume that's what is being used in the search queries, the example in Grouping appears to say that the above should work, right?

I did find a work-around and maybe I'm just not understanding how it's supposed to work.
The work-around is to do a search on:
"(PlateLI8X00446 AND Well2 AND 052) OR (PlateLI8X00445 AND Well2 AND 052)"

That does give exactly the 2 correct images.

I'll also read up on the search bridge docs.

Cheers,
- Damir

Re: Regular expressions in the standard "Search:"?

PostPosted: Fri Oct 16, 2015 10:00 am
by Dominik
Hi Damir,

the lucene query "(PlateLI8X00445 OR PlateLI8X00446) AND Well2 AND 052" in fact should return the results you would expect. However we're doing quite some pre-processing of the query you enter in the search field before passing it on to the lucene engine, in order to make the search functionality easier for the user. At the moment you can only use a limited syntax for the search; e. g. you can use the "AND" and "OR" keywords, the "*" wildcard and quotes, but parenthesis aren't recognized, that's why your query leads to unexpected results.
I'll investigate how we could provide a more advanced search functionality without loosing the simplicity of using it.

Regards,
Dominik

Re: Regular expressions in the standard "Search:"?

PostPosted: Fri Oct 16, 2015 11:01 pm
by dsudar
Hi Domink,

Thanks for following up. That does explain why it works the way it does. But actually my work-around also depends on the parentheses and it does actually work correctly.

Anyway, would be nice to have access somehow to the full Lucene query functionality or possibly even full regexps.

Cheers,
Damir

Re: Regular expressions in the standard "Search:"?

PostPosted: Mon Oct 19, 2015 8:54 am
by Dominik
Hi Damir,

your work-around works because AND clauses have priority over OR clauses, the parenthesis actually aren't necessary. An initial discussion indicated that we might add an "advanced" option for 5.2.1, to enable the user to enter a lucene query directly.

Regards,
Dominik