Page 1 of 1

Dataset linked to FileAnnotation/OriginalFile

PostPosted: Wed Mar 28, 2012 7:33 pm
by chriswood
Hi,

I have xml and text files attached to datasets, and I want to use the seach service of search the text in these files and return the dataset (or dataset id).

The search is working, as far as finding text in the file:
Code: Select all
search = blitzcon.createSearchService()
search.onlyTypes(("FileAnnotation",))
search.setAllowLeadingWildcard(0)
search.setCaseSentivice(0)
search.byFullText("file.contents:this <OR> that")


This returns a list, that usually makes sense with what I search for.

To find the link between the dataset and the file, I am using this query

Code: Select all
  id = i2.id.val   # id of the file annotation
  id2 = i2.getFile().getId().getValue() # id of the OriginalFile
  q = blitzcon.getQueryService()
  s = '''select ds from DatasetAnnotationLink as ds
            join fetch ds.child as f
            join fetch ds.parent as p
            where f.id=%i or f.id=%i
       ''' % (id,id2)
res = q.findAllByQuery(s,None)


I am using both the id for the FileAnnotation and the OriginalFile because the DatasetAnnotationLink.child is not consistently linked to either the FileAnnotation or the Original file. Sometimes to the FileAnnotation, sometimes to the OriginalFile.

So I have 2 questions:
1) Am I going about this correctly, or does another method exist to find the OriginalFile-Dataset link?
2) Why is the DatasetAnnotationLink child link not consistent between OriginalFile and FileAnnotationLink?

Thanks,
Chris

Re: Dataset linked to FileAnnotation/OriginalFile

PostPosted: Wed Mar 28, 2012 8:06 pm
by jmoore
Hi Chris,
chriswood wrote:To find the link between the dataset and the file, I am using this query

Code: Select all
  id = i2.id.val   # id of the file annotation
  id2 = i2.getFile().getId().getValue() # id of the OriginalFile
  q = blitzcon.getQueryService()
  s = '''select ds from DatasetAnnotationLink as ds
            join fetch ds.child as f
            join fetch ds.parent as p
            where f.id=%i or f.id=%i
       ''' % (id,id2)
res = q.findAllByQuery(s,None)


I am using both the id for the FileAnnotation and the OriginalFile because the DatasetAnnotationLink.child is not consistently linked to either the FileAnnotation or the Original file. Sometimes to the FileAnnotation, sometimes to the OriginalFile.



I would guess if this works it's a coincidence. DatasetAnnotationLink.child can only ever point at an Annotation (including FileAnnotations). Can you debug/print out the ids of the FileAnnotations and the OriginalFiles that you are getting back? Other than the two "f.id=%i" statements, your query looks fine.

Cheers,
~Josh

Re: Dataset linked to FileAnnotation/OriginalFile

PostPosted: Thu Mar 29, 2012 1:51 pm
by chriswood
Hi Josh,

Yes, by coincidence, the two files are attached to the same dataset, which is why the results seemed odd.

My search was not returning what I expected. After digging a little deeper, I found that I have some xml and other text files that have mime-type application/octet-stream in the originalfile table of the database.
I previously wrote a script for attaching files, but didn't bother setting the mime-type properly.

Thanks,
Chris