This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: combine xml files


Hi, Tom,

You are really good! All your assumptions are correct.

One little thing is that, each searched record IS a xml file. We are actually
searching over xml records (of course, indexed) and each search result is a xml
record. That's why in the search result screen, when you check two search
results, the file C9A1876A75333C9.tomcat1 has two entries with each entry point
to the full path of a xml file as described in the previous email.

Thanks a lot for your help.

Ming

"Thomas B. Passin" wrote:

> [Ming]
> >
> > I think I can make it more clear with an example:
> >
>
> Good.  Let me summarize what I think I understand:
>
> 1) Each search record is saved in a single xml file.
>
> 2) All contents of any one of these xml files pertain to a single work.
>
> 3) A single xml file may contain data obtained from several sources (the
> "db" values).
>
> 4) All information relevant to a particular search result is contained in a
> single xml file.
>
> 5) For formatting, reliability, or other reasons, information from a
> particular source may be preferred over that from another (the db preference
> order).  The preferred source may be different for titles than for authors.
>
> 6) The data from the most preferred source available is the data to be
> displayed.
>
> Now to check out a few things I am assuming:
>
> a) The db preferences will be the same for all xml files.
>
> b) The db preferences will either not change over searches, or only change
> infrequently.
>
> c) The number of different dbs is small and will always be known before a
> search is processed (in case we want to hard-code them).
>
> If all these things are correct, it should be fairly easy, modulo the time
> needed to process 1000 files.
>
> Let us know if these things are correct.
>
> Tom P
>
> > My saved searched files named: C9A1876A75333C9.tomcat1 (the session id).
> Each
> > entry is saved to this file after a user click on the check box in front
> of
> > each search result.
> >
> > In this file, the entries are like these:
> > /records/sci01/1082-6068/30/1/69_DOU-PSOCFPGCWPIIC
> > /records/sci02/0254-3052/24/10/892_BAI-SDJPLGGVRP
> >
> > And each entry is a xml file. And the format of each xml file is like
> this:
> >
> > <xml>
> >   <db1>
> >      <jauthor>
> >         <author db=db1> Smith, J</author>
> >         <author db=db1> Mou, S </author>
> >     </jauthor>
> >     <jtitle>
> >        <title db=db1> Preliminary study on network (II) </title>
> >     </jtitle>
> >   </db1>
> >
> >   <db2>
> >      <jauthor>
> >        <author db=db2> Smith, JR </author>   <!-- note here,  since it's
> the
> > same article, the author is the same
> >
> > but displayed differently for different database -->
> >        <author db=db2> Mou, ST </author>
> >      </jauthor>
> >      <jtitle>
> >        <title db=db2> Preliminary Study on Network (II) </title><!-- same
> as
> > author, same article, but display title is slightly different -->
> >      </jtitle>
> >   </db2>
> > </xml>
> >
> > And here is my preference file (It can be in any format, here I just put
> it in
> > a text file with space delimited format):
> > filename: DbPref.txt
> > content:
> > title: db2 db1 db3
> > author: db1 db3 db2
> >
> > Actually, there are about 6 dbs (from db1 to db6). And each xml file (or
> each
> > record) can be in any one or more dbs.
> >
> > So, my job is to display something like this on the website:
> > Title: Preliminary Study on Network (II)  <!-- note here, this title is
> from
> > title in db2, since db2 is the preferred title display database -->
> > Author: Smith, J; Mou, S  <!-- note here, the authors are from the authors
> in
> > db1, since db1 is the preferred author display database -->
> >
> > I've thought about this over and over again and think maybe the way you
> > mentioned is a good idea. And what I need to do more is to add the
> preference
> > information (in order to do this, I may need to process each xml file in
> my
> > java servlet and find the preference) to the xml file. Something like:
> > <files>
> > <file title=db2 author=db1> xml file 1 </file>
> > <file title=db3 author=db2> xml file 2 </file>   <!-- note here, the
> record in
> > xml file 2 is in db 2 and db3 -->
> > </files>
> >
> > I don't think  I answer your question correctly. But I really don't know
> how to
> > find a proper answer. So, I gave you this complete scenario. Hope this can
> help
> > to clarify the problem.
> >
> > Thanks a lot.
> >
> > Ming
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > "Thomas B. Passin" wrote:
> >
> > > We're getting closer, I think.  If a work can appear, with a different
> > > format,  in more than one xml file, then how can you tell when an entry
> in
> > > one file is for the same work as an entry in another file?  You need to
> be
> > > able to do that, it would seem, or you won't be able to match up
> entries.
> > >
> > > What data is contained in any one xml file?  Is it data on one single
> work
> > > from one single database?  Is it many works, but all from one database?
> Is
> > > it one single work, but possibly from many databases?
> > >
> > > Are you expecting to get a fast response when looking through 1000 files
> for
> > > each query?  How fast?  Or can it be a batch process?  Even doing a
> > > directory listing of 1000 files can take some time, depending on your
> > > system, and that's not doing any processing on the files.
> > >
> > > Cheers,
> > >
> > > Tom P
> > >
> > > [Ming]
> > >
> > > >
> > > > To make my explanation easier to understand (sorry for the
> misleading),
> > > I'm
> > > > going to describe my task.
> > > >
> > > > Actually I'm doing the "View Marked" function after a search.  The
> saved
> > > > searched are saved in a temporary file with the session id as the file
> > > name.
> > > > And each entry in the file is a complete path to a xml file. So, the
> > > number of
> > > > xml files saved in the temporary file can vary from 1 to 1000.  After
> the
> > > user
> > > > click on the view marked button, I need to display the title and
> author
> > > > information for each xml file to the user. So, it's a
> > > > dynamic process.
> > > >
> > > > For the title in each xml file, the title format for each database is
> > > slightly
> > > > different and so are others such as author. That's why we have a
> > > preference
> > > > list for titles, authors, etc because different group of people prefer
> > > > different display format for titles, authors, etc.
> > > >
> > > > Yes, I need to look through each xml record since some titles appears
> only
> > > in
> > > > one database and some appear in more than one database. So, the <db*>
> tags
> > > are
> > > > different. And I need to find out the most preferred one to display
> from
> > > my
> > > > preferrence list.
> > > >
> > > ...
> > >
> > >  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> >
> >
> >  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> >
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]