Patent Informatics Group's Weblog

August 1, 2008

4. Semantic solutions

Filed under: Uncategorized — Tags: — intellisemantic @ 1:10 pm

Semantic solutions include in their architecture knowledge data bases.

These knowledge data base connect the form of the words with their meanings and to relate different words and different meanings with different kind of relationships, as for example subset and superset.

These knowledge data are called in general ontologies; thesauri and taxonomies are specific instantiations of them. These knowledge data in general code the domain knowledge; in the patents they can also code the document structure, since patents follow a specific document structure.

The use of the meanings of words besides the form of words allows to deal with synonyms (i.e. different words with similar meanings), omographs (i.e. words having the same form but different meanings) and more in general to deal with the multilinguality issue.

Since the wide diversity of semantic solutions, it is appropriate to further differentiate them by the technology used and by the specific function.

Examples of patent information functions which can benefit from semantics are:

a) Semantic Search improvement. This is due to the fact that by using the meanings the Recall increases, since it can be possible to capture other words having the same meaning and the Precision increases as well, since it can be possible to disambiguate identical words with different meanings

b) Semantic faceted refinement between a long list of collected documents. In this sequential approach a high recall query is first applied, which can include also garbage document, then in order to achieve precision as well, downloaded documents are suitably refined by local semantic faceted refinement.

c) Semantic support for Analysis. In this case semantics is used for supporting the user in analysing a specific patent from a specific point of view, IP or technically related.

Of course different kind of functions require different kind of technologies as well, since for example the semantic search improvement benefit from the “shallow” semantics, whilst the semantic support for analysis can include also “deep” semantics, i.e. some form of reasoning.

Some more details and some specific adoption examples are cited in the full presentation, which can accessed from http://www.intellipatent.eu/Documents/PatentInformatics2.pdf

5. Other suggested topics

Filed under: Uncategorized — intellisemantic @ 1:08 pm

This section is simply a root allowing participants to suggest related topics not yet presented other sections.

3. Patent search benchmarking

Filed under: Uncategorized — Tags: — intellisemantic @ 1:04 pm

A opportunity for patent informatics today is to benefit from the research of the Information Retrieval (I/R) community in order to provide more substance to the claims in new technical advances.

Information Retrieval uses Precision and Recall in order to asses different approaches; these have to be measured against specifically agreed benchmarks, which is a typical requirement for every serious activity.

Precision and Recall exhibit opposite trends, since by augmenting the number of retrieved documents, the number of search related documents increases, but the percentage of relevant documents decreases

The objective of new algorithms it to provide a better performance both in Precision as well in Recall ; this can be accomplished for example by using sequential methods.

In summary, patent information retrieval is a specific and well characterized area of document information retrieval, due to predefined structure of patent documents, to the specific language used and to the different subcases of patent searching, which are characterized by different costs of missing important information, hence by a different balance between Precision and Recall.

The most significant and specific activities today known for identifying patent I/R benchmarks are carried out by NTCIR in Japan, which regularly collects, updates and experiments Information Retrieval test cases for different tasks, some of which are patent specific, though focused to far eastern languages (Japan, China, South Korea) besides the English one.

Other than this, NTICR collects benchmarks in patent classification (or mapping), i.e. in the refinement of a list of retrieved patents, classified by problem solved and approach used, which can be also considered a very related topics.

Other kind of activities or at least plans can also be found in the web, and must be of course encouraged.

Some more information can be found in the full presentation, which can be accessed from http://www.intellipatent.eu/Documents/PatentInformatics2.pdf

2. Patent information challenges and opportunities

Filed under: Uncategorized — Tags: , , , — intellisemantic @ 1:02 pm

Most significant patent information challenges today comes from the user side, since:

· the increasing diversity of users, as in R&D and company management, requires the integration of additional information besides patent documents

· the focus on new valued added tasks, beside patent searching, as patent analysis and patent monitoring, requires new functions too,

· the increasing number of people accessing patent information requires more intuitive user interfaces

· today pressure to more efficient information and knowledge management suggests at least to partially automatise some low level functions

From the architectural point of view, these requirements suggest to use a clearer separation between the data bases and the application, which is facilitated by the adoption of web services and of mash up architectures, and to extend the adoption semantic solutions in different process stages, from the search stage, to the patent list refinement stage to the patent analysis stage.

The full contribution accessed from HERE provides a more detailed list of challenges, including the increasing multilinguality of patent information, especially in Asian languages, and the increasing need of patent specific search benchmarks, in order to fairly assess different solutions.

The full contribution accessed from http://www.intellipatent.eu/Documents/PatentInformatics2.pdf includes also a list of current research trends.

Web information sites suggested as most relevant are:

http://www.slis.tsukuba.ac.jp/~fujii/pat_proc_pub.html a directory of events and papers about patent informatics, including references to the NTCIR workshops http://research.nii.ac.jp/ntci , to the ACL workshop on Patent Processing (http://www.slis.tsukuba.ac.jp/~fujii/acl2003ws.html ) , to the ACM SIGIR Workshop on patent retrieval (http://research.nii.ac.jp/ntcir/sigir2000ws )

(http://www.sciencedirect.com/science?_ob=PublicationURL&_tockey=%23TOC%235948%232007%23999569994%23650727%23FLA%23&_cdi=5948&_pubType=J&view=c&_auth=y&_acct=C000010078&_version=1&_urlVersion=0&_userid=128923&md5=1ec4fecb31ad11f5d7b94994f9e3bb12 ): a special issue on patent processing

http://www.ir-facility.org hosting a annual symposium for patent retrieval; presentations

http://www.patexpert.org the Patexpert EU funded project site

1. This blog: objective and reasons

Filed under: Uncategorized — Tags: , — intellisemantic @ 12:58 pm

The objective of this blog is to disseminate and collect information on new and advanced issues about patent information tools.

This area is far from be consolidated, as research efforts testify. In order to start the discussion, the following contributions summarize some background information and key issues. A more detailed presentation of background and key issues can be downloaded from http://www.intellipatent.eu/Documents/PatentInformatics.pdf

Of course this is only a enabler for the kick-off: all active users are asked to contribute and also to suggest other related topics.

Theme: Silver is the New Black. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.