Difference between revisions of "QAWiki:Guide/Mentions"

From QAWiki
Jump to navigation Jump to search
(restructuring the document to make it faster to review on skim)
Line 1: Line 1:
== Importance of mentions ==
== Importance of mentions ==


What is a mention? Take the question "[[Item:Q19|''What is the capital of Ireland?'']]". We can identify two types of mentions in this case: (1) the phrase ''Ireland'' is an '''entity mention''' that refers to the Wikidata entity (aka item) [https://www.wikidata.org/wiki/Q27 Q27]; (2) the phrase ''capital'' is a '''property mention''' that refers to the Wikidata property [https://www.wikidata.org/wiki/Property:P36 P36]. Adding mentions relates (sub)phrases of questions and question alises to elements of knowledge-bases such as Wikidata.
What is a "mention"? Take the question "[[Item:Q19|''What is the capital of Ireland?'']]". We can identify two types of mentions in this case: (1) the phrase ''Ireland'' is an '''entity mention''' that refers to the Wikidata entity (aka item) [https://www.wikidata.org/wiki/Q27 Q27]; (2) the phrase ''capital'' is a '''property mention''' that refers to the Wikidata property [https://www.wikidata.org/wiki/Property:P36 P36]. Adding mentions relates (sub)phrases of questions and question alises to elements of knowledge-bases such as Wikidata.


If you can add just a question, or a question and a query, then that's very welcome. However, adding mentions can be very useful for Question Answering systems in order to know which parts of the question address which entity or property on knowledge-bases such as Wikidata. This is a non-trivial task to do automatically, and in a highly precise and complete way. For example, it may not be immediately obvious if ''Ireland'' refers to [https://www.wikidata.org/wiki/Q27 Q27], [https://www.wikidata.org/wiki/Q22890 Q22890] or [https://www.wikidata.org/wiki/Q1140152 Q1140152]. Knowing how entities or properties are mentioned enable Question Answering systems to better generalise to answering similar questions that involve, for example, a different entity and property. Here we will provide guidelines on how to add mentions linking phrases of questions to specific entities and properties on Wikidata.
If you can add just a question, or a question and a query, then that's very welcome. However, adding mentions can be very useful for Question Answering systems in order to know which parts of the question address which entity or property on knowledge-bases such as Wikidata. This is a non-trivial task to do automatically, and in a highly precise and complete way. For example, it may not be immediately obvious if ''Ireland'' refers to [https://www.wikidata.org/wiki/Q27 Q27], [https://www.wikidata.org/wiki/Q22890 Q22890] or [https://www.wikidata.org/wiki/Q1140152 Q1140152]. Knowing how entities or properties are mentioned enable Question Answering systems to better generalise to answering similar questions that involve, for example, a different entity and property. Here we will provide guidelines on how to add mentions linking phrases of questions to specific entities and properties on Wikidata.
Line 11: Line 11:
== General guidelines for mentions ==
== General guidelines for mentions ==


We start with some general guidelines for mentions:
We start with some general guidelines for mentions applicable to both entity and property mentions.
 
=== Phrases are substrings of questions or their aliases ===
 
Select phrases that are substrings of questions or question aliases and that end at a word boundary (e.g., if a phrase is plural, keep it plural; do not split words). Be sure to select the correct language (we recommend preferring higher-level language codes like <code>fr</code> unless otherwise justified).
 
=== Be liberal when adding mentions ===
 
We recommend to be quite liberal when adding mentions. There are in some cases many ways to write the same query that may use the same combinations of entities and mentions. For example, for the question "[[Item:Q172|''Which U.S. president had the most spouses?'']]", we can offer entity mentions for ''U.S.'' ([https://www.wikidata.org/wiki/Q30 Q30]) and ''president'' ([https://www.wikidata.org/wiki/Q30461 Q30461]) even if there is a specific entity for ''U.S. president'' ([https://www.wikidata.org/wiki/Q11696 Q11696]). Likewise, for a mention like ''wives'', you can add the property [https://www.wikidata.org/wiki/Property:P26 P26 (spouse)], which though not expressing exactly the same property, is likely to be used in such a query given the lack of a specific ''wife'' property in Wikidata.
 
=== A mention can link to many entities and/or properties ===


* Select phrases that are substrings of questions or question aliases and that end at a word boundary (e.g., if a phrase is plural, keep it plural). Be sure to select the correct language.
* We recommend to be quite liberal when adding mentions. There are in some cases many ways to write the same query that may use the same combinations of entities and mentions. For example, for the question "[[Item:Q172|''Which U.S. president had the most spouses?'']]", we can offer entity mentions for ''U.S.'' ([https://www.wikidata.org/wiki/Q30 Q30]) and ''president'' ([https://www.wikidata.org/wiki/Q30461 Q30461]) even if there is a specific entity for ''U.S. president'' ([https://www.wikidata.org/wiki/Q11696 Q11696]). Likewise, for a mention like ''wives'', you can add the property [https://www.wikidata.org/wiki/Property:P26 P26 (spouse)], which though not expressing exactly the same property, is likely to be used in such a query given the lack of a specific ''wife'' property in Wikidata.
* The same phrase can have multiple relevant mentions, and can mix entity and property mentions. For example, in the question "[[Item:Q19|''What is the capital of Ireland?'']]", the phrase ''capital'' corresponds not only to the property [https://www.wikidata.org/wiki/Property:P36 P36], but also the entity [https://www.wikidata.org/wiki/Q5119 Q5119 (capital city)]. We recommend, whenever possible, to add any mentions or links that might be useful.
* The same phrase can have multiple relevant mentions, and can mix entity and property mentions. For example, in the question "[[Item:Q19|''What is the capital of Ireland?'']]", the phrase ''capital'' corresponds not only to the property [https://www.wikidata.org/wiki/Property:P36 P36], but also the entity [https://www.wikidata.org/wiki/Q5119 Q5119 (capital city)]. We recommend, whenever possible, to add any mentions or links that might be useful.
* In the case of questions in sublanguage variants, we recommend only adding the mentions that change from the general language. For example, in [[Item:Q245|''What are the colors of the French flag?'' (<code>en</code>)]] and [[Item:Q245|''What are the colours of the French flag?'' (<code>en-UK</code>)]], it is sufficient to add <code>en-UK</code> mentions for ''colours'' and not repeat mentions for ''French'', ''flag'', etc., if defined already for <code>en</code>.


=== Guidelines for entities ===
=== Avoid repeating mentions across language dialects ===


* Entities do not need to be ''named entities'', i.e., entities named by a proper noun (as often capitalised in many languages). We recommend also linking phrases like ''president'' ([https://www.wikidata.org/wiki/Q30461 Q30461]).
In the case of questions in different dialects of the same language, we recommend only adding the mentions that change from the general language. For example, in [[Item:Q245|''What are the colors of the French flag?'' (<code>en</code>)]] and [[Item:Q245|''What are the colours of the French flag?'' (<code>en-UK</code>)]], it is sufficient to add <code>en-UK</code> mentions for ''colours'' and not repeat mentions for ''French'', ''flag'', etc., if defined already for <code>en</code>.
 
=== Use discontinuous mention phrases ===
 
A key mention might not be continuous in a particular query. For example, in the question "[[Item:Q296|''What volcanos in Iceland are active?''], an important mention is for the Wikidata entity [https://www.wikidata.org/wiki/Q1330974 Q1330974 (''active volcano'')]. In this case, we recommend using the symbol <code>*</code> to represent a wildcard of one or more characters in the phrase of the mention, such that the phrase for the mention becomes "''volcanos * active''", keeping the order of words in the phrase. Similar cases can apply for property mentions. For example, in the question alias "[[Item:Q112|''On what date was Hey Jude published?'']]", we can link the discontinuous mention "''date * published''" to the Wikidata property [https://www.wikidata.org/wiki/Property:P577 P577 (''publication date'')]. In all such cases, <code>*</code> replaces one or more words in the question.
 
== Guidelines for entity mentions ==
 
* Entities do not need to be ''named entities'', i.e., entities named with a proper noun like ''Gabriel Boric'' ([https://www.wikidata.org/wiki/Q16297876 Q16297876]) (as often capitalised in many languages). We recommend also linking phrases like ''president'' ([https://www.wikidata.org/wiki/Q30461 Q30461]).
* Entity phrases do not need to be nouns in base form. For example, in the case of [[Item:Q245|''What are the colors of the French flag?''], we recommend linking the adjectival phrase ''French'' to [https://www.wikidata.org/wiki/Q142 Q142 (France)]; we recommend using the phrase as it appears in the question, e.g., using ''French'' rather than ''France'', keeping nouns phrases plural if they appear so in the question, etc.
* Entity phrases do not need to be nouns in base form. For example, in the case of [[Item:Q245|''What are the colors of the French flag?''], we recommend linking the adjectival phrase ''French'' to [https://www.wikidata.org/wiki/Q142 Q142 (France)]; we recommend using the phrase as it appears in the question, e.g., using ''French'' rather than ''France'', keeping nouns phrases plural if they appear so in the question, etc.
* We recommend adding ''overlapping mentions'', i.e., phrases referring to entities that overlap in the question (often one will be contained in a larger phrase). For example, for the question "[[Item:Q172|''Which U.S. president had the most spouses?'']]", we recommend adding entity mentions for ''U.S.'' ([https://www.wikidata.org/wiki/Q30 Q30]), ''president'' ([https://www.wikidata.org/wiki/Q30461 Q30461]) and ''U.S. president'' ([https://www.wikidata.org/wiki/Q11696 Q11696]).
* We recommend adding ''overlapping mentions'', i.e., phrases referring to entities that overlap in the question (often one will be contained in a larger phrase). For example, for the question "[[Item:Q172|''Which U.S. president had the most spouses?'']]", we recommend adding entity mentions for ''U.S.'' ([https://www.wikidata.org/wiki/Q30 Q30]), ''president'' ([https://www.wikidata.org/wiki/Q30461 Q30461]) and ''U.S. president'' ([https://www.wikidata.org/wiki/Q11696 Q11696]).
== Guidelines for property mentions ==
* The mentions for properties can oftentimes be a bit more relaxed or indirect than in the case of entities. For example, a mention like ''wife'' linked to [https://www.wikidata.org/wiki/Property:P26 P26 (spouse)], which though not expressing exactly the same property, is likely to be used in such a query given the lack of a specific ''wife'' property in Wikidata.
* Related to the previous point,

Revision as of 19:37, 5 December 2022

Importance of mentions

What is a "mention"? Take the question "What is the capital of Ireland?". We can identify two types of mentions in this case: (1) the phrase Ireland is an entity mention that refers to the Wikidata entity (aka item) Q27; (2) the phrase capital is a property mention that refers to the Wikidata property P36. Adding mentions relates (sub)phrases of questions and question alises to elements of knowledge-bases such as Wikidata.

If you can add just a question, or a question and a query, then that's very welcome. However, adding mentions can be very useful for Question Answering systems in order to know which parts of the question address which entity or property on knowledge-bases such as Wikidata. This is a non-trivial task to do automatically, and in a highly precise and complete way. For example, it may not be immediately obvious if Ireland refers to Q27, Q22890 or Q1140152. Knowing how entities or properties are mentioned enable Question Answering systems to better generalise to answering similar questions that involve, for example, a different entity and property. Here we will provide guidelines on how to add mentions linking phrases of questions to specific entities and properties on Wikidata.

What about other knowledge-bases?

QAWiki is open to collecting questions, mentions and queries to enable question answering over other open knowledge bases. However, please do keep in mind that Wikidata offers links to a wide range of knowledge bases, including DBpedia, Wikipedia, YAGO, and many, many, more besides. Hence mentions, specifically, can be easily translated automatically via these links to these other knowledge bases. Aside from this, Wikidata offers a wider selection of entities and (curated) properties when compared to these other sources. For this reason, we currently recommend focusing on adding mentions for Wikidata. In future, we may look at ways in which mentions for other knowledge bases can be added automatically. If there are other open knowledge bases without links from Wikidata, we would rather favour adding the links to Wikidata, as they will be much more reusable (not just for QAWiki).

General guidelines for mentions

We start with some general guidelines for mentions applicable to both entity and property mentions.

Phrases are substrings of questions or their aliases

Select phrases that are substrings of questions or question aliases and that end at a word boundary (e.g., if a phrase is plural, keep it plural; do not split words). Be sure to select the correct language (we recommend preferring higher-level language codes like fr unless otherwise justified).

Be liberal when adding mentions

We recommend to be quite liberal when adding mentions. There are in some cases many ways to write the same query that may use the same combinations of entities and mentions. For example, for the question "Which U.S. president had the most spouses?", we can offer entity mentions for U.S. (Q30) and president (Q30461) even if there is a specific entity for U.S. president (Q11696). Likewise, for a mention like wives, you can add the property P26 (spouse), which though not expressing exactly the same property, is likely to be used in such a query given the lack of a specific wife property in Wikidata.

A mention can link to many entities and/or properties

  • The same phrase can have multiple relevant mentions, and can mix entity and property mentions. For example, in the question "What is the capital of Ireland?", the phrase capital corresponds not only to the property P36, but also the entity Q5119 (capital city). We recommend, whenever possible, to add any mentions or links that might be useful.

Avoid repeating mentions across language dialects

In the case of questions in different dialects of the same language, we recommend only adding the mentions that change from the general language. For example, in What are the colors of the French flag? (en) and What are the colours of the French flag? (en-UK), it is sufficient to add en-UK mentions for colours and not repeat mentions for French, flag, etc., if defined already for en.

Use discontinuous mention phrases

A key mention might not be continuous in a particular query. For example, in the question "[[Item:Q296|What volcanos in Iceland are active?], an important mention is for the Wikidata entity Q1330974 (active volcano). In this case, we recommend using the symbol * to represent a wildcard of one or more characters in the phrase of the mention, such that the phrase for the mention becomes "volcanos * active", keeping the order of words in the phrase. Similar cases can apply for property mentions. For example, in the question alias "On what date was Hey Jude published?", we can link the discontinuous mention "date * published" to the Wikidata property P577 (publication date). In all such cases, * replaces one or more words in the question.

Guidelines for entity mentions

  • Entities do not need to be named entities, i.e., entities named with a proper noun like Gabriel Boric (Q16297876) (as often capitalised in many languages). We recommend also linking phrases like president (Q30461).
  • Entity phrases do not need to be nouns in base form. For example, in the case of [[Item:Q245|What are the colors of the French flag?], we recommend linking the adjectival phrase French to Q142 (France); we recommend using the phrase as it appears in the question, e.g., using French rather than France, keeping nouns phrases plural if they appear so in the question, etc.
  • We recommend adding overlapping mentions, i.e., phrases referring to entities that overlap in the question (often one will be contained in a larger phrase). For example, for the question "Which U.S. president had the most spouses?", we recommend adding entity mentions for U.S. (Q30), president (Q30461) and U.S. president (Q11696).

Guidelines for property mentions

  • The mentions for properties can oftentimes be a bit more relaxed or indirect than in the case of entities. For example, a mention like wife linked to P26 (spouse), which though not expressing exactly the same property, is likely to be used in such a query given the lack of a specific wife property in Wikidata.
  • Related to the previous point,