Difference between revisions of "QAWiki:Guide/Mentions"
m (Aidan moved page QAWiki:Model/Mentions to QAWiki:Guide/Mentions: better categorisation) |
|||
(7 intermediate revisions by the same user not shown) | |||
Line 35: | Line 35: | ||
=== Use discontinuous mention phrases where useful === | === Use discontinuous mention phrases where useful === | ||
A key mention might not be continuous in a particular query. For example, in the question "[[Item:Q296|''What volcanos in Iceland are active?'']], an important mention is for the Wikidata entity [https://www.wikidata.org/wiki/Q1330974 Q1330974 (''active volcano'')]. In this case, we recommend using the symbol <code>*</code> to represent a wildcard that can be used to omit one or more words from the question when specifying the phrase of the mention, such that the phrase for the aforementioned mention becomes "''volcanos * active''", keeping the order of words in the question. Similar cases can apply for property mentions. For example, in the question alias "[[Item:Q112|''On what date was Hey Jude published?'']]", we can link the discontinuous mention "''date * published''" to the Wikidata property [https://www.wikidata.org/wiki/Property:P577 P577 (''publication date'')]. In all such cases, <code>*</code> replaces one or more words in the question. | A key mention might not be continuous in a particular query. For example, in the question "[[Item:Q296|''What volcanos in Iceland are active?'']]", an important mention is for the Wikidata entity [https://www.wikidata.org/wiki/Q1330974 Q1330974 (''active volcano'')]. In this case, we recommend using the symbol <code>*</code> to represent a wildcard that can be used to omit one or more words from the question when specifying the phrase of the mention, such that the phrase for the aforementioned mention becomes "''volcanos * active''", keeping the order of words in the question. Similar cases can apply for property mentions. For example, in the question alias "[[Item:Q112|''On what date was Hey Jude published?'']]", we can link the discontinuous mention "''date * published''" to the Wikidata property [https://www.wikidata.org/wiki/Property:P577 P577 (''publication date'')]. In all such cases, <code>*</code> replaces one or more words in the question. | ||
=== Phrases of different mentions can overlap === | === Phrases of different mentions can overlap === | ||
We recommend adding ''overlapping mentions'', i.e., phrases referring to entities that overlap in the question (often one will be contained in a larger phrase). For example, for the question "[[Item:Q172|''Which U.S. president had the most spouses?'']]", we recommend adding mentions for ''U.S.'' ([https://www.wikidata.org/wiki/Q30 Q30]), ''president'' ([https://www.wikidata.org/wiki/Q30461 Q30461] / [https://www.wikidata.org/wiki/Property:P35 P35]) and ''U.S. president'' ([https://www.wikidata.org/wiki/Q11696 Q11696]), even though they overlap. | We recommend adding ''overlapping mentions'', i.e., phrases referring to entities that overlap in the question (often one will be contained in a larger phrase). For example, for the question "[[Item:Q172|''Which U.S. president had the most spouses?'']]", we recommend adding mentions for "''U.S.''" ([https://www.wikidata.org/wiki/Q30 Q30]), "''president''" ([https://www.wikidata.org/wiki/Q30461 Q30461] / [https://www.wikidata.org/wiki/Property:P35 P35]) and "''U.S. president''" ([https://www.wikidata.org/wiki/Q11696 Q11696]), even though they overlap. | ||
== Guidelines for entity mentions == | == Guidelines for entity mentions == | ||
Line 45: | Line 45: | ||
=== Entities do not need to be named entities === | === Entities do not need to be named entities === | ||
Entities do not need to be ''named entities'', i.e., entities named with a proper noun like ''Gabriel Boric'' ([https://www.wikidata.org/wiki/Q16297876 Q16297876]) (as often capitalised in many languages). We rather also recommend linking phrases like ''president'' ([https://www.wikidata.org/wiki/Q30461 Q30461]), ''volcanos'' ([https://www.wikidata.org/wiki/Q8072 Q8072]), etc. | Entities do not need to be ''named entities'', i.e., entities named with a proper noun like "''Gabriel Boric''" ([https://www.wikidata.org/wiki/Q16297876 Q16297876]) (as often capitalised in many languages). We rather also recommend linking phrases like "''president''" ([https://www.wikidata.org/wiki/Q30461 Q30461]), "''volcanos''" ([https://www.wikidata.org/wiki/Q8072 Q8072]), etc. | ||
=== Phrases do not need to be nouns === | === Phrases do not need to be nouns === | ||
Entity phrases do not need to be nouns or noun phrases. For example, in the case of [[Item:Q245|''What are the colors of the French flag?'']], we recommend linking the adjective ''French'' to [https://www.wikidata.org/wiki/Q142 Q142 (France)]; we recommend using the phrase as it appears in the question, e.g., using ''French'' rather than ''France'', keeping nouns phrases plural if they appear so in the question, etc. | Entity phrases do not need to be nouns or noun phrases. For example, in the case of "[[Item:Q245|''What are the colors of the French flag?'']]", we recommend linking the adjective "''French''" to [https://www.wikidata.org/wiki/Q142 Q142 (France)]; we further recommend using the phrase as it appears in the question, e.g., using "''French''" rather than "''France''", keeping nouns phrases plural if they appear so in the question, etc. | ||
== Guidelines for property mentions == | == Guidelines for property mentions == | ||
Line 55: | Line 55: | ||
=== Indicate relevant properties, even if not an exact match === | === Indicate relevant properties, even if not an exact match === | ||
Wikidata tries to keep the set of properties used fairly concise, trying to avoid a proliferation of specific properties. For this reason, property mentions may link to Wikidata properties that are not an exact match, but are undoubtedly relevant for a given phrase. For example, given a | Wikidata tries to keep the set of properties used fairly concise, trying to avoid a proliferation of specific properties. For this reason, property mentions may link to Wikidata properties that are not an exact match, but are undoubtedly relevant for a given phrase. For example, given a question starting with "''How many wives ...''", there is no property for "''wife''" on Wikidata, but there is a property for "''spouse''" ([https://www.wikidata.org/wiki/Property:P26 P26]). We recommend in such a case to link to [https://www.wikidata.org/wiki/Property:P26 P26] as the property is clearly relevant (i.e., will likely be used in a corresponding query), even if a bit more general than the intended mention. As another example, for a question starting "''Who is the president ...''", there is no property for "''president''", so a link can rather be given to [https://www.wikidata.org/wiki/Property:P35 P35 (''head of state'')], even if more general. Below you will find specific ways to link properties that are not an exact match. | ||
=== Choose the appropriate qualifier for linking properties to mentions === | === Choose the appropriate qualifier for linking properties to mentions === |
Latest revision as of 04:22, 6 December 2022
Importance of mentions[edit | edit source]
What is a "mention"? Take the question "What is the capital of Ireland?". We can identify two types of mentions in this case: (1) the phrase "Ireland" is an entity mention that refers to the Wikidata entity (aka item) Q27; (2) the phrase "capital" is a property mention that refers to the Wikidata property P36. Adding mentions relates (sub)phrases of questions and question alises to elements of knowledge-bases such as Wikidata.
If you can add just a question, or a question and a query, then that's very welcome. However, adding mentions can be very useful for Question Answering systems in order to know which parts of the question address which entity or property on knowledge-bases such as Wikidata. This is a non-trivial task to do automatically, and in a highly precise and complete way. For example, it may not be immediately obvious if the phrase "Ireland" in a query refers to Q27, Q22890 or Q1140152 on Wikidata. Knowing how entities or properties are mentioned enable Question Answering systems to better generalise to answering similar questions that involve, for example, a different entity and property. Here we will provide guidelines on how to add mentions linking phrases of questions to specific entities and properties on Wikidata.
What about other knowledge-bases?[edit | edit source]
QAWiki is open to collecting questions, mentions and queries to enable question answering over other open knowledge bases. However, please do keep in mind that Wikidata offers links to a wide range of knowledge bases, including DBpedia, Wikipedia, YAGO, and many, many, more besides. Hence mentions, specifically, can be easily translated automatically via these links to these other knowledge bases, particularly in the case of entities. Aside from this, Wikidata offers a wider selection of entities and (curated) properties when compared to these other sources. For this reason, we currently recommend focusing on adding mentions for Wikidata. In future, we may look at ways in which mentions for other knowledge bases can be added automatically. If there are other open knowledge bases without links from Wikidata, we would rather favour adding the links to Wikidata as such links will be much more reusable (not just for QAWiki).
General guidelines for mentions[edit | edit source]
We start with some general guidelines for mentions applicable to both entity and property mentions.
Phrases are substrings of questions or their aliases[edit | edit source]
Select phrases that are substrings of questions or question aliases and that end at a word boundary (e.g., if a phrase is plural, keep it plural; do not split words). Be sure to select the correct language (we recommend preferring higher-level language codes like fr
unless otherwise justified).
Be liberal when adding mentions[edit | edit source]
We recommend to be quite liberal when adding mentions. There are in some cases many ways to write the same query that may use the same combinations of entities and mentions. For example, for the question "Which U.S. president had the most spouses?", we can offer entity mentions for "U.S." (Q30) and "president" (Q30461) even if there is a specific entity for "U.S. president" (Q11696). Likewise, for a mention like "wives", you can add the property P26 (spouse), which though not expressing exactly the same property, is likely to be used in such a query given the lack of a specific wife property in Wikidata. (See Super property below.)
A mention can link to many entities and/or properties[edit | edit source]
The same phrase can have multiple relevant mentions, and can mix entity and property mentions. For example, in the question "What is the capital of Ireland?", the phrase "capital" corresponds not only to the property P36 (capital), but also the entity Q5119 (capital city). We recommend, whenever possible, to add any mentions or links that might be useful in order to interpret the question, translate it to a query, etc.
A question can import mentions from other questions[edit | edit source]
In some cases we may want to add variants of a particular question, or otherwise related questions, that share a lot of the same mentions. In this case, we can import the mentions from another question using the QAWiki property P47 (imports mention from). For example, the questions "What is the largest moon of the Solar System by mass?" and "What is the largest moon of the Solar System by mass?" both import mentions from the base (ambiguous) question "What is the largest moon of the Solar System?". Additional mentions can be added. Imported mentions that do not appear as a phrase anywhere in a question or question alias of the indicated language can safely be ignored. Cycles of P47 (imports mention from) should be avoided.
Avoid duplicating mentions across language dialects[edit | edit source]
In the case of questions in different dialects of the same language, we recommend only adding the mentions that change from the general language. For example, in "What are the colors of the French flag?" (en
) and "What are the colours of the French flag?" (en-UK
), it is sufficient to add en-UK
mentions for "colours" and not repeat mentions for "French", '"'flag", etc., if defined already for en
.
Use discontinuous mention phrases where useful[edit | edit source]
A key mention might not be continuous in a particular query. For example, in the question "What volcanos in Iceland are active?", an important mention is for the Wikidata entity Q1330974 (active volcano). In this case, we recommend using the symbol *
to represent a wildcard that can be used to omit one or more words from the question when specifying the phrase of the mention, such that the phrase for the aforementioned mention becomes "volcanos * active", keeping the order of words in the question. Similar cases can apply for property mentions. For example, in the question alias "On what date was Hey Jude published?", we can link the discontinuous mention "date * published" to the Wikidata property P577 (publication date). In all such cases, *
replaces one or more words in the question.
Phrases of different mentions can overlap[edit | edit source]
We recommend adding overlapping mentions, i.e., phrases referring to entities that overlap in the question (often one will be contained in a larger phrase). For example, for the question "Which U.S. president had the most spouses?", we recommend adding mentions for "U.S." (Q30), "president" (Q30461 / P35) and "U.S. president" (Q11696), even though they overlap.
Guidelines for entity mentions[edit | edit source]
Entities do not need to be named entities[edit | edit source]
Entities do not need to be named entities, i.e., entities named with a proper noun like "Gabriel Boric" (Q16297876) (as often capitalised in many languages). We rather also recommend linking phrases like "president" (Q30461), "volcanos" (Q8072), etc.
Phrases do not need to be nouns[edit | edit source]
Entity phrases do not need to be nouns or noun phrases. For example, in the case of "What are the colors of the French flag?", we recommend linking the adjective "French" to Q142 (France); we further recommend using the phrase as it appears in the question, e.g., using "French" rather than "France", keeping nouns phrases plural if they appear so in the question, etc.
Guidelines for property mentions[edit | edit source]
Indicate relevant properties, even if not an exact match[edit | edit source]
Wikidata tries to keep the set of properties used fairly concise, trying to avoid a proliferation of specific properties. For this reason, property mentions may link to Wikidata properties that are not an exact match, but are undoubtedly relevant for a given phrase. For example, given a question starting with "How many wives ...", there is no property for "wife" on Wikidata, but there is a property for "spouse" (P26). We recommend in such a case to link to P26 as the property is clearly relevant (i.e., will likely be used in a corresponding query), even if a bit more general than the intended mention. As another example, for a question starting "Who is the president ...", there is no property for "president", so a link can rather be given to P35 (head of state), even if more general. Below you will find specific ways to link properties that are not an exact match.
Choose the appropriate qualifier for linking properties to mentions[edit | edit source]
Relating to the previous point, QAWiki provides a number of different ways to link to a property depending on its relation to the mention. These include:
- Direct property (P18 (Wikidata property ID)) Use this when the property is a direct match for the phrase of the mention; for example, in the question "What is the population of Chile?", the phrase "population" can be directly linked via to the Wikidata property P1082 (population) on Wikidata.
- Inverse property (P45 (Wikidata inverse property ID)) Use this when the property is an inverse match for the phrase of the mention; for example, in the question "Who discovered Pluto?", the phrase "discovered" can be inversely linked to the Wikidata property P61 (discoverer or inventor), noting that subject and object are flipped (Clyde Tombaugh discovered Pluto, vs. Pluto has discoverer Tombaugh).
- Super property (P58 (Wikidata super property ID)) Use this when the property is a more general match for the phrase of the mention (and there is no better match); for example, in the question "Who is the prime minister of South Korea?", we can use this qualifier to link the phrase "prime minister" with the more general Wikidata property P6 (head of government) since there is no specific property for "prime minister". (There also exists a qualifier for the sub-property variant: P59 (Wikidata sub property ID).)
- Implicit property (P50 (value for Wikidata property ID)) Use this to annotate an entity mention whose linking property is not explicitly mentioned in the question text; for example, in the question "Which mathematician has the most followers on Twitter?", the entity mention for "mathematician" can be annotated using this qualifier with the Wikidata property P1687 (occupation) (since the question does not mention "occupation" anywhere, and thus an explicit property mention is not applicable). As another example, for the question "Which Chilean companies have U.S. parent companies?", the mention "Chilean" can be annotated with this qualifier to link it to the Wikidata property P17 (country). Another common usage is for gender, where in the question "Who is the richest woman in Germany", the phrase "woman" has value Q6581072 (female) for Wikidata property P21 (sex or gender)).
- Superlatives of properties (P48 (maximum value of Wikidata property ID)/P49 minimum value of Wikidata property ID) Use this to indicate that a phrase refers to a superlative of a property. For example, in the question "What is the heaviest Pokémon?", the phrase "heaviest" can be linked to the Wikidata property P2067 (mass) using the former (maximum value) qualifier. On the other hand, in the question "Is Michael Jackson the oldest child?, the phrase "oldest" can be linked to the Wikidata property P569 (date of birth) using the latter (minimum value) qualifier.
- Existence/non-existence of values for properties (P51 (exists value for Wikidata property ID)/P53 (not exists value for Wikidata property ID)) Use this to indicate that a phrase refers to the existence, or lack thereof respectively, of a value for a given Wikidata property. For example, in the question "Is Justin Bieber dead?", we can use the former qualifier to indicate that "dead" refers to the existence of a value for P570 (date of death) or P20 (place of death) (given the incompleteness of Wikidata, if multiple such properties are indicated, the existence of a value for any such property suffices). On the other hand, in the question "Is Justin Bieber alive?, we can use the latter qualifier to indicate that "alive" refers to the non-existence of a value for P570 (date of death) and P20 (place of death) (given the incompleteness of Wikidata, if multiple such properties are indicated, the non-existence of a value for all such properties suffices). There are also inverse versions of these qualifiers: P55 (exists value for inverse Wikidata property ID)/P56 (not exists value for inverse Wikidata property ID).