Location>code7788 >text

Wikidata introduction and query

Popularity:465 ℃/2025-03-01 17:47:45

  WikidataIt is a large structured open source knowledge graph that provides support for projects such as Wikipedia. We can useSPARQL(Wikidata official TutorialQuery it. SPARQL is a specialRDF(Resource Description Framework)Query language for data model design. RDF organizes data through triples (subject subject, predicate, object object). We canWDQS (WikiData Query Service)Use SPARQL to query Wikidata online.

Introduction to Wikidata

Entities and properties

The most basic concept in Wikidata isEntityandProperty. Due to semantic diversity, each entity and attribute is unique in the form of letters and numbers. The entity is prefixed with Q and the attribute is prefixed with P, such asQ148 (China),P31(Instance of). Entities and attributes respectively constitute the nodes and edges of the entire Wikidata knowledge graph.It should be noted thatThe attribute does not necessarily have to be one edge of the knowledge graph (concatenating two entities). This is very common, such asP31(Instance of) is used to indicate the class to which the entity belongs, and will point to the parent entity corresponding to the entity;P1082(Population) is used to indicate the population of an entity (such as a country) and will directly correspond to a number. In addition, attributes do not necessarily correspond to only one value, as the population changes continuously over time,P1082(Population) There are usually multiple values ​​corresponding to different timestamps.

Data structure

Since online query is subject to Internet speed, you canWikidataDumpsDownload Wikidata's data backup. If downloadedentities/.bz2, you can download the entire Wikidata knowledge graph data into a json file (compressed file is about 100GB, decompressed file is more than 1TB). The json file contains a list, each element in the list is a dictionary, and each dictionary contains the following fields:

  • type: Type, entity item or property property. Most of them are entity items, and property.
  • id: Unique identifier.
  • labels: Multilingual tag, that is, name.
  • descriptions: Multilingual description.
  • aliases: Multilingual alias.
  • claims: Declaration, containing attributes and their corresponding values. The most important fields in the knowledge graph.
  • sitelinks: Links in other wiki projects, such as Chinese wiki, French wiki, etc.
  • pageid: Page ID in sites such as Wikipedia.
  • ns: The namespace where it is located. The entry is usually in the 0 namespace, while the user page is in the 2 namespace.
  • title:title.
  • lastrevid: The last modified version ID.
  • modified: Last modified time.

Among them, statementclaimsThe dictionary structure of the attribute identifierP[...]is the key. Each key corresponds to a list, storing corresponding multiple attribute contents. The attribute content dictionary structure is as follows:

  • mainsnak: The main part, containing attributes and attribute values
    • snaktype: the type of mainsnak. Commonly used are: value means that there is a valid attribute value; novalue means that there is no value; somevalue means that the value is unknown or uncertain.
    • property: The ID of the property (for example: P31).
    • datavalue: The value of the attribute may be different data types, such as entity, time, quantitative data, etc.
      • value: The specific value (for example: Q5, indicating human).
      • type: the type of value, common types are:
        • wikibase-entityid: represents the entity (Item or Property) type.
        • time: indicates time.
        • quantity: represents quantitative data.
        • string: represents a string (for example: a description of text type).
      • datatype: The data type of the value, usually wikibase-item (pointing to another entity) or quantity (representing the quantity).
  • type:nature. Desirable values ​​such as: statement, most common, indicate that this is a valid statement; mediainfo represents information related to media files; sitelinks involves website links.
  • qualifiers: Qualification conditions, used to describe more information about attribute values ​​(usually a list). For example, a certain attribute value may have a time stamp (time limit), a place limit, etc.
    • property: The attribute ID of the qualified condition.
    • datavalue: The value of a limited condition is usually the specific content of a certain attribute.
  • qualifiers-order: Defines the field order of the conditions and helps understand additional information about attributes.
  • rank: declared level. Common levels are:
    • normal: Normal level.
    • preferred: preferred level (if multiple declarations exist).
    • deprecated: Deprecated declaration.
  • references: The source of reference for this statement, usually the cited literature or other source.
    • snaks: The specific content referenced is similar to the declared mainsnak, including the attribute ID and the corresponding value.
    • snaks-order: The order of referenced attributes.
  • id: The unique ID of the declaration (used to distinguish different declarations).

SPARQL-Wikidata query

The following is a simple example to introduce how to use SPARQL to query Wikidata knowledge graphs. Syntax is similar to SQL, usingSELECTandWHERE, the key is to use subject-predicate object triplets to form a query.

Basic usage — Subject-predicate object triplets

Inquiry of all Bach’s children:

SELECT ?child
WHERE
{
# ?child  father   Bach
  ?child wdt:P22 wd:Q1339.
}

in?childFor the variable to be queried, you can name it at will;wdRepresents the abbreviation of WikiData, used to refer to entities;wdtRepresents WikiData Truthy, which refers to the value of the entity attribute. The statements are connected: there are attributesfather (P22)The value is entityBach (Q1339)entity. The above will query the list of entity IDs. If you want to add a tag column to the list, add a magic statement from wikidata:

SELECT ?child ?childLabel
WHERE
{
# ?child  father   Bach
  ?child wdt:P22 wd:Q1339.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

inwikibase:labelFixed fields for the specified entity label. Tag variables?childLabelThe naming of the entity is fixed: the variable of the entity to be queryed must be prefixed, i.e.?child; and the fields to be searched with capital letters as the suffix, that isLabel

  Among the children born to Maria Barbara Bach, both composers and pianists:

SELECT ?child
WHERE
{
  ?child wdt:P22 wd:Q1339; 
         wdt:P25 wd:Q57487; # P25: mother  
         wdt:P106 wd:Q36834, wd:Q486748. # P106: occupation  
}

The semicolon is displayed here;comma,and period.usage and difference. If a semicolon can be used to omit the subject?child, commas can be used to omit predicatewdt:P106

Bach's (external) grandson/daughter:

SELECT ?grandChild
WHERE
{
  wd:Q1339 wdt:P40 ?child. # P40: child  
  ?child wdt:P40 ?grandChild.
}

Here is a multi-hop query, that is, pass?childAs an intermediate springboard, it is limited?grandChildWith BachQ1339relationship. It can be further simplified to:

SELECT ?grandChild
WHERE
{
  wd:Q1339 wdt:P40 [ wdt:P40 ?grandChild ].
}

Use square brackets above[]To express "there is?grandChildFor the entity of the child", thus omitting the previous springboard variable?child. It can be understood as a compound sentence using an attributive clause:Bach has a child who has a child ?grandchild.

Symbol expansion/*+|

Symbol/Represents attribute path, used to connect multiple attributes, forming a multi-hop query; symbol*+Similar to regular expressions,*Indicates matching zero or more attributes,+Indicates matching one or more attributes; symbols|Indicates "or". The following shows the code example.

  All works of art:

SELECT ?work ?workLabel
WHERE
{
  ?work wdt:P31/wdt:P279* wd:Q838948. # instance of any subclass of work of art
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}

All descendants of Bach:

SELECT ?descendant ?descendantLabel
WHERE
{
  wd:Q1339 wdt:P40+ ?descendant.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}

  All descendants of Bach:

SELECT ?descendant ?descendantLabel
WHERE
{
  ?descendant (wdt:P22|wdt:P25)+ wd:Q1339.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}

In addition, the question marks?It also has semantics, indicating that it matches one or zero elements.

Qualifiers

Use Qualifiers to fine-grained limits on the entity to be queried.

Sort and quantity limits

useORDER BYSort the query results withLIMITLimit the maximum number of results returned.

  Return to the top ten sovereign countries with the largest population in descending order:

SELECT ?country ?countryLabel ?population
WHERE
{
  ?country wdt:P31/wdt:P279* wd:Q3624078; # P31: instance of; P279: subclass of; Q3624078: sovereign state
           wdt:P1082 ?population. # P1082: population
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?population)
LIMIT 10

  DESCIndicates descending order,ASCIndicates ascending order, default ascending order.

Optional criteria

If you want to return some fields but do not want them to affect the search conditions, you can useOPTIONAL

  All Arthur Conan Doyle books, and the searched fields titles, etc. are optional:

SELECT ?book ?title ?illustratorLabel ?publisherLabel ?published
WHERE
{
  ?book wdt:P50 wd:Q35610. # P50: auther; Q35610: Arthur Conan Doyle
  OPTIONAL { ?book wdt:P1476 ?title. }
  OPTIONAL { ?book wdt:P110  ?illustrator. }
  OPTIONAL { ?book wdt:P123  ?publisher. }
  OPTIONAL { ?book wdt:P577  ?published. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}

The matching of title, illustrator, etc. does not affect?book wdt:P50 wd:Q35610.If the result of , the value will be returned if it can be matched, otherwise the value will be filled in the blanks. Note the difference from the following code:

SELECT ?book ?title ?illustratorLabel ?publisherLabel ?published
WHERE
{
  ?book wdt:P50 wd:Q35610.
  OPTIONAL {
    ?book wdt:P1476 ?title;
          wdt:P110 ?illustrator;
          wdt:P123 ?publisher;
          wdt:P577 ?published.
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

This isOPTIONALIf any field in the do not match, fill in the blanks.

expressionFILTERandBIND

Use expressions to judge and other operations. Use it to record again.

combinationGROUPING

Combine the results through a certain field. Use it to record again.