OpenCitations Meta RDF dataset of all bibliographic metadata and its provenance information

<p dir="ltr">Compared to the previous version, this release includes metadata related to citing and cited bibliographic resources added in the <a href="https://api.crossref.org/snapshots/monthly/2024/11" rel="noreferrer" target="_blank">November 20...

Full description

Saved in:
Bibliographic Details
Main Author: OpenCitations ​ (3068259) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<p dir="ltr">Compared to the previous version, this release includes metadata related to citing and cited bibliographic resources added in the <a href="https://api.crossref.org/snapshots/monthly/2024/11" rel="noreferrer" target="_blank">November 2024 version of Crossref</a>, as well as the November 2024 dump of <a href="https://japanlinkcenter.org" rel="noreferrer" target="_blank">JaLC</a> (Japan Link Center).</p><p dir="ltr">In this version, we have focused on correcting a specific type of error, namely the erroneous duplication of resources with the same identifier. We have successfully merged:</p><ul><li>100% of duplicated identifiers (datacite:Identifier)</li><li>100% of duplicated responsible agents (foaf:Agent)</li><li>70% of duplicated bibliographic resources (fabio:Expression)</li></ul><p dir="ltr">This dataset contains all the bibliographic metadata and its provenance information (in JSON-LD format) included in OpenCitations Meta. The data and the provenance are organized through a complex structure of folders and subfolders, allowing you to quickly find any entity from its URI. The first level consists of the following folders, provided compressed and separately:</p><p><br></p><ul><li><b>[folder "ar"]</b>: contains the data and provenance of the responsible agent type entities (<a href="http://purl.org/spar/pro/RoleInTime" target="_blank">http://purl.org/spar/pro/RoleInTime</a>);</li><li><b>[folder "br"]</b>: contains the data and provenance of the entities of type bibliographic resource (<a href="http://purl.org/spar/fabio/Expression" target="_blank">http:///purl.org/spar/fabio/Expression</a>);</li><li><b>[folder "id"]</b>: contains the data and provenance of the identifier entities (<a href="http://purl.org/spar/datacite/Identifier" target="_blank">http://purl.org/spar/datacite/Identifier</a>);</li><li><b>[folder "ra"]</b>: contains the data and provenance of the responsible agent type entities (<a href="http://xmlns.com/foaf/0.1/Agent" target="_blank">http://xmlns.com/foaf/0.1/Agent</a>);</li><li><b>[folder "re"]</b>: contains the data and provenance of resource embodiment entities (<a href="http://purl.org/spar/fabio/Manifestation" target="_blank">http://purl.org/spar/fabio/Manifestation</a>).</li></ul><p dir="ltr">The inner folders are named through the <b>supplier prefix</b> of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to <b>06*0</b>).</p><p dir="ltr">After that, the folders have <b>numeric names</b>, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the <b>zipped </b>RDF data.</p><p dir="ltr">At the same level, additional folders containing the <b>provenance </b>are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called <b>prov</b>, also in zipped JSON-LD format.</p><p dir="ltr">For example, data related to the entity is located in the folder /br/06250/10000/1000/1000.zip, while information about provenance in /br/06250/10000/1000/prov/1000.zip</p><p dir="ltr">This version of the dataset contains:</p><ul><li>121,302,680 bibliographic entities</li><li>368,061,399 authors, 2,718,222 editors, and 101,612,475 publishers (counted by their roles, without disambiguating individual</li><li>698,995 publication venues</li></ul><p dir="ltr">The compressed archives total 47GB, using the tar.gz compression algorithm, and expand to 145G when decompressed. The JSON-LD files inside the archives are further compressed using the zip algorithm. It is recommended to process these inner files as compressed without extracting them, to manage data more efficiently.</p><p dir="ltr">Additional information about OpenCitations Meta at the <a href="https://opencitations.net/meta" target="_blank">official webpage</a>.</p><p><br></p>