Jump to content

User:MargaretRDonald/sandbox/Using queries + OpenRefine to improve biota Wikidata

From Wikipedia, the free encyclopedia

Enhancing Australian biodiversity using openRefine

[edit]

Abstract/description

[edit]
  1. The online database IRMNG is used to download a partial list of an author's taxon names. The author whose taxa we are looking for is Humphreys (William F.)
  2. The resulting CSV file is uploaded to openRefine where we learn how to
    1. facet
    2. split columns to give author names
    3. reconcile columns
    4. create a schema
    5. upload some properties to Wikidata

Examples to be used are the

  1. openRefine spreadsheet IRMNG taxlist 20220410WattsV2 csv for Chris H.S. Watts species,
  2. together with the start of a new project for his colleague and collaborator William F. Humphreys based on a query we will form from IRMNG and upload to openRefine

An alternative approach

[edit]

Using the following queries for APNI and AFD taxa:

  1. For genera with APNI ids (and no authority) plus taxon author citation
  2. For species with APNI ids (and no authority)
  3. For genera with AFD ids (and no authority) plus taxon author citation
    1. for AFD arachnid genera (limiting a query)
  4. For species with AFD ids (and no authority)

Modify these queries

[edit]
  1. to pick a family, genus, order

and download the query result as a CSV file

The tasks thereafter closely match those discussed above and include

  1. forming links to the APNI and AFD pages for the taxon
  2. grabbing the authority and the publication from these links

to create lists of authors, taxon year of publication, publication name and page, and again, creating a schema to upload the reconciled authors and publications to wikidata.

What I am hoping to achieve

[edit]

At the end of the session, participants will have learned

  1. how to create a project in openRefine
  2. why & how to facet
  3. how to split a column (and how to undo an action)
  4. how to reconcile a column with its wikidata
  5. some useful GREL functions
  6. how to create a schema for uploading data to wikidata

to ultimately create Wikidata entries like that for Illawarra wisharti.

Relationship to Wiki skills or to the theme

[edit]

This is a useful way to upload bulk data to Wikidata, and should enhance participants' Wikidata knowledge & skills

Username/s

[edit]
  • MargaretRDonald (talk) 21:02, 2 August 2024 (UTC)

Session type

[edit]

Depending on the participants, this would be a short series of online Zoom one-hour sessions with interactions between participants and presenters