User:Monkbot/task 19: cite iucn update
Task 19 was originally conceived to update, from the IUCN Red List API, the 13,000 or so articles that use {{cite IUCN}}
where |url=
holds an old-form IUCN url. These articles are listed in Category:cite IUCN maint (1,161).
There are several old-form urls (not all of these work):
- http://www.iucnredlist.org/details/22718564/all
- http://www.iucnredlist.org/details/22718564/full
- http://www.iucnredlist.org/details/full/22718564/0
- http://www.iucnredlist.org/details/22718564/0
- http://www.iucnredlist.org/details/22718564/
- http://www.iucnredlist.org/details/22718564
- http://www.iucnredlist.org/details/summary/22718564
- http://www.iucnredlist.org/search/details.php/22718564/all
- http://www.iucnredlist.org/search/details.php/22718564/summ
- http://oldredlist.iucnredlist.org/details/22718564/0
Old-form urls are considered 'old-form' because (when they work) they always point to the current assessment.
Most of these old-form urls are used in {{cite IUCN}}
templates that are found in the |status_ref=
parameter of {{speciesbox}}
and {{taxobox}}
templates (collectively hereafter 'taxobox') to support the values in the taxobox |status=
and |status_system=
parameters. Because values for |status=
(IUCN uses the term 'category') and for |status_system=
can be extracted or derived from the results of an additional IUCN API call, task 19 was expanded to support updating these taxobox parameters.
IUCN API
[edit]This task is generally slow. IUCN do not want anyone or anything hammering away at their API as fast as possible so task 19's calls to the IUCN API are spaced about 3 seconds apart. To accomplish this, the AWB Bots→Auto save→Delay setting is 3 seconds. This prevents task 19 from making edits that require only a single IUCN API call too quickly. For edits that require multiple IUCN API calls, task 19 imposes a 3-second pause before executing each IUCN API call after the first one.
IUCN API calls require a token. While the code for this task is published, the task's token is not. Anyone considering reuse of this code must obtain their own token; do not use the publicly available demo token.
Task 19 fetches data from the IUCN API in four forms; two of species data and two of species citations. These examples are for Anthus roseatus (the name) and 22718564
(the taxon id). The IUCN API returns for Anthus roseatus (name) and 22718564 (taxon id) are:
- name:
{"name":"Anthus roseatus","result":[{"taxonid":22718564,"scientific_name":"Anthus roseatus","kingdom":"ANIMALIA","phylum":"CHORDATA","class":"AVES","order":"PASSERIFORMES","family":"MOTACILLIDAE","genus":"Anthus","main_common_name":"Rosy Pipit","authority":"Blyth, 1847","published_year":2019,"assessment_date":"2019-06-13","category":"LC","criteria":null,"population_trend":"Stable","marine_system":false,"freshwater_system":true,"terrestrial_system":true,"assessor":"BirdLife International","reviewer":"Smith, D.","aoo_km2":null,"eoo_km2":"3530000","elevation_upper":5000,"elevation_lower":2700,"depth_upper":null,"depth_lower":null,"errata_flag":null,"errata_reason":null,"amended_flag":null,"amended_reason":null}]}
- taxon id:
{"name":"22718564","result":[{"taxonid":22718564,"scientific_name":"Anthus roseatus","kingdom":"ANIMALIA","phylum":"CHORDATA","class":"AVES","order":"PASSERIFORMES","family":"MOTACILLIDAE","genus":"Anthus","main_common_name":"Rosy Pipit","authority":"Blyth, 1847","published_year":2019,"assessment_date":"2019-06-13","category":"LC","criteria":null,"population_trend":"Stable","marine_system":false,"freshwater_system":true,"terrestrial_system":true,"assessor":"BirdLife International","reviewer":"Smith, D.","aoo_km2":null,"eoo_km2":"3530000","elevation_upper":5000,"elevation_lower":2700,"depth_upper":null,"depth_lower":null,"errata_flag":null,"errata_reason":null,"amended_flag":null,"amended_reason":null}]}
The citation data returns are:
- name:
{"name":"Anthus roseatus","result":[{"citation":"BirdLife International 2019. Anthus roseatus. The IUCN Red List of Threatened Species 2019: e.T22718564A152671411. https://dx.doi.org/10.2305/IUCN.UK.2019-3.RLTS.T22718564A152671411.en .Downloaded on 21 September 2021"}]}
- taxon id:
{"name":"22718564","result":[{"citation":"BirdLife International 2019. Anthus roseatus. The IUCN Red List of Threatened Species 2019: e.T22718564A152671411. https://dx.doi.org/10.2305/IUCN.UK.2019-3.RLTS.T22718564A152671411.en .Downloaded on 21 September 2021"}]}
taxobox updates
[edit]Task 19 confirms, updates, or adds taxobox parameters |status=
, |status_system=
, and |status_ref=
using data extracted from the IUCN API. The IUCN API data are fetched using a binomial species name; task 19 does not attempt to fetch IUCN API data using the taxon id found in any existing IUCN references in the taxobox. For taxobox updates, task 19 attempts to get the binomial from various taxobox parameters:
{{speciesbox}}
parameters|taxon=
|genus=
+|species=
|name=
{{taxobox}}
parameters|binomial=
|name=
when the taxobox has none of the above parameters, task 19 will use the article title in the IUCN API call.
Task 19 does not confirm, update, or add |status=
, |status_system=
, and |status_ref=
when:
- the binomial is not a binomial; usually because the taxobox or article title uses only the genus portion of the binomial
- the IUCN API does not recognize the binomial as a valid name. When this happens task 19 adds Category:Taxobox binomials not recognized by IUCN and a hidden comment with the unrecognized binomial. Reasons that the IUCN API might not recognize the binomial are:
- misspellings
- typos
- extraneous text
- species name might not be 'globally assessed' but instead be 'regionally assessed' – the taxobox does not specify the region of an assessment so task 19 cannot use the regional form of the citation API call
- IUCN API does not support the redirect-like behavior for binomials as the search box at https://www.iucnredlist.org/ does
{{speciesbox}}
parameters |status2=
, |status2_system=
, and |status2_ref=
are not handled in the same way as their non-enumerated counterparts. This is because there are relatively few instances of the enumerated forms (~25 according to this search 2021-09-20). |status2_ref=
may be updated by subsequent task 19 processes but |status2=
and |status2_system=
will not be.
{{automatic taxobox}}
and {{subspeciesbox}}
support |status=
, |status_system=
, and |status_ref=
but task 19 does not attempt to update these parameters as a group because the use of these parameters in those templates is comparatively rare and because species names upon which task 19 depends are inconsistent in comparison to {{speciesbox}}
and {{taxobox}}
. Task 19 may choose to update the content of |status_ref=
in these templates if the parameter uses an old-form url or is a plain-text citation but will not attempt to update |status=
and |status_system=
nor will it remove duplicate |status_ref=
references.
IUCN status
[edit]From the IUCN API call for species data using the binomial, task 19 extracts the category
value and the assessment_date
value. The species IUCN status is confirmed when |status=
has the same value as the category returned from the IUCN API. When they are different, task 19 updates |status=
to the value from the IUCN API. When |status=
is missing (because it was never there or because an empty parameter was deleted) task 19 updates |status=
or adds a new |status=
at the end of the taxobox. Updates, confirmation, and additions are noted in the edit summary.
IUCN status displayed on an IUCNredlist web page may be different from the category returned from the IUCN API – task 19 uses the IUCN API's category; cf. (as of 2021-09-22):
- NT (from the Zenia insignis web page)
- LR/nt (from the IUCN API):
{"name":"32462","result":[{"taxonid":32462,"scientific_name":"Zenia insignis","kingdom":"PLANTAE","phylum":"TRACHEOPHYTA","class":"MAGNOLIOPSIDA","order":"FABALES","family":"FABACEAE","genus":"Zenia","main_common_name":null,"authority":"Chun","published_year":1998,"assessment_date":"1998-01-01","category":"LR/nt","criteria":null,"population_trend":null,"marine_system":false,"freshwater_system":false,"terrestrial_system":true,"assessor":"World Conservation Monitoring Centre","reviewer":"","aoo_km2":null,"eoo_km2":null,"elevation_upper":null,"elevation_lower":null,"depth_upper":null,"depth_lower":null,"errata_flag":null,"errata_reason":null,"amended_flag":null,"amended_reason":null}]}
IUCN status system
[edit]To update or add a taxobox |status_system=
parameter, task 19 extracts the year portion from the IUCN API's assessment_date
value. If the assessment year is 2000 or earlier, task 19 sets |status_system=IUCN2.3
otherwise |status_system=IUCN3.1
. The threshold date is taken from Wikipedia:Conservation status. When |status_system=
is missing, task 19 adds a new parameter at the end of the taxobox. Updates and additions are noted in the edit summary, confirmations are not.
IUCN status reference
[edit]To update or add |status_ref=
, task 19 inspects the parameter value for a date that task 19 would have written (<ref name="iucn status date">...</ref>
) or the existing citation's |access-date=
(in that order). When a date can be extracted from one of these, it is compared to the current date. Task 19 will attempt to update |status_ref=
only when the difference between the current date and the reference date is greater than six months or when no date can be extracted. This six-month limit was arbitrarily chosen on the presumption that IUCN updates their database twice a year.
Task 19 will not update templated citations in |status_ref=
if the citation has one of:
|amends=<year>
|errata=<year>
Similarly, task 19 will not update plain-text citations in |status_ref=
if the citation has one of:
- (amended version of <year> assessment)
- (errata version published in <year>)
This because the IUCN API does not provide the <year> of amendment or errata.
When the six month limit is met, and when the citation in |status_ref=
does not hold the amended or errata parameters or strings, task 19 then inspects the associated reference tag:
<ref>
– unnamed reference;- replaces the value assigned to
|status_ref=
with<ref name="iucn status date"><new
{{cite IUCN}}
from IUCN API></ref>
- where
date
inname="iucn status date"
is a copy of the value assigned to the new{{cite IUCN}}
template's|access-date=
parameter
- replaces the value assigned to
<ref name=name>
– named reference:- replaces that reference with
<ref name="iucn status date"><new
{{cite IUCN}}
from IUCN API></ref> - replaces all instances of
<ref name=name />
with<ref name="iucn status date" />
- where
date
inname="iucn status date"
is a copy of the value assigned to the new{{cite IUCN}}
template's|access-date=
parameter
- replaces that reference with
<ref name=name />
– named self-closed reference:- swaps the self-closed reference tag with the reference definition
- replaces the citation as described in 2
- if the definition was (and now the self-closed ref tag is) inside
{{reflist|refs=}}
then the self-closed ref tag is deleted
{{cite IUCN}} template updates
[edit]For {{cite IUCN}}
templates that have old-form urls, task 19 extracts the taxon id from the url and attempts to fetch citation data from the IUCN API using the taxon id. If the IUCN API does not recognize the taxon ID, task 19 will attempt to get a citation from the API by using the value assigned to |title=
in the {{cite IUCN}}
template. When successful, task 19 replaces the old {{cite IUCN}}
template with a new {{cite IUCN}}
template that has parameter values from the IUCN API citation.
When the taxon/assessment ids in a new {{cite IUCN}}
template's |page=
and |doi=
parameters are not the same, the citation is not updated because {{cite IUCN}}
will emit a |doi=
/ |url=
mismatch error message. The mismatch is usually (usually) an indication that the assessment has errata. The citation rendered on an IUCN species web page indicates the errata year but, at the time of this writing, that value is not available in the citation returned from the IUCN API. IUCN have been notified of this discrepancy.
plain-text citation updates
[edit]For the purposes of this task, plain-text references are untemplated IUCN references inside named or unnamed <ref>...</ref>
tags or IUCN references as a line item in an unordered list (*
markup). Task 19 will update plain-text references when it can extract a taxon id from an IUCN page identifier (e.T###A###
), from an IUCN doi (as a doi inside {{doi}}
or as a url), or from an IUCN url.
duplicate citations
[edit]Task 19 will replace named and unnamed references that hold {{cite IUCN}}
templates that match {{cite IUCN}}
in |status_ref=
with <ref name="iucn status date" />
tags. <ref name=name />
associated with named references that hold {{cite IUCN}}
templates that match {{cite IUCN}}
in |status_ref=
are replaced with <ref name="iucn status date" />
tags.
Duplicate references that wholly make up an entry in an unordered list are deleted as redundant.
Task 19 does not remove any other references.
ancillary tasks
[edit]Task 19 may update a {{IUCN status}}
template's status value in its first positional parameter ({{{1|}}}
) from the IUCN API when {{IUCN status}}
has a valid taxon id as its second positional parameter ({{{2|}}}
).
As with all other monkbot tasks, task 19 does not run with AWB general fixes turned on.
abandoned edits
[edit]Task 19 will abandon edits when:
- the article uses
{{r}}
- the article uses
{{#tag:ref}}
parser functions - the number of
{{cite IUCN}}
templates evaluated is equal to the number of IUCN API calls that returned nil values - the article contains
{{bots|deny=monkbot/task 19}}
edit summaries
[edit]Task 19 emits terse edit summaries. An edit summary is a concatenation of one or more of these message fragments:
- IUCN status confirmed (n×) – number of taxobox
|status=
and{{IUCN status}}
values that were confirmed to match the IUCN API returned value; when there is only one confirmation (the most common case), the parenthetical count is omitted - IUCN status updated (n×) – number of taxobox
|status=
and{{IUCN status}}
values that were updated to match the IUCN API returned value; when there is only one update, the parenthetical count is omitted - IUCN status added – a taxobox
|status=
parameter was added using the IUCN API returned value - IUCN status system updated – a taxobox
|status_system=
parameter was updated to match the IUCN API returned value - IUCN status system added – a taxobox
|status_system=
parameter was added using the IUCN API returned value - IUCN status ref updated – a taxobox
|status_ref=
parameter was updated to match the IUCN API returned value - IUCN status ref added – a taxobox
|status_ref=
parameter was added using the IUCN API returned value- [duplicate removed] or [duplicates removed (n×)] – suffix added to 'IUCN status ref updated' or 'IUCN status ref added' messages when duplicate reference(s) have been removed
- IUCN status ref current – the citation in
|status_ref=
is not older than six months - evaluated n template(s) – the number of
{{cite IUCN}}
templates that task 19 inspected for use of old-form urls - n template(s) modified – the number of
{{cite IUCN}}
templates with old-form urls that task 19 updated - evaluated n reference(s) – the number of plain-text references that task 19 inspected
- n reference(s) modified – the number of plain-text references that task 19 updated
- API species nil return (id) (n×) – emitted when IUCN API did not return species data for a given taxon id
- API species nil return (name) (n×) – emitted when IUCN API did not return species data for a given species name
- API cite nil return (n×) – emitted when IUCN API did not return citation data (species name or taxon id)
- unrecognized binomial: binomial – the binomial that task 19 used to fetch data from the IUCN API for the taxobox parameter
- (n/mm:ss.ms) – n is the number of IUCN API calls; mm:ss.ms – minutes, seconds and milliseconds required to process the article
script
[edit]/*
use the iucn api to fetch IUCN categories to update {{taxobox}} and {{speciesbox}} |status= and status_system=
parameters
use the iucn api to fetch assessment citations to update {{taxobox}} and {{speciesbox}} |status_ref= parameters
with current {{cite IUCN}} templates
use the iucn api to fetch assessment citations to update {{cite IUCN}} templates with old-form urls
use the iucn api to fetch IUCN categories to update second positional parameter in {{IUCN status}} templates
source categories:
Category:Taxonomy articles created by Polbot
Category:cite IUCN maint
source searches:
insource:/Downloaded on [0-3][0-9] +[JFMASOND][a-z]+ +[0-9]{4}/
hastemplate:"cite IUCN" -incategory:"Taxobox binomials not recognized by IUCN" -insource:/iucn status [0-9]+[^0-9]+2021/
*/
//---------------------------< P R O C E S S A R T I C L E >--------------------------------------------------
//
//
//
List<string> error_log_list = new List<string>();
public string ProcessArticle(string ArticleText, string ArticleTitle, int wikiNamespace, out string Summary, out bool Skip)
{
Skip = false; // assume that something will be changed
// these use redirect to User:Monkbot/task 19: cite IUCN update
// Summary = "[[User:Monkbot/task 19|Task 19]] (manual dev test): convert/update IUCN references to {{[[Template:cite IUCN|cite IUCN]]}} using data from [[IUCN Red List]] [[API]];";
// Summary = "[[User:Monkbot/task 19|Task 19]] (BRFA trial): convert/update IUCN references to {{[[Template:cite IUCN|cite IUCN]]}} using data from [[IUCN Red List]] [[API]];";
Summary = "[[User:Monkbot/task 19|Task 19]]: convert/update IUCN references to {{[[Template:cite IUCN|cite IUCN]]}} using data from [[IUCN Red List]] [[API]];";
int template_modified_count = 0; // number of cite IUCN templates that were modified from the iucn api
int other_template_modified_count = 0; // number of cite journal/web templates that were converted to {{cite IUCN}}
// reset these static counters
plain_text_modified_count = 0; // number of plain-text citations that were modified from the iucn api
plain_text_count = 0; // total number of plain-text iucn references
api_call_count = 0; // number of api calls made
api_fetch_fail_count = 0; // number of api fetches that failed
api_no_cite_return_count = 0; // number of times that the api returned a non-citation value
api_no_species_return_name_count = 0; // number of times that the api returned a non-species value (species binomial)
api_no_species_return_id_count = 0; // number of times that the api returned a non-species value (species id for {{IUCN status}})
iucn_status_confirmed_count = 0; // number of times that we confirmed the iucn status in taxobox-like templates
iucn_status_updated_count = 0; // number of times that we updated the iucn status in taxobox-like templates
iucn_status_system_updated_count = 0; // number of times that we updated the iucn status system in taxobox-like templates
iucn_template_count = 0; // total number of cite IUCN templates
other_template_count = 0; // total number of cite journal/web templates
parse_fail_count = 0; // number of times that we couldn't parse the api return
page_doi_skip_count = 0; // number of templates or plain-text references skipped because page and doi assessment ID mismatch
status_added = false; // set to true when |status= created
status_system_added = false; // set to true when |status_system created
status_ref_added = false; // set to true when |status_ref= created
status_ref_updated = false; // set to true when |status_ref= updated
status_ref_current = false; // set to true when |status_ref= less than 6 months old
duplicates_removed_count = 0; // number of duplicate status references removed
taxobox_blank = null; // gets blank taxobox as flag
unrecognized_species_name = null; // gets taxobox species name that IUCN doesn't recognize
System.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch(); // set up a stopwatch
stopwatch.Start(); // and start it
if (Regex.Match (ArticleText, @"\{\{\s*#tag:ref").Success)
{
Summary = "Article uses {{#tag:ref}} parser function(s)";
error_log_add ("Article uses " + code_nowiki("{{#tag:ref}}") + " parser function(s)"); // add error message to list
log_errors (ArticleTitle, error_log_list); // dump list to the log file
Skip = true;
return ArticleText;
}
if (Regex.Match (ArticleText, @"\{\{\s*[Rr]\s*\|").Success)
{
Summary = "Article has {{r}} template(s)";
error_log_add ("Article has " + code_nowiki("{{r}}") + " template(s)"); // add error message to list
log_errors (ArticleTitle, error_log_list); // dump list to the log file
Skip = true;
return ArticleText;
}
if (null == api_token)
{
System.IO.StreamReader sr = new System.IO.StreamReader (iucn_api_token_file); // open the api token file for reading
api_token = "?token=" + sr.ReadLine(); // read the token (must be the only thing in the file)
sr.Close(); // close and done
}
if (null == api_token) // but just in case
{
Summary = "Failed to read: " + iucn_api_token_file; // announce failure
error_log_add ("Failed to read: " + iucn_api_token_file); // add error message to list
log_errors (ArticleTitle, error_log_list); // dump list to the log file
Skip = true;
return ArticleText;
}
ArticleText = Regex.Replace (ArticleText, @"[\r\n]+\[\[Category:Taxobox binomials not recognized by IUCN\]\][^\r\n]*", ""); // remove if present; will be restored if necessary
//---------------------------< T A X O B O X >----------------------------------------------------------------
//
// <taxobox> holds the content of {{taxobox}} or {{Speciesbox}} and then is modified by taxobox_update(). The
// source template in <ArticleText> is replaced with an empty skeleton ('{{taxobox}}' or '{{Speciesbox}}' but
// without contents. At the end, this skeleton is replaced with the modified taxobox held in <taxobox>.
//
// The reason for this round-about is to prevent other portions of this script from evaluating and tallying
// the reference in |status_ref=. Also permits easy replacement of references that duplicate the reference in
// |status_ref=.
//
ArticleText = Regex.Replace (ArticleText, hide_non_ref_tag_pattern, hide_non_ref_replace_val);
ArticleText = hide (ArticleText, HIDE_ALL_BUT_TAXOBOX); // hide all templates except taxobox-like templates
ArticleText = hide (ArticleText, HIDE_ALL_BUT_TAXOBOX); // hide all templates except taxobox-like templates
//if (1 == 1) return ArticleText;
string taxobox = taxobox_get (ArticleText);
taxobox_status_ref = null; // reset the 'new' value for |status_ref; used at the end to remove duplicates
taxobox_status_ref_open_tag = null; // its matching ref open tag
taxobox_status_ref_sc_tag = null; // and its matching self-closed tag
taxobox_update (ref taxobox, ref ArticleText, ArticleTitle); // update the taxobox |status=, |status_system=, and |status_ref=
ArticleText = unhide (ArticleText);
//---------------------------< C I T E I U C N U P D A T E S >--------------------------------------------
//
// this segment updates {{cite IUCN}} templates that have old-form urls. There are a variety of old-form urls
// but the most common indicator is the taxon id followed by a zero (0) for the assessment id. This section
// fetches the current citation from the IUCN API using the taxon id (preferred) or the using the 'name' in |title=.
// The 'name' in |title= is presumed to be an italicized binomial
//
// {{cite IUCN}} templates with |ref= holding any value retain the parameter so that {{sfn}} or {{harv}} links
// aren't broken. Any replacement citation that does not use |ref= may have a different author list from the
// 'original' so, when the underlying {{cite journal}} creates a CITEREF id for the new name list, the {{sfn}}
// or {{harv}} links will be broken ...
//
// does not update references in the taxobox (|status_ref= handled above); example: [[Picea abies]]
//
ArticleText = hide (ArticleText, IS_CITE_IUCN); // hide all templates except cite IUCN templates
if (Regex.Match (ArticleText, iucn_template_pattern).Success)
ArticleText = Regex.Replace (ArticleText, iucn_template_pattern,
delegate(Match match)
{
string template = match.Groups[0].Value; // this will be returned if no changes
string ref_param = null;
iucn_template_count++; // bump total number of cite IUCN templates tally
string id = taxon_id_from_old_form_url_get (template);
if (null == id) // not an old-form-url template so ignore it
return template;
if (Regex.Match (template, @"__P1P3__\s*(?:errata|amends)\s*=\s*\d{4}").Success)
{
error_log_add ("[cite IUCN update]: template has |errata= or |amends= parameter (id: " + id + ")");
return template;
}
string name = null;
if (Regex.Match (template, iucn_title).Success)
{
name = Regex.Match (template, iucn_title).Groups[1].Value.Trim();
name = species_name_cleanup (name); // remove markup, extinction markers, disambiguation, etc
}
string api_url_id = api_id_url + id + api_token; // build the url from its various parts
string api_url_name = api_name_url + name + api_token; // build the url from its various parts
string cite_iucn = cite_iucn_get (api_url_id, api_url_name, ArticleTitle, id, name);
if (null == cite_iucn)
return template;
template = Regex.Replace (template, ref_param_empty, "$1"); // remove empty |ref= parameters from template
if (Regex.Match (template, ref_param_not_empty).Success) // if this template has |ref=<something>
ref_param = Regex.Match (template, ref_param_not_empty).Groups[1].Value.Trim(); // get its assigned value
if (null != ref_param)
cite_iucn = Regex.Replace (cite_iucn, @"(\}\})", " |ref=" + ref_param + "$1"); // add the preexisting |ref= param
template_modified_count++;
return cite_iucn;
});
ArticleText = unhide (ArticleText); // unhide all that is hidden
//---------------------------< C I T E J O U R N A L / W E B U P D A T E S >------------------------------
//
// this segment updates {{cite journal}} abd {{cite web}} templates that have iucn urls, or pages or dois. This
// section fetches the current citation from the IUCN API using the taxon id (preferred) or the using the 'name'
// in |title=. The 'name' in |title= is presumed to be an italicized binomial
//
// {{cite journal}} and {{cite web}} templates with |ref= holding any value retain the parameter so that {{sfn}}
// or {{harv}} links aren't broken. Any replacement {{cite IUCN}} that does not use |ref= may have a different
// author list from the 'original' so, when the underlying {{cite journal}} creates a CITEREF id for the new name
// list, the {{sfn}} or {{harv}} links will be broken ...
//
// does not update references in the taxobox (|status_ref= handled above)
//
ArticleText = hide (ArticleText, IS_CITE_OTHER); // hide all templates except cite journal and cite web templates
if (Regex.Match (ArticleText, other_template_pattern).Success)
ArticleText = Regex.Replace (ArticleText, other_template_pattern,
delegate(Match match)
{
string template = match.Groups[0].Value; // this will be returned if no changes
string ref_param = null;
other_template_count++; // bump total number of cite journal/web templates tally
string id = plain_text_taxon_id_get (template); // attempt to get taxon id from page -> doi -> url
if (null == id) // not an 'iucn' template so ignore it
return template;
// cite journal and cite web don't support |errata= or |amends=
// if (Regex.Match (template, @"__P1P3__\s*(?:errata|amends)\s*=\s*\d{4}").Success)
// {
// error_log_add ("[cite IUCN update]: template has |errata= or |amends= parameter (id: " + id + ")");
// return template;
// }
string name = null;
if (Regex.Match (template, iucn_title).Success) // get value assigned to |title=
{
name = Regex.Match (template, iucn_title).Groups[1].Value.Trim();
name = species_name_cleanup (name); // remove markup, extinction markers, disambiguation, etc
}
string api_url_id = api_id_url + id + api_token; // build the api url from its various parts
string api_url_name = api_name_url + name + api_token; // build the api url from its various parts
string cite_iucn = cite_iucn_get (api_url_id, api_url_name, ArticleTitle, id, name);
if (null == cite_iucn)
return template;
template = Regex.Replace (template, ref_param_empty, "$1"); // remove empty |ref= parameters from template
if (Regex.Match (template, ref_param_not_empty).Success) // if this template has |ref=<something>
ref_param = Regex.Match (template, ref_param_not_empty).Groups[1].Value.Trim(); // get its assigned value
if (null != ref_param)
cite_iucn = Regex.Replace (cite_iucn, @"(\}\})", " |ref=" + ref_param + "$1"); // add the preexisting |ref= param
other_template_modified_count++;
return cite_iucn;
});
ArticleText = unhide (ArticleText); // unhide all that is hidden
//---------------------------< P L A I N _ T E X T _ R E F _ U P D A T E >------------------------------------
//
// update plain-text references first in ArticleText and then in the taxobox
//
ArticleText = plain_text_ref_update (ArticleText, ArticleTitle);
// all of these create or rely on <ref iucn status <'date'>>{{cite IUCN}}
if ((status_added || (0 != iucn_status_confirmed_count) || (0 != iucn_status_updated_count)) && (status_ref_added || status_ref_updated || status_ref_current))
taxobox = plain_text_ref_update (taxobox, ArticleTitle); // do not update plain-text references in taxobox because |status_ref= might be plain text
//---------------------------< I U C N P L A I N - T E X T B I B L I O G R A P H Y U P D A T E >--------
//
// this is the plain-text form API id only. Plain-text references in bibliographies must be in unordered list
// markup \n*...\n
//
// known issues:
// because this attempts to locate 'correct' plain-text citations and because any non-template and non-
// wikilink text is plain text, plain text that is part of the unordered list item that is not part of the
// actual IUCN citation will be treated as part of the citation and will be replaced with the {{cite IUCN}}
// template if the API returns a citation for the taxon id.
//
if (Regex.Match (ArticleText, plain_text_bib_pattern).Success) // must have the form \n*plain text\n must be constrained because article is plain text
ArticleText = Regex.Replace (ArticleText, plain_text_bib_pattern,
delegate(Match match)
{
string plain_text = match.Groups[0].Value; // this will be returned if no changes
string taxon_id = plain_text_taxon_id_get (plain_text); // attempt to get taxon id
if (null == taxon_id)
return plain_text; // no taxon id so abandon
if (is_plain_text_rejected (plain_text)) // returns true when plain_text is rejected
return plain_text;
string ref_open = match.Groups[1].Value; // the opening \n*
string ref_close = match.Groups[3].Value; // the closing \n tag
plain_text_count++; // bump total number of plain-text references found
string api_url = api_id_url + taxon_id + api_token; // build the url from its various parts
string cite_iucn = cite_iucn_get (api_url, null, ArticleTitle, taxon_id, null); // go build a {{cite IUCN}} template from the api
if (null == cite_iucn)
return plain_text; // template build failed
plain_text_modified_count++;
return ref_open + cite_iucn + ref_close;
});
//---------------------------< I U C N S T A T U S T E M P L A T E >--------------------------------------
//
// Update status in {{IUCN status|<status>|<taxon id>|<options>}}
//
if (Regex.Match (ArticleText, iucn_status_template_pattern).Success)
ArticleText = Regex.Replace (ArticleText, iucn_status_template_pattern,
delegate(Match match)
{
string template = match.Groups[0].Value; // if no change, return this
string status = null;
string id = null;
if (Regex.Match (template, iucn_status_status).Success)
status = Regex.Match (template, iucn_status_status).Groups[2].Value;
else
return template;
if (Regex.Match (template, iucn_status_id).Success)
id = Regex.Match (template, iucn_status_id).Groups[2].Value;
else
return template;
string species_from_api; // species data from the API will go here
string api_url = api_species_id_url + id + api_token; // build the url from its various parts
species_from_api = api_fetch (api_url, ArticleTitle); // fetch species data from the IUCN API
if (null == species_from_api) // if api_fetch() failed
return template;
string status_from_api = null;
if (Regex.Match (species_from_api, status_from_api_pattern).Success)
status_from_api = Regex.Match (species_from_api, status_from_api_pattern).Groups[1].Value;
else
{
error_log_add ("[iucn status template]: API did not return species data: " + code_nowiki (species_from_api));
api_no_species_return_id_count++;
return template;
}
if (status == status_from_api) // if status same as api status
iucn_status_confirmed_count++; // bump the confirmed count and done
else
{
template = Regex.Replace (template, iucn_status_lead + status, "$1" + status_from_api); // update
iucn_status_updated_count++; // bump the updated count
}
return template;
});
//--------------------------- R E M O V E D U P L I C A T E S T A T U S R E F >-------------------------
//
// convert |status_ref= {{cite IUCN}} template into a regex to find duplicates of itself in ArticleText and
// then replace any duplicates with the |status_ref= self-closed tag from |status_ref=
//
// replaces duplicates in taxobox only after hiding the |status_ref= definition so that we don't lose the definition
//
// problem: if the duplicate is named and is the definition for other self-closed ref tags, all of those tags
// need to be renamed ... argh example: [[Bellamya trochlearis]], [[Catarina pupfish]]
//
if ((null != taxobox_status_ref) && (null != taxobox_status_ref_sc_tag))
{
string taxobox_status_ref_pattern = taxobox_status_ref;
foreach (string symbol in symbols)
taxobox_status_ref_pattern = Regex.Replace (taxobox_status_ref_pattern, symbol, symbol); // convert taxobox_status_ref to a regex pattern
// references in unordered lists always ok to replace
ArticleText = counted_replace (ArticleText, bib_open_ul + taxobox_status_ref_pattern + bib_close_ul, "$1", ref duplicates_removed_count);
// references with unnamed <ref> tags always ok to replace
ArticleText = counted_replace (ArticleText, ref_open_tag_unnamed + @"\s*" + taxobox_status_ref_pattern + @"\s*" + ref_close_tag, taxobox_status_ref_sc_tag, ref duplicates_removed_count);
taxobox = counted_replace (taxobox, ref_open_tag_unnamed + @"\s*" + taxobox_status_ref_pattern + @"\s*" + ref_close_tag, taxobox_status_ref_sc_tag, ref duplicates_removed_count);
taxobox = hide_taxobox_status_ref (taxobox, taxobox_status_ref_open_tag, taxobox_status_ref_pattern); // hide |status_ref= {{cite IUCN}} template so we don't replace it with sc tag
named_status_ref_dup_remove (ref ArticleText, ref taxobox, taxobox_status_ref_pattern, taxobox_status_ref_sc_tag); // remove duplicates
// remove sequential instances of taxobox_status_ref_open_tag_sc TODO: this could be improved
string taxobox_status_ref_open_tag_sc = Regex.Replace (taxobox_status_ref_open_tag, @"([^\>]+)\>", "$1 />");
taxobox = Regex.Replace (taxobox, taxobox_status_ref_open_tag_sc + @"\s*" + taxobox_status_ref_open_tag_sc, taxobox_status_ref_sc_tag);
ArticleText = Regex.Replace (ArticleText, taxobox_status_ref_open_tag_sc + @"\s*" + taxobox_status_ref_open_tag_sc, taxobox_status_ref_sc_tag);
}
//---------------------------< C L E A N U P >----------------------------------------------------------------
if (null != taxobox)
taxobox = unhide (taxobox);
ArticleText = hide (ArticleText, "[Rr]eflist");
while (Regex.Match (ArticleText, reflist_cleanup).Success) // remove self-closed ref tags from {{reflist}} (European fire-bellied toad)
{
ArticleText = Regex.Replace (ArticleText, reflist_cleanup, "$1");
ArticleText = Regex.Replace (ArticleText, @"(\{\{)\s*([Rr]eflist[^\|]*)\s*\|\s*refs\s*=\s*(\}\})", "$1$2$3");
}
ArticleText = unhide (ArticleText);
if (null != taxobox)
ArticleText = Regex.Replace (ArticleText, taxobox_blank_pattern, taxobox);
ArticleText = Regex.Replace (ArticleText, angle_open, "<");
ArticleText = Regex.Replace (ArticleText, angle_close, ">");
//---------------------------< F I N I S H >------------------------------------------------------------------
if (status_added) // build our edit summary
Summary = summary_concat (Summary, " IUCN status added;");
if (0 != iucn_status_confirmed_count) // build our edit summary
Summary = summary_concat (Summary, " IUCN status confirmed" + ((1 < iucn_status_confirmed_count) ? " (" + iucn_status_confirmed_count + "×);" : ";"));
if (0 != iucn_status_updated_count)
Summary = summary_concat (Summary, " IUCN status updated" + ((1 < iucn_status_updated_count) ? " (" + iucn_status_updated_count + "×);" : ";"));
if ((0 != iucn_status_confirmed_count) || (0 != iucn_status_updated_count) || status_added)
{
if (0 != iucn_status_system_updated_count)
Summary = summary_concat (Summary, " IUCN status system updated;");
else if (status_system_added)
Summary = summary_concat (Summary, " IUCN status system added;");
}
string dup_text = "";
switch (duplicates_removed_count)
{
case 0:
dup_text = ";";
break;
case 1:
dup_text = " [duplicate removed];";
break;
default:
dup_text = " [duplicates removed (" + duplicates_removed_count + "×)];";
break;
}
if (status_ref_added)
Summary = summary_concat (Summary, " IUCN status ref added" + dup_text);
if (status_ref_updated)
Summary = summary_concat (Summary, " IUCN status ref updated" + dup_text);
if (status_ref_current)
Summary = summary_concat (Summary, " IUCN status ref current;");
if (0 != plain_text_count) // build our edit summary
{
Summary = summary_concat (Summary, " evaluated " + plain_text_count + " reference" + (1 == plain_text_count ? ";" : "s;"));
if (0 != plain_text_modified_count)
Summary = summary_concat (Summary, " " + plain_text_modified_count + " reference" + (1 == plain_text_modified_count ? " " : "s ") + "modified;");
}
if (0 != iucn_template_count)
{
Summary = summary_concat (Summary, " evaluated " + iucn_template_count + " {{cite IUCN}}" + (1 == iucn_template_count ? ";" : "s;"));
if (0 != template_modified_count)
Summary = summary_concat (Summary, " " + template_modified_count + " template" + (1 == template_modified_count ? " " : "s ") + "modified;");
}
if ((0 != other_template_count) && (0 != other_template_modified_count)) // only report 'other templates' when we modify
{
Summary = summary_concat (Summary, " evaluated " + other_template_count + " other template" + (1 == other_template_count ? ";" : "s;"));
if (0 != other_template_modified_count)
Summary = summary_concat (Summary, " " + other_template_modified_count + " template" + (1 == other_template_modified_count ? " " : "s ") + "modified;");
}
if (0 != page_doi_skip_count)
Summary = summary_concat (Summary, " skipped doi/page mismatch (" + page_doi_skip_count + "×);");
if (0 != api_no_cite_return_count)
Summary = summary_concat (Summary, " API cite nil return (" + api_no_cite_return_count + "×);");
if (0 != api_no_species_return_id_count) // for {{IUCN status}}
Summary = summary_concat (Summary, " API species nil return (id) (" + api_no_species_return_id_count + "×);");
if (0 != api_no_species_return_name_count)
Summary = summary_concat (Summary, " API species nil return (name) (" + api_no_species_return_name_count + "×);");
if (null != unrecognized_species_name)
Summary = summary_concat (Summary, " unrecognized binomial: " + unrecognized_species_name + ";");
stopwatch.Stop(); // stop the stopwatch
TimeSpan ts = stopwatch.Elapsed; // get the elapsed time and tack it onto the edit summary
Summary = Summary + " (" + api_call_count + "/" + String.Format("{0:00}:{1:00}.{2:00}", ts.Minutes, ts.Seconds, ts.Milliseconds / 10) + ");";
if (!status_ref_added && !status_ref_updated && (0 == iucn_status_updated_count)) // iucn_status_updated_count for {{IUCN status}} updates (List of reptiles of North America)
{
if (0 == iucn_template_count)
{
if ((0 != plain_text_count) && (plain_text_count == page_doi_skip_count))
{
error_log_add ("auto-skipped: doi/page mismatch");
Skip = true;
}
if ((0 != plain_text_count) && (plain_text_count == api_no_cite_return_count))
{
error_log_add ("auto-skipped: number of cite IUCN templates is same as number of API citation nil returns");
Skip = true;
}
}
if (0 == plain_text_count)
{
if ((0 != iucn_template_count) && (iucn_template_count == page_doi_skip_count))
{
error_log_add ("auto-skipped: doi/page mismatch");
Skip = true;
}
if ((0 != iucn_template_count) && (iucn_template_count == api_no_cite_return_count))
{
error_log_add ("auto-skipped: number of plain-text citations is same as number of API citation nil returns");
Skip = true;
}
}
}
if ("" == ArticleText) // trap to see if the 'blanked' pages that sometimes occur are the fault of this script
{
error_log_add ("auto-skipped: ArticleText is empty string"); // error message
Skip = true; // force a skip
}
if (0 != error_log_list.Count)
log_errors (ArticleTitle, error_log_list);
return ArticleText;
}
//===========================<< S U P P O R T >>==============================================================
//---------------------------< N A M E D _ S T A T U S _ R E F _ D U P _ R E M O V E >------------------------
//
//
//
//private string named_status_ref_dup_remove (ref string text, string taxobox_status_ref_pattern, string taxobox_status_ref_sc_tag)
// {
// Match dup_match = Regex.Match (text, @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?([^""\>]+)""?\>\s*" + taxobox_status_ref_pattern + @"\s*\</[Rr][Ee][Ff]\>");
// if (dup_match.Success)
// {
// string name = dup_match.Groups[1].Value; // get the reference's name from <ref name=...> tag
// string ref_tag_replace_pattern = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""""?" + name + @"""""?\s*\>"; // make a <ref name=... > pattern from name
// string sc_replace_pattern = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""""?" + name + @"""""?\s*/\>"; // make a self-closed <ref name=... /> pattern from name
// text = Regex.Replace (text, sc_replace_pattern, taxobox_status_ref_sc_tag); // replace any <ref name=... /> with <ref name="iucn status <date> /> sc tag
// text = counted_replace (text, ref_open_tag_named + @"\s*" + taxobox_status_ref_pattern + @"\s*" + ref_close_tag, taxobox_status_ref_sc_tag, ref duplicates_removed_count); // now remove any duplicates
// return sc_replace_pattern;
// }
// return null;
// }
private void named_status_ref_dup_remove (ref string article_text, ref string taxobox, string taxobox_status_ref_pattern, string taxobox_status_ref_sc_tag)
{
Match dup_match;
string name = null;
string ref_tag_replace_pattern = null;
string sc_replace_pattern = null;
dup_match = Regex.Match (taxobox, @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?([^""\>]+)""?\>\s*" + taxobox_status_ref_pattern + @"\s*\</[Rr][Ee][Ff]\>");
while (dup_match.Success)
{
name = dup_match.Groups[1].Value; // get the reference's name from <ref name=...> tag
ref_tag_replace_pattern = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""""?" + name + @"""""?\s*\>"; // make a <ref name=... > pattern from name
sc_replace_pattern = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""""?" + name + @"""""?\s*/\>"; // make a self-closed <ref name=... /> pattern from name
taxobox = Regex.Replace (taxobox, sc_replace_pattern, taxobox_status_ref_sc_tag); // replace any <ref name=... /> with <ref name="iucn status <date> /> sc tag
article_text = Regex.Replace (article_text, sc_replace_pattern, taxobox_status_ref_sc_tag); // replace any <ref name=... /> with <ref name="iucn status <date> /> sc tag
taxobox = counted_replace (taxobox, ref_tag_replace_pattern + @"\s*" + taxobox_status_ref_pattern + @"\s*" + ref_close_tag, taxobox_status_ref_sc_tag, ref duplicates_removed_count); // now remove any duplicates
dup_match = Regex.Match (taxobox, @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?([^""\>]+)""?\>\s*" + taxobox_status_ref_pattern + @"\s*\</[Rr][Ee][Ff]\>");
}
dup_match = Regex.Match (article_text, @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?([^""\>]+)""?\>\s*" + taxobox_status_ref_pattern + @"\s*\</[Rr][Ee][Ff]\>");
while (dup_match.Success)
{
name = dup_match.Groups[1].Value; // get the reference's name from <ref name=...> tag
ref_tag_replace_pattern = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?" + name + @"""?\s*\>"; // make a <ref name=... > pattern from name
sc_replace_pattern = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""""?" + name + @"""""?\s*/\>"; // make a self-closed <ref name=... /> pattern from name
article_text = Regex.Replace (article_text, sc_replace_pattern, taxobox_status_ref_sc_tag); // replace any <ref name=... /> with <ref name="iucn status <date> /> sc tag
taxobox = Regex.Replace (taxobox, sc_replace_pattern, taxobox_status_ref_sc_tag); // replace any <ref name=... /> with <ref name="iucn status <date> /> sc tag
article_text = counted_replace (article_text, ref_tag_replace_pattern + @"\s*" + taxobox_status_ref_pattern + @"\s*" + ref_close_tag, taxobox_status_ref_sc_tag, ref duplicates_removed_count); // now remove any duplicates
dup_match = Regex.Match (article_text, @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?([^""\>]+)""?\>\s*" + taxobox_status_ref_pattern + @"\s*\</[Rr][Ee][Ff]\>");
}
}
//---------------------------< H I D E _ T A X O B O X _ S T A T U S _ R E F >--------------------------------
//
//
//
private string hide_taxobox_status_ref (string taxobox, string taxobox_status_ref_open_tag, string taxobox_status_ref_pattern)
{
Match dup_match = Regex.Match (taxobox, "(" + taxobox_status_ref_open_tag +")(" + taxobox_status_ref_pattern + ")"); // look for and capture |status_ref= definition
if (dup_match.Success)
{
string hidden_status_ref = hide (dup_match.Groups[2].Value, IS_TAXOBOX); // spoof to hide {{cite IUCN}} in |status_ref=
return Regex.Replace (taxobox, "(" + taxobox_status_ref_open_tag +")(" + taxobox_status_ref_pattern + ")", "$1" + hidden_status_ref); // replace with the hidden definition
}
return taxobox;
}
//---------------------------< I U C N P L A I N - T E X T R E F E R E N C E U P D A T E >--------------
//
// this is the plain-text form API id only. Plain-text citations must be wrapped with <ref ...>...</ref> tags
//
// known issues:
// because this attempts to locate 'correct' plain-text citations and because any non-template and non-
// wikilink text is plain text, plain text inside <ref ...>...</ref> that is not part of the actual IUCN
// citation will be treated as part of the citation and will be replaced with the {{cite IUCN}} template
// if the API returns a citation for the taxon id.
//
// does not update plain-text references in the taxobox (|status_ref= handled above); example: [[Picea abies]]
//
private string plain_text_ref_update (string text, string article_title)
{
if (Regex.Match (text, plain_text_ref_pattern).Success) // must have the form <ref ...>plain text</ref> must be constrained because article is plain text
text = Regex.Replace (text, plain_text_ref_pattern,
delegate(Match match)
{
string plain_text = match.Groups[0].Value; // this will be returned if no changes
string taxon_id = plain_text_taxon_id_get (plain_text); // attempt to get taxon id
if (null == taxon_id)
return plain_text; // no taxon id so abandon
if (is_plain_text_rejected (plain_text)) // returns true when plain_text is rejected
return plain_text;
string ref_open = match.Groups[1].Value.Trim(); // the opening <ref> tag
string ref_close = match.Groups[3].Value.Trim(); // the closing </ref> tag
plain_text_count++; // bump total number of plain-text references found
string api_url = api_id_url + taxon_id + api_token; // build the url from its various parts
string cite_iucn = cite_iucn_get (api_url, null, article_title, taxon_id, null); // go build a {{cite IUCN}} template from the api
if (null == cite_iucn)
return plain_text; // template build failed
plain_text_modified_count++;
return ref_open + cite_iucn + ref_close;
});
return text;
}
//---------------------------< T A X O B O X _ G E T >--------------------------------------------------------
//
// gets the {{taxobox}} or {{speciesbox}} template from <article_text>
//
private string taxobox_get (string article_text)
{
if (Regex.Match (article_text, taxobox_template_pattern).Success)
return Regex.Match (article_text, taxobox_template_pattern).Groups[0].Value;
return null;
}
//---------------------------< T A X O B O X _ U P D A T E >--------------------------------------------------
//
// updates |status=, |status_system=, and |status_ref= parameters; returns true when updated; false else
//
private bool taxobox_update (ref string taxobox, ref string article_text, string article_title)
{
if (null == taxobox) // if no taxobox
return false;
taxobox_blank = Regex.Replace (taxobox, taxobox_template_pattern, "$1$3");
taxobox = Regex.Replace (taxobox, stray_dot, "$1"); // delete stray . because I found one such
taxobox = Regex.Replace (taxobox, stray_splat, "$1"); // delete stray * because I found one such
taxobox = Regex.Replace (taxobox, stray_equal, "$1"); // delete stray = because I found one such
taxobox = Regex.Replace (taxobox, stray_nbsp, "$1"); // delete stray because I found one such
taxobox = Regex.Replace (taxobox, html_comment, "$1"); // and html comments (Euconocephalus remotus)
string taxobox_status_val = null;
string taxobox_status_system_val = null;
string taxobox_status_ref_val = null;
string taxobox_status_ref_type = null;
string taxobox_status_ref_name = null; // original name from <ref name="original name"> or <ref name="original name" />
bool taxobox_status_ref_is_empty = false;
string taxobox_status_date = null;
int taxobox_status_date_diff = 100;
string taxobox_species_name_val = null;
string api_status_val = null;
string api_status_system_val = null;
taxobox_species_name_val = taxobox_species_name_get (taxobox, article_title); // get species name from taxobox or article title
if (api_species_data_get (taxobox_species_name_val, ref api_status_val, ref api_status_system_val, article_title))
{ // when here presume that we can also get citation data from api
taxobox_status_val = taxobox_status_get (taxobox);
taxobox_status_system_val = taxobox_system_get (taxobox);
if ((((null != taxobox_status_val) && is_iucn_status (taxobox_status_val)) || // has a value that is an IUCN status or
((null != taxobox_status_system_val) && is_iucn_system (taxobox_status_system_val))) || // has a value that is an IUCN system or
((null == taxobox_status_val) && (null == taxobox_status_system_val))) // both are missing or empty
{
taxobox_status_update (ref taxobox, api_status_val, taxobox_status_val);
taxobox_system_update (ref taxobox, api_status_system_val, taxobox_status_system_val);
}
else
return false;
taxobox_status_ref_val = taxobox_status_ref_get (taxobox, ref taxobox_status_ref_is_empty);
if (null != taxobox_status_ref_val)
{
if (Regex.Match (taxobox_status_ref_val, amended_text).Success)
{
error_log_add ("taxobox_update(): plain-text |status_ref= has amended text");
return false;
}
if (Regex.Match (taxobox_status_ref_val, errata_text).Success)
{
error_log_add ("taxobox_update(): plain-text |status_ref= has errata text");
return false;
}
if (Regex.Match (taxobox_status_ref_val, @"__P1P3__\s*(?:errata|amends)\s*=\s*\d{4}").Success)
{
error_log_add ("taxobox_update(): |status_ref= citation has |errata= or |amends= parameter");
return false;
}
}
taxobox_status_ref_type = taxobox_status_ref_type_get (taxobox_status_ref_val, ref taxobox_status_ref_name);
string api_url = null;
if (("named" == taxobox_status_ref_type) || ("unnamed" == taxobox_status_ref_type) || (null == taxobox_status_ref_type))
{
if (null != taxobox_status_ref_val)
{
taxobox_status_date = taxobox_status_date_get (taxobox_status_ref_val, taxobox_status_ref_name);
taxobox_status_date_diff = taxobox_status_date_diff_get (taxobox_status_date);
}
if (6 < taxobox_status_date_diff)
{
api_url = api_name_url + taxobox_species_name_val + api_token; // build citation url from its various parts
taxobox_status_ref = cite_iucn_get (api_url, null, article_title, null, taxobox_species_name_val); // go build a {{cite IUCN}} template from the api
if (null == taxobox_status_ref)
return false; // template build failed
new_ref_tags_make (taxobox_status_ref, ref taxobox_status_ref_sc_tag, ref taxobox_status_ref_open_tag);
if (null == taxobox_status_ref_val) // if empty or missing
{
if (taxobox_status_ref_is_empty)
{
taxobox = Regex.Replace (taxobox, taxobox_status_ref_empty_pattern, "$1" + taxobox_status_ref_open_tag + taxobox_status_ref + "</ref>$2");
status_ref_added = true;
}
else // here when |status_ref= is missing
{
taxobox = Regex.Replace (taxobox, taxobox_new_stat_sys_ref_pattern, "$1$2|status_ref=" + taxobox_status_ref_open_tag + taxobox_status_ref + "</ref>$2$3");
status_ref_added = true;
}
}
else
{
taxobox = Regex.Replace (taxobox, taxobox_status_ref_pattern, "$1" + taxobox_status_ref_open_tag + taxobox_status_ref + "</ref>");
if ("named" == taxobox_status_ref_type) // go rename all of the self-closed ref tags in article text and in the taxobox
{
article_text = Regex.Replace (article_text, sc_ref_tag_begin + taxobox_status_ref_name + sc_ref_tag_end, taxobox_status_ref_sc_tag);
taxobox = Regex.Replace (taxobox, sc_ref_tag_begin + taxobox_status_ref_name + sc_ref_tag_end, taxobox_status_ref_sc_tag);
}
status_ref_updated = true;
}
}
else
status_ref_current = true;
}
else if ("named_sc" == taxobox_status_ref_type)
{
if (Regex.Match (article_text, ref_def_begin + taxobox_status_ref_name + ref_def_end).Success)
{
taxobox_status_ref_val = Regex.Match (article_text, ref_def_begin + taxobox_status_ref_name + ref_def_end).Groups[0].Value;
taxobox_status_ref_val = unhide (taxobox_status_ref_val);
taxobox_status_date = taxobox_status_date_get (taxobox_status_ref_val, taxobox_status_ref_name);
taxobox_status_date_diff = taxobox_status_date_diff_get (taxobox_status_date);
if (6 < taxobox_status_date_diff)
{
api_url = api_name_url + taxobox_species_name_val + api_token; // build citation url from its various parts
taxobox_status_ref = cite_iucn_get (api_url, null, article_title, null, taxobox_species_name_val); // go build a {{cite IUCN}} template from the api
if (null == taxobox_status_ref)
return false; // template build failed
new_ref_tags_make (taxobox_status_ref, ref taxobox_status_ref_sc_tag, ref taxobox_status_ref_open_tag);
// replace original definition with new sc ref tag
article_text = Regex.Replace (article_text, ref_def_begin + taxobox_status_ref_name + ref_def_end, taxobox_status_ref_sc_tag);
// replace original |status_ref= sc ref tag with new definition
taxobox = Regex.Replace (taxobox, taxobox_status_sc_ref_pattern, "$1" + taxobox_status_ref_open_tag + taxobox_status_ref + "</ref>");
// rename original sc ref tags
article_text = Regex.Replace (article_text, sc_ref_tag_begin + taxobox_status_ref_name + sc_ref_tag_end, taxobox_status_ref_sc_tag);
taxobox = Regex.Replace (taxobox, sc_ref_tag_begin + taxobox_status_ref_name + sc_ref_tag_end, taxobox_status_ref_sc_tag);
status_ref_updated = true;
}
}
else
error_log_add ("taxobox_update(): no definition for: " + code_nowiki (taxobox_status_ref_val));
}
else
{
error_log_add ("taxobox_update(): no " + code_nowiki ("|status_ref="));
}
}
else // here when binomial is not recognized by iucn
{
if (null != taxobox_species_name_val)
{
taxobox_status_val = taxobox_status_get (taxobox); // if either of these then add a maintenance category and ...
taxobox_status_system_val = taxobox_system_get (taxobox); // ... save unrecognized binomial for edit summary only when ...
if ((((null != taxobox_status_val) && is_iucn_status (taxobox_status_val)) || // ... |status= has a value that is an IUCN status or
((null != taxobox_status_system_val) && is_iucn_system (taxobox_status_system_val))) || // |status_system= has a value that is an IUCN system or
((null == taxobox_status_val) && (null == taxobox_status_system_val))) // both are missing or empty (example: Barlow's lark)
{
unrecognized_species_name = Uri.UnescapeDataString (taxobox_species_name_val); // remove percent encoding
string cat_plus_name = "[[Category:Taxobox binomials not recognized by IUCN]]" + " <!-- " + unrecognized_species_name + " -->";
MatchCollection matches = Regex.Matches (article_text, @"__WL1NK_O__[Cc]ategory:.+__WL1NK_C__"); // find all of the categories
if (0 != matches.Count) // non-zero when categories found
{
int index = matches.Count - 1; // make an indexer from Count and then replace last one with itself + our category
article_text = Regex.Replace (article_text, matches[index].Value, matches[index].Value + '\n' + cat_plus_name);
}
else // here when no categories; look for stub templates
{
matches = Regex.Matches (article_text, @"__0P3N__.+\-stub__CL0S3__"); // find all of the stub templates
if (0 != matches.Count) // non-zero when stub templates found
article_text = Regex.Replace (article_text, matches[0].Value, cat_plus_name + '\x0A' + '\x0A' + matches[0].Value);
else // here when no categories and no stub templates
article_text = article_text + '\x0A' + cat_plus_name; // no cats and no stub templates, add to the end
}
// binomial may not be recognized for a global assessment but is recognized for a regional assessment;
// this script cannot know which region so cannot use the regional form of the citation API call:
// /api/v3/species/citation/:name/region/:region_identifier?token='YOUR TOKEN'
// binomial may be recognized in iucn search box (as a redirect-like name) but that is not available
// to the API (and if it were probably shouldn't be used)
}
}
}
taxobox = unhide (taxobox);
article_text = Regex.Replace (article_text, taxobox_template_pattern, taxobox_blank); // install a blank so that we don't spend time evaluating the citation in |status_ref=
return true;
}
//---------------------------< N E W _ S E L F _ C L O S E D _ T A G S _ M A K E >----------------------------
//
// makes self-closed and normal <ref> tags for new |status_ref= {{cite IUCN}} reference using |access-date= from
// the {{cite IUCN}} template
//
private void new_ref_tags_make (string cite_iucn, ref string new_self_closed_tag, ref string taxobox_status_ref_open_tag)
{
string date = Regex.Match (cite_iucn, access_date).Groups[1].Value.Trim(); // date from new {{cite IUCN}} |access-date=
new_self_closed_tag = @"<ref name=""iucn status " + date + @""" />"; // make a version to replace short-form ref tags that need to be renamed
taxobox_status_ref_open_tag = @"<ref name=""iucn status " + date + @""">"; // make a version for |status_ref=
}
//---------------------------< T A X O B O X _ S T A T U S _ G E T >------------------------------------------
//
// gets value assigned to {{taxobox}} or {{speciesbox}} |status= parameter; returns that value; status validation
// is done by calling function; returns null if |status= is missing or empty.
//
private string taxobox_status_get (string taxobox_template)
{
if (!Regex.Match (taxobox_template, taxobox_status_missing).Success || Regex.Match (taxobox_template, taxobox_status_empty).Success)
return null; // |status= is missing or empty
return Regex.Match (taxobox_template, taxobox_status_value).Groups[2].Value.Trim();
}
//---------------------------< I S _ I U C N _ S T A T U S >--------------------------------------------------
//
// return true if <status> is known IUCN category; false else
//
private bool is_iucn_status (string status)
{
if (null == status)
return false;
return Regex.Match (status, IS_IUCN_STATUS).Success;
}
//---------------------------< T A X O B O X _ S T A T U S _ U P D A T E >------------------------------------
//
// updates, adds, or confirms |status= in taxobox using value from iucn API
//
private void taxobox_status_update (ref string taxobox, string api_status_val, string taxobox_status_val)
{
if (null == api_status_val) // did api return species data with IUCN category?
return;
if (!Regex.Match (taxobox, taxobox_status_missing).Success) // if |status= not in taxobox
{
taxobox = Regex.Replace (taxobox, taxobox_new_stat_sys_ref_pattern, "$1$2|status=" + api_status_val + "$2$3");
status_added = true;
}
else if (api_status_val != taxobox_status_val)
{
taxobox = Regex.Replace (taxobox, taxobox_status_pattern, "$1" + api_status_val + "$2");
iucn_status_updated_count++;
}
else // here when <api_status_val> == <taxobox_status_val>
iucn_status_confirmed_count++; // bump the confirmed count and done
}
//---------------------------< T A X O B O X _ S Y S T E M _ G E T >------------------------------------------
//
// gets value assigned to {{taxobox}} or {{speciesbox}} |status_system= parameter; returns that value; status_system
// validation is done by calling function; returns null if |status_system= is missing or empty.
//
private string taxobox_system_get (string taxobox_template)
{
if (!Regex.Match (taxobox_template, taxobox_system_missing).Success || Regex.Match (taxobox_template, taxobox_system_empty).Success)
return null; // |status= is missing or empty
return Regex.Match (taxobox_template, taxobox_system_value).Groups[2].Value.Trim();
}
//---------------------------< I S _ I U C N _ S Y S T E M >--------------------------------------------------
//
// return true if <system> is known IUCN category; false else
//
private bool is_iucn_system (string system)
{
if (null == system)
return false;
return Regex.Match (system, IS_IUCN_SYSTEM).Success;
}
//---------------------------< T A X O B O X _ S Y S T E M _ U P D A T E >------------------------------------
//
// updates, adds, or confirms |status_system= in taxobox using value from iucn API
//
private void taxobox_system_update (ref string taxobox, string api_status_system_val, string taxobox_status_system_val)
{
if (null == api_status_system_val) // did api return species data with IUCN category?
return;
if (!Regex.Match (taxobox, taxobox_system_missing).Success) // if |status_system= not in taxobox
{
taxobox = Regex.Replace (taxobox, taxobox_new_stat_sys_ref_pattern, "$1$2|status_system=" + api_status_system_val + "$2$3");
status_system_added = true;
}
else if (api_status_system_val != taxobox_status_system_val)
{
taxobox = Regex.Replace (taxobox, taxobox_system_pattern, "$1" + api_status_system_val + "$2");
iucn_status_system_updated_count++;
}
}
//---------------------------< T A X O B O X _ S T A T U S _R E F _ G E T >-----------------------------------
//
// gets value assigned to {{taxobox}} or {{speciesbox}} |status_system= parameter; returns that value; ref tags,
// ref name, and reference text extracted by calling function
//
private string taxobox_status_ref_get (string taxobox, ref bool taxobox_status_ref_is_empty)
{
if (!Regex.Match (taxobox, taxobox_status_ref_missing).Success)
return null; // |status= is missing
if (Regex.Match (taxobox, taxobox_status_ref_empty).Success)
{
taxobox_status_ref_is_empty = true;
return null; // |status= is empty
}
return Regex.Match (taxobox, taxobox_status_ref_value).Groups[2].Value.Trim();
}
//---------------------------< T A X O B O X _ S T A T U S _ R E F _ T Y P E _ G E T >------------------------
//
// look at opening <ref> tag and return its type (order of evaluation is important here:
// <ref> returns 'unnamed'
// <ref ... name = .../>returns 'named_sc'
// <ref ... name = ...> returns 'named'
// if none of these, or <taxobox_status_ref_val> is null, returns null
//
private string taxobox_status_ref_type_get (string taxobox_status_ref_val, ref string taxobox_status_ref_name)
{
if (null == taxobox_status_ref_val)
return null;
if (Regex.Match (taxobox_status_ref_val, ref_tag_unnamed_pattern).Success)
return "unnamed";
if (Regex.Match (taxobox_status_ref_val, ref_tag_named_sc_pattern).Success) // order here important; named_sc test before named test
{
taxobox_status_ref_name = Regex.Match (taxobox_status_ref_val, ref_tag_named_sc_pattern).Groups[2].Value.Trim();
return "named_sc";
}
if (Regex.Match (taxobox_status_ref_val, ref_tag_named_pattern).Success) // order here important; named test after named_sc test
{
taxobox_status_ref_name = Regex.Match (taxobox_status_ref_val, ref_tag_named_pattern).Groups[2].Value.Trim();
return "named";
}
return null; // should never get here
}
//---------------------------< T A X O B O X _ S T A T U S _ D A T E _ G E T >--------------------------------
//
// attempt to get date of last status update from ref tag (<ref name="iucn status 29 September 2021">) or from
// |access-date= value
//
private string taxobox_status_date_get (string taxobox_status_ref_val, string taxobox_status_ref_name)
{
if ((null != taxobox_status_ref_name) && Regex.Match (taxobox_status_ref_name, preferred_status_ref_tag_name).Success)
return Regex.Match (taxobox_status_ref_name, preferred_status_ref_tag_name).Groups[1].Value.Trim();
taxobox_status_ref_val = unhide (taxobox_status_ref_val);
if (Regex.Match (taxobox_status_ref_val, access_date).Success)
return Regex.Match (taxobox_status_ref_val, access_date).Groups[1].Value.Trim(); // date from |access-date=
return null;
}
//---------------------------< T A X O B O X _ S T A T U S _ D A T E _ D I F _ G E T >------------------------
//
// return the difference in months between today's date and a date from the |status_ref= <ref> tag or from the
// |status_ref= citation's |access-date=
//
// script will not update |status_ref= if date difference is less than 7 months
//
private int taxobox_status_date_diff_get (string date)
{
if (null == date)
{
// error_log_add ("taxobox_status_date_diff_get(): nil date value; forcing update"); // not really an error
return 100; // any value greater than 6 forces citation update attempt
}
int current_month = DateTimeOffset.Now.Month;
int current_year = DateTimeOffset.Now.Year;
string month = null;
string year = null;
foreach(KeyValuePair<string, string> date_pattern in date_patterns)
{
Match match = Regex.Match (date, date_pattern.Value);
if (match.Success)
{
if ("ymd" == date_pattern.Key) // because year precedes month, Group[1] and Group[2] are ordered differently
{
month = match.Groups[2].Value.Trim().ToLower();
year = match.Groups[1].Value.Trim();
}
else // here when dmy or mdy
{
month = match.Groups[1].Value.Trim().ToLower();
year = match.Groups[2].Value.Trim();
}
}
}
if ((null == month) || (null == year))
{
error_log_add ("taxobox_status_date_diff_get(): month and/or year null; forcing update");
error_log_add ("year: " + year);
error_log_add ("month: " + month);
return 100; // any value greater than 6 forces citation update attempt
}
if (months.ContainsKey (month))
return ((current_year - Int32.Parse(year)) * 12) + current_month - months[month];
else
{
error_log_add ("taxobox_status_date_diff_get(): month not recognized: " + month + "; forcing update");
return 100;
}
}
//---------------------------< T A X O B O X _ S P E C I E S _ N A M E _ G E T >------------------------------
//
// attempts to get binomial from various parameters in {{taxobox}} or {{speciesbox}} and failing that the article
// title.
//
// taxobox: |binomial= -> |name= -> article title
// speciesbox: |taxon= -> |genus= + |species= -> |name= -> article title
//
// returns null when <name> is not binomial-like (two words); example [[Africanogyrus]]
//
private string taxobox_species_name_get (string taxobox, string article_title)
{
string template_name = Regex.Match (taxobox, taxobox_template_pattern).Groups[2].Value.ToLower(); // capture is the template name (Taxobox, Speciesbox, etc)
string name = null; // name of this species from various possible parameters in the taxobox template
if ("taxobox" == template_name)
{
if (Regex.Match (taxobox, binomial_pattern).Success)
name = Regex.Match (taxobox, binomial_pattern).Groups[1].Value.Trim(); // use |binomial=
else if (Regex.Match (taxobox, name_pattern).Success)
name = Regex.Match (taxobox, name_pattern).Groups[1].Value.Trim(); // fallback to |name=
}
else if ("speciesbox" == template_name)
{
if (Regex.Match (taxobox, taxon_pattern).Success)
name = Regex.Match (taxobox, taxon_pattern).Groups[1].Value.Trim(); // use |taxon=
else if (Regex.Match (taxobox, genus_pattern).Success && Regex.Match (taxobox, species_pattern).Success)
name = Regex.Match (taxobox, genus_pattern).Groups[1].Value.Trim() + " " + Regex.Match (taxobox, species_pattern).Groups[1].Value.Trim();
else if (Regex.Match (taxobox, name_pattern).Success)
name = Regex.Match (taxobox, name_pattern).Groups[1].Value.Trim(); // fallback to |name=
}
if (null == name) // when none of the above
{
name = article_title; // TODO: don't use article title?
error_log_add ("using article title");
}
name = species_name_cleanup (name); // remove markup, extinction markers, disambiguation, etc
if (!Regex.Match (Uri.UnescapeDataString (name), @"[A-Za-z]+ [A-Za-z]+").Success) // does <name> look like a binomial?
{
error_log_add ("name not a binomial: " + name);
return null;
}
return name;
}
//---------------------------< T A X O N _ I D _ O L D _ F O R M _ U R L _ G E T >----------------------------
//
// loops through a series of old-form IUCN urls and returns the taxon id if the pattern matches; null else
//
private string taxon_id_from_old_form_url_get (string text)
{
foreach (string url_pattern in url_patterns) // loop through a series of old-form url patterns
{
Match url_match = Regex.Match (text, url_pattern);
if (url_match.Success) // if found
return url_match.Groups[1].Value.Trim(); // extract and return the taxon id
}
return null;
}
//---------------------------< P L A I N _ T E X T _ T A X O N _ I D _ G E T >--------------------------------
//
// extract taxon id from IUCN page, doi, or url. For plain-text citations, accept any form of iucn url when
// attempting to get the taxon id; prefer page -> doi -> url; returns taxon id if available, null else
//
private string plain_text_taxon_id_get (string plain_text)
{
if (Regex.Match (plain_text, plain_text_page_taxon_id).Success) // get taxon id from page?
return Regex.Match (plain_text, plain_text_page_taxon_id).Groups[1].Value;
if (Regex.Match (plain_text, plain_text_doi_taxon_id).Success) // get taxon id from doi?
return Regex.Match (plain_text, plain_text_doi_taxon_id).Groups[1].Value;
if (Regex.Match (plain_text, plain_text_taxon_id_url).Success) // get taxon id from url?
return Regex.Match (plain_text, plain_text_taxon_id_url).Groups[1].Value;
return null; // couldn't find taxon id; might not be iucn reference
}
//---------------------------< I S _ P L A I N _ T E X T _ R E J E C T E D >----------------------------------
//
// evaluates <plain_text> looking for things that oughtn't to be there or that are not currently supported
// returns true when <plain_text> is rejected; null else
//
private bool is_plain_text_rejected (string plain_text)
{
if (Regex.Match (plain_text, @"\{\{\s*[Cc]it[ae]").Success) // if 'plain text' has {{cit...}} template
{
// error_log_add ("is_plain_text_rejected(): plain-text has cite template: " + plain_text); // don't do this because it alarms on valid cite IUCN templates
return true; // skip this reference
}
if (Regex.Match (plain_text, amended_text).Success)
{
error_log_add ("is_plain_text_rejected(): plain-text has amended text");
return true; // because API doesn't yet identify amended assessment year
}
if (Regex.Match (plain_text, errata_text).Success)
{
error_log_add ("is_plain_text_rejected(): plain-text has errata text");
return true; // because API doesn't yet identify errata assessment year
}
return false;
}
//---------------------------< S P E C I E S _ N A M E _ C L E A N U P >--------------------------------------
//
// removes stuff that isn't part of the binomial; returns name modified or not.
//
private string species_name_cleanup (string name)
{
name= Regex.Replace (name, "__4ng13_0__", "<"); // unhide html comments that might be part of <name>
name= Regex.Replace (name, "__4ng13_C__", ">");
foreach (string [] cleanup_pattern in cleanup_patterns)
name = Regex.Replace (name, cleanup_pattern[0], cleanup_pattern[1]);
name = name.Trim(); // and remove any leading/trailing whitespace
name = Uri.EscapeDataString (name); // percent encode uri reserved characters
return name;
}
//---------------------------< C I T E _ I U C N _ G E T >----------------------------------------------------
//
// creates {{cite IUCN}} template from api call. Tries <first_url> first and if successful ignores <second_url>
// tries <second_url> else
//
private string cite_iucn_get (string first_url, string second_url, string ArticleTitle, string taxon_id, string species_name)
{
string citation_from_api = null;
string raw_citation = null;
if ((null == first_url) && (null == second_url))
return null;
var urls = new List<string>();
urls.Add (first_url);
urls.Add (second_url);
foreach (string url in urls)
{
if (null != url)
{
citation_from_api = api_fetch (url, ArticleTitle); // fetch citation from the IUCN API
if (null == citation_from_api)
return null;
if (Regex.Match (citation_from_api, citation_from_api_pattern).Success)
{
raw_citation = Regex.Match (citation_from_api, citation_from_api_pattern).Groups[1].Value.Trim();
break;
}
}
}
if (null == raw_citation) // <raw_citation> must have a value
{
string text = "cite_iucn_get(): API did not return citation:";
if (null != taxon_id)
text = text + " id: " + taxon_id;
if (null != species_name)
text = text + " name: " + species_name;
text = text + " " + code_nowiki (citation_from_api);
error_log_add (text);
api_no_cite_return_count++;
return null;
}
string author_list = "";
string date = "";
string title = "";
string volume = "";
string page = "";
string page_assessment = "";
string doi = "";
string doi_assessment = "";
string access_date = "";
Match parse = Regex.Match (raw_citation, parse_pattern);
if (parse.Success)
{
author_list = author_names_get (parse.Groups[1].Value.Trim());
date = @" |date=" + parse.Groups[2].Value.Trim();
title = title_get (parse.Groups[3].Value.Trim());
volume = @" |volume=" + parse.Groups[4].Value.Trim();
page = @" |page=" + parse.Groups[5].Value.Trim();
page_assessment = parse.Groups[6].Value.Trim();
doi = @" |doi=" + parse.Groups[7].Value.Trim();
doi_assessment = parse.Groups[8].Value.Trim();
access_date = @" |access-date=" + parse.Groups[9].Value.Trim();
}
else
{
error_log_add ("cite_iucn_get(): parse failure: " + code_nowiki (citation_from_api));
parse_fail_count++;
return null;
}
if (page_assessment != doi_assessment) // until errata date information available from the API
{
error_log_add ("cite_iucn_get(): doi/page mismatch: page assessment: " + code_nowiki (parse.Groups[5].Value.Trim()));
page_doi_skip_count++; // skip template when page- and doi-assessment ids are mismatched
return null;
}
return @"{{cite IUCN" + author_list + date + title + volume + page + doi + access_date + @"}}";
}
//---------------------------< A P I _ S P E C I E S _ D A T A _ G E T >--------------------------------------
//
// using taxon name, attempt to get species data from the IUCN API.
//
private bool api_species_data_get (string taxobox_species_name_val, ref string api_status_val, ref string api_status_system_val, string article_title)
{
if (null == taxobox_species_name_val) // when taxobox_species_name_get() can't get a binomial-like name
return false;
string api_url = api_species_url + taxobox_species_name_val + api_token; // build a url from its various parts (taxon name)
string species_from_api = api_fetch (api_url, article_title); // fetch species data from the IUCN API (taxon name)
if (null == species_from_api) // if the api call failed
return false; // abandon
if (Regex.Match (species_from_api, status_from_api_pattern).Success) // update <api_status_val> from api return
api_status_val = Regex.Match (species_from_api, status_from_api_pattern).Groups[1].Value;
if (Regex.Match (species_from_api, status_system_from_api_pattern).Success) // update <api_status_system_val> from api return
{
int year = Int32.Parse (Regex.Match (species_from_api, status_system_from_api_pattern).Groups[1].Value); // convert to an integer
api_status_system_val = ((2000 < year) ? "IUCN3.1" : "IUCN2.3"); // and then convert to the appropriate status system
}
if ((null == api_status_val) || (null == api_status_system_val)) // if either of these are null, declare an error
{
error_log_add ("api_species_data_get(): API did not return species data: " + code_nowiki (species_from_api));
api_no_species_return_name_count++;
return false; // and abandon
}
return true;
}
//---------------------------< A P I _ F E T C H >------------------------------------------------------------
//
// calls the iucn api with <api_url>; returns raw data string on success; null else. Bumps the api call counter
//
//
private string api_fetch (string api_url, string ArticleTitle)
{
if (0 < api_call_count) // pause here for 3 seconds if <api_call_count> is greater than 0 (pause is skipped for the first api access)
System.Threading.Thread.Sleep (3000); // this prevents us from banging on the API too quickly
api_call_count++; // bump the call counter
string string_from_api = null;
try
{
// this WebRequest code courtesy of en.wiki editor User:DavidBrooks
System.Net.HttpWebRequest webRequest = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(api_url);
webRequest.UserAgent = "Wikipedia IUCN citation update experiment (https://en.wikipedia.org/wiki/User:Trappist_the_monk)";
System.IO.Stream str = webRequest.GetResponse().GetResponseStream();
string_from_api = new System.IO.StreamReader(str).ReadToEnd();
}
catch
{
error_log_add ("api_fetch(): Exception occurred reading: " + code_nowiki (api_url));
api_fetch_fail_count++;
return null;
}
return string_from_api;
}
//---------------------------< A U T H O R _ N A M E S _ G E T >----------------------------------------------
//
// attempts to extract individual author names from iucn api citation. Derived from [[Module:cite IUCN]] function
// make_cite_iucn()
//
private string author_names_get (string raw_author_list)
{
string collaboration = null;
string pattern = @"(,\s+[A-Z]),"; // for when iucn forgets to include final dot
raw_author_list = Regex.Replace (raw_author_list, pattern, "$1" + ".,");
pattern = @"(\.[A-Z]),"; // for when iucn forgets to include final dot
raw_author_list = Regex.Replace (raw_author_list, pattern, "$1" + ".,");
pattern = @"\s\(([^\)]+)\)$";
if (Regex.Match (raw_author_list, pattern).Success)
{
collaboration = Regex.Match (raw_author_list, pattern).Groups[1].Value.Trim(); // save the collaboration name
raw_author_list = Regex.Replace (raw_author_list, pattern, ""); // remove collaboration from raw_author_list
}
raw_author_list = Regex.Replace (raw_author_list, @"\.?,?\s+&\s+", ".|"); // replace <opt. dot><opt. comma><space><ampersand><space> with <dot><pipe>
raw_author_list = Regex.Replace (raw_author_list, @"\.,\s+", ".|"); // replace <dot><comma><space> with <dot><pipe>
raw_author_list = Regex.Replace (raw_author_list, @"(\.[A-Z]),\s+", "$1.|"); // special case where iucn drops the dot after an initial
string author_list = "";
string[] authors = Regex.Split (raw_author_list, @"\|"); // split the string on the <pipe>
int i = 1;
foreach (string author in authors)
{
if (1 == i)
author_list = author_list + " |author" + "=" + author; // don't enumerate first author
else
author_list = author_list + " |author" + i + "=" + author;
i++;
}
if (null != collaboration)
author_list = author_list + " |collaboration=" + collaboration;
return author_list;
}
//---------------------------< T I T L E _ G E T >------------------------------------------------------------
//
// extracts title from iucn API citation; attempts to add markup so that it renders correctly
//
private string title_get (string raw_title)
{
string title = null; // formatted title goes here
string errata = ""; // errata year, if present, goes here; empty string for concatenation
string amends = ""; // amends year, if present, goes here; empty string for concatenation
string pattern = null;
string replace = null;
foreach (string[] search_and_replace in search_and_replaces)
{
pattern = search_and_replace[0];
replace = search_and_replace[1]; // replace includes wiki markup for title
if (Regex.Match (raw_title, pattern).Success)
{
title = Regex.Replace (raw_title, pattern, replace);
break;
}
}
if (null == title)
{
title = "''" + raw_title + "''"; // pattern not found apply italic markup to raw_title from API citation
// error_log_add ("title_get(): using raw title: " + raw_title); // not really an error
}
pattern = errata_text; // look for an errata string; as of 2021-10-01, errata string not available in API citation
Match match = Regex.Match (title, pattern);
if (match.Success)
errata = " |errata=" + match.Groups[1].Value.Trim();
pattern = amended_text; // look for an amended string; as of 2021-10-01, amended string not available in API citation
match = Regex.Match (title, pattern);
if (match.Success)
amends = " |amends=" + match.Groups[1].Value.Trim();
return " |title=" + title + errata + amends;
}
//---------------------------< H I D E >----------------------------------------------------------------------
//
// HIDE TEMPLATES: find templates that are not <dont_hide>; replace the opening {{ with __0P3N__, the closing }}
// with __CL0S3__, and internal | (pipes) with __P1P3__
//
// single curly braces in urls and other parameter values can confuse other regex in this code so replace {
// with __0CU!21Y__ and } with __CCU!21Y__
//
private string hide (string ArticleText, string dont_hide)
{
string pattern = @"\{\{(?!\s*" + dont_hide + @")[^\{\}]*\}\}";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern,
delegate(Match match)
{
string fixed_template; // a hidden template is assembled here
string raw_template = match.Groups[0].Value; // the whole template
pattern = @"\{\{"; // hide the opening {{
fixed_template = Regex.Replace (raw_template, pattern, "__0P3N__");
pattern = @"\}\}"; // hide the closing }}
fixed_template = Regex.Replace (fixed_template, pattern, "__CL0S3__");
pattern = @"\|"; // and hide the pipes
fixed_template = Regex.Replace (fixed_template, pattern, "__P1P3__");
return fixed_template;
});
}
pattern = @"(\<!\-{2,}\s*[^\>\|\}]*)\{\{(\s*" + dont_hide + @"[^\}]*)\}\}([^\>]*\-{2,}\>)"; // <!-- {{citx...}} -->
ArticleText = Regex.Replace(ArticleText, pattern, "$1__0P3N__$2__CL0S3__$3");
pattern = @"\{\|"; // open table markup
ArticleText = Regex.Replace(ArticleText, pattern, "__0T4BL3__");
pattern = @"\|\}(?!\})"; // close table markup
ArticleText = Regex.Replace(ArticleText, pattern, "__CT4BL3__");
pattern = @"([^\{])\{([^\{])"; // single opening curly brace
ArticleText = Regex.Replace(ArticleText, pattern, "$1__0CU!21Y__$2");
pattern = @"([^\}])\}([^\}])"; // single closing curly brace
ArticleText = Regex.Replace(ArticleText, pattern, "$1__CCU!21Y__$2");
pattern = @"\[\[(?![Ff]ile|[Ii]mage)([^\|\]]+)\|([^\]]+)\]\]"; // HIDE complex wikilinks: [[article title|label]] to __WL1NK_O__article title__P1P3__label__WL1NK_C__
ArticleText = Regex.Replace(ArticleText, pattern, "__WL1NK_O__$1__P1P3__$2__WL1NK_C__"); // [[File: with wikilinks inside can be confusing
pattern = @"\[\[([^\]]+)\]\]"; // HIDE simple wikilinks: [[article title]] to __WL1NK_O__article title__WL1NK_C__
ArticleText = Regex.Replace(ArticleText, pattern, "__WL1NK_O__$1__WL1NK_C__");
return ArticleText;
}
//---------------------------< U N H I D E >------------------------------------------------------------------
//
// UNHIDE TEMPLATES: find templates and wikilinks that are hidden; replace the 'hide' keywords with the
// appropriate wiki markup
//
private string unhide (string ArticleText)
{
ArticleText = Regex.Replace(ArticleText, @"__WL1NK_O__", "[["); // UNHIDE: replace __WL1NK_O__ with [[
ArticleText = Regex.Replace(ArticleText, @"__WL1NK_C__", "]]"); // UNHIDE: replace __WL1NK_C__ with ]]
ArticleText = Regex.Replace(ArticleText, @"__P1P3__", "|"); // UNHIDE: replace __P1P3__ with |
ArticleText = Regex.Replace(ArticleText, @"__0T4BL3__", "{|"); // UNHIDE: replace __0T4BL3__ with {|
ArticleText = Regex.Replace(ArticleText, @"__CT4BL3__", "|}"); // UNHIDE: replace __CT4BL3__ with |}
ArticleText = Regex.Replace(ArticleText, @"__0CU!21Y__", "{"); // UNHIDE: replace __0CU!21Y__ with {
ArticleText = Regex.Replace(ArticleText, @"__CCU!21Y__", "}"); // UNHIDE: replace __CCU!21Y__ with }
ArticleText = Regex.Replace(ArticleText, @"__0P3N__", "{{"); // UNHIDE: replace __0P3N__ with {{
ArticleText = Regex.Replace(ArticleText, @"__CL0S3__", "}}"); // UNHIDE: replace __CL0S3__ with }}
return ArticleText;
}
//---------------------------< S U M M A R Y _ C O N C A T >--------------------------------------------------
//
// concatenates text onto an existing edit summary string, limiting the string to a length of no more than 347
// characters. When <summary> appended with <text> would be longer than the allowed 347 character limit, this
// function replaces <text> with an ellipsis. Once an ellipsis is added, no more <text> can be added to <summary>
//
private string summary_concat (string summary, string text)
{
if (0 <= summary.IndexOf ("...")) // if ellipsis already present in <summary>, abandon
return summary;
if (347 >= (summary.Length + text.Length + 3)) // if adding <text> to summary will overrun the 347 char limit (+ 3 to make sure we can add ellipsis if necessary)
return summary + text; // append <text> to <summary> and done
return summary + "..."; // append ellipsis instead
}
//---------------------------< C O D E _ N O W I K I >--------------------------------------------------------
//
// wraps 'text' in <code><nowiki>text</nowiki></code> tags for error log
//
private string code_nowiki (string text)
{
return "<code><nowiki>" + text + "</nowiki></code>";
}
//---------------------------< E R R O R _ L O G _ A D D >----------------------------------------------------
//
// adds an error message to the error log list. Probably superfluous.
//
private void error_log_add (string message)
{
error_log_list.Add (message);
}
//---------------------------< L O G _ E R R O R S >----------------------------------------------------------
//
// writes the content of the error log list to the log file, prettified with wiki markup.
//
private void log_errors (string article_title, List<string> error_log_list)
{
System.IO.StreamWriter sw;
string time = DateTimeOffset.Now.ToString("u").Substring (11, 9);
string date = DateTimeOffset.Now.ToString("u").Substring (0, 10);
string log_file = @"Z:\Wikipedia\AWB\Monkbot_tasks\Monkbot_task_19_cite_iucn_update\logs\" + date + ".txt";
int seconds = DateTimeOffset.Now.Second;
int minutes = DateTimeOffset.Now.Minute;
int hours = DateTimeOffset.Now.Hour;
sw = System.IO.File.AppendText (log_file);
sw.WriteLine ("*[[" + article_title + "]] (" + time + "):");
foreach (string list_item in error_log_list)
sw.WriteLine ("*:" + list_item);
error_log_list.Clear();
sw.Close();
}
//---------------------------< C O U N T E D _ R E P L A C E >------------------------------------------------
//
// common function to replace <pattern> with <replace> and bump <count> until no more <pattern>
//
private string counted_replace (string template, string pattern, string replace, ref int count)
{
Regex rgx = new Regex (pattern); // make a new regex from <pattern>
while (Regex.Match (template, pattern).Success) // look for <pattern> in <template>
{
template = rgx.Replace (template, replace, 1); // replace one copy of <pattern> with <replace>
count++; // bump the counter
}
return template;
}
//===========================<< S T A T I C D A T A >>======================================================
static bool status_added = false; // set to true when |status= created in taxobox
static int plain_text_modified_count = 0; // number of plain-text citations that were modified from the iucn api
static int plain_text_count = 0; // total number of plain-text iucn references
static int api_call_count = 0; // number of api calls made; this value not reported in edit summary
static int api_fetch_fail_count = 0; // number of api fetches that failed
static int api_no_cite_return_count = 0; // number of times that the api returned a non-citation value like: {"value":"0","species":"202965"}
static int parse_fail_count = 0; // number of times that we couldn't parse the api return
static int page_doi_skip_count = 0; // number of templates or plain-text references skipped because page and doi assessment ID mismatch (could be errata but since no errata date ...)
static int api_no_species_return_name_count = 0; // number of times that the api returned a non-species value (species name)
static int api_no_species_return_id_count = 0; // number of times that the api returned a non-species value (species id for {{IUCN status}})
static int iucn_status_updated_count = 0; // number of times that we updated the iucn status in taxobox-like templates
static int iucn_status_confirmed_count = 0; // number of times that we confirmed the iucn status in taxobox-like templates
static int iucn_status_system_updated_count = 0; // number of times that we updated the iucn status system in taxobox-like templates
static string taxobox_blank = null; // gets blank taxobox as flag
static bool status_ref_added = false; // set to true when |status_ref= created
static bool status_system_added = false; // set to true when |status_system created
static bool status_ref_updated = false; // set to true when |status_ref= updated
static bool status_ref_current = false; // set to true when |status_ref= less than 6 months old
static int duplicates_removed_count = 0; // number of duplicate status references removed
static string sc_ref_tag_begin = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?"; // these for taxobox |status_ref= handling
static string sc_ref_tag_end = @"""?\s*/\>";
static string ref_def_begin = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?"; // these for taxobox |status_ref= <ref name=... /> handling to locate the matching definition
static string ref_def_end = @"""?\s*\>[^\<]*\</[Rr][Ee][Ff]\>";
static string reflist_cleanup = @"(\{\{\s*[Rr]eflist[^\}]*\|\s*refs\s*=[^\}]*)\<\s*[Rr][Ee][Ff][^\>]*/\>";
static string hide_non_ref_tag_pattern = @"\<((?!/[Rr][Ee][Ff]|[Rr][Ee][Ff])[^\>]*)\>";
static string angle_open = "__4ng13_0__";
static string angle_close = "__4ng13_C__";
static string hide_non_ref_replace_val = angle_open + "$1" + angle_close;
static int iucn_template_count = 0; // total number of cite IUCN templates
static int other_template_count = 0; // total number of cite journal/web templates
//---------------------------< A P I >------------------------------------------------------------------------
static string api_species_url = "http://apiv3.iucnredlist.org/api/v3/species/"; // for fetching species data from the api by name
static string api_species_id_url = api_species_url + "id/"; // for fetching species data from the api by taxon id (for {{IUCN status}})
static string api_id_url = api_species_url + "citation/id/"; // for fetching citation data from the api using taxon id
static string api_name_url = api_species_url + "citation/"; // for fetching citation data from the api using binomial
static string iucn_api_token_file = @"Z:\Wikipedia\AWB\Monkbot_tasks\Monkbot_task_19_cite_iucn_update\iucn_api_token"; // token required to be private; stored locally here
static string api_token = null; // stored at iucn_api_token_file
//---------------------------< C I T E I U C N >------------------------------------------------------------
static string IS_CITE_IUCN = @"(?:[Cc]ite iucn|[Cc]ite IUCN)";
static string iucn_template_pattern = @"\{\{\s*" + IS_CITE_IUCN + @"[^\}]+\}\}"; // basic cite IUCN template pattern
static string iucn_title = @"\|\s*title\s*=([^\|\}]*)"; // everything in cite IUCN |title= for api calls
static string[] url_patterns = new string[]
{
@"https?://www\.iucnredlist\.org/details/(\d+)/\b(?:all|full)",
@"https?://www\.iucnredlist\.org/details/full/(\d+)/\d+",
@"https?://www\.iucnredlist\.org/details/(\d+)/\d+",
@"https?://www\.iucnredlist\.org/details/(\d+)/?",
@"https?://www\.iucnredlist\.org/details/summary/(\d+)",
@"https?://www\.iucnredlist\.org/search/details\.php/(\d+)/(?:all|summ)",
@"https?://oldredlist\.iucnredlist.org/details/(\d+)/\d+",
};
static string ref_param_empty = @"\|\s*ref\s*=\s*([\|\}])";
static string ref_param_not_empty = @"\|\s*ref\s*=\s*([^\|\}]+)";
//---------------------------< C I T E J O U R N A L / W E B >----------------------------------------------
static string IS_CITE_OTHER = @"(?:[Cc]ite journal|[Cc]ite web)"; // TODO: expand this to include more redirects?
static string other_template_pattern = @"\{\{\s*" + IS_CITE_OTHER + @"[^\}]+\}\}"; // basic cite IUCN template pattern
//---------------------------< N E W C I T E I U C N >----------------------------------------------------
//
// parse_pattern doesn't work for citations like this (from [[Cantleya]]) because of the 'extra' year ahead of
// the binomial:
// Asian Regional Workshop (Conservation & Sustainable Management of Trees, Viet Nam, August 1996) 1998. Cantleya corniculata. The IUCN Red List of Threatened Species 1998: e.T33197A9760751. https://dx.doi.org/10.2305/IUCN.UK.1998.RLTS.T33197A9760751.en .Downloaded on 1 October 2021
//
// Haven't seen enough of these to attempt a second parse pattern
//
//static string citation_from_api_pattern = @"\[\{""citation"":""([^""]*)""\}\]";
static string citation_from_api_pattern = @"\[\{""citation"":""([^\}]*)""\}\]";
static string parse_pattern = @"(^\D+)(\d{4})\.(\D+)\. The IUCN Red List of Threatened Species (\d{4}): (e\.T\d+A(\d+))\.\D+(10\.2305\/IUCN\.UK\.[\d\-]+\.RLTS\.T\d+A(\d+)\S+)\D+(\d{1,2} [A-Za-z]+ \d{4})";
static string[][] search_and_replaces =
{
new string[] {@"(.+?)\sssp\.\s+(.+?)\s(\([^\)]+\))$", @"''$1'' ssp. ''$2'' $3"}, // binomen ssp. subspecies (zoology) with errata or amended text
new string[] {@"(.+?)\sssp\.\s+(.+)", @"''$1'' ssp. ''$2''"}, // binomen ssp. subspecies (zoology)
new string[] {@"(.+?)\ssubsp\.\s+(.+?)\s(\([^\)]+\))$", @"''$1'' subsp. ''$2'' $3"}, // binomen subsp. subspecies (botany) with errata or amended text
new string[] {@"(.+?)\ssubsp\.\s+(.+)", @"''$1'' subsp. ''$2''"}, // binomen subsp. subspecies (botany)
new string[] {@"(.+?)\svar\.\s+(.+?)\s+(\([^\)]+\))$", @"''$1'' var. ''$2'' $3"}, // binomen var. variety (botany) with errata or amended text
new string[] {@"(.+?)\svar\.\s+(.+)", @"''$1'' var. ''$2''"}, // binomen var. variety (botany)
new string[] {@"(.+?)\ssubvar\.\s+(.+?)\s(\([^\)]+\))$", @"''$1'' subvar. ''$2'' $3"}, // binomen subvar. subvariety (botany) with errata or amended text
new string[] {@"(.+?)\ssubvar\.\s+(.+)", @"''$1'' subvar. ''$2''"}, // binomen subvar. subvariety (botany)
new string[] {@"(.+?)\s*(\([^\)]+\))$", @"''$1'' $2"} // binomen with errata or amended text
};
static string errata_text = @"\(errata version published in (\d{4})\)";
static string amended_text = @"\(amended version of (\d{4}) assessment\)";
//---------------------------< T A X O B O X >----------------------------------------------------------------
static string HIDE_ALL_BUT_TAXOBOX = @"(?:[Tt]axobox\s*\||[Ss]peciesbox\s*\|)"; // this to prevent confusion with {{Taxobox authority}} when hiding
static string IS_TAXOBOX = @"(?:[Tt]axobox|[Ss]peciesbox)"; // for hiding all non-taxobox-like templates
static string taxobox_template_pattern = @"(\{\{\s*(" + IS_TAXOBOX + @"))[^\}]+(\}\})"; // basic taxobox-like template pattern; TODO: {{subspeciesbox}}?
static string taxobox_blank_pattern = @"\{\{\s*" + IS_TAXOBOX + @"\}\}";
static string taxobox_new_stat_sys_ref_pattern = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]+?)(\s*)(\}\})"; // used to create new |status=, |status_system=, and |status_ref= params in taxobox
static string taxobox_status_ref_pattern = @"(\|\s*status_ref\s*=\s*)(\<ref[^\>]*\>)[^\<]*(\</ref\>)"; // used to replace |status_ref= param in taxobox
static string taxobox_status_ref_empty_pattern = @"(\|\s*status_ref\s*=[ \t]*)([\r\n]*[\|\}])"; // used to add reference to |status_ref= param in taxobox
static string taxobox_status_sc_ref_pattern = @"(\|\s*status_ref\s*=\s*)(\<[Rr][Ee][Ff][^\>]+/\>)"; // used to replace |status_ref= param in taxobox
static string taxobox_status_ref = null; // the 'new' value for |status_ref
static string taxobox_status_ref_open_tag = null; // it matching ref open tag
static string taxobox_status_ref_sc_tag = null; // and its matching self-closed tag
static string stray_dot = @"(\|\s*status_ref\s*=\s*)\."; // delete stray dot; because I found one such (Astroblepus pholeter)
static string stray_splat = @"(\|\s*status_ref\s*=\s*)\*"; // delete stray spat; because I found one such (Gray short-tailed bat)
static string stray_equal = @"(\|\s*status_ref\s*=\s*)="; // delete stray equal; because I found one such (Cyprinus hieni)
static string stray_nbsp = @"(\|\s*status_ref\s*=\s*) "; // delete stray because I found one such (Euconocephalus remotus)
static string html_comment = @"(\|\s*status_ref\s*=[^\|\}]*)\<!\-\-[^\>]*\-\-\>"; // and html comments
static string unrecognized_species_name = null; // gets taxobox species name that IUCN doesn't recognize
//---------------------------< T A X O B O X _ S T A T U S >--------------------------------------------------
static string IS_IUCN_STATUS = @"(\b(?:LC|LR/lc|NT|LR/nt|LR/cd|VU|EN|CR|PE|PEW|EW|EX|DD|NE)\b)"; // also used with {{IUCN status}}
static string taxobox_status_missing = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status\s*=";
static string taxobox_status_empty = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status\s*=\s*([\|\}])";
static string taxobox_status_value = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status\s*=\s*([^\|\}]+)";
static string taxobox_status_pattern = @"(\|\s*status\s*=\s*)[^\|\}]*?(\s*[\|\}])";
static string status_from_api_pattern = @"""category"":""([^""]+)"""; // for |status=
//---------------------------< T A X O B O X _ S Y S T E M >--------------------------------------------------
static string IS_IUCN_SYSTEM = @"(\b(?:IUCN2.3|IUCN3.1)\b)";
static string taxobox_system_missing = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_system\s*=";
static string taxobox_system_empty = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_system\s*=\s*([\|\}])";
static string taxobox_system_value = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_system\s*=\s*([^\|\}]+)";
static string taxobox_system_pattern = @"(\|\s*status_system\s*=\s*)[^\|\}]*([^\|\}])";
static string status_system_from_api_pattern = @"""assessment_date"":""(\d+)"; // for |status_system=
//---------------------------< T A X O B O X _ S T A T U S _ R E F >------------------------------------------
static string taxobox_status_ref_missing = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_ref\s*=";
static string taxobox_status_ref_empty = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_ref\s*=\s*([\|\}])";
static string taxobox_status_ref_value = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_ref\s*=\s*([^\|\}]+)";
static string ref_tag_named_pattern = @"(\<[Rr][Ee][Ff][^\>]*name\s*=\s*""?([^""\>]*)""?\s*\>)";
static string ref_tag_named_sc_pattern = @"(\<[Rr][Ee][Ff][^\>]*name\s*=\s*""?([^""/]*)""?\s*/\s*\>)";
static string ref_tag_unnamed_pattern = @"(\<[Rr][Ee][Ff]\>)";
//---------------------------< T A X O B O X _ S P E C I E S _ N A M E >--------------------------------------
static string binomial_pattern = @"\|\s*binomial\s*=\s*([^\|\}]*)"; // taxobox
static string taxon_pattern = @"\|\s*taxon\s*=\s*([^\|\}]*)"; // speciesbox
static string genus_pattern = @"\|\s*genus\s*=\s*([^\|\}]*)"; // these two combined to make binomial name
static string species_pattern = @"\|\s*species\s*=\s*([^\|\}]*)";
static string name_pattern = @"\|\s*name\s*=\s*([^\|\}]*)"; // taxobox and speciesbox
//---------------------------< D A T E S >--------------------------------------------------------------------
static Dictionary<string, string> date_patterns = new Dictionary<string, string>()
{
{"dmy", @"\d{1,2}\s+([JFMASOND][a-z]+)\s+(\d{4})"}, // dmy
{"mdy", @"([JFMASOND][a-z]+)\s+\d{1,2}\s*,\s+(\d{4})"}, // mdy
{"ymd", @"(\d{4})\-(\d{2})\-\d{2}"} // ymd
};
static string preferred_status_ref_tag_name = @"iucn status (\d{1,2}\s+([JFMASOND][a-z]+)\s+(\d{4}))";
static string access_date = @"\|access\-?date=([^\|\}]+)";
static Dictionary<string, int> months = new Dictionary<string, int>()
{
{"january", 1}, // these for dmy and mdy
{"february", 2},
{"march", 3},
{"april", 4},
{"may", 5},
{"june", 6},
{"july", 7},
{"august", 8},
{"september", 9},
{"october", 10},
{"november", 11},
{"december", 12},
{"jan", 1}, // these for dmy and mdy
{"feb", 2},
{"mar", 3},
{"apr", 4},
// {"may", 5}, // same as whole month name; can't have two with the same key
{"jun", 6},
{"jul", 7},
{"aug", 8},
{"sep", 9},
{"oct", 10},
{"nov", 11},
{"dec", 12},
{"01", 1}, // these for ymd
{"02", 2},
{"03", 3},
{"04", 4},
{"05", 5},
{"06", 6},
{"07", 7},
{"08", 8},
{"09", 9},
{"10", 10},
{"11", 11},
{"12", 12},
};
//--------------------------- R E M O V E D U P L I C A T E S T A T U S R E F >-------------------------
static string[] symbols = new string[]
{
@"\{",
@"\(",
@"\|",
@"\.",
@"\-",
@"\)",
@"\}",
};
static string ref_open_tag_unnamed = @"\<[Rr][Ee][Ff]\>";
static string ref_open_tag_named = @"\<[Rr][Ee][Ff][^\>]*\>";
static string ref_close_tag = @"\</[Rr][Ee][Ff]>";
static string bib_open_ul = @"[\r\n]+\*\s*";
static string bib_close_ul = @"([\r\n]+)";
//---------------------------< S P E C I E S _ N A M E _ C L E A N U P >--------------------------------------
//
// these things must be removed from binomial before calling the api with the binomial
//
static string[][] cleanup_patterns =
{
new string[] {ref_open_tag_named + @"[^\<]*" + ref_close_tag, ""}, // references; [[Lampadioteuthis]] caused api fetch exception
new string[] {@"\<[Rr][Ee][Ff][^\>]+/\>", ""}, // self-closed references; [[Sand cat]]
new string[] {@"\<!\-\-[^\>]*\-\-\>", ""}, // html comment
new string[] {@"[\.;:]+$", ""}, // trailing punctuation
new string[] {"'''(.+)'''", "$1"}, // bold wiki markup
new string[] {"''(.+)''$", "$1"}, // italic wiki markup
new string[] {@"""", ""}, // double quote marks
new string[] {"†", ""}, // extinction markers
new string[] {@"\[\[", ""}, // opening wikilink markup
new string[] {@"\]\]", ""}, // closing wikilink markup
new string[] {@"\s*\([^\)]+\)", ""}, // disambiguation
new string[] {@"[\.;:]+$", ""}, // trailing punctuation (again)
new string[] {@"\<nowiki/\>", ""}, // self-closed <nowiki/> tag
new string[] {@"\<nowiki\>", ""}, // opening <nowiki> tag
new string[] {@"\</nowiki\>", ""}, // closing </nowiki> tag
};
//----------------------------------------< P L A I N _ T E X T >---------------------------------------------
//
// for plaintext references wrapped in <ref>...</ref> tags or in unordered markup (bibliography); must have a
// recognizable page identifier or doi or a url from which a taxon id can be extracted
//
static string plain_text_ref_pattern = @"(\< *ref[^\>]*\>)([^\<]*)(\</ref>)"; // <ref>anything</ref> ref tags and reference are captured
static string plain_text_bib_pattern = @"([\r\n]+\*)([^\r\n]*iucnredlist\.org[^\r\n]*)([\r\n]+)"; // some sort of iucn ref in unordered list
static string plain_text_page_taxon_id = @"\be\.T(\d+)A\d+"; // get taxon id from page
static string plain_text_doi_taxon_id = @"\bRLTS\.T(\d+)A\d+"; // get taxon id from doi
static string plain_text_taxon_id_url = @"https?://(?:www|oldredlist)\.iucnredlist\.org/\S+?/(\d+)\S+"; // get taxon id from url
//---------------------------< I U C N S T A T U S >--------------------------------------------------------
static string iucn_status_template_pattern = @"(\{\{\s*IUCN status[^\}]+\})";
static string iucn_status_lead = @"(\{\{\s*IUCN status\s*\|\s*)";
static string iucn_status_status = iucn_status_lead + IS_IUCN_STATUS;
static string iucn_status_id = @"(\{\{\s*IUCN status\s*\|[^\|]+\|\s*)(\d+)";
// Monkbot_task_19_cite_iucn_update.cs