Jump to content

Template talk:Unichar

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Proposal: use Template:Char

[edit]

Would it be good to place the character itself in {{char}}? jlwoodwa (talk) 06:43, 9 July 2023 (UTC)[reply]

Although generally keen on char, I'd need to be convinced in this case. Char is used to "isolate" a glyph under discussion from the associated running text. In the output of unichar, that is usually clear.
The only argument in favour that I can see is that, at present, unichar identifies the glyph by increasing its size and maybe the faint box used by char would be better? But conversely magnification makes it easier to "read".
Did you have a particular case that provoked the proposal? ๐•๐•„๐”ฝ (talk) 07:51, 9 July 2023 (UTC)[reply]
It's clear to anyone who's familiar with the format, but I'm not sure it's as clear to a general reader, especially one who doesn't know what the "U+ stuff" means. I haven't noticed any specific problems that this would solve, I just think it's good to have a consistent format for "inline character literals" on Wikipedia. jlwoodwa (talk) 08:19, 9 July 2023 (UTC)[reply]
So how would we handle this example: U+20E0 โƒ  COMBINING ENCLOSING CIRCLE BACKSLASH (which is already not handled terribly well). Likewise, Asiatic scripts present issues that don't occur to those of us only familiar with alphabetic scripts. A lot of development work has gone into this template to deal with these issues so changing it would not be trivial, given the need to verify many many test cases and rewrite to resolve anomalies. Annoyingly, one of the recent main developers, user:DePiep, is no longer available to advise. --๐•๐•„๐”ฝ (talk) 10:20, 9 July 2023 (UTC)[reply]
โƒ  seems to work just fine. I understand the difficulty of modifying such a convoluted and widely-used template, though. Since it sounds like it's not obviously a bad idea, I'll try the "obvious implementation" in the sandbox, and give an update here when it's working. jlwoodwa (talk) 10:35, 9 July 2023 (UTC)[reply]
on Chrome, the symbol overruns the box (or the box underruns)... ๐•๐•„๐”ฝ (talk) 13:43, 9 July 2023 (UTC)[reply]
... but then again it overruns the last digit of the codepoint right now. --๐•๐•„๐”ฝ (talk) 13:45, 9 July 2023 (UTC)[reply]

Combining diacritics are displaying as tofu on Android - fault may be in cwith= handling?

[edit]

I don't know if this is new? The argument cwith=◌ or cwith=โ—Œ is used heavily to display combining diacritics. I'm editing in Android right now and the symbol displays correctly. But in articles like diacritic, it is has more tofu than a Japanese restaurant. Is there a style serif somewhere that is blocking the last resort substitution? --๐•๐•„๐”ฝ (talk) 13:22, 21 September 2023 (UTC)[reply]

No, it is not unique to Unichar, that just happens to be where it first saw it. Diacritic doesn't even use unichar, it just uses a dotted circle and combining diacritic directly, thus โŸจโ—ŒฬโŸฉ. As it is a general problem, I will take it to Wikipedia:Village pump (technical). --๐•๐•„๐”ฝ (talk) 13:37, 21 September 2023 (UTC)[reply]
No solution suggested, it is an implementation defect in Android. So unless someone has a back-channel to Google, we just have to grin and bear it. --๐•๐•„๐”ฝ (talk) 16:33, 22 September 2023 (UTC)[reply]
Further discussion has revealed that the problem is due to deficiency in the system default sans-serif font. The workaround is to use serif and I have started to do that with success on "freestanding" cases. But {{unichar}} is heavily used so we really need a fix to it, please? --๐•๐•„๐”ฝ (talk) 16:32, 23 September 2023 (UTC)[reply]

Template enhancement needed, please

[edit]

Requirement: when cwith=โ—Œ is invoked, wrap the output in <span style="font-family: serif">{{1}}</span>, where {{1}} is the sequence dotted circle + combining diacritic. Is there a doctor in the house? --๐•๐•„๐”ฝ (talk) 16:32, 23 September 2023 (UTC)[reply]

Will this work {{unichar |0301 |combining acute accent |cwith=โ—Œ|use=script|use2=serif}} U+0301 โ—Œฬ COMBINING ACUTE ACCENT, I just extended the template to support "serif" as a use2 param if you set use as "script." This might work also. {{unichar |0301 |combining acute accent |cwith=โ—Œ|use=script|use2=noto}} U+0301 โ—Œฬ COMBINING ACUTE ACCENT Andre๐Ÿš 20:03, 23 September 2023 (UTC)[reply]
Yes, that would work. I hate to be ungrateful but to employ that solution would create a lot of work, many many articles would to be updated to use it โ€“ and, when Google discards Roboto as default sans font, would all have to be undone again. AFIK, this is the only use-case for cwith=โ—Œ so it would not have any deleterious effect elsewhere (and would be easy to back out). [BTW, we couldn't have use2=noto because it would break Bing and Safari.] --๐•๐•„๐”ฝ (talk) 22:11, 23 September 2023 (UTC)[reply]
Ok, I made the change to Template:Unichar/glyph, but let me know if it doesn't look right and I'll revert it. Andre๐Ÿš 23:51, 23 September 2023 (UTC)[reply]

Misaligned diacritics

[edit]

Can anyone explain (better still fix) this phenomenon:

  • U+0360 โ—Œโ—Œอ  COMBINING DOUBLE TILDE , a tilde diacritic that spans a pair of adjacent characters: โ—Œอ โ—Œ no markup: โ—Œอ โ—Œ

Just using the characters directly puts the diacritic in the right place but unichar fails (placement is offset). (At least when using Chrome on Chromebook).

  • U+0301 โ—Œฬ COMBINING ACUTE ACCENT is ok. โ—Œฬ

๐•๐•„๐”ฝ (talk) 16:42, 22 September 2023 (UTC)[reply]

|cwith=โ—Œโ—Œ puts the dotted circles before the diacritic, but the diacritic is supposed to be between them. I don't know how it should be fixed though. โ€” Eruยทtuon 19:21, 22 September 2023 (UTC)[reply]
Ah, of course. Obvious really. <blush> There are very few of these two-character diacritics so I don't really see it being worth anyone's while hacking the template to fix it. I'll just add a note to the documentation to say it doesn't work, handcrafting is required. --๐•๐•„๐”ฝ (talk) 19:37, 22 September 2023 (UTC)[reply]
I have added this text. It is not quite right, the display of the U+0360 is not exactly as produced by the template but does it matter?
** Note that cwith=โ—Œโ—Œ does not provide the desired result if the intention is to display a diacritic that spans two characters (such as those in the range U+035C to U+0362): the diacritic will be offset. In such cases, editors must emulate the template output by hand, because the correct HTML sequence is "first-character + combining-diacritic + second-character". Thus, for example, to show the combining double tilde U+0360, write U+0360 &#x25cc;&#x0360;&#x25cc; then (in {{small}}), COMBINING DOUBLE TILDE. This produces U+0360 โ—Œอ โ—Œ COMBINING DOUBLE TILDE.
Comments (better still, direct edits to improve) welcome. --๐•๐•„๐”ฝ (talk) 20:24, 22 September 2023 (UTC)[reply]
Really this needs a "print this instead" for the character. All this size/font/cwith stuff could be put into that instead of trying to fool the automatic text generator into producing the desired result. Spitzak (talk) 21:50, 23 September 2023 (UTC)[reply]
Sorry, I don't follow. Rather than spend time explaining, would you write the alternative text please? Here or in the doc. --๐•๐•„๐”ฝ (talk) 22:14, 23 September 2023 (UTC)[reply]
I meant that there could be a parameter, perhaps show, so that if invoked with show=foobar then instead of showing the character it shows "foobar". This could then contain any wiki or html markup desired and any trick needed to get the character to be correctly visible. In this example it would contain the two circles and the combining diacritic. Spitzak (talk) 00:08, 28 December 2023 (UTC)[reply]
I think it does have a param does something similar, or it did 3 months ago. Andre๐Ÿš 00:15, 28 December 2023 (UTC)[reply]
Hmm, a double parameter could be introduced to change the order of the output. Andre๐Ÿš 19:50, 24 September 2023 (UTC)[reply]

Question on Error on off-Wiki

[edit]

I've copied all the relating templates and modules to our wiki, and I've checked them a few times over, but it keeps giving me the following error:

I wrote:
โ””> "The character {{unichar|a9|COPYRIGHT SIGN}} is about intellectual property."
It should write:
โ””> "The character U+00A9 ยฉ COPYRIGHT SIGN is about intellectual property."
but gives me:
โ””> "The character Error using {{unichar}}: Input "a9" is not a Hexadecimal value. is about intellectual property."

I don't understand why it does this. Not sure if I should ask this here or somewhere else, but thought to try it here first. Kind regards,  Rodejong  ๐Ÿ’ฌ โœ‰๏ธ  23:15, 18 December 2023 (UTC)[reply]

That is a charset encoding issue probably. Or something to do with your wiki's installation of php. U+00A9 ยฉ COPYRIGHT SIGN works fine here, as you can see. Andre๐Ÿš 00:16, 28 December 2023 (UTC)[reply]
Thanks for answering. I'll ask the hosting guys to look in to that then. Kind regards,  Rodejong  ๐Ÿ’ฌ โœ‰๏ธ  00:53, 28 December 2023 (UTC)[reply]

Enhancement request: sanity check or lazy invocation

[edit]

At Copyright sign, a vandal changed {{unichar|25|Percent sign|html=}} to {{unichar|26|Percent sign|html=}}. No error was generated, though inspection shows that the name doesn't match the new, wrong, glyph. The template really should do a sanity check that the name actually matches the code-point and display an error status if not. For familiar glyphs like % and &, it is obvious but not if it is a j

Better still, don't ask for any text, indeed ignore any provided. A simple {{unichar|25}} should fetch the official name and not expect editors to do make-work.

Is there a template doctor in the house? ๐•๐•„๐”ฝ (talk) 19:53, 2 April 2024 (UTC)[reply]

It seems this has fallen through the cracks. I'm going to see if I can wrangle a modification to this template that will simply allow one to print the canonical Unicode name for a given code point. I would prefer it being the default or only behavior, but I am curious is this would be a problem for anyone. Remsense่ฏ‰ 12:58, 5 April 2024 (UTC)[reply]
To my mind, anything but the canonical name is at best finger trouble. The family nlink= is there when the WP:common name and the canonical name don't match. As in U+005E ^ CIRCUMFLEX ACCENT ({{unichar|005E|circumflex accent|nlink=carat}} ๐•๐•„๐”ฝ (talk) 18:00, 5 April 2024 (UTC)[reply]
The issue being, it seems we need a data module of 150k entries that the module has to be searched every timeโ€”if we want to prevent vandalism, anywayโ€”and that's about three orders of magnitude more entries than I've seen a module on here work with, so I am worried by the potential server load. Remsense่ฏ‰ 18:16, 5 April 2024 (UTC)[reply]
Maybe WP:village pump/technical could advise? But it is not really a search when you already have the index and just want to fetch the record that matches that index. ๐•๐•„๐”ฝ (talk) 18:24, 5 April 2024 (UTC)[reply]
Doy, you're completely right on the latter point. Had the current flowing the wrong way in my brain there. I'll poke the pump. Remsense่ฏ‰ 18:27, 5 April 2024 (UTC)[reply]
Well, that was easy!!!!!!!!!!!!!! {{Unichar/sandbox}} seems to work perfectly well. Thank you so much @Cryptic for lending some lost, cold, and confused lexicographers a helping U+2F3F โผฟ KANGXI RADICAL HAND Remsense่ฏ‰ 21:03, 5 April 2024 (UTC)[reply]

The sooner we can put this live, the better. There's a lot of it about! (Kudos to Nickps for spotting this one in such a high-profile article but such basic stuff should't depend on eagle eyes to keep clean.) --๐•๐•„๐”ฝ (talk) 10:33, 7 April 2024 (UTC)[reply]

I am not sure of a particular reason why it can't, I just didn't want to be rash about doing so. It's not like it was a particularly technical change, if you'd like to do the honors? Remsense่ฏ‰ 10:38, 7 April 2024 (UTC)[reply]
I'm happy to be the one to do it but you'll have to tell me how. ๐•๐•„๐”ฝ (talk) 12:42, 7 April 2024 (UTC)[reply]
Oh! Apologies for assuming everyone else is the one I should be asking how to do things. I've done it. Remsense่ฏ‰ 12:54, 7 April 2024 (UTC)[reply]
The template should certainly ignore the text given but maybe we should start with a green warning to say that the template has done so. One like the error message you get if you accidently type firdt=John in a CS1/2 citation. We could do it silently and let those who have been taking advantage of the failure to check come and read the (to be revised) documentation which will tell them that the free text field is no more. ๐•๐•„๐”ฝ (talk) 12:55, 7 April 2024 (UTC)[reply]
Yes I can do that also, great idea. Remsense่ฏ‰ 12:57, 7 April 2024 (UTC)[reply]
Revising the doc, I noticed that calling the template with no text generated just omitted it. I can't see why anyone would want to do that but we had best add a name=none option? ๐•๐•„๐”ฝ (talk) 13:10, 7 April 2024 (UTC)[reply]
I think it's nice to have just because I often am too lazy to tab to a template's documentation so I try all the things (=none? could it be =false? how about =no? Surely it will no longer confound me if I try =""โ€”there we go!) Remsense่ฏ‰ 13:13, 7 April 2024 (UTC)[reply]
Well we could just cheat and regard any input to name= as an instruction to omit. Who is ever going to use if to mean yes. --๐•๐•„๐”ฝ (talk) 13:38, 7 April 2024 (UTC)[reply]
This is usually the pragmatist's move with a binary parameter. I swear there's a thing that lets you check all the ways a user wants to say no or yes to something. Remsense่ฏ‰ 14:09, 7 April 2024 (UTC)[reply]
I probably don't deserve praise for that one considering I'm the one who made the mistake in the first place [1] but thanks, I guess. Nickps (talk) 11:06, 7 April 2024 (UTC)[reply]
Of course you do! It's never too late to make things right. Remsense่ฏ‰ 11:08, 7 April 2024 (UTC)[reply]

Override option needed

[edit]

See

In Unicode, the majuscule ฦข is encoded in the Latin Extended-B block at U+01A2 and the minuscule ฦฃ is encoded at U+01A3.[1] The assigned names, "LATIN CAPITAL LETTER OI" and "LATIN SMALL LETTER OI" respectively, are acknowledged by the Unicode Consortium to be mistakes, as gha is unrelated to the letters O and I.[2] The Unicode Consortium therefore has provided the character name aliases "LATIN CAPITAL LETTER GHA" and "LATIN SMALL LETTER GHA".[1]

Right now, we have

  • U+01A2 ฦข LATIN CAPITAL LETTER OI

We need a alias= as in alias=LATIN CAPITAL LETTER GHA , as suggested by Chatul at the Village Pump. There are a very few such cases where an error was made in the original standard that will never be changed. --๐•๐•„๐”ฝ (talk) 13:49, 7 April 2024 (UTC)[reply]

Will start this right now alongside the other thing. Remsense่ฏ‰ 14:10, 7 April 2024 (UTC)[reply]
I think it would be ok for arg 1 to continue to work. Instead find all the invocations of this template and remove arg 1 unless it is actually necessary.Spitzak (talk) 19:07, 8 April 2024 (UTC)[reply]
In principle, you are absolutely right โ€“ but in practice that would be a huge task, wildly out of proportion to the tiny number of cases where the Unicode Consortium admits it made an error. This is the most practicable solution to this specific problem. Meanwhile, ignoring the supplied 2= in favour of the canonical text resolves immediately the rather more cases of spelling errors and vandalism. --๐•๐•„๐”ฝ (talk) 20:25, 8 April 2024 (UTC)[reply]

Temporary reversion needed

[edit]

@Remsense: we forgot the many instances of uses like this: {{unichar|2120|Service mark|nlink=} which now fail U+2120 โ„  SERVICE MARK because there is no such article as SERVICE MARK. Do'oh! --๐•๐•„๐”ฝ (talk) 21:23, 8 April 2024 (UTC)[reply]

Revert done: I'm working on the aliases as we speak also Remsense่ฏ‰ 21:26, 8 April 2024 (UTC)[reply]
Which is now working:
{{Unichar/sandbox|1A2}} โ†’ U+01A2 ฦข LATIN CAPITAL LETTER OI
{{Unichar/sandbox|1A2|alias=yesgivemethealias}} โ†’ U+01A2 ฦข LATIN CAPITAL LETTER GHA
What should we do about this? It does say such use of |nlink= is deprecated. Should we clean it all up somehow? Remsense่ฏ‰ 21:38, 8 April 2024 (UTC)[reply]
I have seen a lot of nlink=<blank>, indeed I confess to have been a major perpetrator โ€“ "monkey see monkey do". It works (worked) and there was (is?) no error message to say No data supplied with nlink=, ignored. So we need ...
first: a list of articles that use nlink= with no data, so that someone (aka me, since I know many of them are my fault) can go round and correct them. [I believe that the template already has such an exceptions report, though whether anyone has been checking since DePiep got canned must be doubtful.) Then we can reinstate the change.
second, add some code to say (for all the optional parameters), No data supplied with <param>=, ignored
PS sorry to have dropped the bombshell and not been around until now to help with the cleanup; officially I was otherwise engaged and shouldn't have been in a position to spot the error. <blush> --๐•๐•„๐”ฝ (talk) 23:01, 8 April 2024 (UTC)[reply]
My "first" wouldn't be needed if the current interception of nlink=<blank> were changed so that it linked to the U+XXXX or the target character rather than some name? Which adds support to the question of "do we even need nlink= ?". --๐•๐•„๐”ฝ (talk) 23:58, 8 April 2024 (UTC)[reply]
Don't apologize at all! Nothing about this is particularly burdensome. I am leaning towards linking to the character itself, are there cases where this is going to break? Remsense่ฏ‰ 00:03, 9 April 2024 (UTC)[reply]
So, do you think directly linking to the character itself is the best move? That's where I am presently unless there are edge cases (e.g. I can think of high-range code points and non-printable ones, and maybe we can define those manually). Remsense่ฏ‰ 02:26, 9 April 2024 (UTC)[reply]
yes, see below. ๐•๐•„๐”ฝ (talk) 08:30, 9 April 2024 (UTC)[reply]
The |nlink= default is now also working:
{{Unichar/sandbox|1A2|alias=yes|nlink=}} โ†’ U+01A2 ฦข LATIN CAPITAL LETTER GHA Remsense่ฏ‰ 13:47, 9 April 2024 (UTC)[reply]

Do we even need nlink=

Say: we have a lot of technical redirects, why can't we just add U+XXXX as redirect format to a given page? Remsense่ฏ‰ 21:43, 8 April 2024 (UTC)[reply]
As in, U+2120 now redirects to Service mark symbol, as already did โ„ . This seems like a pre-solved problem. Remsense่ฏ‰ 21:49, 8 April 2024 (UTC)[reply]
It looks to be a neat solution. The only catch that I can see is that these U+XXXX aren't well watched and may be subject to vandalism. It is not an obvious vector for a "bad actor" so I guess it is a reasonable risk. The problem is that the attack won't be obvious and someone following a link to a Gardiner's sign list entity will have no idea how it happened. --๐•๐•„๐”ฝ (talk) 23:01, 8 April 2024 (UTC)[reply]
Are there any cases of nlink=target-name#section-name? I can't think why there would but if it is possible (as it is), someone somewhere will have done it. <sigh> --๐•๐•„๐”ฝ (talk) 23:58, 8 April 2024 (UTC)[reply]
I would say if necessary, the redirect page itself can link to a given section, if I'm understanding properly? Remsense่ฏ‰ 00:04, 9 April 2024 (UTC)[reply]
Yes, that makes sense. I can't see any other reasonable possibility. ๐•๐•„๐”ฝ (talk) 07:43, 9 April 2024 (UTC)[reply]
Though there are cases where the nlink goes to a broad concept article (such as Gardiner's sign list) when there is no specific article. So nlink=<something other than one codepoint> is certainly valid and useful.
So to solve the current problem, we just need to change the behaviour of nlink=<nothing> so that it links to the target character article rather than its Unicode name. As you proposed already, I think? But we can't dispense with nlink= completely and just link everything willy-nilly since many codepoints (e.g., Chinese characters) don't have their own articles. --๐•๐•„๐”ฝ (talk) 08:11, 9 April 2024 (UTC)[reply]


Testcases

As a template editor, I find it helpful, when people point out exceptions and cases like this, to put them in the testcases page so that future editors do not have to remember them. โ€“ Jonesey95 (talk) 21:52, 8 April 2024 (UTC)[reply]
Which testcases? I'm planning on ensuring there's an adequate library of them there once I'm done with this round of updates. Remsense่ฏ‰ 21:54, 8 April 2024 (UTC)[reply]


Per above...is there actually a purpose to being able to set a custom link rather than create easter eggs? I say we just have it link in most cases to ฦข i.e. the page for the character itself most of the time. Remsense่ฏ‰ 21:57, 8 April 2024 (UTC)[reply]

Almost there

[edit]

Great to see it working again, thank you. Just one left on the to-do list, I think?

  • name=none so that {{unichar|0123|name=none}} produces just plain U+0123 ฤฃ

I need to document alias=yes: I will copy Unicode#Alias. --๐•๐•„๐”ฝ (talk) 14:48, 9 April 2024 (UTC)[reply]

And there you are: {{Unichar|1A2|alias=yes|name=none}} โ†’ U+01A2 ฦข Remsense่ฏ‰ 15:15, 9 April 2024 (UTC)[reply]
It looks a lot like the use of the alias can be automatic, by just checking the alias database and using it instead of the real one if there is an entry. Is there a reason you did not do this? Spitzak (talk) 09:44, 10 April 2024 (UTC)[reply]

Anomalies

[edit]

Problems as I discover them

Knew I should've just looked at the page that definitely exists where they tell me what characters can't be used as article titles. Remsense่ฏ‰ 19:44, 9 April 2024 (UTC)[reply]
Some you win, some you lose. I just came back to say it must be something to do with that character because these work:
{{unichar|002A|Asterisk| nlink= }}, {{unichar|0023|Number sign |nlink= }} --๐•๐•„๐”ฝ (talk) 20:01, 9 April 2024 (UTC)[reply]

Refs

[edit]

References

  1. ^ a b "Unicode chart" (PDF).
  2. ^ "Unicode Technical Note #27: Known Anomalies in Unicode Character Names".

Cwith= and non-latin script

[edit]

The Nepalese rupee sign, เคฐเฅ‚ uses the combining diacritic technique of

  • U+0930 เคฐ DEVANAGARI LETTER RA + U+0942 เฅ‚ DEVANAGARI VOWEL SIGN UU.

Unfortunately, {{unichar|0930|cwith=เฅ‚}} produces

  • U+0930 เฅ‚เคฐ DEVANAGARI LETTER RA (A dog's breakfast).

Can anyone fix? ๐•๐•„๐”ฝ (talk) 16:18, 21 April 2024 (UTC)[reply]

I see that it is also a problem with latin script. In the example of "q with circumflex" below, the template fails to align the circumflex correctly over the q. --๐•๐•„๐”ฝ (talk) 18:52, 21 April 2024 (UTC)[reply]
The cwith character is printed first. Also you should not try to use this to show a character that is not a single code point. Spitzak (talk) 08:03, 22 April 2024 (UTC)[reply]
Ah yes, of course. The general solution is your response to the next question. ๐•๐•„๐”ฝ (talk) 08:24, 22 April 2024 (UTC)[reply]

cwith handling generally

[edit]

Suppose that somewhere there exist a letter q with circumflex, qฬ‚. Before we enhanced the template to assert the canonical name (and only the canonical name), it was possible to write {{unichar|0071|cwith=ฬ‚|Latin small letter q with circumflex}} and get U+0071 qฬ‚ LATIN SMALL LETTER Q WITH CIRCUMFLEX. Which of course was false: U+0071 is a common or garden q. The new arrangement is questionably better, producing U+0071 ฬ‚q LATIN SMALL LETTER Q, which is a different kind of lie: the grapheme shown is not U+0071 and it is not (just) a Latin small letter q.

So I would like to propose that, when cwith=<combining diacritic>, we expose that fact in the description.

  • Thus, for example, {{unichar|0071|cwith=ฬ‚}} should produce U+0071 q LATIN SMALL LETTER Q with U+0302 ฬ‚ COMBINING CIRCUMFLEX ACCENT : qฬ‚

Comments? ๐•๐•„๐”ฝ (talk) 18:50, 21 April 2024 (UTC)[reply]

Cwith should be limited to only the dotted circle.
I do think the should be a simple "print this instead" argument to replace all the size, font, IMG, and cwith stuff. Spitzak (talk) 08:07, 22 April 2024 (UTC)[reply]
Yes, I agree that the dotted circle should be the only valid option. Perhaps way back in the early developments, it also supported a coloured block to show the various forms of space character? These are now hardcoded but I guess there are too many combining diacritics to do the same here too.
I will revise the documentation accordingly.
As for all the other bells and whistles, it would take a full search of existing usage to determine where and why they are used. That is not a trivial task. ๐•๐•„๐”ฝ (talk) 08:34, 22 April 2024 (UTC)[reply]
I have revised the documentation to formally restrict the base character to โ—Œ and to deprecate any other usage. Please review.
When someone has time to revise the template, can this restriction be enforced, please? --๐•๐•„๐”ฝ (talk) 10:27, 22 April 2024 (UTC)[reply]
{{unichar|0302|cwith=q}} produces U+0302 qฬ‚ COMBINING CIRCUMFLEX ACCENT. Spitzak (talk) 10:28, 22 April 2024 (UTC)[reply]
True, but should it? As per your earlier comment (with which I agree), the template should only produce real code points. --๐•๐•„๐”ฝ (talk) 16:27, 23 April 2024 (UTC)[reply]

More detailed request for development

[edit]

The only legitimate character to use to display a combining diacritic is the dotted circle. So I propose that

  • cwith= is redefined to mean "circle with".
  • The preferred syntax is cwith=yes
    • cwith=โ—Œ and cwith=&#x25CC; are accepted alternatives.
  • Any other argument is flagged as an error.

Is that reasonable? --๐•๐•„๐”ฝ (talk) 16:27, 23 April 2024 (UTC)[reply]

Is it possible to determine it is combining from the unicode info database? If so maybe just ignore the field entirely and use that. Spitzak (talk) 07:15, 25 April 2024 (UTC)[reply]
Do we know how/whether that would work with non-Western scripts? Interestingly (at least on ChromeOS), this Devangari combiner comes with dotted circle out of the box: U+0942 เฅ‚ DEVANAGARI VOWEL SIGN UU. I don't know how typical that is. --๐•๐•„๐”ฝ (talk) 17:10, 26 April 2024 (UTC)[reply]

Fixing nlink= for WP:FORBIDDEN characters

[edit]

The docs say that |nlink= with no argument is deprecated but in my opinion it is a useful feature that we should try to support. The problematic characters are easy to fix simply by linking to the names instead of the characters. I have already written how this can be done in the sandbox (the diff). The only problem with the way its currently done is that I have to special case the underscore because low line is a disambiguation page. I don't like hardcoding things like that, but I don't think anyone plans to move underscore any time soon so it should be fine. Nickps (talk) 14:26, 14 June 2024 (UTC)[reply]

It was only deprecated because it is a bit of a bear trap. Not every Unicode canonical name has a matching article, I think? And just because an article of that name exists, does it necessarily relate to the character.
@Remsense:, can you remember what the complications were? ๐•๐•„๐”ฝ (talk) 19:00, 14 June 2024 (UTC)[reply]
|nlink= does not link to the canonical name by default. It links to the character itself. See {{unichar/sandbox|32|nlink=}}->U+0032 2 DIGIT TWO for an example (digit two does not exist, 2 obviously does). My proposal is that the name should be linked if and only if the character is not allowed in a title. Nickps (talk) 20:05, 14 June 2024 (UTC)[reply]
To actually explain what my change is, if the character is any of # < > [ ] { } | : _ which are the characters not allowed in titles, then I link to the name (except low line which is disambiguated to underscore), otherwise, nothing changes. Nickps (talk) 20:29, 14 June 2024 (UTC)[reply]
Rereading the discussions about the last big change, it does seem to be the case that it was just these forbidden characters that caused the barf (specific example was full stop). Your proposed revision resolves that problem and seems lightweight enough not to cause any problems.
As this is such a high profile template, best we give it a week for any other editor to raise any red flag issues. --๐•๐•„๐”ฝ (talk) 07:56, 15 June 2024 (UTC)[reply]
Ok, that makes sense, you never know how these things can break. I also need to write testcases anyway, so there's no rush to merge. Nickps (talk) 09:01, 15 June 2024 (UTC)[reply]
This makes perfect sense. I cannot figure out why a huge change to make it not use a user-defined name was somehow accompanied by a change that forced a user defined name for the link. I would implement this ASAP as somebody is busy adding text to the nlink in every instance, which is backwards. Spitzak (talk) 14:19, 15 June 2024 (UTC)[reply]
@Spitzak I'd suggest you ask them to stop their edits and comment here. I want to undeprecate the empty nlink parameter but apparently this editor disagrees and should be given a chance to explain their reasons. Nickps (talk) 14:50, 15 June 2024 (UTC)[reply]
Did you mean me? After the big change (when we discovered the anomaly that Nickps is now fixing), I certainly went round clearing nlink=nothing because of not knowing the full extent of the problem. That was a month ago. Has someone else resumed? ๐•๐•„๐”ฝ (talk) 18:06, 15 June 2024 (UTC)[reply]

Now, to be clear, that page used to be at TM:Unichar/sandbox/doc but since it was only used by {{Unichar/hexformat/sandbox}}, I moved it to its current title. Still, I can't understand the purpose of that page. To me it looks more like a bunch of notes for personal use rather than a documentation page. Does anyone have any idea what it's supposed to say or should it just go to TfD? Nickps (talk) 01:07, 25 June 2024 (UTC)[reply]

@Nickps: It looks like a bunch of test cases created by DePiep for regression testing. Since no-one has spoken up it its defence by now, off with its head. ๐•๐•„๐”ฝ (talk) 16:22, 16 August 2024 (UTC)[reply]

Make |cwith=| a valid option, to save us having to dig out a dotted circle every time?

[edit]

Since, as documented, the only valid parameter for cwith= is the dotted circle, can anyone see a reason to demand the parameter in for first place? Surely we can just have |cwith=| (a null parameter) as a valid option, with the dotted circle being supplied automatically. ๐•๐•„๐”ฝ (talk) 16:26, 16 August 2024 (UTC)[reply]

I think it should also be possible to automatically add the dotted circle if the unicode attributes indicates the character is combining, so no cwith is needed at all.
If it is wrong, I really recommend an attribute be added that is the "print this instead" attribute. It can contain any markup wanted, and would replace all the stuff to set the font and size and cwith, and the image option, and so on. Spitzak (talk) 17:33, 16 August 2024 (UTC)[reply]
Yes, first para makes sense, I agree.
Sorry, I don't understand your second paragraph, could you expand? ๐•๐•„๐”ฝ (talk) 22:12, 16 August 2024 (UTC)[reply]
I think most of the current parameters could be replaced with a single optional parameter. If that parameter is given, it's value is used to show the character. This would get rid of the need for the image and a lot of other controls for messing with the font. Popular substitutions could eventually be put in the template itself. Spitzak (talk) 03:02, 17 August 2024 (UTC)[reply]
But the only character we ever want to show is the canonical glyph and canonical name? (with the sole exception of combining diacritics which need the support of a dotted circle for clarity) [Caution: many Devangari diacritics come with the dotted circle 'as standard'.] I'm still not following you.
Or do you mean a option to use serif rather than the default sans, since some glyphs are difficult to "read" without the hinting supplied by serif.
Or am I still missing your point? (Though if is that there is surfeit of bells and whistles that are never used and should go, I agree "subject to survey". ๐•๐•„๐”ฝ (talk) 15:43, 17 August 2024 (UTC)[reply]
I want an option that if set to "BLAH" will make it print "BLAH" instead of attempting to print the character. Spitzak (talk) 17:52, 4 September 2024 (UTC)[reply]
I think you really need to give an example. I assume you don't mean anything horrible like getting U+005E ^ CIRCUMFLEX ACCENT to display U+005E ^ CARET SIGN? --๐•๐•„๐”ฝ (talk) 18:32, 4 September 2024 (UTC)[reply]
Assuming the new field is called "as", I propose that {{unichar|0040|as="FooBar"}} display as U+0040 FooBar COMMERCIAL AT instead of U+0040 @ COMMERCIAL AT Spitzak (talk) 18:39, 4 September 2024 (UTC)[reply]

Width bug

[edit]

Recently this has been adding a lot of whitespace at the end of the small-caps name. Most obvious if the link is enabled as the underscore is also extended under this whitespace. Spitzak (talk) 17:53, 4 September 2024 (UTC)[reply]

This may be Safari-only. Seems to work on Chrome on Linux Spitzak (talk) 22:57, 4 September 2024 (UTC)[reply]

More flexibility in parameter 1

[edit]

Occasionally, I'd like to use the unicode character itself as the parameter. For instance, for ๐ŸŽด, I'd like {{unichar|๐ŸŽด}} to produce U+1F3B4 ๐ŸŽด FLOWER PLAYING CARDS. This would occasionally save me a short but slightly tedious round trip looking up the character code of a character I already have but I don't have the code of, and having the computer do this mapping for me seems quite doable using software (I don't know much about Wikipedia templates, though). Single characters 0-F/f can be exempt from this, of course, if their capacity to represent single-digit hexadecimal numbers from 0 to 15 is still important (although maybe it isn't, since most people write those like 000F or 0F anyway?).

While looking into this, I was reminded that the unichar template doesn't let you add the U+ prefix to the code in parameter 1. So, for instance, U+1F3B4 is an error. Apparently this is a common error for people to make, so maybe it should be detected and the U+ prefix should simply be stripped internally? Dingolover6969 (talk) 07:02, 19 October 2024 (UTC)[reply]

I'm not sure how a reverse lookup like that could be easily accomplished in a Wikipedia template. It seems like something that ought to be possible since the computer obviously has this information, but I don't think you have access to the table that you'd need to do that. The best idea that comes to mind would be to generate a magic template list with a script or bot of some kind that hardcodes the table and then look it up from that. Andre๐Ÿš 07:08, 19 October 2024 (UTC)[reply]