I have recently been working on an online ticketing system. I have been using
strings.TrimSpace
for a while, and it works well. I tested
it with the “empty character” from
emptycharacter.com, and it failed to detect
whatever whitespace characters it was using.
I thought it was just strings.TrimSpace
not detecting different types of
Unicode’s empty characters. So I replaced it with
strings.TrimFunc(s, unicode.IsSpace)
, and it still didn’t clear the
spaces1.
Disecting that empty character, we find it actually made up of five different characters:
U+200F
: Right-To-Left MarkU+200F
: Right-To-Left MarkU+200E
: Left-To-Right MarkU+0020
: Regular SpaceU+200E
: Left-To-Right Mark
We can see that it is using a control character to prevent the regular space from being trimmed.
However, Go doesn’t list these characters as control characters2, so we
cannot use unicode.IsControl
. But it is included in the
unicode.Bidi_Control
subset. Here’s my first solution:
func isImproperChar(r rune) bool {
return unicode.IsSpace(r) || unicode.In(r, unicode.Bidi_Control)
}
strings.TrimFunc(s, IsImproperChar)
This would trim away at bi-directional control characters, which is probably a really bad idea especially in systems supporting Arabic, Hebrew, or other right-to-left languages.
So we can just trim it to measure the length, then discarding the trimmed result.
func IsEmpty(s string) bool {
return len(strings.TrimFunc(s, func(r rune) bool {
return unicode.IsSpace(r) || unicode.In(r, unicode.Bidi_Control)
})) == 0
}
Try it out on the Go playground!
Have a better solution? Please let me know!
This is my eighth post in the #100DaysToOffload challenge.
I thought
unicode.IsSpace
wasn’t detecting detecting some types of spaces. But after some testing, that doesn’t seem to be the case. ↩︎Not listed on unicode/tables.go:7108 as
pC
(control character), but rather it’s included in the Bidi_Control subset. ↩︎
Would like to comment on the blog post? Feel free to start a discussion on my public general mailing list.