I have recently been working on an online ticketing system. I have been using
strings.TrimSpace for a while, and it works well. I tested
it with the “empty character” from
emptycharacter.com, and it failed to detect
whatever whitespace characters it was using.
I thought it was just strings.TrimSpace not detecting different types of
Unicode’s empty characters. So I replaced it with
strings.TrimFunc(s, unicode.IsSpace), and it still didn’t clear the
spaces1.
Disecting that empty character, we find it actually made up of five different characters:
U+200F: Right-To-Left MarkU+200F: Right-To-Left MarkU+200E: Left-To-Right MarkU+0020: Regular SpaceU+200E: Left-To-Right Mark
We can see that it is using a control character to prevent the regular space from being trimmed.
However, Go doesn’t list these characters as control characters2, so we
cannot use unicode.IsControl. But it is included in the
unicode.Bidi_Control subset. Here’s my first solution:
func isImproperChar(r rune) bool {
return unicode.IsSpace(r) || unicode.In(r, unicode.Bidi_Control)
}
strings.TrimFunc(s, IsImproperChar)
This would trim away at bi-directional control characters, which is probably a really bad idea especially in systems supporting Arabic, Hebrew, or other right-to-left languages.
So we can just trim it to measure the length, then discarding the trimmed result.
func IsEmpty(s string) bool {
return len(strings.TrimFunc(s, func(r rune) bool {
return unicode.IsSpace(r) || unicode.In(r, unicode.Bidi_Control)
})) == 0
}
Try it out on the Go playground!
Have a better solution? Please let me know!
This is my eighth post in the #100DaysToOffload challenge.
Footnotes
-
I thought
unicode.IsSpacewasn’t detecting detecting some types of spaces. But after some testing, that doesn’t seem to be the case. ↩ -
Not listed on unicode/tables.go:7108 as
pC(control character), but rather it’s included in the Bidi_Control subset. ↩