Detecting the Empty Character in Go

I have recently been working on an online ticketing system. I have been using strings.TrimSpace for a while, and it works well. I tested it with the “empty character” from emptycharacter.com, and it failed to detect whatever whitespace characters it was using.

I thought it was just strings.TrimSpace not detecting different types of Unicode’s empty characters. So I replaced it with strings.TrimFunc(s, unicode.IsSpace), and it still didn’t clear the spaces¹.

Disecting that empty character, we find it actually made up of five different characters:

U+200F: Right-To-Left Mark
U+200F: Right-To-Left Mark
U+200E: Left-To-Right Mark
U+0020: Regular Space
U+200E: Left-To-Right Mark

We can see that it is using a control character to prevent the regular space from being trimmed.

However, Go doesn’t list these characters as control characters², so we cannot use unicode.IsControl. But it is included in the unicode.Bidi_Control subset. Here’s my first solution:

func isImproperChar(r rune) bool {
	return unicode.IsSpace(r) || unicode.In(r, unicode.Bidi_Control)
}

strings.TrimFunc(s, IsImproperChar)

This would trim away at bi-directional control characters, which is probably a really bad idea especially in systems supporting Arabic, Hebrew, or other right-to-left languages.

So we can just trim it to measure the length, then discarding the trimmed result.

func IsEmpty(s string) bool {
	return len(strings.TrimFunc(s, func(r rune) bool {
		return unicode.IsSpace(r) || unicode.In(r, unicode.Bidi_Control)
	})) == 0
}

Try it out on the Go playground!

Have a better solution? Please let me know!

This is my eighth post in the #100DaysToOffload challenge.

I thought unicode.IsSpace wasn’t detecting detecting some types of spaces. But after some testing, that doesn’t seem to be the case. ↩
Not listed on unicode/tables.go:7108 as pC (control character), but rather it’s included in the Bidi_Control subset. ↩

Humaid Alqasimi

Detecting the Empty Character in Go

Copy & share: huma.id/emptychar

Detecting the Empty Character in Go

Footnotes

Copy & share: huma.id/emptychar