How do you tag your media?

One of the big deals in web 2.0, as everyone knows, is tagging your media. We’ve resorted to tagging media because it more succinctly defines the content compared to a search engine defining content by pulling out keywords. I’ve been doing this blogging thing for a while now and I’m still not quite sure how to tag. Here are some of my principles of tagging that I use and the reason I feel it’s beneficial.

  • Root Words
    • Search algorithms can be more easily optimized to accept many variations of a word and then go looking for the root word in your tags and content.
    • For example, use surf instead of surfing, movie instead of movies, run instead of running, etc.
  • Basic Words
    • For the same reason that I use root words I use the simpler word where there is a choice between a simple and a complex word.
    • People are more apt to use the simpler word as well when searching
    • For example, domain instead of top level domain, internet or net instead of world wide web, etc.
  • Tag Phrases as Word
    • Say I post about my favourite band, brave saint saturn, I’ll tag each individual word: brave, saint, saturn
    • Why? Because computers can figure out more combinations and more quickly than humans can
    • Say a visitor searches for international business and you have one post about IBM (which stands for International Business Machines), some people would prefer if the IBM post came up in the search result.
  • Acronyms
    • Don’t be afraid of using acronyms
    • Acronyms used in every day language have a specificity all their own. They can indicate time, location, subject, age, etc.
    • It’s also a good idea to tag each individual word in the acronym
  • Slang, C0lloquialisms, Jargon, Vernacular
    • Go crazy on these too
    • These types of words also have a value unique to them which make them great for searching
    • Like acronyms, they carry connotations with them that can help indicate the topic of your media sometimes better than other dictionary words
  • Variations
    • Eat your heart out! If there is more than one apt word for a topic, tag them all!
    • For example, blog, post, article, essay, etc.,
    • This is increase the chances that a visitor choosing a word at random related to the topic will find your post

The basic idea is to make your tagging as easy as possible for your visitors to search and as easy as possible for services to index your content, like Google.

The other idea behind all of these principles is the underlying assumption that all most people will find your content by a computer algorithm. Computer algorithms handle the basic cases (ie. the most simple cases) and the expand out into other cases that might introduce fuzziness and reduce the accuracy of finding the content the user wanted. So we try to make it as simple as possible for algorithm to find out content

Not only that, but we assume that algorithms will also be improved. So, we attempt to give algorithms basic, raw, individual pieces of data (ie. international, business, machines instead of international business machines). By breaking it up like this, you allow future algorithms to mix and match your data more easily and so build better relations between content. This is a future-proofing mechanism.

Remember KISS: Keep It Simple, Stupid. And search engines and your visitors will love you for it.

So, what are your tips for tagging media?

Anti-Spam Done Right

Blog comment spam is a big problem. I started out with no protection on this blog. I found out rather quickly that wouldn’t do.

So I implemented a CAPTCHA which requires you to enter a random code to prevent automated comment submissions. Then I found out actual humans were submitting comments (or perhaps very smart anti-CAPTCHA programs).

My next step was disabling auto-approval unless you had one pre-approved comment. But that only resulted in me clearing out spam every day and discouraging real-time discussion because comments wouldn’t show up immediately.

After a long time, I finally found Akismet.

Akismet is anti-spam done right.

Comments get automatically submitted to a blog anti-spam service where, first, they are submitted to hundreds of tests to see if it’s spam.

The second part is the key, though. Because Akismet is a service anyone can use, thousands if not millions of people use Akismet for the same reason, and this is where its power lies. The second part of the anti-spam checks is to compare the comment with millions of other blogs that also use the service. More than likely somebody already has gotten your comment, or one like it, and marked it as spam. So when it gets to you, it’s already considered spam and not published. You can decide what to do with it in your admin interface.

Google Mail also does anti-spam right. They operate on the same principle as akismet (who were probably inspired by gmail in the first place). Basically, tests are run on the sender of your email, the email itself and then the email is compared with the billions of other emails that other users of Google Mail also get. If it looks like spam based on any of these checks, it goes in your spam folder.

This is the beauty of distributed effort.

When so many people are pooling into a system you really can make spammers largely ineffective – to the point that it’s no longer worth it for them to spam.

What we need is a distributed system for anti-spam checking at the smtp level for regular system admins. Imagine the entire world pooling into this system. I have yet to try Distributed Checksum Clearinghouse. It does something like what I would like but not quite. It’s not exactly like Gmail or Akismet’s mechanisms to tag spam.

It should be clear: In a world of anti-spam done right, spam largely goes away.

Akismet. I. Said. Bring. It.

Well, I’ve been running Akismet for a few days now and so far it’s been flawless. I’ve had a few legitimate comments and a dozen spams. Akismet has caught all of the bad ones and none of the good ones. Excellent work!

I believe Akismet is working like gmail’s spam filter which, along with its own algorithms, takes the spam reports of other users and uses them as indicators of spam. It’s distributed and thus quick acting and far better than any single algorithm or single person can achieve.

If you have wordpress, you owe it to yourself to try out akismet!

Pick Fretz’s Brain

My good friend Jamie, over at Pick Fretz’s Brain, deserves a shout out. He’s attending Emmanuel Bible College and I’ve seen a really significant and impressive change in him since he’s been there these past two years. From a small town boy with small town views, he’s really taking it all in and growing his intellect, growing in his walk with Christ (even admidst the struggles that all of us face), and growing the basis of his belief.

I had the thought of starting a blog to follow and help my bible reading (my lack of bible reading) and it wasn’t until I saw Jamie’s blog that I took the jump.

So props Fretz, you are an influential man! Keep your stick on the ice!