JoelNothman.com

22 June, 2008

Wikipedia categories ≠ ontology

Filed under: Wikipedia by Joel @ 2:07 pm, 22 June 2008.

I think I’m probably stating the obvious here. If we take a single trace of an article such as Tom Cruise through the category hierarchy in Wikipedia, we find out that he is merely a theory…

Tom Cruise1962 births1960s births20th century birthsBirths by yearPeopleHumansApesPrimatesMammalsVertebratesChordatesAnimalsEukaryotesOrganismsLifeCore issues in ethicsEthicsBranches of philosophyPhilosophyBeliefSpiritualityHuman behaviourBehaviourBranches of psychologyPsychologyInterdisciplinary fieldsAcademic disciplinesAcademiaEducationPersonal developmentPersonal lifeSelfMetaphysicsRealityPhilosophical conceptsPhilosophical terminologyTerminologyVocabularyLanguageCommunicationSocial psychologySocial philosophyPhilosophical movementsMovementsIdeologiesEpistemologyPhilosophy of scienceAnalytic philosophy20th century philosophy20th century2nd milleniumMilleniaYearsChronologyMeasurementScientific observationData collectionData managementComputer dataComputer storageComputer memoryDigital mediaDigital technologyElectronicsElectromagnetismSpecial relativityRelativityTheoretical physicsTheories → …

And yes, this isn’t completely irrelevant. It relates to my honours research work. It means that the Wikipedia category hierarchy is only useful as a folksonomy, or perhaps only for a very small hierarchical depth beneath each article…

4 Comments »

  1. not sure what you mean here — all this shows is that one category ultimately connects to all the others. but even if there was some rigorous taxonomy the same chain would apply since ultimately it would divide up Everything thereby creating a path between any 2 nodes?

    Comment by Michael — 22 June, 2008 @ 2:58 pm

  2. “Connection” is a broader relationship than what is depicted here. Categorisation of A in B usually means that B subsumes A; i.e. A (or all the articles contained in A) is part of B. For many Wikipedia categorisations this is the case, but not for many others. Still, we’re talking about containment or subsumption, not arbitrary paths between nodes.

    I don’t know what you mean by dividing up Everything. Wikipedia does have categories that are meant to act as the root of their article categorisation system, e.g. Main topic classifications, Fundamental, Articles and Contents. But many of Wikipedia’s categorisations are more often thematic than taxonomic, and thematic ties are much more broad.

    Compare to, for instance, WordNet, a carefully designed lexical-semantic ontology. For WordNet, there are very few root nodes for each part of speech, i.e. I think all nouns are rooted in “entity”, below which are a number of “unique beginners” categorised under “physical entity”, “abstraction” and “thing” (i.e. unnamed entities). Such constraints mean that the ontology is contrived at some places, but still, the point is that there is nothing clear about the semantics of Wikipedia’s hierarchy…

    And I possibly have a solution which would find an approximately taxonomic subgraph from Wikipedia’s category, but I’ll report on that when I have something to report on.

    Comment by Joel — 22 June, 2008 @ 4:37 pm

  3. Hmm, interesting… but from the looks of that a ‘folksonomy’ could still be kinda useful, particularly if you worked out how to prune some of the ‘bad’ links. It’s all about probability anyway - you could just degrade the value of the connection the longer the chain is, which is the nice way of doing the small depth thing.

    (I had a quicky traceback from Tom Cruise myself, and found he was a kind of ‘Underpopulated category’, which itself was a kind of ‘Very large category’ :) )

    Comment by James — 22 June, 2008 @ 8:00 pm

  4. Yup… I do have a way of pruning the bad stuff… and I might try it out in a few days.

    Comment by Joel — 22 June, 2008 @ 10:25 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress