Number vandals at Wikipedia

Dec 12, 2009

ImageI've been an occasional editor at Wikipedia for almost five years now. Over that time, my usual modus operandi has been to come across an article I'm interested in, make a lot of correction and clean-up edits and then add it to my watch list. My watch list has thus grown slowly but steadily over the years.

So I can't confirm whether it is because I am watching more articles, or whether this vandalism practice is new, but I thought I would write about it, because the implications are serious. The type of vandalism I'm referring to is the deliberate introduction of incorrect information involving numbers.

Numbers on Wikipedia intimidate people. Numbers are, by definition, absolute and difficult to argue with. A common example is an unreferenced date on a page receiving moderate user traffic. A well-intentioned editor likely added the date when first creating the article because he or she has intimate knowledge of the subject and is under the impression that such knowledge is "obvious" or "common". It is this misconception which makes number vandalism so insidious.

A sly vandal arrives one day, long after the original editor has moved on, and changes that date to something five years before, or a few months after. Subsequent visitors to the page aren't alerted that the date was recently changed; they only see a date and most will accept it without question, because of two reasons: a) they are there to get knowledge, rather than verify it, and b) a date is a number, impossible to question without pre-existing knowledge of the subject, or expending effort to perform real research.

Thus number vandalism has a tendency to be inserted without question more often, and remain on Wikipedia far longer than vandalism involving factual errors of other kinds. A recent example of this occurred on the Wikipedia page for Balvenie, one of my favourite purveyors of scotch whisky. In a prominent place halfway down the page was a list of "Vintage Casks" complete with dates, alcohol content and number of bottles produced, which was added on the 6th of September, 2009. This information was added without citation, but appeared so authoritative, as well as on a lightly trafficked page, that it was never challenged until I removed it on the 12th of December, 2009.

What's important to note here is that the information may very well have been accurate (although uncited) when it was first added. However, since that addition, further editors had seen fit to add, edit and change dates within this list, all without citation. Perhaps all of these edits were accurate as well, but really: who could tell?

Another example from my watchlist was this edit made on the 3rd of December, 2009. This vandal changed a date on a stub article from 1926 to 1941, and if I hadn't just started watching the article, who would have known the date was incorrect without following up on the various cited references on the Raisin Bran and U.S. Mills pages? How long would it have gone unnoticed on this relatively obscure and low-importance article?

It would be convenient to pigeonhole this as simple date vandalism, but really, the problem extends to all numbers on Wikipedia, of which dates are a significant category that feature on a large number of articles. Most things had to have happened at a "time" so dates are apt to appear on most articles. But it isn't just dates that have an air of authority in the human mind, it is assuredly all numbers.

Are you a Wikipedia editor? If so, you should make it a habit to examine numbers and dates on the pages you watch and edit. Uncited changes should be put under double scrutiny, for the simple reason that the ratio of effort required to change them versus that required to verify them is more unbalanced than almost any other type of edit. Be vigilant, and question all numbers; they should be the first items on any page which require reliable and verifiable external referencing.

Road Apples Almanac Hidden-state input elements and defaultValue

Comments closed

  • Dec 13, 2009 - 22:13

    # Comment by AVB

    Gravatar

    I am the one who complied the Balvenie Vintage Cask list and posted it on the Balvenie page. As a collector of Vintage Casks for the last decade I'm not sure what other authentication you are looking for in my listing. Since the distillery itself has been no help in determining the casks and amounts my list is the most correct one available. I've spent many hours tracking down the data presented and it is correct to the extent I can make it. I saw your removal of my listing when I went to update an entry for the 1970 Vintage Cask. Chances are that all of the recent edits were updates as I noticed a transposed number, found new cask info and generally cleaned up the list.

    While I can agree without some kind of lock the numbers can be changed by anyone it is a shame to eliminate this information entirely. If there is anyway I can re-list it without you removing it I'd appreciate you contacting me.

  • Dec 13, 2009 - 22:58

    # Comment by GreyWyvern

    Gravatar

    Hi AVB,

    I'm a big fan of The Balvenie myself. However, Wikipedia is not for original research; you can't insert material which you've tracked down yourself, as if you were a reporter. That's not how Wikipedia works. Rather it is based on information which can be verified by reliable third party sources.

    If you can find a book, documentary or news report on the distillery which lists these casks through which all of the numerical data can be verified, then by all means, you should re-insert the data.

    However, I hope you can understand that if the only source we can use to verify the information is your own research, then we are essentially taking the word of one person as encyclopedic fact. I am not debating over the correctness of your particular list, it may very well be accurate, the issue is that exceptions cannot be made to the OR policy or it would set a precedent for other authors to do the same to their favourite articles as well.

    Your list is potentially useful information, but if you add it to the article, who can verify it? There is no way to prove that it hasn't been simply invented. Especially since the data is mostly numerical; someone could add false information to your list, and our only recourse as editors is to wait until the next time you log in, recognize the change as vandalism, and revert it. Because it is unsourced, editors who could do that fact-checking themselves have no clue where to begin looking, and most (if not all) wouldn't even recognize that vandalism had in fact taken place.

    This is, as I've said, the big issue with numerical data on Wikipedia. If you can cite references, that's excellent. However, if not, it shouldn't be included as it is ripe for exploitation.

  • Dec 13, 2009 - 23:41

    # Comment by AVB

    Gravatar

    Out of all the listings I posted, I own over a third of them. Besides posting pictures of every bottle showing the cask number, amounts and proof how would you or I verify that? Could I have not gotten those pics from a third party? What are the requirements of absolute proof? You ask if I had citations from published sources as if that is always 100% correct. This info came from many sources, the 1970 VC I was going to update for example came from the Bonhams catalog for the Willard S. Folsom Collection of Old and Rare Single Malt Whiskies auction to be held on December 17th. Now that is a published catalog but does an auction house meet your standards of authenticity? If I took cask information from an ebay auction of 5 days ago does that mean it no will longer be true once that auction is no longer accessible?

    Everything I've posted can be verified but whomever does it will have to go through the same effort I already have since I've been working on it for years. There needs to be some way of accepting information believed to be true while indicating that not all of it has met the standards for verification.

    I've see many entries with a parenthetical [citation needed], perhaps that is all that is required in situations like this.

  • Dec 14, 2009 - 8:48

    # Comment by GreyWyvern

    Gravatar

    It really doesn't matter how many bottles you own, or that you've spent years of work on this; and I'm definitely not trying to belittle your efforts by saying so. Original research is simply original research. I am not the gatekeeper here, Wikipedia policy is.

    I doubt an eBay auction can be used as a reference, but perhaps the other references you mention, which you consider may be of dubious value, may in fact be acceptable. You should read up on what Wikipedia considers to be reliable sources and see whether any of your references apply.

    AVB said:

    I've see many entries with a parenthetical [citation needed], perhaps that is all that is required in situations like this.

    This just means the tagged content will eventually either become referenced or get removed; it is not a licence for unverified content to remain indefinitely.

    Nevertheless, there is no point in continuing to discuss this over email or at greywyvern.com. If you like, you can make a new section on the article's Talk Page, and we can discuss it further there.

  • Mar 28, 2010 - 18:11

    # Comment by AtticusX

    Gravatar

    Your commentary on the unique problems presented by number vandalism at Wikipedia echos my own thoughts precisely.

    I have seen a flood of this sort of abuse recently, and cleaning up the damage is time-consuming and frustrating. One particularly persistent multi-IP vandal from India has started showing up on a daily basis now to alter birthdates, death dates, and release dates. Each edit on its own looks fairly benign; it's only when you look at the history of that user's hundreds of edits that the destructive pattern becomes clear.

    The question that grips me when I come across such messes is why? Why would anyone do this? With other kinds of Wkipedia abuse I can usually imagine some kind of semi-rational motivation -- thinking it's funny (haha, I wrote a bad word on Wikipedia!), or wanting to semi-anonymously leave their mark where other people will notice it (like bathroom-stall graffiti) -- but this behavior has me stumped. What possible purpose could number vandalism serve? I'm sure it's just as boring to do as it is to revert, and when it succeeds in evading detection, what's been accomplished? Oh no, Wikipedia's accuracy has been diminished by some trivial amount! And when it does get caught, the only effect is to create unnecessary work for other editors. Editors like you and me, unpaid volunteers with no sinister agenda other than enabling the dissemination of information.

    What do you think is going on in the number vandal's mind?

  • Mar 30, 2010 - 2:40

    # Comment by somedude

    Gravatar

    There are publishing co's in this world, most of them reside around London and New York, that have a huge stake in diminishing something that simply marched in and started taking pieces of pie. Where are kids going for information? Websters? Oxford? Meriam? Only if they are forced to by a ruler wielding nun.

    Every sentence that is read online via wikipedia is one less sentence that is read from a book purchased from the big guys. How do they stop the bleeding? Compete with wikipedia maybe? Right, that hasn't worked so far. What do they do? Time is not going to slow down for them as their sales lag. Do they beg to the educational institutions to demand book based references? They have tried and failed. They are failing more and more as days pass.

    They have one option other than accepting a loss: beat down wikipedia's reputation. If wikipedia can be brought to a point to where most people question even it's most base elements then they can at least survive. If it can not then they will have to keep slowing down the presses, shuttering subs and eventually after buying paper/supplies in lesser quantities with each order as they pay more per unit... fail.

    They do not want to fail. They really do not want to fail.

    GM begins a marketing campaign directed at the Prius. It doesn't gain much traction. Within weeks Toyota hits all media outlets in the U.S. (for problems that had been known about for years). Imagine if your customers were not 80% from the U.S., if your sales were not just 1:5... that your sales were in every country on earth, nearly every gradeschooler, college student, home chef, the militaries... were all buying your products. Then imagine that you, the H.M.'s and M.H.'s who were just a short while ago running at close to 100%, were suddenly taken down to 50% by the internet's version of a Toyota Prius.

    The same people who have done the most creepy things in the history of this world are the same ones who are writing history as they want it known. They are in London writing the history that U.S. school children are to read and believe. They are in London writing their version of events around WWII for billions of people in asia. Does it matter that the English, who basically owned China, were the only ones to really gain from crushing Korea and Vietnam as threats to their controlling the shipping after the fall of Japan? It does to U.S. veterans but guess what? the history they believe was writing in London.

    Get real folks. Dots on a screen are only a dream. Soon we'll all be meeting the new boss... and he really is the same as the old boss.