It's useful to read this to get an idea of the magnitude of the problem WP faced even in 2006, when it already had huge traction. Wikipedia is an amazing achievement, maybe even up there in the leaderboard of western civilization, and the fact that they managed to do it despite an active decentralized motivated adversary (the forces of vanispamcruftisement) is just one aspect of that.
on WP, some whole articles are carefully crafted artworks of spam, where vanispamcruftisement movement learned the encylopedia language and you never notice. One day someone will write "vanispamcruftisement manifesto", why spam is a positive moment of western civilization. Spam techniques of once become mainstream ad types with non-relevance diminished and design improved and cost increased. Many social media are flood of spammy posts about 'self'..
Essentially, pick a topic which isn't totally mainstream (and hence getting lots of editor attention), but covers something that people buy for themselves / their house and where they might visit "review" sites to help them purchase the item.
Check out the links in the References and (the second link) in the External Links section. Do those really add to the article or are they content-free "comparison" sites? The fact that you don't think the page looks spammy shows how subtle this game has got.
Edit#2: Since I used to work in "whitehat SEO" (thankfully no more) let me explain what's going on here:
A long-lasting link from a Wikipedia page is like gold to search engines. It's far more valuable than if you linkspam a comment on someone's blog. It indicates that this page is very relevant to the topic. So when people search for "lawn sweepers", those links appear nearer to the top. The spammers monetize this by filling these essentially content-free pages with affiliate links to Amazon and other sellers.
There's no motive here apart from money. This doesn't make Wikipedia, Google, the web or even lawn sweeper sellers any better. It just benefits the spammers.
This is particularly prevalent in geographical articles. I recently spent a few hours clearing spammy links out of articles on Greek islands (and towns, and beaches, ...).
One strategy spammers are using to try to make their links in Wikipedia "stickier" is to insert them as citations instead External Links. Wikipedians will (rightly) think more carefully before removing references than external links, since refs are supposed to be providing justification for part of the article, not just a nice-to-have extra. But in most cases they are not really legitimate citations, in the sense of any kind of reliable source for information (either the information isn't even there, or it's copied from a better source like an official website, in which case the original source should be cited instead).
This illustrates precisely why wikipedia is actually fairly robust.
Because anyone can edit an article, generally, there's a catch-22 for things like spam entries. If nobody sees it then it can stay around for a while, but then again nobody is looking at it. But if a lot of people see it then the problems tend to be addressed either because mods take notice or because individuals edit out the spamminess.
I don't have them to hand, but I've certainly come across WP articles where it was pretty clear somebody just entered a company's sales-brochure more or less directly... (and if I was marketing director at a company, I'd definitely try to have a presence on Wikipedia of some sort).
This isn't a diss against WP though. I've found such questionable articles to be very rare, and despite Wikipedia's flaws, it's an amazing, amazing, site, one of the best things on the web, by far, and maybe even more astonishingly, one of the few major sites that isn't commercial. I find it indispensable in my everyday life (and donate money regularly)...
It's usually found and fixed - either deleted or made more neutral. Sometimes the new editors are welcomed to the project and given help with COI. They're told about the licencing problems of just copying content from a corporate website. Sometimes they're bashed by vandal patrolling twinkle users. (Does that still happen? I think WP made efforts to stop it from happening.)
> This is an automated message from CorenSearchBot. I have performed a web search with the contents of Professional Pensions, and it appears to include a substantial copy of http://www.incisivemedia.com/corporate/products/professional.... For legal reasons, we cannot accept copyrighted text or images borrowed from other web sites or printed material; such additions will be deleted. You may use external websites as a source of information, but not as a source of sentences.
This message was placed automatically, and it is possible that the bot is confused and found similarity where none actually exists. If that is the case, you can remove the tag from the article and it would be appreciated if you could drop a note on the maintainer's talk page. CorenSearchBot (talk) 15:51, 4 March 2008 (UTC)
I am a supporter, fan and casual editor of Wikipedia. That's definitely one of the grandiose achievements of Internet. Simple spams and scams are excluded from Wikipedia but smart spams are created daily. I really see a lot of examples. Check articles, pages for many semi to non-famous people, companies or products. What belongs to Wikipedia and what doesn't, has always been a problem. Wikipedia is much less strict than any encyclopedia - and I find it good indeed.
First: the incentives to spam Wikipedia remain enormous; Wikipedia has incredible search mojo. The kind of spam he's talking about is of a kind with Wordpress comment spam, but the kind of spam Wikipedia deals with in reality is more sophisticated, and involves entire bogus articles.
Second, flagged revisions are a tool that suppresses edit wars, and are used on (last I checked) a tiny subset of all the pages on WP. That "most important technical change" restricting "anyone from editing" WP hasn't been and never will be deployed sitewide.
Agreed I wasn't endorsing the link, just put it there for completeness.
FWIW I think the formula to beat wikipedia is obvious: mirror it, get rid of notability, get rid of anonymity and hire the most prolific editors. It would require a huge investment of course but I'm shocked it hasn't happened yet. I mean you could kill Facebook with the same site.
Good luck, but I don't think you'll be successful with that, because I think notability is an extremely important part of the glue that holds the Wikipedia project together. Without it you not only get a torrent of pointless, trivial content that readers have to sift through to find the real stuff, but you also lose the critical factor that makes the editorial challenge of Wikipedia tractable; without notability, you simply have too many potential articles to fact-check, and no way to fact-check them.
About 15 years ago some friends and I started a company whose technical premise boiled down to taking IRC and replacing the static tree-based "routing" system it has with a real dynamic routing protocol. No more netsplits! Arbitrary topologies! It was so simple! Why would anyone want anything other than a biconnected IRC? It turns out that the moment you lose the static tree property, the whole system goes to hell. All the sudden, messages aren't (and can't be) reliable, because the network can change out from under them while they're being forwarded. So you figure you'll just build a simple reliability scheme for messages to ride on. Oh, wait: IRC is a group messaging system, and not only is the distributed systems GMS problem not particularly easy to build in a "realistic" network, but providing reliability on it is essentially the multicast reliability problem, which is itself so annoying that it's part of why IP multicast failed.
My long-winded point is, sometimes something that seems like an obvious weakness of an existing system is actually fundamental to that system's viability.
"torrent of pointless, trivial content that readers have to sift through to find the real stuff"
I don't think people page through online encyclopedias that way. I'm not notable, if I had a wikipedia page I don't think it would link from any page that actually exists today. How could that ever bother you? I don't think you'd come across it unless you googled my name and it took you to my wikipedia page in which case it would seem to be doing good.
I used to hit the [random] link and know I'd get something interesting.
Now, not so much. I'll get a tiny stub of something programmaticly dragged in from some huge database - a town name with maybe some population figure and location; an obscure politician with party affiliation and birthdate.
For people who enjoy gnoming these kind of articles are tedious - what's the point of correcting a comma if no-one is likely to see it?
Supporting the random button isn't what I'd call anywhere near the top 10 most important functions an encyclopedia of any kind ought to provide. Serendipity is useful in a library or a bookstore, but when you open a book, the bibliomancy becomes avant garde art at best.
> FWIW I think the formula to beat wikipedia is obvious: mirror it, get rid of notability, get rid of anonymity and hire the most prolific editors.
I would mirror the 100,000 most important articles. I'd then fact check them vigorously. Emphasis would be on truth, not verifiability. I'd keep anonymity, but I'd throw out the Wiki - editing would be done by paid employees from "suggested improvements" made by visitors. I'd pay people to create excellent quality diagrams, and include plenty of them. I'm a fan of translating STEM information into Portuguese, Spanish, and French.
There's no reason the ideas aren't compatible. I'm all for having different editing standards for the top X articles (senior edits can freely edit, edits from plebes have to wait Y hours or be ok'd) and the less notable articles.
If you wanted to run with the FB idea you could even have special privileges for the "confirmed user" for certain parts of their own bio page. If I don't know you and go to your page I see a wiki bio and whatever anyone has added. If we're wiki-friends I see all that plus your anything in your wikifriends area (pictures, whatever).
My theory on getting rid of notability is you'd have say Z million people come and make webpages for themselves and their friends and family. They'd learn the edit tools and process. And so you'd get an organic growth effect in both natural users and editors.
And of course the big incentive is that we're cutting good editors in on some adsense sharing, idk base it on some function of how substantive an edit is, how long it lasts and how popular the page is.
Encarta, Britannica and Citizendium have already proved many other approaches and/or management styles substandard.
Getting rid of notability: Are you talking about something like Wikia? Currently the largest network of gaming sites (30M unique views/mo.), but overall progress has been surprisingly slow since the start in 2004. Jimmy Wales as a founder and 11M raised less than a year ago.
I think you can beat wikipedia by focusing on its weaknesses, but anonymity is not one of them, per se (at best it's a weakness as well as a strength).
Notability: a dumb policy that is more harmful than good.
Mod/admin power tripping: this is where anonymous edits might be a weakness because they validate the existence of mods who think they contribute more to the site than they actually do. Maybe some sort of gamification/karma system would be workable.
Weak sourcing constraints: wikipedia's criteria for a good source are basically "some notable entity believes it". A site that invested a significant amount of resources in valuing and even hosting and validating primary sources and more rigorous sourcing of facts would be a tremendous complement to wikipedia.
Clumsy formatting: wiki markup is definitely holding back wikipedia, there are other ways to present data and there are opportunities for competitors to take advantage of better presentation systems (not just prettier or easier but more usable and practical).
Separately, Jimmy Wales reportedly said that 0.7% of Wikipedia's users have made 50% of all Wikipedia edits and 1.8% of users have written more than 72% of all articles, and he was quoted in the New York Times in June as saying “A lot of people think of Wikipedia as being 10 million people, each adding one sentence...But really the vast majority of work is done by this small core community” of about 1,000 Wikipedians.
I believe Aaron Swartz did some statistical analysis on this and found it not to be true .. a small number of editors make the most # of edits, but the content is written by a very large number of editors.
His analysis was better than mine in that he weighed content by persistence. Still, for what it's worth, I found that when edits were weighed by length, the 1000 most frequent editors dominated.
I know a guy who's a "Wikipedia consultant." He's a longtime editor who now sells his services to companies and people looking to polish their image. I'm glad the practice hasn't spread.
I don't know what this particular person does, but wikipedia isn't zero sum. It's entirely possible for businesses an wikipedia to have mutually beneficial relationships, just as many do with open source software.
On the other hand the Gibraltar thing was an embarrassing abuse of the system.
Like ZeroGravitas says, Wikipedia isn't zero sum. He frames the issue as being paid to make Wikipedia better. It's not clear that what he's doing is wrong, though it does leave a funny taste in my mouth. I suppose it really depends on how much the money biases him.
Wikipedia's open nature helps it. Several years ago I did a "wiki patrol" for new edits. There is an IRC channel and programs and facilities for this. Most of the cleanup I did was of well-meaning people who did not understand Wikipedia's markup format and the like. Some was vandalism. Lots of people do these "patrols". I would not underestimate Wikipedia's ability to deal with spammers. The open nature of it all helps it.
Not only wrong, but Wikipedia usage is thriving in emerging markets - thanks to Wikipedia Zero (modeled off of Facebook Zero which despite data consumption cost mobile user $0). Fascinating I say...
Goldman's use of ODP as an example is really funny; all the ODP employees left for other projects within Netscape & AOL soon after ODP was acquired. And that's why it went downhill.
Anonymous editing is a real problem for Wikipedia. There are already numerous "edit wars" by anonymous editors - mostly on ideological/political issues, but also celebrity people changing articles about them to remove criticism, for example. These abuses/debates will likely continue to get worse. That's why we at phdtree wiki project (http://phdtree.org/) don't allow anonymous editing.
What Wikipedia needs is a good karma system, like the one used by HN.
Had a really interesting conversation the other night about sites that offer consumers an honest ranking of things such as a place to eat. When they monetise through businesses that care little for the honest rankings and only desire strength for their own personal ranking, there is a conflict of interest. When power is placed completely in hands of consumer, power corrupts them too as they demand a free meal otherwise leaving a bad comment despite recieving good service. I have talked to hotel managers who have been threatened with bad comments despite no problem in service and had no choice but to not charge for the room due to the dishonesty.
The big question here is how do sites generate a site that is honest providing a good service that can be monetised successfully and cannot be gamed by either business or consumer. Any good examples?