How Should We Liberate Knowledge?

Here's an interesting situation at the online academic repository JSTOR: Last fall and winter, JSTOR experienced a significant misuse of our database. A substantial portion of our publisher partners’ content was downloaded in an...


Here's an interesting situation at the online academic repository JSTOR:

Last fall and winter, JSTOR experienced a significant misuse of our database. A substantial portion of our publisher partners' content was downloaded in an unauthorized fashion using the network at the Massachusetts Institute of Technology, one of our participating institutions. The content taken was systematically downloaded using an approach designed to avoid detection by our monitoring systems.

The downloaded content included over 4 million articles, book reviews, and other content from our publisher partner's academic journals and other publications; it did not include any personally identifying information about JSTOR users.

We stopped this downloading activity, and the individual responsible, Mr. Swartz, was identified. We secured from Mr. Swartz the content that was taken, and received confirmation that the content was not and would not be used, copied, transferred, or distributed.

The criminal investigation and today's indictment of Mr. Swartz has been directed by the United States Attorney's Office.

That's what JSTOR said. Here's the other side of the story:

Moments ago, Aaron Swartz, former executive director and founder of Demand Progress, was indicted by the US government. As best as we can tell, he is being charged with allegedly downloading too many scholarly journal articles from the Web. The government contends that downloading said articles is actually felony computer hacking and should be punished with time in prison.

"This makes no sense," said Demand Progress Executive Director David Segal; "it's like trying to put someone in jail for allegedly checking too many books out of the library."
"It's even more strange because the alleged victim has settled any claims against Aaron, explained they've suffered no loss or damage, and asked the government not to prosecute," Segal added.

James Jacobs, the Government Documents Librarian at Stanford University, also denounced the arrest: "Aaron's prosecution undermines academic inquiry and democratic principles," Jacobs said. "It's incredible that the government would try to lock someone up for allegedly looking up articles at a library."

So let's step back and look at the underlying issues.

Here's what JSTOR says about itself:

Our mission at JSTOR is supporting scholarly work and access to knowledge around the world. Faculty, teachers, and students at more than 7,000 institutions in 153 countries rely upon us for affordable and in some cases free access to content on JSTOR. Since our founding in 1995, we have digitized the complete back runs of nearly 1,400 academic journals from over 800 publishers. Our ultimate objective is to provide affordable access to scholarly content to anyone who needs it.

Note the emphasis on the word "affordable". Remember that this is scholarly work, not work written for commercial purposes with the intent to generate profits. Much, if not most, of this work is produced by scholars who are funded by you and me through government grants. Thus JSTOR is providing "affordable access" to stuff that we have already paid for. This, of course, is the problem that open access is trying to solve by ensuring that digital copies of publicly-funded academic work are freely available.

It's true that alongside born-digital academic papers, JSTOR also scans and digitises older articles that were previously only available as hard copy. But as other initiatives like Project Gutenberg have shown, you don't need to spend millions of dollars to do that: crowdsourcing is very effective here. Or rather it would be if people were allowed to make the locked-up knowledge available in this way, but of course they aren't.

Although JSTOR is certainly something of a dinosaur in a world moving to open access, it doesn't seem to be driving this prosecution, as its comments above indicate. Instead, the US Attorney's office appears to be behind the moves. It has issued a press release on the case [.pdf], which bears the hyperbolic headline "Alleged Hacker Charged with Stealing over Four Million Documents from MIT Network". That reference to "stealing" was no slip of the word processor, since it goes on to say:

The indictment alleges that Swartz exploited MIT's computer system to steal over four million articles from JSTOR, even though Swartz was not affiliated with MIT as a student, faculty member, or employee. In fact, during these events, Swartz was allegedly a fellow at a Boston-area university, through which he could have accessed JSTOR's services and archive for legitimate research.

United States Attorney Carmen M. Ortiz said, "Stealing is stealing whether you use a computer command or a crowbar, and whether you take documents, data or dollars. It is equally harmful to the victim whether you sell what you have stolen or give it away."

Of course, this is utter nonsense. You can't "steal" digital files (well, not unless you steal the physical medium on which they are stored). You can make unauthorised copies, but that's not theft, it's copyright infringement (indeed, it's odd that this is one thing that Swartz is not being charged with.) The US Attorney then compounds this error by a particularly ridiculous comment that "stealing is stealing whether you use a computer command or a crowbar, and whether you take documents, data or dollars": if she really believes this, then she is clearly not competent to lead investigations into this kind of case.

This extraordinarily wrong-headed view, along with the fact that "if convicted on these charges, SWARTZ faces up to 35 years in prison, to be followed by three years of supervised release, restitution, forfeiture and a fine of up to $1 million," suggests to me that the law enforcement authorities are trying to make an example of Swartz, presumably "pour encourager les autres".

For not only was nothing stolen, nothing was even shared. And even if it had been shared, some of the material was in the public domain. The vast majority that wasn't, was paid for by the public, and therefore arguably belonged to everyone anyway. For those items that were not paid for in this way, it has still never been shown by independent research that sharing them would actually harm the publishers: in fact, there is plenty of research to the contrary (as I've noted before.)

So, all-in-all, this looks to be a completely disproportionate response to the alleged actions – and it has to be said that the description of those actions in the formal indictment [.pdf] makes the whole case look pretty fishy. I predict that there's far more to this story than meets the eye, and that we'll be hearing much more on the subject in due course.

Coincidentally, the issue of free access to publicly-funded research is being discussed in Europe, too:

A public consultation on access to, and preservation of, digital scientific information has been launched by the European Commission on the initiative of European Commission Vice President for the Digital Agenda Neelie Kroes and Commissioner for Research and Innovation, Máire Geoghegan-Quinn. European researchers, engineers and entrepreneurs must have easy and fast access to scientific information, to compete on an equal footing with their counterparts across the world. Modern digital infrastructures can play a key role in facilitating access. However, a number of challenges remain, such as high and rising subscription prices to scientific publications, an ever-growing volume of scientific data, and the need to select, curate and preserve research outputs. Open access, defined as free access to scholarly content over the Internet, can help address this.

Neelie Kroes, who remains the most savvy of the European Commissioners when it comes to digital technologies, is quoted as saying:

"The results of publicly funded research should be circulated as widely as possible as a matter of principle. The broad dissemination of knowledge, within the European Research Area and beyond, is a key driver of progress in research and innovation, and thus for jobs and growth in Europe. Our vision is Open Access to scientific information so that all of us benefit as much as possible from investments in science. To accelerate scientific progress, but also for education, for innovation and for other creative re-use. For the same reason we must preserve scientific records for future generations".

The key questions being posed by the consultation are as follows:

how scientific articles could become more accessible to researchers and society at large

how research data can be made widely available and how it could be re-used

how permanent access to digital content can be ensured and what barriers are preventing the preservation of scientific output

There are various ways to reply, including an online questionnaire, also available as a .pdf. The deadline is 9th September.

Follow me @glynmoody on Twitter or

"Recommended For You"

The Great "Cyber" Con White House declines to act on petitions to fire Aaron Swartz prosecutors