Main

Library Digitization Archives

March 27, 2007

Digital Video, and a Post-Copyright Era?

Judith Thomas does a masterful job of explaining best practices and challenges in creating a digital motion media collection in this article, Digital Video, the Final Frontier - 1/15/2004 - netConnect. I'm studying digitizing video right now at the iSchool, and this was an optional reading. I found it most inspiring because there's not one mention of copyright or rights or permissions in the entire article. It's simply not on the table. Wow. That's like a dream, isn't it?

Several years ago I was invited by the Library of Congress to participate in a 3 day discussion about digital archiving. The setting was in gorgeous Berkeley, CA at a fabulous hotel. The other invitees were a glittering array, including the Register of Copyrights, MaryBeth Peters. I gladly accepted, and I felt honored to be invited. But, a curious thing happened at the very beginning of the gathering. All the participants agreed that they would define the copyright issues out of the discussion. I don't know how the Register felt about this, but it made me feel a little under-utilized, to say the least (useless, puts it more bluntly). I was absolutely amazed at the quantity of ideas entertained, the quality of the solutions proffered, the creativity of the group. If copyright had been on the table, the group might just was well have sat around the pool gossiping. It would have been, in effect, a nonstarter.

Since that meeting, I've begun more and more to believe that for some things libraries need to do for the future, they just need to be done without much concern for what the law says today. The very idea that anyone who's job it is, or I should say who's mission in life it is, to preserve for posterity, simply cannot stand by and watch important pieces of the 20th century just crumble before their eyes because of fear of getting sued, more often than not, by someone who could care less what you're doing, or who's actually dead, or who's descendants could care less, etc. etc. etc. etc. etc. etc.

I noticed earlier today a post at Michael Geist's blog about a remarkable speech by Bruce Lehman, former Commissioner of Patents and architect of the DMCA, notably including the anti-circumvention provisions, in which he suggests that we're entering a post-copyright era. He also admits that the DMCA as an approach has failed...

This idea that copyright is becoming irrelevant is actually one of the things that contributed considerably to my decision to get a degree in information studies and refocus, away from copyright. It's just terribly out of sync right now and as much as I hate to admit it, I truly feel that it just has to be ignored in some of its more egregiously out of sync aspects. (I'm waiting to be struck dead by a lightening bolt... waiting... waiting...) This is like the moral dilemma of our time (for those of us who think about things like this), like civil disobedience. Defiant preservation, organization, indexing, and access. Will wrong-headed and failed laws go quietly into the night at some point, or do we just turn away and embrace new paradigms created on their ruins, such as Creative Commons licenses and new business models that rely on something other than artificial scarcity to motivate creativity? And will libraries just quietly do what has to be done?

April 5, 2007

Jean-Noel Jeanneney leaves France's Bibliotheque National

I read with interest today that the President of the Bibliotheque National, Jean-Noel Jeanneney, has apparently been forced to resign:Jean-Noel Jeanneney quitte la presidence de la BnF - Tour de Toile du BBF. You might wonder why this seems important to me, unless you know what I'm studying at the iSchool...

But, more generally, it's of interest because Jeanneney is an impassioned critic of all things Google. In fact, in his slim volume, Google and the Myth of Universal Knowledge, he says at one point, something to the effect, "Whatever Google does, we should do the opposite."

His principle criticism was that selectivity and organization should be at the heart of the process of digitization, and of course, Google's goal is to digitize everything and let the users sort it out through search, tags, bookmarks, etc. He also criticizes our reliance on the market to do what he thinks should be done with public money in Europe. At the core of Google's undertaking, and implicitly rejected in France's efforts that so far involve only public domain works, is reliance on fair use to justify digitizing books still in copyright. Being an employee of a Google Library partner, I'm not neutral on the matter, but I must say that the book is very well written and raises good points. Nevertheless, one commenter on the blog where I saw this note about Jeanneney's departure seemed to suggest that there might be a connection between the fact that Google had so far digitized 10 million books and the Bibliotheque National, 100 thousand, and Jeanneney had essentially castigated Google for performing well. While neither of the figures is likely accurate, they get the general gist of the point across.

As always, there's no doubt a lot more to the story than initial reactions suggest, but I wonder whether Jeanneney's departure signals an opening for a new attitude towards mass digitization projects in France. Not coincidentally, I am headed there in 5 weeks to interview several librarians about their views of the future of the library in France. I have both Bibliotheques Nationales on my agenda, as well as 2 University libraries and a municipal library (Lyon). It's an exciting time to be thinking about the future of libraries, and May is a fine month to visit Paris.

April 28, 2007

A Big Week for Copyright; End of First Year of Grad School for Georgia

It was a big week for copyright. Events were reported all over the blogosphere. My friend and fellow grad student, Carlos Ovalle, has a nice roundup on his blog, Copy This Blog, where he reports several unsurprising legal opinions based in the sad music industry war against college students, and in a post a few days earlier, the related decision of Ohio University to ban p2p software on its system.

MPAA's former head, Jack Valenti, died this week. Lawrence Lessig offers an interesting memoir.

And the Stanford Center for Internet and Society posted a note about a decision that determined that downloads were not performances for ASCAP/BMI/SESAC royalty purposes.

Tobe Liebert, one of my favorite law librarians (I have lots of favorite law libraians -- our UT Law School Library is one of the best!), posted a note about Siva Vaidhyanathan's explication of his position regarding the three serious dangers of the Google Book Search Project, well articulated and succinct. His argument raises important questions about a future that includes wildcard projects like Google Book Search. If you have a chance to see Siva in person, don't miss it.

On another note, I am winding down my first full year of grad school. Classes end next week. Papers are due, presentations will be made, files of printed articles will be dumped (recycled, of course). It has been a really amazing experience for me, one that still astonishes me, nearly 9 months into it. I'm registered for the summer session and for the fall. I shift focus next year to research, having been accepted into the Ph.D. program. In the meantime, I head off to France to do a little bit of research, sort of getting my feet wet in the Seine (and the Cote d'Azure). Whatever I conclude based on my little sojourn will be reported through the CIP, perhaps even here on the blog. I'm going to write a paper, though I have to admit, I call it that only reluctantly, because I am planning to put my money where my mouth is -- I'm focusing on the future of the book, and I'm going to place all my research data, analysis, results and predictions in forms that explore that future. So, it will be fun, as well as instructive for me to figure out novel ways to report research in progress. I hope you'll enjoy that exploration too. The future of libraries is affected directly by the future of books so it is in our interests to pay attention to the expansion of the expressiveness we experience in books today.

June 5, 2007

First francophone library signs with Google

I have just returned from a nice 2 week stay in France where I conducted a little research on the French attitude towards the future of libraries in a networked world. In light of the scathing book Jean-Noel Jeanneney, President of the Bibliotheque Nationale, had written earlier (Google and the Myth of Universal Knowledge) criticizing American digitization efforts spearheaded by Google, I wasn't all that sure what I would find. But, I've concluded that the bell curve obtains in France just as it does here in the USA (the long tail doesn't describe everything). French libraries are all over the map with respect to their attitudes towards and adoption of strategies to define their futures.

While I was there, Google announced its first francophone (french-speaking) library partner, the University of Lausanne (Switzerland). In commentary on the French blog, "Under the Duster," it was noted that the Swiss had not spent a cent to digitize their patrimony: Google numerise en Suisse romande - Sous la poussiere. I have no basis to judge the accuracy of such a statement, but it does not surprise me. The effort to digitize cultural history is overwhelmingly huge. It requires support from every sector. It can't be accomplished by governments alone. It can't be accomplished by libraries alone. It can't be accomplished by Google alone. It hasn't taken very long for us to figure this out. I think it's time for us to move beyond critiques of those who are making the effort and start to think about things further down the road, as Don Waters suggested in a recent essay about strategic thinking regarding our efforts to facilitate open access to scholarship. The availability of long-forgotten books online, or at a minimum, the availability of information about them and where they can be obtained, is a dramatic change and we need to start thinking about the implications of this and how to best take advantage of it.

Finally, it's important to note that the European Book Search partners have limited their participation to books in the public domain. This is not a surprise -- no other country in the world has a fair use provision like ours, or, therefore, an opportunity to argue the merits of mass digitization for books still in copyright as a fair use. We are on our own here, completely. Very American.

June 14, 2007

Losing sleep over copyright

I don't often lose sleep over copyright issues anymore. But last night I could not stop thinking about the Copyright Office's new resource for *children.* Please have a look if you haven't already: Taking the Mystery Out of Copyright. There's a text only version if you want to skip the cartoons and the music (assuming you are not 13). This bothers me on so many levels, but I'm only going to address one level here, the most obvious. My experienced, calm, collected voice is telling me to wait a few days before I write this. Ok, at least wait a few days before I publish it. Clearly, I am ignoring that voice. I should at least acknowledge that I'm probably overreacting. I no doubt will feel differently about it after I have thought about it for awhile. Maybe I'll write about it again after a few days.

That said, do children really need to know about copyright? Well, I reluctantly must admit that yes, they do. Should they need to know about copyright registration, copyright history, and the role copyright plays in protecting film, music, art and literature? Well, it's not like they need to be protected from this, like it was senseless death, war violence or something cruel and ugly. So, it is commendable that the Library of Congress offers a well-done, straightforward, and fairly neutral informational piece. What would we expect the Library to talk about, other than what it does, which is, in this case, copyright registration. A narrow slice of the copyright pie, to be sure, but again, that's one of the things the Library does that no one else does.

But on the other hand, remember what it was like to be 13? Was registering your copyrights something you were all that concerned about? Should you have been? Have things changed that much with respect to how likely it is that the metaphorical box of things you created during your 13th or 14th year of life needs protection? From what? From becoming part of the stream of creativity (my metaphors are all over the place) from which you yourself borrowed to create?

If I had one opportunity to tell kids about copyright, I suppose I would mention its role in protecting the commercial interests of creators and distributors like the film, music, art and publishing industries, but in the next breath I would appeal to their own sense of how most things we all create are not meant for commercial exploitation, but instead are meant to be shared, reused, remixed and borrowed from. I'd say, "Look inside that box of things you created last year. Let's look at where all your things came from. Let's see how borrowing and modifying and adding your own ideas works in real life, and what we all need to keep that going."

The lesson I would teach is about the fact that *YOU HAVE TO DO SOMETHING* if you want your own creativity to be added to and be a part of a flowing, lively stream, rather than be caught up in a little eddy that goes nowhere. Congress (something here about infinite wisdom) has created a set of rules that, without your doing anything beyond the mere act of creating (tangible things, of course), keeps everything you create in that box, locked away, maybe forever, but at least for, let's see, you're 13? Let's say you'll live to 78, your box of stuff stays locked away for the rest of your life (65 years) plus 70 more years. Yes, in 135 years your box of stuff will possibly join the stream of creativity. If the box is still around then. And somebody finds it. And they know you and only you created it, and when you died. And they know about copyrights. If that doesn't fit your idea of what you want, then YOU HAVE TO DO SOMETHING. You have to let people know that you have something else in mind for your box of stuff. Fade to Creative Commons logo/website.

The assumption that everything needs "protection" for 1 1/3 centuries is so out of step with the reality of how we all create and most importantly, *why* we all create (overwhelming, not to make a living from our creations), and the serious consequences of being out of step with reality makes me very sad, and angry. The waste, the untapped creativity, and the criminalizing of creativity cannot be defended in my opinion. One size does not fit all. Given the enormity of the explosion of creativity enabled by the networked environment, to say nothing of creativity in the real world, the lessons we need to teach are about taking responsibility to do individually what Congress cannot seem to do for us as a nation -- create a copyright that fits our widely divergent needs, rather than one that both stifles us creatively and turns us into criminals (or potential civil litigants -- there's another interesting copyright lesson for kids) if we ignore it. We need to tag our creative works with simple statements that express how we feel about their place in the creative stream. I would recommend Creative Commons licenses for many reasons, but any statement about sharing is better than doing nothing and thereby consigning your work to copyright's centuries-long holding bin, or perhaps appropriately named, wastebasket.

July 16, 2007

Moving Images: Digitization for Access

Peter Brantley, director of the Digital Library Federation, posts at his personal blog, shimenawa, but recently has begun posting at O'Reilly Radar. Today he posted, "Moving Images: Digitization for Access," which I found quite interesting. The group he describes, Lot 49, challenges many current practices in archive and preservation culture, some very, very old, some very new. That Lot 49 could actually proactively change these practices to achieve a public good seems a long shot, but one never knows unless one tries.

The group "accept[s] as a key principle that access is key to the survival of archives, and digitization the best enabler of access." Brantley goes on to summarize seven other principles that will guide Lot 49's efforts:

1. Public access online to publicly owned resources will remain free.
2. Partnerships shall support the joint goals of increased access and enhanced preservation of archival materials.
3. Our partnerships will be non-exclusive.
4. Our partners will provide our organizations -- without charge -- a complete set of the digital copies produced by the partnership, and the metadata required to make use of them.
5. Ultimately, our organizations will hold unrestricted ownership of these digital copies and metadata.
6. Our partnerships will balance the interests of the public with the financial investment of our partners.
7. We seek to protect and enhance our organizations' interests, while respecting the interests of our users, our community, and our partners.

The post goes on to identify other priorities as getting a better handle on what stores of moving images archives and libraries possess, and taking a more aggressive position to protect the public interest in these materials in negotiations with commercial partners, which reflects very closely Brantley's and others' criticisms of the Google Library partners' (UT included) efforts in this regard.

There is a brief reference to the legal limitations on such a project:

"... it is our hope that we can find ways to maximize access to moving image collections to the greatest extent that the law and our means permit."

Clearly, these legal limits are not insignificant, especially given the overall key principle that providing access is the best way to preserve. So I wonder what the group thinks it will be able to do with the undoubtedly huge number of moving images for which permission will never be able to be obtained, either because the owner will decide that maybe there's money to be made on the movie and so will want to limit access, or because no owner can be identified (orphan works issues). Even identifying what is in the public domain will be a monumental task. I would be very encouraged to hear that among the cultural practices that the group hopes to change is the oftentimes extreme cautiousness of conservative institutions in the face of ambiguities like those presented by orphan works. I note that the orphan works legislation so optimistically hearalded last session wasn't even introduced this session and with an election next year, it probably won't be introduced then either. It could be another decade before enlightened self-interest finally brings content owners around on the importance of freeing this kind of content from its near-century of forced obscurity. In the meantime, more courage on the part of archives and libraries to provide access to identified orphans works, regardless of medium, would be welcomed.

July 22, 2007

Siva Vaidhyanathan's fellowship at the Institute for the Future of the Book

One of my favorite blogs is the Institute for the Future of the Book, if:book as it's called, which I read every time it's updated. So I learned last week that Siva Vaidhyanathan would be joining the Institute as its first fellow. Siva is also moving from his current home in NYC to the University of Virginia. You can read the institute's note about this as well as Siva's notes about a keynote address he gave recently where he outlines (and, actually, people blogging his speech in real time outline for us) his evolving criticisms about the Google Book Search project. For earlier expositions of Siva's thoughts on these matters, you can review any number of web postings, among them an April post from the ACRLog site, Siva Vaidhyanathan questions Google Book Search. The comments are worth a read also.

Siva is working on a book on this subject, and therein lies an intriguing opportunity. The Institute for the Future of the Book hosts several experimental new forms of networked expression (new books). The if:book note indicates,

"we will be a launching a new website devoted to Siva's latest book project, The Googlization of Everything, an examination of Google's disruptive effects on culture, commerce and community."

Hopefully this means that Siva's ideas will be presented in a way that those of us in the community who do not fully understand his criticisms will have a chance to question and engage him more fully in a discussion of his concerns than we usually can in the hurried conversations that we may have at the close of his excellent speeches. I certainly do look forward to that possibility. I've read much that he's written about his concerns over the last 2 years, and I still am not convinced that he's entirely right about this. When his book site launches, I'll post a note here, and I would urge Collectanea readers to include the book site in your rss feeds. It ought to be a very interesting and active discussion forum.

Ironically, if:book posted just last Wednesday a sort of counterpoint to Siva's concern that Google "controls too much knowledge," noting that the Internet Archive and Open Content Alliance had launched a demo version of Open Library,

"a grand project that aims to build a universally accessible and publicly editable directory of all books: one wiki page per book, integrating publisher and library catalogs, metadata, reader reviews, links to retailers and relevant Web content, and a menu of editions in multiple formats, both digital and print."

Additionally, Mike Madison, at madisonian.net, in commenting upon Siva's concerns, says,

"One reason I have been less skeptical of Google than Siva (among others) is my confidence that Google — while hardly a savior, and deserving scrutiny — isn’t the end game."

One final quote from Ben Vershbow about the Open Library project, because this is such an exciting idea and I hope you'll go read the entire post:

"Building an open source library catalog is a mammoth undertaking and will rely on millions of hours of volunteer labor, and like Wikipedia it has its fair share of built-in contradictions. Jessamyn West of librarian.net put it succinctly:

"It’s a weird juxtaposition, the idea of authority and the idea of a collaborative project that anyone can work on and modify."

But the only realistic alternative may well be the library that Google is building, a proprietary database full of low-quality digital copies, a semi-accessible public domain prohibitively difficult to use or repurpose outside the Google reading room, a balkanized landscape of partner libraries and institutions left in its wake, each clutching their small slice of the digitized pie while the whole belongs only to Google, all of it geared ultimately not to readers, researchers and citizens but to consumers. Construed more broadly to include not just books but web pages, videos, images, maps etc., the Google library is a place built by us but not owned by us. We create and upload much of the content, we hand-make the links and run the search queries that program the Google brain. But all of this is captured and funneled into Google dollars and AdSense. If passive labor can build something so powerful, what might active, voluntary labor be able to achieve? Open Library aims to find out."

Nice gig, Siva! Congratulations!

October 26, 2007

Publishing trade association issues orphan works "rules"

A consortium of publishers announced this week that it had agreed upon a safe harbor for users of orphan works. The press release was reported widely (see, for example, the Law Librarian's Blog). Although the press release did not include a link to the actual safe harbor rules, they were easy to find on the Websites of the participating publishers. I read them and thought to myself as I did, that they were similar in some ways to the legislation that failed to pass last year here in the U.S. They were much simpler overall, leaving out many of the refinements that the bill had, such as rules for nonprofits that allowed take-down in lieu of payment of a royalty and continued use. After all, these publishers are not proposing law, so they don't have to consider the needs of all stakeholders. What is it exactly that they are proposing, or in fact, is this a proposal at all, or a done deal with users of their works?

These publishers are pursuing a strategy that is becoming more common these days. Rather than attempt to amend copyright law to address the horrible situation we have gotten ourselves into with our century-long terms, broad and deep rights, narrowly tailored (in some cases to the point of uselessness) exemptions, no easy way to opt out of it all, or to find owners, they advocate "letting the market take care of it," one publisher at a time. It might sound daunting, especially to someone who wants to use orphan works (think of all the questions you have to ask yourself about all the different publishers' different standards, and which publishers have no agreements with "the public" at all, etc.), but this is more or less the strategy Lessig pursued when he created the Creative Commons after he recognized that there would never be a legislative or court-imposed resolution to the problems created by repeated lengthening of the copyright term. Both of these actions (Lessig's and the publishers') evidence a recognition that relying on lawmakers and courts to "fix" the problems with copyright is not going to work in some cases. So we turn to contract instead. Lessig might have thought that fixing outrageously long terms and the over-protective scope of copyright, one creator at a time, would be a daunting task, but it was the only thing that showed any promise at all of ever working. And it has worked -- quite well. To be fair, I don't think he's given up entirely on law, but, then again, perhaps he has.

Anyway, for these publishers, it's a plan. Their deal goes something like this: "If you use a work that you think is an orphan, but it turns out the work belongs to one of us and we figure that out, we promise not to take your first born child; rather, we'll just charge you what we would have charged you if you'd come to us in the first place. In return for this forbearance on our parts, we expect that you'll diligently search for us, and here's what we think a diligent search includes:

*** in virtually all cases searches and reviews must be conducted of these kinds of resources identified generically as:
• Published indexes of published material relevant for the publication type and subject matter; • Indexes and catalogs from library holdings and collections; • Sources that identify changes in ownership of publishing houses and publications (see below comment on imprints) including from local reprographic rights organizations; • Biographical resources for authors; • Searches of recent relevant literature to determine if the citation to the underlying work has been updated by other users or authors; • Relevant business or personal directories or search engine searches of businesses or persons; and • Sources on the history of relevant publishing houses or scientific, technical or medical disciplines.
Additionally, where the user can identify a prior publisher that appears to be out of business, the list of imprints available from this [link] should be consulted immediately prior to each use.

The [link] referenced above is not a live link, but it is reported to be "a list of journal publisher imprints that the associations have compiled."

So what are we to make of this deal we're being offered (and the strategy in general)? I must assume that the publishers know what they are talking about in their bulleted list of things we have to do, so arguably publishers will not have a difficult time figuring out what a reasonable search involves. But me? I am clueless. The only thing I recognize in the long list is the library catalog (but which catalog?). I'm sure the publishers all sat around together and agreed that they could handle this. I wonder if they had someone like me at their table? Or librarians. I asked my friend, Lexie (a librarian) if she knew what the bullets were about. She hasn't gotten back to me yet, but she will.

In the meantime, I invite you to think about what these requirements mean. Examples would be helpful. I'll try to suspend judgment until I've gotten a better idea of what's involved here, but I'm pretty sure this "reasonably diligent" search requirement is not going to light a fire under very many potential users of orphan works. Because, it's not just the "what we have to do" part of the bargain that looks like it might not be such a good deal, but the other side, the "and here's what we'll give you in return" part isn't looking so hot either.

For commercial uses, the reasonable royalty is probably fine. But for nonprofit libraries, archives, museums, etc., no. If we did our reasonable search and couldn't find you, and you surface at some point, we need to be able to oblige your desire to send your work back into the dark for you, but not to pay you.

The other problem I mentioned earlier is that we now know what a reasonable search looks like to these guys:

American Chemical Society
American Institute of Physics
BMJ Publishing Group Ltd
Börsenverein des Deutschen Buchhandels e.V.
Elsevier
Institute of Physics
John Wiley & Sons (including Blackwell)
Oxford University Press journals
Portland Press Limited
Royal Society of Chemistry
SAGE Publications
Springer Science+Business Media
Taylor & Francis

I wonder which other publishers are going to sign on; which ones will say nothing; which ones will come up with different standards. And how that landscape will or *will not* encourage the responsible use of orphan works.

Well, that's it for now. We all need to think about this. Orphan works are one of the biggest challenges we face today. These are works that are destined never to see the light of digital day unless we find a way to get them online while making reasonable efforts to protect the interests of their owners. The time when obscurity was the only option for non-economically viable works is over. We need to find ways to get on with it. Are these publishers on to something, or are they living in a dream world where all potential users have the kinds of knowledge and resources they do to dig deeply into the history of everyone who ever wrote something that's orphaned today? My really cynical side thinks that maybe that's the idea -- only other publishers will be able to take advantage of this deal, which would make it amount to no more than professional courtesy.

November 7, 2007

Lessig's, How creativity is being strangled by the law [video]

Lessig' gave a talk about remix culture back in March at TED: How Creativity is Being Strangled by the Law. His talk was just posted this month. For anyone who has seen him give a talk in the last 1 - 2 years, this won't be new, but it's very streamlined and very succinct. The video only runs about 18 minutes and it's excellent -- watch this!

Lessig emphasizes the importance of competition, that "more free" can compete with "less free," that artists' choice (to distribute differently, for example, to make their own works more freely available) is the key to defeating monopoly, and that laws that criminalize our children's creativity are corrosive -- and we can do better.

I have been developing an argument that touches on these same themes at Mass digitization ~ changing copyright law and policy, and in fact I had just posted this new segment last weekend that talks about how the sheer availability of so much good free content online inevitably puts pressure on even Hollywood and the music industries to stop making it hard for people to get to their content (DRM and subscription barriers, among others). Check it out.

November 25, 2007

Mass Digitization blogging project completed

After 6 weeks of drafting, posting, tracking blog statistics, and weekly writing in a journal about the experience, I have just completed my blogging experiment at Mass digitization ~ Changing copyright law and policy, by posting the Conclusion today. Here's the first paragraph:

The story of mass digitization’s effect on copyright law and policy is the story of confronting and eventually calming fears. Sometimes the only way to calm fears is just to stand up, stride towards the light switch, and show that there’s nothing to be afraid of. Turn on the light. Look under the bed. Open the closet door. See? There’s nothing there. Didn’t Franklin Roosevelt say something about this?


Since I announced the start of the experiment here on Collectanea, I thought I would announce its conclusion as well. If you haven't visited yet, or if you visited early in the drafting process, you might like to visit again to read the entire draft (7 fairly short sections). Be sure to check out the Project Resources page. It has links to all the online materials referred to in the draft, and other materials that support or illustrate the argument.

It has been a very interesting experience to draft in blog-style. My next step will be to polish the draft and give it journal-style. I will be able to compare the two drafts and perhaps say something useful about how the styles differ. I also have skads of data about daily page views, time on the pages, and how many pages were viewed per visit. It's amazing what Google Analytics can tell you about your blog. If it weren't for Google Analytics in fact (and other blog statistics programs), the story we would relate about our experiences blogging would be far removed from the truth because without stats, we only know readers are there if they comment. Hardly *anyone* comments though. The comment rate on Mass Digitization was roughly .2% -- that's point two percent, not two percent. So, for 1000 pages viewed, the blog received 2 comments. This rate is consistent with rates I've read in broad studies of blogs. Of course, there are exceptions, but most of us are not really visibly building a community of commenters.

But we are reaching people. Those 1000 pages viewed represent about 500+ people who stopped by, even if only for a few minutes. So, the blog entries did get viewed in whole or in part by many folks who might not read the article in its polished journal-style form. It is an interesting hypothesis, how blogs affect scholarship. I will be posting my paper on that subject at the Crash Course when I complete the paper in about 2 weeks. And Mass Digitization will be published on CIP's Website in the spring.

If you are one of those 500+ people, THANK YOU! It is very nice to know you are there --

December 12, 2007

free*the*books

Well, it's official: The University of Texas at Austin Libraries has launched our documentary blog for our public domain and orphan works project, free*the*books. We invite you to view and post comments! Our new blog is focused on our research about international copyright laws that control the use and distribution of digitized books online.

As a Google Library Partner, UT Libraries will digitize over a million books from its rich collections within the next six years. Digitization of 800,000 books in the Benson Latin American Collection began in June of this year followed by this companion project to develop an authoritative process for determining the copyright status of books published in various Latin American countries and to identify foreign works in the public domain.

We have found little guidance to help us reliably identify which of our books are already in the public domain so we are piloting a project to develop new tools for ourselves and for anyone who wants to tackle these difficult public domain problems. We will document our process, our progress and our results on the blog's pages along with links to web resources we find useful.

The initial pages of the blog include online resources to determine critical author birth and death data, prototypes of legal evidence tables and draft guidelines by which books, wherever published, may be determined to be in the public domain

We will be adding features, more pages and new posts to the blog on a regular basis and from time to time will also have guest contributors to add variety and fresh perspectives. We invite suggestions and comments from other Google Library Partners and anyone undertaking similar or related projects.

Email us at freethebooks@gmail.com or IM us at our Meebo widget in the sidebar of the blog. We are here; we are building an evidence base and we are looking for virtual partners!

February 25, 2008

See you in DC!

Last year I was not able to attend the CIP's annual conference, but I've caught quite a few of them over the years. This one is special for me, however, because as the Center's Virtual Scholar, I have had the honor of participating in the planning. Kim Bonner, the Center's Executive Director, is at the helm of the planning process and has put together a great lineup of events and speakers. At the top of the list is Jamie Boyle, Duke law professor and advocate of the public domain. I am looking forward to meeting him and hearing what he has to say.

I, too, am speaking at the conference. I plan to discuss an idea I am working on as a possible dissertation topic that fits well with this year's CIP theme: Copyright Monopoly.

The lineup is widely diverse, including speakers representing content industries (Copyright Alliance, CCC), law professors and practicing lawyers, librarians and lawyer librarians, and intermediaries like OCLC and Google, among others.

The conference also features a new format for day three -- a series of roundtable discussion groups focused on what you can take back home with you to put what you have learned into practice.

Hope to see you there!

March 30, 2008

Orphan Works legislation: Round two

Congress reportedly will try to pass orphan works legislation again this session, introducing a bill as early as this week. After its March 13 hearing, at which 6 interested parties presented testimony (including the Register of Copyrights, Marybeth Peters, and representatives of the 2006 bill's most vehement opponents, free-lance photographers), the stage appears set for another try.

Molly Kleinman's take is positive; Tom Richmond's is hostile. Reportedly, the photographers have gotten concessions and are supposedly onboard, but Tom's blog post certainly doesn't suggest that it's a done deal.

I read Marybeth Peters' testimony (see link above) and she talks about some of the changes from the last bill. One that concerns me the most is the idea that the industries will define a reasonable search. I reviewed one such proposed definition, and found it daunting. It was clearly designed with other publishers in mind, given their corporate resources, and their likely intent to profit from the use of the work contributing to their willingness to spend considerable time and money chasing down every rabbit track. This does not seem like a good idea for nonprofit entities making nonprofit uses. As I commented at the time, the proposal suggested that all the rigor of adopting real human orphans should be applied to making even nonprofit uses of abandoned copyrighted works.

Well, let's prepare ourselves. It's either going to work or it's not, but if it doesn't, the problem of orphan works is not going away.

June 11, 2008

Scattering thought across the Web

It's funny how things connect up. Since I returned home from the CIP annual Symposium on UMUC's campus, I've been reading the copyright news with little enthusiasm. I see important things going on (like the brewing ACTA storm), but I am not inspired to comment. I just seem to bounce from one discouraging topic to another. Then this morning I was clearing out some email and noticed a message with a link to an Atlantic article, The Atlantic Online | July/August 2008 | Is Google Making Us Stupid? | Nicholas Carr. The article is about the way technology can affect the actual wiring of our brains. It is fascinating reading. I really enjoyed it and I'm sure you will too. About half way through, I came across this paragraph:

When the Net absorbs a medium, that medium is re-created in the Net's image. It injects the medium's content with hyperlinks, blinking ads, and other digital gewgaws, and it surrounds the content with the content of all the other media it has absorbed. A new e-mail message, for instance, may announce its arrival as we're glancing over the latest headlines at a newspaper's site. The result is to scatter our attention and diffuse our concentration.

Google Book Search is about absorbing the medium of books into the Internet. I was just talking with the new Deputy General Counsel at UT System Monday, Dan Sharphorn, about the future of books (one of my favorite topics) and especially, how that future will be funded if books are available for free on the Internet (that is, digital copies are free, but people pay for something else, such as a print copy, or maybe a subscription to a book service (like music subscriptions), or who knows what). That part of the discussion is very much about the subject of the talk I just gave at the CIP Symposium (Mass Digitization's Effect on Copyright Law, Policy and Practice), about the economics of copyright. But an equally interesting part of the discussion is recognizing that when you think about the future, it's not the assumptions about what new things (like new business models) will be there that are the hardest. Rather, the really hard part or tricky part is examining your assumptions about old things, specifically what old things won't be there.

If (well, when) the Web absorbs the medium of books, books are not going to stay the same. The idea of an e-book is pretty limited (and even that is overwhelming for some of our publishing friends). The idea of an e-book reader is pretty limited. If you eliminate the idea of a book as we know it from the possibilities for communicating with others, and then try to imagine how you would weave a story if all you had were the Web, just try that for a momentary thought experiment... How would you tell a story? And how would you relate the results of research if all you had were the Web and no idea about a thing called a journal article. (And don't just "invent" the journal article and the book all over again -- that's not what the experiment is about!)

I have been thinking about this in the context of expressing whatever research I do for my dissertation. I can't really think in terms of writing a formal 5 chapter paper thing that resides between two harder paper things called covers. On the other hand, I am very inspired thinking about how to make my research a part of the conversation on the Web, how to take advantage of what the tools offer, what the possibilities present. At some point, if I really want the PhD, all indications are that the profession (information studies) will require me to cull some small part of that and sandwich it between those covers. I am spending the summer thinking long and hard about that.

And what of copyright? How else might we encourage creativity if we just put aside entirely the idea that we "need" government intervention to encourage it in a world with friction-less world-wide distribution, where each of us helps to pay for the distribution system by our purchases of computers, software and Internet connectivity? James Boyle gave our opening keynote at the CIP Symposium, and enumerated and evaluated five criticisms of copyright law as it exists now, how badly it "fits" the Web 2.0 world. Keeping in mind how the Internet is changing the media it absorbs, is copyright likely to fit better in 10 -20 years or much, much worse?

The Atlantic article ends on a sort of cautionary note: "as we come to rely on computers to mediate our understanding of the world, it is our own intelligence that flattens into artificial intelligence." Author Nicholas Carr is trying to see the future of our minds in a world dominated by the ideas that are shaping the Internet experience, in particular, Google's ideas of science applied to efficient information organization. A very scary undertaking, seeing into the future, but one that we've never been able to resist.

October 31, 2008

Google Book Search -- and Buy

So, at last, the cards are laid on the table and we see what everyone's holding. And guess who's got the winning hand! No surprise there. Google, by a landslide. (Whoops, my subconscious hopes for election day slipping in there...)

It is absolutely fascinating to finally get to see the musings begin, musings about what this major business deal means for the future: the future of publishing, the future of the book, the future of Google, the future of libraries, the future of education. Well, let me rephrase that: What the major business deal *could* mean for all of the above, and more. Oh, that is the fun part. Imagining the possibilities. Imagining the potential. I'm an optimist and a true believer in the triumph of a good idea, no, a great idea.

So, I want to point you to a couple of commentators that I think are especially exciting, illuminating, thoughtful. I have by no means scoured the blogosphere; rather, these are my heroes, my guideposts, the people I trust to present a point of view that adds value to the discussion:

Library Journal, quoting both blogs below plus several others; Vaidhyanathan's Googlization of Everything Blog; and Larry Lessig's Blog

And my own thoughts on and feelings about the deal are a combination of heartbreak, exhilaration, relief, pride, thankfulness, and gratitude to the libraries who worked so hard to make the deal a better one for the public interest. So it's finally out in the open and those who have been agonizing over it for up to two years can now be joined by the many, many others who are eager to begin to think through, together, what has changed, for whom, how, and what it means.

Heartbreak: It hit me really, really hard to realize that Google utilized fair use strategically to bring the publishers and authors to a deal. My heart was in strengthening fair use. It has been for a long, long time. I felt betrayed, really hurt. But damn it, Google was right. It is right. This deal is way better for everyone, more value, more possibility, more of everything. For fair use to cover digitizing for indexing would have been nice, but it would not have given us this (and there was the chance Google could have lost, though I firmly believed Google would have won). Maybe we could have had both. A S.Ct. win for Google might also have led to a deal, but at much greater expense, much later. Google clearly felt it wasn't worth it, strategically, to add that piece to the picture. What Google did, worked. I got over it.

Exhilaration: From my first reading of the deal, I saw amazing possibilities that just inspired me to no end (after the shock wore off, that is). I was in a semester in my PhD studies where I was trying to generate ideas for a dissertation topic and this deal just spun out possibilities like a tornado. But I couldn't talk about any of them with anyone. What a hellish place that was. The announcement of the settlement dragged on and on and on. The date was always a moving target. Eventually I stopped thinking about it all. I just gave up and moved on. But it is *so* gratifying to see such smart minds beginning to examine the same little gems of possibility, and now there will be lots of people to talk to about it, lots of research projects, and lots of thinking about the future of it all. Is that not absolutely exhilarating?

Relief: Thank God the NDA (nondisclosure agreement) is finished. I'll never sign one again. You get to know incredible things, be a part of incredible things, but you can't talk to anyone about it. I hate that.

Pride: I got to be a part of, a teeny, tiny, eensy, weensy part of, an unbelievably complex (way too complex for me) unfolding of a new way to share knowledge, the knowledge that is out there but that has been forgotten, or soon would be forgotten, if physical books on physical shelves were the only option we had for keeping it alive and integrated into our social and cultural lives. I got to react and say what I liked and didn't like. At least a few people listened. Maybe I made some difference. Maybe not much, maybe not any. But it was really wonderful to be there. (Cf. paragraph on Relief -- legalese for compare for a contrast, or contradiction, the paragraph above on Relief where I say pretty much that it wasn't worth the agony of the nondisclosure agreement -- I guess I'm torn about that.)

Thankfulness: I decided to move on with my studies, as I mentioned above. I am thankful that this deal is finally out on the table and it will become what it becomes (not, what it could be, but what it will be).

Gratitude: I know first-hand that it was extremely difficult for the libraries who put tremendous effort into making the deal better reflect the public interest. I was only involved for 10 months. Harvard, UC, Stanford and Michigan were involved for almost 2 years. Virginia got involved only a few months ago, but pitched right in and went to work. Others followed over the summer and early fall. It was grueling to receive those drafts, repeatedly, to pore over them, analyze them, pushing here, prodding there, gaining concessions from the publishers/authors (never easily, of course), gaining concessions from Google. Those folks worked tirelessly to imbue the deal with public benefit. In the end, not all were satisfied with the degree to which the deal does in fact benefit the public, but they had done the absolute best they possibly could. Everyone anticipates criticism of the deal in this regard, as there was before: did libraries sell themselves short? I frankly don't think it is possible to fairly critique their effort without knowing what they were up against, how tirelessly they worked, how little the publishers and authors ever appeared to appreciate how critical their collections are to the dollars the publishers and authors now expect to make.

If one takes it as a given that this is a good thing (and a realistic, as opposed to idealistic and unrealistic way to get from here to there), libraries are not sitting at the head of the bargaining table, and they are not going to be able to get everything they wanted, or perhaps even much of what they wanted. But they sure put their all into it. It's not possible to walk a mile in their shoes. The walk is over. But I do hope that those who may be unhappy about the shape of the deal for the public (outside the obvious benefit to the public of discoverability, readability and the ability to buy "lost" books) won't be too quick to assume that any library could have done better. If the criticism is that none of us should have been involved at all, well, that's simply a non-starter. Libraries are not sitting the revolution out or trying to go it alone. Partnering is simply a fact of our lives. It always has been and always will be. We don't exist in a vacuum.

I hope the deal gets approved and moves on to implementation. It's exciting. I want it to succeed. It puts lots of feet firmly on the path. Who knows where that path leads? And boy does that make me smile.

Next time: orphan works, the sequel. Oddly, at the same time the publishers and authors were negotiating this deal with Google that structures access to orphan works in a particular way, they were also dealing with the Congressional effort to structure it entirely differently. What was up with that?

November 1, 2008

Google Book Search and orphan works

In Google Book Settlement, Business Trumps Ideals, reports Juan Perez in this insightful business article in PC World. Here's the quote that sums up the deal's novel approach to orphan works:

Of the 7 million books Google has scanned, 1 million are in full preview mode as part of formal publisher agreements. Another 1 million are public domain works. Most of the other 5 million aren't in print or commercially available. Google today can only show snippets of their text. The agreement opens up those books for broader preview and potential paid access via individual purchase or institutional subscriptions.
"Together, we're igniting a new market for these books that have been held in libraries but not available commercially," Google's Smith said.

So, what's so new? Everything.

This isn't the Congressional approach to problem solving (shove the parties into a room and lock the door until they have reached an agreement -- and may the strongest interest obliterate the weaker and we'll call it a compromise in the public interest). This is the publisher's and Google's no nonsense business approach: "Hey, let's just start selling all the books and if there's money to be made, the owners will either show up to claim it, or the money will lie there for 5 years while we give everyone time to wake up and smell the coffee. At the end of 5 years, we'll pretty much know what's orphan and what's not. What's not to like?"

At first I was appalled. Especially because the settlement terms provided that the information about who claimed what was going to be kept secret between Google and the publishers/authors (ie, the Registry). And equally as bad, if no one came forward to claim a book, as copyright owner, essentially the Registry would keep the money. There are provisions for the Registry to use it for x, y and z, and *if* any is left, it goes to a reading-oriented charity or some such. But I'm not thinking there's going to be any left... What do you think?

Further, Google clearly understood and accepted that this plan was based on an idea I found repugnant: if orphan works don't have owners, by definition, then why is it that the Registry should keep the money that comes in for books that ultimately no one claims? The publishers and authors just don't see orphans as really belonging to everyone in the absence of an owner. They see them as belonging to all the other authors and publishers, but not the public. That really rubbed me the wrong way. After all, it's not the publishers and authors who have collected these books, maintained them, preserved them, and are now making it POSSIBLE for anyone to even have potential to find them and buy them by partnering with Google to make them a part of Book Search. Where do they get off claiming that they are entitled to keep unearned, undeserved revenues to the exclusion of everyone else in the world?

"Ah, Georgia, uh, this is a rather innovative and practical approach to orphan works, probably better than anyone has come up with. Come down off the ceiling and think it through," said Alex. Well not in those terms. He was just honest and straightforward (as he always is) and explained that a deal with publishers and authors that started from the premise I favored (that orphans don't belong to anyone so if they generate revenue, it should go back to those who paid when it's clear the work is orphan) was simply not possible. So Google started where the publishers were willing to start and worked for a good outcome, the practical effect of the proposal on availability of orphans, and ultimately availability of information about which ones *were* orphans. Google focused on the fifth of those five years.

That's why the secrecy thing had to be fixed. And it was fixed, but in my opinion, it's still not as good as it needs to be. I'm happy that in five years (from the approval of the settlement and implementation of the business model) there will (we take on faith) be some sort of way to pull together which books have not been claimed and more or less know what's orphaned of those works that were published in the 20th century. But the process by which a book is claimed needs to be transparent. If the public will not know whether claimants meet rigorous or absurdly simple criteria for proving their claims, confidence in the outcome of the process will fail. This has the potential to be very powerful -- or a joke. Maybe the court won't accept this aspect of the deal unless the transparency of the process through which claimants come forward and their claims are vetted improves. Imagine if the process of registering a copyright at the Copyright Office were secret and only the result, that a copyright was registered, were available. No actual registration, no basis for disputing whether a claim is valid.

Many people anticipate a slew of murky claims to be disputed by various claimants (where, for example, no one is sure whether rights reverted, or sales of assets were not accompanied by clear copyright titles, etc.), but the whole idea of orphan works is that there's no one around to claim the work. This could make spurious claims easy to perpetrate because of the likelihood that there's no one to take you to task for fraudulently claiming. This worries me.

I want this process to work. I think it has a much better chance of working than that piece of, uh, than that piece of legislation that nearly passed earlier this fall. It doesn't give us an answer today and it *only* deals with books, so it's not a comprehensive solution, but it might serve as an example of what works, assuming it does work. But libraries can still do their own research on individual titles that they think may be orphans while we wait for this deal's market incentives to do their job, and for it to become clear that transparency is in the owners' best interests as well as the public's.

For example, I believe that the OCLC's Copyright Evidence Registry is just as important today as it was 5 days ago before Google announced this deal. Although the publisher/author Registry has potential to be definitive, there will be need for multiple sources of information about the copyright status of works until the publisher/author Registry earns its keep. No source that wants to be definitive can do so if it can't be trusted. In the absence of trust, we will absolutely need to view it as just one source of information, to be accumulated with other, hopefully more trustworthy sources, and then make our decisions, based on our own risk tolerance levels, what we're comfortable is orphan and what's not.

Speculation is fun. But this deal offers a real living, breathing experiment for bringing orphan works to a new audience, and for bringing information about what works are orphans to light as well. The settlement is not written in stone. I know from working with Google as a Book Search Partner that Google doesn't work at the level of its contractual commitments. It sees those commitments as starting points and works up from there. If there are aspects of the settlement that threaten its value, they will be addressed. I think the transparency of the Registry process and outcomes is one of those elements.

February 10, 2010

No Fair Use For E-Reserves Or Online Courses?

Many of you, like me, have been watching the publishers' (plaintiffs) lawsuit against Georgia State University (GSU) concerning the amount of copyrighted material posted in the University's electronic reserves and online course management systems, pursuant to fair use. The burning question for me is how much was too much for the publishers?

Before the lawsuit was filed, materials on GSU's e-reserves could be viewed by anyone, enabling the publisher plaintiffs to acquire a great deal of data. Fifteen of their works were selected for inclusion in their complaint as illustrative of uses far in excess of fair use and, thus, requiring permission. The complaint names the work and the amount (pages or chapters) posted in e-reserves. [No specific examples were given for works used in online courses since those courses were access-protected].

What we all want to know, however, and what I was unable to locate in any coverage of this lawsuit, is how much of a work is so far beyond fair that it would trigger a lawsuit? In meaningful terms, like a percentage. Number of pages used is less than useful without knowing the total number of pages in the work. Oddly enough, the complaint does not supply that information at all. From the complaint, there is no way of evaluating what publishers consider fair or how far apart publishers and universities might be on this issue.

Ok. You can figure it out yourself, just takes more time. Using the amounts stated in the complaint versus the total amount of pages in the work, I obtained percentages for all but one older work.

[Note: Since I was working through Amazon, occasionally I only knew the number of pages up to the last chapter. This means that any errors favor the publishers, since I was using a total that was actually less than the true total.]

Are you sitting down? Here are the percentages used by GSU E-Reserves that resulted in a lawsuit; they are in ascending order rather than tracking the complaint.
4.6%, 9%, 9.9%, 11.5%, 12%, 12.4%, 12.5%, 13%, 13%, 15.7%, 18%, 22%, 26%, and 26.4%.
The average is 14.7%
The median is 12.75%
The mode is either 12 or 13%, depending on rounding.

I confess to finding these numbers remarkable. They speak for themselves. However, I might suggest that one reasonable view of these figures would be that, according to these publisher plaintiffs, there IS no fair use for e-reserves (or, by extrapolation, online courses). Meaning every use requires permission.

Does that bother you?

peggy

February 26, 2010

Library Copyright Alliance Enters the Online Video Discussion

I meant to post this information when it came out recently but better late than never. For those of you following the conversation on whether or not the law permits educational institutions to stream entire movies or videos within an online course, the Library Copyright Alliance has joined the discussion with an issue brief accessible from this site: http://www.arl.org/news/pr/Streaming-Films-19feb10.shtml

One of the brief's authors is CIP's own Peter Jaszi, and the brief is certainly a valuable contribution to the debate. Some of the press reports I have seen about it, however, almost suggest that the brief settles the matter and institutions can rely on it for both policy and practice purposes. It is being stated that the library associations have determined that this practice is well within the law and the light is green.

It will be interesting to see whether policies or practices at institutions change in reliance on this issue brief - I would caution that, as usual, the press may be overstating the conclusions or analysis presented.

Peggy

About Library Digitization

This page contains an archive of all entries posted to ©ollectanea in the Library Digitization category. They are listed from oldest to newest.

Higher Education is the previous category.

Licensing is the next category.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 4.1