Harvard Converts Millions of Legal Documents into Open Data: Three hundred and sixty years of United States caselaw

Harvard Law School to scan all federal and state court cases and get them online (for free) in a machine readable format (not just PDFs!), with open APIs for anyone to use. And, earlier this week, case.law officially launched, with 6.4 million cases, some going back as far as 1658. There are still some limitations — some placed on the project by its funding partner, Ravel, which was acquired by LexisNexis last year (though, the structure of the deal will mean some of these restrictions will likely decrease over time).

Also, the focus right now is really on providing this setup as a tool for others to build on, rather than as a straight up interface for anyone to use. As it stands, you can either access data via the site’s API, or by doing bulk downloads. Of course, the bulk downloads are, unfortunately, part of what’s limited by the Ravel/LexisNexis data. Bulk downloads are available for cases in Illinois and Arkansas, but that’s only because both of those states already make cases available online. Still, even with the Ravel/LexisNexis limitation, individual users can download up to 500 cases per day.

The real question is what will others build with the API. The site has launched with four sample applications that are all pretty cool.

  • H2O is a tool that law professors can use to easily create casebooks for students in various areas of law. Anything published on H2O gets a Creative Commons license and can then be shared widely. I wonder if professors like Eric Goldman, who offers an Internet Law Casebook, or James Grimmelmann, who has a different Internet Law Casebook, will eventually port them over to a platform like H2O.
  • A wordcloud app that currently shows the “most used words” in California cases in various years. Here, for example, are the word clouds in California cases from 1871… and 2012. See if you can tell which one’s which.

Author: user

Leave a Reply

Your email address will not be published. Required fields are marked *