Teraflop AI is excited to help support the Caselaw Access Project and Harvard Library Innovation Lab, in the release of over 6.6 million state and federal court decisions published throughout U.S. history.

The Caselaw Access Project

In collaboration with Ravel Law, @hlslib digitized over 40 million U.S. court decisions consisting of 6.7 million cases from the last 360 years into a dataset that is widely accessible to use. You can access a bulk download of the data through the Caselaw Access Project API (CAPAPI):

https://case.law/caselaw/

You can find more information about accessing state and federal written court decisions of common law through the bulk data service documentation:

https://case.law/docs/

It is important to democratize fair access to data to the public, legal community, and researchers. You can find a processed and cleaned version of the data available on @huggingface here:

You can learn more about the Caselaw Access Project and all of the phenomenal work done by Jack Cushman, @leppert, and @macargnelutti here:

https://case.law/about/

You can watch a live stream of the release here:

https://lil.law.harvard.edu/about/cap-celebration/stream

Post-processing

During the digitization of these texts, there were erroneous OCR errors that occurred. We worked to post-process each of the texts for model training to fix encoding, normalization, repetition, redundancy, parsing, and formatting.

Teraflop AI’s data engine allows for the massively parallel processing of web-scale datasets into cleaned text form. Our one-click deployment allowed us to easily split the computation between 1000s of nodes on our managed infrastructure.

Nomic Atlas

Thank you to @nomic_ai for providing us with Atlas research credits to store and visualize each of the jurisdictions in this dataset. You can view a Nomic Atlas map of New York state court decisions here:

https://atlas.nomic.ai/data/teraflop-ai/ny-law/map

You can access the New York jurisdiction map and all of the other @nomic_ai Atlas maps on @huggingface here: