A Dataset Showing a Century of Evolution in the Complexity of the United States Legal Code

We leverage OCR and Generative AI techniques to recover and clean printed historical editions of the Code. This enables computational analysis of federal law even in periods before web-based digital access. The processing pipeline in...

Mô tả đầy đủ

Đã lưu trong:

Chi tiết về thư mục
Tác giả chính:	Dawoon Jeong (22382975) (author)
Tác giả khác:	James Holehouse (21698807) (author), Jisung Yoon (13043034) (author), Christopher P. Kempes (21698853) (author), Geoffrey B. West (21698858) (author), Hyejin Youn (21698921) (author)
Được phát hành:	2025
Những chủ đề:	Legal institutions (incl. courts and justice systems) Legal practice, lawyering and the legal profession Litigation, adjudication and dispute resolution U.S. Code Dataset Legal System Dynamics Regulatory Complexity
Các nhãn:	Thêm thẻ Không có thẻ, Là người đầu tiên thẻ bản ghi này!

Miêu tả
Tóm tắt:	<p dir="ltr">We leverage <b>OCR</b> and <b>Generative AI</b> techniques to recover and clean printed historical editions of the Code. This enables computational analysis of federal law even in periods before web-based digital access. The processing pipeline includes:</p><ul><li> <b>Contents of U.S. Code</b>: Word counts, unique word counts, entropy, scaling exponents, etc.</li><li> <b>Hierarchical Structure</b>: Subtitle → Part → Chapter → Section → Subsection...</li><li> <b>Cross-Reference Relationships</b>: Title-to-title citation relationships</li></ul><p dir="ltr">Due to repository size constraints, this GitHub includes:</p><ul><li> A sample OCR text page (<code>ocr_processing_gemini</code>) for demonstration</li><li> Web-based U.S. Code text from 1994 for structural parsing (<code>Data Set 2</code>)</li></ul><p></p>

A Dataset Showing a Century of Evolution in the Complexity of the United States Legal Code

Những quyển sách tương tự