Distributed Checksum Calculation System

Distributed Checksum Calculation System for Kora

As part of a senior capstone project within Michigan State University's Computer Science Collaborative Design course, a distributed checksum calculation system was created. The system was created to allow organizations to abide by current digital archiving standards in a timespan that was acceptable. Download a free copy today and try it out.

Checksums are small values which can be computed using binary operations to test the integrity of a piece of data such as a file.

About

MATRIX is Michigan State University's Center for Humane Arts, Letters and Social Sciences Online. MATRIX, in its mission statement: “seeks to advance critical understanding and promote access to knowledge through world-class research in humanities technology.” In keeping with this mission, MATRIX is currently developing a digital archiving system with content management system-like capabilities called KORA.

This system has some specific requirements it must satisfy in order to be labeled as a digital archive. One of these requirements is an event recording system that also handles data integrity checks. While this seems like a trivial problem to solve at first, the system should scale to thousands of files and hundreds or thousands of gigabytes of information. The large amount of information that may be entered into this system poses interesting challenges to the system designer(s).

An accepted solution to aid in ensuring reliability of data is to store meta-data, or data about the data. One form of this is a digital signature such as a 'checksum' of the data.

While a checksum can be computed using several different algorithms- MATRIX is inclined to use the Secure Hash Algorithm (SHA) standard developed by the National Security Agency. Since the checksum is generally much smaller than the data itself it's quite useful to use for comparison because it will be much faster to compare checksums rather than entire files.

The Event Logging system operates as follows - the system takes in data from a database, checksums it, and stores the checksum and some identifying information, such as a filename, in a database. The point of the system is to ensure the data does not degrade over time. To accomplish this the system will intermittently checksum each data element and create a log file to note any additions, removals, or errors in the archive.

Software Requirements

License

This software is under the MIT License: Copyright © 2008, Michigan State University

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Picture of team members

Team Members