ACL Data and Code Repository

From ACL Wiki
Revision as of 06:29, 4 June 2008 by Pdturney (Talk | contribs)

Jump to: navigation, search

The ACL Data and Code Repository is a repository of data (e.g., hand-labeled text, hand-parsed text, feature vectors for machine learning, etc.) and source code (e.g., taggers, parsers, chunkers, etc.) for computational linguistics and natural language processing. The goal of the repository is to make it easier for researchers to replicate each other's work and to compare different approaches using the same benchmarks.

If you are contributing source code, you should consider SourceForge or another code hosting service, instead of the ACL Data and Code Repository. These services have features, such as version control and bug tracking, that are not available here. The ACL Data and Code Repository is more suitable for static, archival, historical source code, rather than dynamic, evolving source code.

The ACL Data and Code Repository is experimental. There are limits to the size of an upload. If you pass the limit or encounter other problems, let us know.


Instructions

  • Metadata: Each item in the repository must have an associated metadata entry in the ACL Wiki. Choose a good name for your contribution and add "(Repository)" to the name (e.g., "Wall Street Journal Corpus (Repository)"). Add a link below, under Data or Source Code, to a new ACL Wiki entry, using this name. Click on the link to begin editing the metadata for your new entry.
  • Depositor: The first entry in the metadata should be the name of the person who is depositing the data or code. Include contact information, such as a link to your personal web page. Include the date the deposit was made.
  • Copyright: The second entry in the metadata should state who owns the copyright for the data or code, and the date of the copyright. If you (the depositor) do not own the copyright, then you must explicitly state that the owner of the copyright has granted you permission to contribute the data or code to the ACL Data and Code Repository. Contributions that violate copyright will be deleted.
  • Licensing: State the terms under which the data or code may be used. For data, we suggest one of the Creative Commons licenses. For code, we suggest one of the GNU licenses. You may use any other widely recognized license. You should not create your own special custom license. Contributions without license terms will be deleted.
  • Citation: State how (or whether) you would like the item to be cited or acknowledged.
  • Description: Briefly describe the data or code. A longer description and documentation should be included in the download package.
  • Packaging: The item should be a .zip or .gz file. If you have multiple files (and you should have at least the data or code file plus the documentation file), bundle them together into a single .zip or .gz file.
  • Uploading: Upload here.
  • Linking: Add a link to the uploaded file in the metadata page.


Data

Source Code