Difference between revisions of "Hypergraph Format"
Jump to navigation
Jump to search
Line 25: | Line 25: | ||
Con: | Con: | ||
− | * "It's really easy to get up to some of the data size limits that are in place to prevent malicious data from having the PB parser allocate too much memory". [http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google.protobuf.io.coded_stream.html | + | * "It's really easy to get up to some of the data size limits that are in place to prevent malicious data from having the PB parser allocate too much memory". Some of the limits are described in the section describing SetTotalBytesLimit on [http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google.protobuf.io.coded_stream.html this page]. |
* "You typically have to create a full hypergraph protocol buffer object before you can serialize it, so you either have to use the PB data structures internally in your code or you have to copy your data structure. While doing this copy, you can end up with two copies of the forest in memory, which is bad for memory usage." | * "You typically have to create a full hypergraph protocol buffer object before you can serialize it, so you either have to use the PB data structures internally in your code or you have to copy your data structure. While doing this copy, you can end up with two copies of the forest in memory, which is bad for memory usage." | ||
Revision as of 04:38, 8 November 2010
JSON
Pro:
- Implementations in every language (often packaged with language).
- Human readable
- Already used in CDec for forest output
Con:
- Space inefficiency
Protocol Buffers
Pro:
- Conversion to and from JSON (protobuf-json)
- Very fast to read (particularly in C++ and Java, hopefully soon in python)
- Very space efficient
- Implementations in every language (although requires a separate library)
- Automatically generates typed stubs
Con:
- "It's really easy to get up to some of the data size limits that are in place to prevent malicious data from having the PB parser allocate too much memory". Some of the limits are described in the section describing SetTotalBytesLimit on this page.
- "You typically have to create a full hypergraph protocol buffer object before you can serialize it, so you either have to use the PB data structures internally in your code or you have to copy your data structure. While doing this copy, you can end up with two copies of the forest in memory, which is bad for memory usage."
Variation of SLF (Standard Lattice Format)
Pro:
- Blindingly fast.
- Could be implemented to work lazy/streaming.
Con:
- Requires a custom format
- Probably need specialized language bindings.