-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream support #60
Comments
Hey @drewlio! Thanks so much for jumping in and getting involved! Sorry for the delayed response - as you might imagine, I really didn't get much done last week other than GRCon, and am catching up, now =) Okay, this is a really interesting comment. So, one of the top questions we get about SigMF is "why didn't you just use VITA49?". I'm really interested to know why VITA49 doesn't work for you, especially since the format you suggested ( Is there a way we could do this differently that would make it useful to you? I want to make sure I understand the difference =) |
Quite honestly I don't have any powerful reasons why VITA49 couldn't be made to work in my situation. (Similarly, you have probably considered the question 'Can't I just stream VITA49 protocol to a file and call it SigMF?' Maybe it's possible, but that doesn't mean we want to do that.) But here are the not-powerful reasons: I want a lightweight streaming/block protocol that I can freely and easily recreate the full-spec encoder/decoder in native languages (ECMAscript, Python, C, etc). I want a format that feels right when passed via file, pipe, or most importantly through a microservices architecture (I believe what Jonathan Corgan was calling "client/server"). For instance, what if a short recording needs to pass through a cache (like Redis or Memcached)? Combining the metadata and data into a single blob is good here. Another microservices example is using Nginx as a load balancer and sending snapshots via POST as application/octet-stream mime type. Also, websockets. Currently, I would not use VITA49 for these examples because of design decisions. I would come up with my own custom protocol that would look very much like SigMF. In fact, I could use SigMF as-is and put So my two suggested changes above were to:
I think these two items (number 2 is most valuable) open up a lot of possible use cases. An interest area of mine is how to leverage the horizontal scaling tech that the DEVOPS crowd developed in the last ~5 years in response to cheap cloud VMs. I think this meshes well with what I heard at the conf about a possible microservices ("client/server") architecture and could be an enabling technology for horizontally scaling Gnu Radio based systems. |
Why do we need |
@mbr0wn I could be overlooking something, but in the streaming case ( Currently in the SigMF spec there is This has different levels of importance with different implementations. For instance if you're using something raw like pipes or UDP, or some framer like E1/T1, or a one-way optical link, this is very helpful. If you have some wrapper like DDS, ZeroMQ messages, or FileMQ, the impact of not having |
I think @drewlio is getting to a really fundamental point about SigMF. Do we want to support streaming formats, or not? When we first started the SigMF effort, it was specifically to support the portability of datasets and metadata. It needed to be able to support applications (i.e., Readers and Writers) streaming data to disk, but that's quite different from supporting streaming data between applications. In short, SigMF was not originally designed to support streaming metadata (like VITA49 was), and that's apparent in its design, as @drewlio points out. At GRCon, I heard from a lot of people that wanted this functionality. Indeed, here at DeepSig, we are talking about doing the same thing. One of the main reasons we haven't gone after streaming support in the past was because VITA49 exists and "already does that", and since the goal of SigMF is not to just be another standard for the sake of having one, and we hadn't identified streaming support as a primary goal, we weren't focused on enabling it. I think it's time to circle back around to this question and re-debate it. Specifically, is a goal of SigMF to create a standard that can be used for streaming data? If we want the answer to this to be @drewlio provided some interesting insight as to why he wouldn't use VITA49. I'm copy/pasting it here, for reference:
I've heard from a lot of people that don't like VITA49, especially at GRCon, but that doesn't necessarily mean that we should try to re-invent it. I would really like to hear opinions on this topic. What are your thoughts? Suggestions? Is there something VITA49 can't do or does poorly that you would like to see in SigMF? Why should this be something we address? |
Just some thoughts, not really for/against either one, but might be the pain points that are causing you to hear grumblings from GR users. The people that said they want an alternative to Vita 49 probably don't want to replicate all functionality of Vita 49, but probably do want these:
I would encourage you to look at SigMF as an easy and open standard for storing and passing a finite amount of signal data and metadata in a medium/transport agnostic way. It's good for files, blobs, streams--it's just payload in your medium/transport. That leaves a clear distinction between SigMF and Vita 49. Vita 49 is better for low-level (ie FPGA) integration due to the fixed packet structure and natively handles the streaming protocol including acknowledgements as well as a fixed device control lexicon. The stable ANSI spec is better for long development timelines. I see them as serving two distinct purposes, each with their place. In the extreme case, fully streaming SigMF could have a partial capability overlap with Vita 49. If the prospect of fragmenting users in that case is a concern enough to limit the use of SigMF to file pairs only, then that's an ok decision to make. |
We have used something similar to the following in practice and it was effective:
One thing that's nice about prepending the metadata with its length is that you don't have to read the stream char by char and try to watch for balanced object braces on the fly. In our case, we did have the
That way you don't have to parse the JSON on-the-fly, you can just stream both directly into a file. It would be an extremely minimalist extension to SigMF that would facilitate efficient streaming. |
@djanderson Yeah, I get that. It seems like that would have it's advantages. It's one step toward having all fixed-width fields like Vita 49. There is a trade off for the designers between high-level structure for convenience and low-level structure for performance. |
The assumption I'm making is that if someone is choosing streaming, they've already identified a performance or latency bottleneck that they're trying to address. Most of the use cases you specified fit into that category: caches, load balancers, sockets. In our case, we POSTed files via TCP if we could, but streamed the data over a UDP socket if low latency was more important than data integrity. I don't know if forcing the metadata/data stream to be less than 18 quintillion bytes before another metadata message could be considered moving toward "fixed-width fields" :) but it does mean you can't just have a single metadata file followed by an infinite stream of data. I don't think we should strive to provide that. I was just throwing it out there because it would allow us to provide for the streaming scenario (which I support) without actually baking that into the metadata format itself, because I agree with you: SigMF's simplicity is one of its biggest strengths. |
for fixed field I meant the leading uint64 preceding the JSON. but that's a minor point. You know, actually, when talking about all the caches and stuff, I was actually thinking more of API options than performance. Like if the data is in a single blob then it's a easy fit for stuff like Mongo, Redis, POST, ZeroMQ. These are technologies that work well with horizontal scaling, so it's closely tried to performance conversations. |
@drewlio, for that use case, have you seen our "archive" format, which packages metadata and data together in a single blob-like archive? We're also working on compression support for them in #68. That doesn't address streaming them back-to-back, but you can break an acquisition into arbitrarily many captures and archive them up as a single blob. |
The uncompressed archive format could work. It covers the two suggestions, which were:
Tar is a bit bloated, but the data portion will always be so big that I think it is reasonable. (even for a small snapshot like 100ms at 1MSPS, the total tar overhead is <1%) There is a usability consideration and one important caveat: Usability
IMO this loses some of the simplicity. In fact, I might rather just use what you suggested earlier: Caveat If you were to put this requirement (that metadata comes first) in the archive spec, that would be a pain for a lot of people who might be creating multi-signal archives on the command line like: It's even a pain for single signal archives because "sigmf-data" is alphabetically before "sigmf-meta". So this command would create the archive in the wrong order: So in summary, for several reasons it's hard (but not impossible) to make the archive format work well for stream-wise processing case, but it is reasonable for the blob-wise processing case. EDIT: And I think blob-wise processing is where big wins are to be had leveraging horizontal scaling technologies so I would be agreeable to adopting the uncompressed tar-based archive format as the suggested blob-wise solution. |
This discussion was really finalized by @drewlio back in 2017, and has since been addressed in several other Issues & PRs. Cleaning up this issue. |
@bhilburn Great talk at GRCon17. That is what has brought me here.
I was very excited to hear about SigMF because it's very close to solving a problem I have. I also have experience with other standards (Vita49, Midas Blue, custom) that don't quite solve my problem. However, I was very sad to see it's limited to flat files and doesn't seem to have a provision for stream transport. My problem is that I would like to "plug in" a stream and be able to figure out how to process it via embedded metadata. I believe SigMF could do this with some very small allowances in the spec:
length
in the core namespace, representing length of the dataset.[METADATA][DATASET][METADATA][DATASET][METADATA][DATASET]
EDIT--removed my point number 3 because the spec already has all JSON in a top-level object. (I originally missed this)
There are a lot of nice things about this. It's agnostic of your transport, and the overhead is application dependent. Some people might want infinite dataset length with one metadata header at the front. Some might want to tune
length
to repeat the metadata periodically, maybe every 10ms, so when a stream is attached it can discover the metadata and start processing in a timeframe that is appropriate for the application.The text was updated successfully, but these errors were encountered: