YouTube Ingest by Hand (Part 1)

YouTube Ingest by Hand (Part 1)

Has anyone ever actually got YouTube's MPEG-DASH live streaming ingest thing to work? I've got all of the API stuff handled correctly. I'm producing an MPD that YouTube's API accepts without error. And I'm pushing the media MP4 files which are also accepted without error, but the live stream never advances from "ready" to "active". I don't get it.

I've got the presentation time starting from zero. It is contiguous across the segmented media files. The MPD is updated every 45 seconds with the current media sequence number. What else am I missing?


Okay. I'm making progress. At one point when I was stuck yesterday and I asked ChatGPT to tell me about the initialization segment format and it described a big-endian 4-byte length field preceding each of the SPS and PPS values. Turns out that was not entirely accurate....

So now I need to figure out how to make an empty ISO BMFF (mp4) file with AVAssetWriter, which doesn't seem to like writing empty files. I could try throwing in a key frame but YouTube imposes a 100KB limit, so, hmm.


Gave up on that and just used the key frame from the most recent GoP. That got AVAssetWriter producing a valid single-frame MP4 file. Sent that off to YouTube. The stream still won't become active. No errors.


I clicked "Copy code" in ChatGPT for the first time. I feel forever tainted.

It was just bit-shifting code. I can stop any time.


I had previously removed the audio track from my tests so that I could focus on getting video working. Maybe YouTube really does mean it when they say that the stream must have audio? Seems like a strange requirement. Alrighty. Let's work audio back into the test.


YouTube, in its infinite wisdom, does not accept H.265 (HEVC) via MPEG-DASH, so I have to run the camera's video stream through a transcoder. Prior to this, I've got an array of audio and video packets all commingled ("muxed"). The GOP size going into the video transcoder is not going to be the same as the GOP size coming out, so ... I need to hang onto the audio packets (and their timing information) and take a handful off the queue equivalent to the duration of the transcoded GOP (once the transcoder spits it out). Seems straightforward enough....


Okay. I did that. VLC confirms that my little single-GOP media files have an audio track, and my ears confirm that it is the sound of an open audio jack. Still no dice with YouTube's MPEG-DASH ingestion service though. Now what?


Half my kingdom for an error message of any sort....


Dammit. I just clicked ChatGPT's "Copy code" button again. In my defense, it's a really long switch-statement that I don't want to type. :-/


This MPD looks good, right? What am I missing?


I was hardcoding the segment duration and bandwidths. I'm calculating those now. No change. Hmm. I'm running out of ideas. Or maybe it's just a cruel ploy and MPEG-DASH for YouTube doesn't actually work at all?


Instead of providing any sort of helpful error messages, they do at least provide a handy diagram of ISO BMFF, in case, you know, you're going to whip that up by hand or something. Thanks.


I had a Mac app while back that let me inspect the structure of MPEG files. I need to find that.

Well, I found one such app (not the one I'm thinking of). "Only" $100. I'm not quite ready for that yet. Authored by @synalysis.

I was searching for Mac apps, and the app I was thinking about was an iOS app. Found it!

Quite ugly, but there's the Atom structure of my MPEG-DASH initialization segment. I wonder how much money the author got from that banner ad at the bottom. @hoolr

I only see one TRAK (for video), and no TRAK for audio. I didn't bother to put any audio data in the initialization segment MP4, which is probably why. So, let's try that....


Now I have two TRAKs. Hurray for that. Continuing on this journey, the initialization segment MP4 has the MVHD atom, but not the IODS or the HVEX atoms (as per that helpful diagram I mentioned earlier). Perhaps that's the issue. Or one of the issues.


I'm waiting for brew to update approximately a million (103) packages so that I can try ffprobe to (hopefully) verify that those atoms in fact do not exist in that file. It's compiling gcc right now. Ugh.


Okay. Maybe I'm finally figuring this out now. "A series of these segments can be appended to an Initialization segment to produce a valid and complete multiplexed ISO BMFF stream." So, my full-and-complete MP4 media files (probably) aren't correct. They should instead be these SDIX + MOOF + MDAT things. Interesting.


Ha ha. I'm asking ChatGPT how to write out a SIDX atom with AVAssetWriter. Oh, that's easy: just use the mp4box command line tool. Right.

That $99 @synalysis is looking a bit more appealing at this point. You guys want to trade for @streamieapp? So, assuming that my MP4 files contain TRAFs in the TRAK, it seems like my only path forward is to manually parse them out to make this segmented media file.

OR! I wonder if I could just skip that entirely, and manually write the SIDX + MOOF + TRAF atoms with the already-encoded video frames that I have as input.

Hmm.


"Does the TRAF atom typically contain a single video frame?" Apparently the answer is explicitly both yes and no.

"Each Track Box SHOULD contain a sample table, but its sample count MUST be zero." So, even if I can write out these fragmented MP4 files, my initialization segment (with a single video frame to make AVAssetWriter happy) is wrong according to the spec.

Time for dinner. To be continued.

Subscribe to A garage sale for your mind

Don’t miss out on the latest posts. Sign up now to get access to the library of members-only posts.
[email protected]
Subscribe