Android Extract Decode Encode Mux Audio

Question

Nov 26, 2015, 04:47 AM

android audio mediacodec multiplexing mediamuxer

Android Extract Decode Encode Mux Audio

Ich versuche, den Code in @ anzupass ExtractDecodeEditEncodeMuxTest.java, um Audio- und Videodaten von einem über das Gerät.capture.captureVideo von Cordova aufgenommenen MP3 zu extrahieren, das Audio zu dekodieren, die dekodierten Audio-Samples zu bearbeiten, das Audio zu kodieren und das Audio mit dem Video zurückzuspielen und erneut als MP3 zu speichern.

Mein erster Versuch besteht einfach darin, Audio zu extrahieren, zu decodieren, zu codieren und zu muxen, ohne zu versuchen, eines der Audio-Samples zu bearbeiten. Wenn ich das kann, bin ich mir ziemlich sicher, dass ich die decodierten Samples wie gewünscht bearbeiten kann. Ich muss das Video nicht bearbeiten, daher gehe ich davon aus, dass ich MediaExtractor einfach zum Extrahieren und Muxen der Videospur verwenden kann.

Das Problem, das ich habe, ist jedoch, dass ich den Audio-Dekodierungs- / Kodierungsprozess scheinbar nicht richtig ausführen kann. Was weiterhin passiert, ist, dass der Muxer die mp4 aus der extrahierten Videospur und der extrahierten -> decodierten -> codierten Audiospur erstellt. Während das Video gut abgespielt wird, beginnt das Audio mit einem kurzen Rauschstoß Einige Sekunden, in denen Audiodaten normal wiedergegeben werden (aber am Anfang des Videos), danach wird das restliche Video stummgeschaltet.

Einige der relevanten Felder:

private MediaFormat audioFormat;
private MediaFormat videoFormat;
private int videoTrackIndex = -1;
private int audioTrackIndex = -1;
private static final int MAX_BUFFER_SIZE = 256 * 1024;

// parameters for the audio encoder
private static final String OUTPUT_AUDIO_MIME_TYPE = "audio/mp4a-latm"; // Advanced Audio Coding
private static final int OUTPUT_AUDIO_CHANNEL_COUNT = 2; // Must match the input stream. not using this, getting from input format
private static final int OUTPUT_AUDIO_BIT_RATE = 128 * 1024;
private static final int OUTPUT_AUDIO_AAC_PROFILE = MediaCodecInfo.CodecProfileLevel.AACObjectHE; //not using this, getting from input format 
private static final int OUTPUT_AUDIO_SAMPLE_RATE_HZ = 44100; // Must match the input stream
private static final String TAG = "vvsLog";
private static final Boolean DEBUG = false;
private static final Boolean INFO = true;
/** How long to wait for the next buffer to become available. */
private static final int TIMEOUT_USEC = 10000;
private String videoPath;

Der Code zur Konfiguration von Decoder, Encoder und Muxer:

MediaCodecInfo audioCodecInfo = selectCodec(OUTPUT_AUDIO_MIME_TYPE);
    if (audioCodecInfo == null) {
        // Don't fail CTS if they don't have an AAC codec (not here, anyway).
        Log.e(TAG, "Unable to find an appropriate codec for " + OUTPUT_AUDIO_MIME_TYPE);
        return;
    }

    MediaExtractor videoExtractor = null;
    MediaExtractor audioExtractor = null;
    MediaCodec audioDecoder = null;
    MediaCodec audioEncoder = null;
    MediaMuxer muxer = null;

    try {

        /**
         * Video
         * just need to configure the extractor, no codec processing required
         */
        videoExtractor = createExtractor(originalAssetPath);
        String vidMimeStartsWith = "video/";
        int videoInputTrack = getAndSelectTrackIndex(videoExtractor, vidMimeStartsWith);
        videoFormat = videoExtractor.getTrackFormat(videoInputTrack);

        /**
         * Audio
         * needs an extractor plus an audio decoder and encoder
         */
        audioExtractor = createExtractor(originalAssetPath);
        String audMimeStartsWith = "audio/";
        int audioInputTrack = getAndSelectTrackIndex(audioExtractor, audMimeStartsWith);
        audioFormat = audioExtractor.getTrackFormat(audioInputTrack);
        audioFormat.setInteger(MediaFormat.KEY_SAMPLE_RATE,OUTPUT_AUDIO_SAMPLE_RATE_HZ);

        MediaFormat outputAudioFormat = MediaFormat.createAudioFormat(OUTPUT_AUDIO_MIME_TYPE,
                audioFormat.getInteger(MediaFormat.KEY_SAMPLE_RATE),
                audioFormat.getInteger(MediaFormat.KEY_CHANNEL_COUNT));
        outputAudioFormat.setInteger(MediaFormat.KEY_AAC_PROFILE, audioFormat.getInteger(MediaFormat.KEY_AAC_PROFILE));
        outputAudioFormat.setInteger(MediaFormat.KEY_BIT_RATE, OUTPUT_AUDIO_BIT_RATE);

        // Create a MediaCodec for the decoder, based on the extractor's format, configure and start it.
        audioDecoder = createAudioDecoder(audioFormat);
        // Create a MediaCodec for the desired codec, then configure it as an encoder and start it.
        audioEncoder = createAudioEncoder(audioCodecInfo, outputAudioFormat);

        //create muxer to overwrite original asset path
        muxer = createMuxer(originalAssetPath);

        //add the video and audio tracks
        /**
         * need to wait to add the audio track until after the first encoder output buffer is created
         * since the encoder changes the MediaFormat at that time
         * and the muxer needs the correct format, including the correct Coded Specific Data (CSD) ByteBuffer
         */

        doExtractDecodeEditEncodeMux(
                videoExtractor,
                audioExtractor,
                audioDecoder,
                audioEncoder,
                muxer);

    }

Das Monster doExtractDecodeEditEncodeMux Methode:

private void doExtractDecodeEditEncodeMux(
        MediaExtractor videoExtractor,
        MediaExtractor audioExtractor,
        MediaCodec audioDecoder,
        MediaCodec audioEncoder,
        MediaMuxer muxer) {

    ByteBuffer videoInputBuffer = ByteBuffer.allocate(MAX_BUFFER_SIZE);
    MediaCodec.BufferInfo videoBufferInfo = new MediaCodec.BufferInfo();

    ByteBuffer[] audioDecoderInputBuffers = null;
    ByteBuffer[] audioDecoderOutputBuffers = null;
    ByteBuffer[] audioEncoderInputBuffers = null;
    ByteBuffer[] audioEncoderOutputBuffers = null;
    MediaCodec.BufferInfo audioDecoderOutputBufferInfo = null;
    MediaCodec.BufferInfo audioEncoderOutputBufferInfo = null;

    audioDecoderInputBuffers = audioDecoder.getInputBuffers();
    audioDecoderOutputBuffers =  audioDecoder.getOutputBuffers();
    audioEncoderInputBuffers = audioEncoder.getInputBuffers();
    audioEncoderOutputBuffers = audioEncoder.getOutputBuffers();
    audioDecoderOutputBufferInfo = new MediaCodec.BufferInfo();
    audioEncoderOutputBufferInfo = new MediaCodec.BufferInfo();

    /**
     * sanity checks
     */
    //frames
    int videoExtractedFrameCount = 0;
    int audioExtractedFrameCount = 0;
    int audioDecodedFrameCount = 0;
    int audioEncodedFrameCount = 0;
    //times
    long lastPresentationTimeVideoExtractor = 0;
    long lastPresentationTimeAudioExtractor = 0;
    long lastPresentationTimeAudioDecoder = 0;
    long lastPresentationTimeAudioEncoder = 0;

    // We will get these from the decoders when notified of a format change.
    MediaFormat decoderOutputAudioFormat = null;
    // We will get these from the encoders when notified of a format change.
    MediaFormat encoderOutputAudioFormat = null;
    // We will determine these once we have the output format.
    int outputAudioTrack = -1;
    // Whether things are done on the video side.
    boolean videoExtractorDone = false;
    // Whether things are done on the audio side.
    boolean audioExtractorDone = false;
    boolean audioDecoderDone = false;
    boolean audioEncoderDone = false;
    // The audio decoder output buffer to process, -1 if none.
    int pendingAudioDecoderOutputBufferIndex = -1;

    boolean muxing = false;

    /**
     * need to wait to add the audio track until after the first encoder output buffer is created
     * since the encoder changes the MediaFormat at that time
     * and the muxer needs the correct format, including the correct Coded Specific Data (CSD) ByteBuffer
     * muxer.start();
     * muxing = true;
     */

    MediaMetadataRetriever retrieverTest = new MediaMetadataRetriever();
    retrieverTest.setDataSource(videoPath);
    String degreesStr = retrieverTest.extractMetadata(MediaMetadataRetriever.METADATA_KEY_VIDEO_ROTATION);
    if (degreesStr != null) {
        Integer degrees = Integer.parseInt(degreesStr);
        if (degrees >= 0) {
            muxer.setOrientationHint(degrees);
        }
    }

    while (!videoExtractorDone || !audioEncoderDone) {
        if (INFO) {
            Log.d(TAG, String.format("ex:%d at %d | de:%d at %d | en:%d at %d ",
                    audioExtractedFrameCount, lastPresentationTimeAudioExtractor,
                    audioDecodedFrameCount, lastPresentationTimeAudioDecoder,
                    audioEncodedFrameCount, lastPresentationTimeAudioEncoder
                    ));
        }
        /**
         * Extract and mux video
         */
        while (!videoExtractorDone && muxing) {

            try {
                videoBufferInfo.size = videoExtractor.readSampleData(videoInputBuffer, 0);
            } catch (Exception e) {
                e.printStackTrace();
            }

            if (videoBufferInfo.size < 0) {
                videoBufferInfo.size = 0;
                videoExtractorDone = true;
            } else {
                videoBufferInfo.presentationTimeUs = videoExtractor.getSampleTime();
                lastPresentationTimeVideoExtractor = videoBufferInfo.presentationTimeUs;
                        videoBufferInfo.flags = videoExtractor.getSampleFlags();
                muxer.writeSampleData(videoTrackIndex, videoInputBuffer, videoBufferInfo);
                videoExtractor.advance();
                videoExtractedFrameCount++;
            }
        }

        /**
         * Extract, decode, watermark, encode and mux audio
         */

        /** Extract audio from file and feed to decoder. **/
        while (!audioExtractorDone  && (encoderOutputAudioFormat == null || muxing)) {
            int decoderInputBufferIndex = audioDecoder.dequeueInputBuffer(TIMEOUT_USEC);
            if (decoderInputBufferIndex == MediaCodec.INFO_TRY_AGAIN_LATER) {
                break;
            }
            if (DEBUG) {
                Log.d(TAG, "audio decoder: returned input buffer: " + decoderInputBufferIndex);
            }
            ByteBuffer decoderInputBuffer = audioDecoderInputBuffers[decoderInputBufferIndex];
            int size = audioExtractor.readSampleData(decoderInputBuffer, 0);
            long presentationTime = audioExtractor.getSampleTime();
            lastPresentationTimeAudioExtractor = presentationTime;
            if (DEBUG) {
                Log.d(TAG, "audio extractor: returned buffer of size " + size);
                Log.d(TAG, "audio extractor: returned buffer for time " + presentationTime);
            }
            if (size >= 0) {
                audioDecoder.queueInputBuffer(
                        decoderInputBufferIndex,
                        0,
                        size,
                        presentationTime,
                        audioExtractor.getSampleFlags());
            }
            audioExtractorDone = !audioExtractor.advance();
            if (audioExtractorDone) {
                if (DEBUG) Log.d(TAG, "audio extractor: EOS");
                audioDecoder.queueInputBuffer(
                        decoderInputBufferIndex,
                        0,
                        0,
                        0,
                        MediaCodec.BUFFER_FLAG_END_OF_STREAM);
            }
            audioExtractedFrameCount++;
            // We extracted a frame, let's try something else next.
            break;
        }

        /**
         * Poll output frames from the audio decoder.
         * Do not poll if we already have a pending buffer to feed to the encoder.
         */
        while (!audioDecoderDone && pendingAudioDecoderOutputBufferIndex == -1 && (encoderOutputAudioFormat == null || muxing)) {
            int decoderOutputBufferIndex =
                    audioDecoder.dequeueOutputBuffer(
                            audioDecoderOutputBufferInfo, TIMEOUT_USEC);
            if (decoderOutputBufferIndex == MediaCodec.INFO_TRY_AGAIN_LATER) {
                if (DEBUG) Log.d(TAG, "no audio decoder output buffer");
                break;
            }
            if (decoderOutputBufferIndex == MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED) {
                if (DEBUG) Log.d(TAG, "audio decoder: output buffers changed");
                audioDecoderOutputBuffers = audioDecoder.getOutputBuffers();
                break;
            }
            if (decoderOutputBufferIndex == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
                decoderOutputAudioFormat = audioDecoder.getOutputFormat();
                if (DEBUG) {
                    Log.d(TAG, "audio decoder: output format changed: "
                            + decoderOutputAudioFormat);
                }
                break;
            }
            if (DEBUG) {
                Log.d(TAG, "audio decoder: returned output buffer: "
                        + decoderOutputBufferIndex);
            }
            if (DEBUG) {
                Log.d(TAG, "audio decoder: returned buffer of size "
                        + audioDecoderOutputBufferInfo.size);
            }
            ByteBuffer decoderOutputBuffer =
                    audioDecoderOutputBuffers[decoderOutputBufferIndex];
            if ((audioDecoderOutputBufferInfo.flags & MediaCodec.BUFFER_FLAG_CODEC_CONFIG)
                    != 0) {
                if (DEBUG) Log.d(TAG, "audio decoder: codec config buffer");
                audioDecoder.releaseOutputBuffer(decoderOutputBufferIndex, false);
                break;
            }
            if (DEBUG) {
                Log.d(TAG, "audio decoder: returned buffer for time "
                        + audioDecoderOutputBufferInfo.presentationTimeUs);
            }
            if (DEBUG) {
                Log.d(TAG, "audio decoder: output buffer is now pending: "
                        + pendingAudioDecoderOutputBufferIndex);
            }
            pendingAudioDecoderOutputBufferIndex = decoderOutputBufferIndex;
            audioDecodedFrameCount++;
            // We extracted a pending frame, let's try something else next.
            break;
        }

        // Feed the pending decoded audio buffer to the audio encoder.
        while (pendingAudioDecoderOutputBufferIndex != -1) {
            if (DEBUG) {
                Log.d(TAG, "audio decoder: attempting to process pending buffer: "
                        + pendingAudioDecoderOutputBufferIndex);
            }
            int encoderInputBufferIndex = audioEncoder.dequeueInputBuffer(TIMEOUT_USEC);
            if (encoderInputBufferIndex == MediaCodec.INFO_TRY_AGAIN_LATER) {
                if (DEBUG) Log.d(TAG, "no audio encoder input buffer");
                break;
            }
            if (DEBUG) {
                Log.d(TAG, "audio encoder: returned input buffer: " + encoderInputBufferIndex);
            }
            ByteBuffer encoderInputBuffer = audioEncoderInputBuffers[encoderInputBufferIndex];
            int size = audioDecoderOutputBufferInfo.size;
            long presentationTime = audioDecoderOutputBufferInfo.presentationTimeUs;
            lastPresentationTimeAudioDecoder = presentationTime;
            if (DEBUG) {
                Log.d(TAG, "audio decoder: processing pending buffer: "
                        + pendingAudioDecoderOutputBufferIndex);
            }
            if (DEBUG) {
                Log.d(TAG, "audio decoder: pending buffer of size " + size);
                Log.d(TAG, "audio decoder: pending buffer for time " + presentationTime);
            }
            if (size >= 0) {
                ByteBuffer decoderOutputBuffer =
                        audioDecoderOutputBuffers[pendingAudioDecoderOutputBufferIndex]
                                .duplicate();
                decoderOutputBuffer.position(audioDecoderOutputBufferInfo.offset);
                decoderOutputBuffer.limit(audioDecoderOutputBufferInfo.offset + size);
                encoderInputBuffer.position(0);
                encoderInputBuffer.put(decoderOutputBuffer);
                audioEncoder.queueInputBuffer(
                        encoderInputBufferIndex,
                        0,
                        size,
                        presentationTime,
                        audioDecoderOutputBufferInfo.flags);
            }
            audioDecoder.releaseOutputBuffer(pendingAudioDecoderOutputBufferIndex, false);
            pendingAudioDecoderOutputBufferIndex = -1;
            if ((audioDecoderOutputBufferInfo.flags
                    & MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) {
                if (DEBUG) Log.d(TAG, "audio decoder: EOS");
                audioDecoderDone = true;
            }
            // We enqueued a pending frame, let's try something else next.
            break;
        }

        // Poll frames from the audio encoder and send them to the muxer.
        while (!audioEncoderDone && (encoderOutputAudioFormat == null || muxing)) {
            int encoderOutputBufferIndex = audioEncoder.dequeueOutputBuffer(
                    audioEncoderOutputBufferInfo, TIMEOUT_USEC);
            if (encoderOutputBufferIndex == MediaCodec.INFO_TRY_AGAIN_LATER) {
                if (DEBUG) Log.d(TAG, "no audio encoder output buffer");
                break;
            }
            if (encoderOutputBufferIndex == MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED) {
                if (DEBUG) Log.d(TAG, "audio encoder: output buffers changed");
                audioEncoderOutputBuffers = audioEncoder.getOutputBuffers();
                break;
            }
            if (encoderOutputBufferIndex == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
                encoderOutputAudioFormat = audioEncoder.getOutputFormat();
                if (DEBUG) {
                    Log.d(TAG, "audio encoder: output format changed");
                }
                if (outputAudioTrack >= 0) {
                    Log.e(TAG,"audio encoder changed its output format again?");
                }
                break;
            }
            if (DEBUG) {
                Log.d(TAG, "audio encoder: returned output buffer: "
                        + encoderOutputBufferIndex);
                Log.d(TAG, "audio encoder: returned buffer of size "
                        + audioEncoderOutputBufferInfo.size);
            }
            ByteBuffer encoderOutputBuffer =
                    audioEncoderOutputBuffers[encoderOutputBufferIndex];
            if ((audioEncoderOutputBufferInfo.flags & MediaCodec.BUFFER_FLAG_CODEC_CONFIG)
                    != 0) {
                if (DEBUG) Log.d(TAG, "audio encoder: codec config buffer");
                // Simply ignore codec config buffers.
                audioEncoder.releaseOutputBuffer(encoderOutputBufferIndex, false);
                break;
            }
            if (DEBUG) {
                Log.d(TAG, "audio encoder: returned buffer for time "
                        + audioEncoderOutputBufferInfo.presentationTimeUs);
            }
            if (audioEncoderOutputBufferInfo.size != 0) {
                lastPresentationTimeAudioEncoder = audioEncoderOutputBufferInfo.presentationTimeUs;
                muxer.writeSampleData(
                        audioTrackIndex, encoderOutputBuffer, audioEncoderOutputBufferInfo);
            }
            if ((audioEncoderOutputBufferInfo.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM)
                    != 0) {
                if (DEBUG) Log.d(TAG, "audio encoder: EOS");
                audioEncoderDone = true;
            }
            audioEncoder.releaseOutputBuffer(encoderOutputBufferIndex, false);
            audioEncodedFrameCount++;
            // We enqueued an encoded frame, let's try something else next.
            break;
        }

        if (!muxing && (encoderOutputAudioFormat != null)) {

            Log.d(TAG, "muxer: adding video track.");
            videoTrackIndex = muxer.addTrack(videoFormat);

            Log.d(TAG, "muxer: adding audio track.");
            audioTrackIndex = muxer.addTrack(encoderOutputAudioFormat);

            Log.d(TAG, "muxer: starting");
            muxer.start();
            muxing = true;
        }
    }
    /**
     * Done processing audio and video
     */
    Log.d(TAG,"encoded and decoded audio frame counts should match. decoded:"+audioDecodedFrameCount+" encoded:"+audioEncodedFrameCount);

    Log.d(TAG,"decoded frame count should be less than extracted frame coun. decoded:"+audioDecodedFrameCount+" extracted:"+audioExtractedFrameCount);
    Log.d(TAG,"no audio frame should be pending "+pendingAudioDecoderOutputBufferIndex);

    PluginResult result = new PluginResult(PluginResult.Status.OK, videoPath);
    result.setKeepCallback(false);
    callbackContext.sendPluginResult(result);

}

Ich sehe diesen ACodec-Fehler für die ersten mehreren hundert extrahierten Audio-Frames:

11-25 20:49:58.497   9807-13101/com.vvs.VVS430011 E/ACodec﹕ OMXCodec::onEvent, OMX_ErrorStreamCorrupt
11-25 20:49:58.497   9807-13101/com.vvs.VVS430011 W/AHierarchicalStateMachine﹕ Warning message AMessage(what = 'omx ', target = 8) = {
    int32_t type = 0
    int32_t node = 7115
    int32_t event = 1
    int32_t data1 = -2147479541
    int32_t data2 = 0
    } unhandled in root state.

Hier ist ein Pastebin des gesamten logcat, das einige Protokolldateien für die Überprüfung der Datenintegrität im Format von: @ enthä

D/vvsLog﹕ ex:{extracted frame #} at {presentationTime} | de:{decoded frame #} at {presentationTime} | en:{encoded frame #} at {presentationTime}

Die presentationTime von codierten und decodierten Frames scheint zu schnell zu werden, während diese OMX_ErrorStreamCorrupt-Nachrichten angezeigt werden. Wenn sie anhalten, scheint die presentationTime für die decodierten und codierten Frames wieder "normal" zu sein und scheint auch mit dem tatsächlichen "guten" Audio übereinzustimmen, das ich am Anfang des Videos höre - das "gute" Audio von Ende der ursprünglichen Audiospur.

Ich hoffe, dass jemand mit viel mehr Erfahrung mit diesen Android-Multimedia-APIs auf niedriger Ebene, als ich habe, mir helfen kann, zu verstehen, warum dies geschieht. Denken Sie daran, ich bin mir bewusst, dass dieser Code nicht optimiert ist, in separaten Threads usw. ausgeführt wird. - Ich werde überarbeiten, um die Dinge zu bereinigen, sobald ich ein funktionierendes Beispiel für den grundlegenden Befehl extract-> decode-> edit-> encode- habe. > Mux-Prozess.

Vielen Dank