系列文章:
我們現(xiàn)在已經(jīng)能在安卓上播放視頻畫面了,但是聲音部分還是缺失的,這篇博客就來(lái)把視頻的音頻播放模塊也加上聂抢。
為了音頻和視頻可以分別解碼播放,我們需要對(duì)之前的代碼做重構(gòu),將媒體流的讀取和解碼解耦:
MediaReader從文件流中讀取出AVPacket交由VideoStreamDecoder和AudioStreamDecoder做視頻與音頻的解碼莽龟。我們?cè)贛ediaReader里加上線程安全機(jī)制,使得視頻和音頻可以分別在各自的工作線程中進(jìn)行解碼握础。
音頻分?(plane)與打包(packed)
解碼出來(lái)的AVFrame,它的data字段放的是視頻像素?cái)?shù)據(jù)或者音頻的PCM裸流數(shù)據(jù),linesize字段放的是對(duì)齊后的畫面行長(zhǎng)度或者音頻的分片長(zhǎng)度:
/**
* For video, size in bytes of each picture line.
* For audio, size in bytes of each plane.
*
* For audio, only linesize[0] may be set. For planar audio, each channel
* plane must be the same size.
*
* For video the linesizes should be multiples of the CPUs alignment
* preference, this is 16 or 32 for modern desktop CPUs.
* Some code requires such alignment other code can be slower without
* correct alignment, for yet other it makes no difference.
*
* @note The linesize may be larger than the size of usable data -- there
* may be extra padding present for performance reasons.
*/
int linesize[AV_NUM_DATA_POINTERS];
視頻相關(guān)的在之前的博客中有介紹,音頻的話可以看到它只有l(wèi)inesize[0]會(huì)被設(shè)置,如果有多個(gè)分片,每個(gè)分片的size都是相等的。
要理解這里的分片size,先要理解音頻數(shù)據(jù)的兩種存儲(chǔ)格式分?(plane)與打包(packed)魔熏。以常見的雙聲道音頻為例子,
分?存儲(chǔ)的數(shù)據(jù)左聲道和右聲道分開存儲(chǔ),左聲道存儲(chǔ)在data[0],右聲道存儲(chǔ)在data[1],他們的數(shù)據(jù)buffer的size都是linesize[0]甲捏。
打包存儲(chǔ)的數(shù)據(jù)按照LRLRLR...的形式交替存儲(chǔ)在data[0]中,這個(gè)數(shù)據(jù)buffer的size是linesize[0]磕蛇。
AVSampleFormat枚舉音頻的格式,帶P后綴的格式是分配存儲(chǔ)的:
AV_SAMPLE_FMT_U8P, ///< unsigned 8 bits, planar
AV_SAMPLE_FMT_S16P, ///< signed 16 bits, planar
AV_SAMPLE_FMT_S32P, ///< signed 32 bits, planar
AV_SAMPLE_FMT_FLTP, ///< float, planar
AV_SAMPLE_FMT_DBLP, ///< double, planar
不帶P后綴的格式是打包存儲(chǔ)的:
AV_SAMPLE_FMT_U8, ///< unsigned 8 bits
AV_SAMPLE_FMT_S16, ///< signed 16 bits
AV_SAMPLE_FMT_S32, ///< signed 32 bits
AV_SAMPLE_FMT_FLT, ///< float
AV_SAMPLE_FMT_DBL, ///< double
音頻數(shù)據(jù)的實(shí)際長(zhǎng)度
這里有個(gè)坑點(diǎn)備注里面也寫的很清楚了,linesize標(biāo)明的大小可能會(huì)大于實(shí)際的音視頻數(shù)據(jù)大小,因?yàn)榭赡軙?huì)有額外的填充命锄。
- @note The linesize may be larger than the size of usable data -- there
- may be extra padding present for performance reasons.
所以音頻數(shù)據(jù)實(shí)際的長(zhǎng)度需要用音頻的參數(shù)計(jì)算出來(lái):
int channelCount = audioStreamDecoder.GetChannelCount();
int bytePerSample = audioStreamDecoder.GetBytePerSample();
int size = frame->nb_samples * channelCount * bytePerSample;
音頻格式轉(zhuǎn)換
視頻之前的demo中已經(jīng)可以使用OpenGL播放,而音頻可以交給OpenSL來(lái)播放,之前我寫過一篇《OpenSL ES 學(xué)習(xí)筆記》詳細(xì)的使用細(xì)節(jié)我就不展開介紹了,直接將代碼拷貝來(lái)使用。
但是由于OpenSLES只支持打包的幾種音頻格式:
#define SL_PCMSAMPLEFORMAT_FIXED_8 ((SLuint16) 0x0008)
#define SL_PCMSAMPLEFORMAT_FIXED_16 ((SLuint16) 0x0010)
#define SL_PCMSAMPLEFORMAT_FIXED_20 ((SLuint16) 0x0014)
#define SL_PCMSAMPLEFORMAT_FIXED_24 ((SLuint16) 0x0018)
#define SL_PCMSAMPLEFORMAT_FIXED_28 ((SLuint16) 0x001C)
#define SL_PCMSAMPLEFORMAT_FIXED_32 ((SLuint16) 0x0020)
這里我們指的AudioStreamDecoder的目標(biāo)格式為AV_SAMPLE_FMT_S16,如果原始音頻格式不是它,則對(duì)音頻做轉(zhuǎn)碼:
audioStreamDecoder.Init(reader, audioIndex, AVSampleFormat::AV_SAMPLE_FMT_S16);
bool AudioStreamDecoder::Init(MediaReader *reader, int streamIndex, AVSampleFormat sampleFormat) {
...
bool result = StreamDecoder::Init(reader, streamIndex);
if (sampleFormat == AVSampleFormat::AV_SAMPLE_FMT_NONE) {
mSampleFormat = mCodecContext->sample_fmt;
} else {
mSampleFormat = sampleFormat;
}
if (mSampleFormat != mCodecContext->sample_fmt) {
mSwrContext = swr_alloc_set_opts(
NULL,
mCodecContext->channel_layout,
mSampleFormat,
mCodecContext->sample_rate,
mCodecContext->channel_layout,
mCodecContext->sample_fmt,
mCodecContext->sample_rate,
0,
NULL);
swr_init(mSwrContext);
// 雖然前面的swr_alloc_set_opts已經(jīng)設(shè)置了這幾個(gè)參數(shù)
// 但是用于接收的AVFrame不設(shè)置這幾個(gè)參數(shù)也會(huì)接收不到數(shù)據(jù)
// 原因是后面的swr_convert_frame函數(shù)會(huì)通過av_frame_get_buffer創(chuàng)建數(shù)據(jù)的buff
// 而av_frame_get_buffer需要AVFrame設(shè)置好這些參數(shù)去計(jì)算buff的大小
mSwrFrame = av_frame_alloc();
mSwrFrame->channel_layout = mCodecContext->channel_layout;
mSwrFrame->sample_rate = mCodecContext->sample_rate;
mSwrFrame->format = mSampleFormat;
}
return result;
}
AVFrame *AudioStreamDecoder::NextFrame() {
AVFrame *frame = StreamDecoder::NextFrame();
if (NULL == frame) {
return NULL;
}
if (NULL == mSwrContext) {
return frame;
}
swr_convert_frame(mSwrContext, mSwrFrame, frame);
return mSwrFrame;
}
這里我們使用swr_convert_frame進(jìn)行轉(zhuǎn)碼:
int swr_convert_frame(SwrContext *swr, // 轉(zhuǎn)碼上下文
AVFrame *output, // 轉(zhuǎn)碼后輸出到這個(gè)AVFrame
const AVFrame *input // 原始輸入AVFrame
);
這個(gè)方法要求輸入輸出的AVFrame都設(shè)置了channel_layout万哪、 sample_rate侠驯、format參數(shù),然后回調(diào)用av_frame_get_buffer為output創(chuàng)建數(shù)據(jù)buff:
/**
* ...
*
* Input and output AVFrames must have channel_layout, sample_rate and format set.
*
* If the output AVFrame does not have the data pointers allocated the nb_samples
* field will be set using av_frame_get_buffer()
* is called to allocate the frame.
* ...
*/
int swr_convert_frame(SwrContext *swr,
AVFrame *output, const AVFrame *input);
SwrContext為轉(zhuǎn)碼的上下文,通過swr_alloc_set_opts和swr_init創(chuàng)建,需要把轉(zhuǎn)碼前后的音頻channel_layout、 sample_rate奕巍、format信息傳入:
struct SwrContext *swr_alloc_set_opts(struct SwrContext *s,
int64_t out_ch_layout, enum AVSampleFormat out_sample_fmt, int out_sample_rate,
int64_t in_ch_layout, enum AVSampleFormat in_sample_fmt, int in_sample_rate,
int log_offset, void *log_ctx);
int swr_init(struct SwrContext *s);
視頻格式轉(zhuǎn)換
之前的demo里面我們判斷了視頻格式不為AV_PIX_FMT_YUV420P則直接報(bào)錯(cuò),這里我們仿照音頻轉(zhuǎn)換的例子,判斷原始視頻格式不為AV_PIX_FMT_YUV420P則使用sws_scale進(jìn)行格式轉(zhuǎn)換:
bool VideoStreamDecoder::Init(MediaReader *reader, int streamIndex, AVPixelFormat pixelFormat) {
...
bool result = StreamDecoder::Init(reader, streamIndex);
if (AVPixelFormat::AV_PIX_FMT_NONE == pixelFormat) {
mPixelFormat = mCodecContext->pix_fmt;
} else {
mPixelFormat = pixelFormat;
}
if (mPixelFormat != mCodecContext->pix_fmt) {
int width = mCodecContext->width;
int height = mCodecContext->height;
mSwrFrame = av_frame_alloc();
// 方式一,使用av_frame_get_buffer創(chuàng)建數(shù)據(jù)存儲(chǔ)空間,av_frame_free的時(shí)候會(huì)自動(dòng)釋放
mSwrFrame->width = width;
mSwrFrame->height = height;
mSwrFrame->format = mPixelFormat;
av_frame_get_buffer(mSwrFrame, 0);
// 方式二,使用av_image_fill_arrays指定存儲(chǔ)空間,需要我們手動(dòng)調(diào)用av_malloc陵霉、av_free去創(chuàng)建、釋放空間
// unsigned char* buffer = (unsigned char *)av_malloc(
// av_image_get_buffer_size(mPixelFormat, width, height, 16)
// );
// av_image_fill_arrays(mSwrFrame->data, mSwrFrame->linesize, buffer, mPixelFormat, width, height, 16);
mSwsContext = sws_getContext(
mCodecContext->width, mCodecContext->height, mCodecContext->pix_fmt,
width, height, mPixelFormat, SWS_BICUBIC,
NULL, NULL, NULL
);
}
return result;
}
AVFrame *VideoStreamDecoder::NextFrame() {
AVFrame *frame = StreamDecoder::NextFrame();
if (NULL == frame) {
return NULL;
}
if (NULL == mSwsContext) {
return frame;
}
sws_scale(mSwsContext, frame->data,
frame->linesize, 0, mCodecContext->height,
mSwrFrame->data, mSwrFrame->linesize);
return mSwrFrame;
}
sws_scale看名字雖然是縮放,但它實(shí)際上也會(huì)對(duì)format進(jìn)行轉(zhuǎn)換,轉(zhuǎn)換的參數(shù)由SwsContext提供:
struct SwsContext *sws_getContext(
int srcW, // 源圖像的寬
int srcH, // 源圖像的高
enum AVPixelFormat srcFormat, // 源圖像的格式
int dstW, // 目標(biāo)圖像的寬
int dstH, // 目標(biāo)圖像的高
enum AVPixelFormat dstFormat, // 目標(biāo)圖像的格式
int flags, // 暫時(shí)可忽略
SwsFilter *srcFilter, // 暫時(shí)可忽略
SwsFilter *dstFilter, // 暫時(shí)可忽略
const double *param // 暫時(shí)可忽略
);
sws_scale支持區(qū)域轉(zhuǎn)碼,可以如我們的demo將整幅圖像進(jìn)行轉(zhuǎn)碼,也可以將圖像切成多個(gè)區(qū)域分別轉(zhuǎn)碼,這樣方便實(shí)用多線程加快轉(zhuǎn)碼效率:
int sws_scale(
struct SwsContext *c, // 轉(zhuǎn)碼上下文
const uint8_t *const srcSlice[], // 源畫面區(qū)域像素?cái)?shù)據(jù),對(duì)應(yīng)源AVFrame的data字段
const int srcStride[], // 源畫面區(qū)域行寬數(shù)據(jù),對(duì)應(yīng)源AVFrame的linesize字段
int srcSliceY, // 源畫面區(qū)域起始Y坐標(biāo),用于計(jì)算應(yīng)該放到目標(biāo)圖像的哪個(gè)位置
int srcSliceH, // 源畫面區(qū)域行數(shù),用于計(jì)算應(yīng)該放到目標(biāo)圖像的哪個(gè)位置
uint8_t *const dst[], // 轉(zhuǎn)碼后圖像數(shù)據(jù)存儲(chǔ),對(duì)應(yīng)目標(biāo)AVFrame的data字段
const int dstStride[] // 轉(zhuǎn)碼后行寬數(shù)據(jù)存儲(chǔ),對(duì)應(yīng)目標(biāo)AVFrame的linesize字段
);
srcSlice和srcStride存儲(chǔ)了源圖像部分區(qū)域的圖像數(shù)據(jù),srcSliceY和srcSliceH告訴轉(zhuǎn)碼器這部分區(qū)域的坐標(biāo)范圍,用于計(jì)算偏移量將轉(zhuǎn)碼結(jié)果存放到dst和dstStride中伍绳。
例如下面的代碼就將一幅完整的圖像分成上下兩部分分別進(jìn)行轉(zhuǎn)碼:
int halfHeight = mCodecContext->height / 2;
// 轉(zhuǎn)碼上半部分圖像
uint8_t *dataTop[AV_NUM_DATA_POINTERS] = {
frame->data[0],
frame->data[1],
frame->data[2]
};
sws_scale(mSwsContext, dataTop,
frame->linesize, 0,
halfHeight,
mSwrFrame->data, mSwrFrame->linesize);
// 轉(zhuǎn)碼下半部分圖像
uint8_t *dataBottom[AV_NUM_DATA_POINTERS] = {
frame->data[0] + (frame->linesize[0] * halfHeight),
frame->data[1] + (frame->linesize[1] * halfHeight),
frame->data[2] + (frame->linesize[2] * halfHeight),
};
sws_scale(mSwsContext, dataBottom,
frame->linesize, halfHeight,
mCodecContext->height - halfHeight,
mSwrFrame->data, mSwrFrame->linesize);
AVFrame內(nèi)存管理機(jī)制
我們創(chuàng)建了一個(gè)新的AVFrame用于接收轉(zhuǎn)碼后的圖像:
mSwrFrame = av_frame_alloc();
// 方式一,使用av_frame_get_buffer創(chuàng)建數(shù)據(jù)存儲(chǔ)空間,av_frame_free的時(shí)候會(huì)自動(dòng)釋放
mSwrFrame->width = width;
mSwrFrame->height = height;
mSwrFrame->format = mPixelFormat;
av_frame_get_buffer(mSwrFrame, 0);
// 方式二,使用av_image_fill_arrays指定存儲(chǔ)空間,需要我們手動(dòng)調(diào)用av_malloc踊挠、av_free去創(chuàng)建、釋放buffer的空間
// int bufferSize = av_image_get_buffer_size(mPixelFormat, width, height, 16);
// unsigned char* buffer = (unsigned char *)av_malloc(bufferSize);
// av_image_fill_arrays(mSwrFrame->data, mSwrFrame->linesize, buffer, mPixelFormat, width, height, 16);
av_frame_alloc創(chuàng)建出來(lái)的AVFrame只是一個(gè)殼,我們需要為它提供實(shí)際存儲(chǔ)像素?cái)?shù)據(jù)和行寬數(shù)據(jù)的內(nèi)存空間,如上所示有兩種方法:
1.通過av_frame_get_buffer創(chuàng)建存儲(chǔ)空間,data成員的空間實(shí)際上是由buf[0]->data提供的:
LOGD("mSwrFrame --> buf : 0x%X~0x%X, data[0]: 0x%X, data[1]: 0x%X, data[2]: 0x%X",
mSwrFrame->buf[0]->data,
mSwrFrame->buf[0]->data + mSwrFrame->buf[0]->size,
mSwrFrame->data[0],
mSwrFrame->data[1],
mSwrFrame->data[2]
);
// mSwrFrame --> buf : 0x2E6E8AC0~0x2E753F40, data[0]: 0x2E6E8AC0, data[1]: 0x2E7302E0, data[2]: 0x2E742100
- 通過av_image_fill_arrays指定外部存儲(chǔ)空間,data成員的空間就是我們指的的外部空間,而buf成員是NULL:
LOGD("mSwrFrame --> buffer : 0x%X~0x%X, buf : 0x%X, data[0]: 0x%X, data[1]: 0x%X, data[2]: 0x%X",
buffer,
buffer + bufferSize,
mSwrFrame->buf[0],
mSwrFrame->data[0],
mSwrFrame->data[1],
mSwrFrame->data[2]
);
// FFmpegDemo: mSwrFrame --> buffer : 0x2DAE4DC0~0x2DB4D5C0, buf : 0x0, data[0]: 0x2DAE4DC0, data[1]: 0x2DB2A780, data[2]: 0x2DB3BEA0
而av_frame_free內(nèi)部會(huì)去釋放AVFrame里buf的空間,對(duì)于data成員它只是簡(jiǎn)單的把指針賦值為0,所以通過av_frame_get_buffer創(chuàng)建存儲(chǔ)空間,而通過av_image_fill_arrays指定外部存儲(chǔ)空間需要我們手動(dòng)調(diào)用av_free去釋放外部空間。
align
細(xì)心的同學(xué)可能還看到了av_image_get_buffer_size和av_image_fill_arrays都傳了個(gè)16的align,這里對(duì)應(yīng)的就是之前講的linesize的字節(jié)對(duì)齊,會(huì)填充數(shù)據(jù)讓linesize變成16效床、或者32的整數(shù)倍:
@param align the value used in src for linesize alignment
這里如果為0會(huì)填充失敗:
而為1不做填充會(huì)出現(xiàn)和實(shí)際解碼中的linesize不一致導(dǎo)致畫面異常:
av_frame_get_buffer則比較人性化,它推薦你填0讓它自己去判斷應(yīng)該按多少對(duì)齊:
* @param align Required buffer size alignment. If equal to 0, alignment will be
* chosen automatically for the current CPU. It is highly
* recommended to pass 0 here unless you know what you are doing.
完整代碼
完整的demo代碼已經(jīng)放到Github上,感興趣的同學(xué)可以下載來(lái)看看