MP3文件格式:
MP3的全稱是MPEG Audio Layer3,它是一種高效的計算機音頻編碼方案闭翩,它以較大的壓縮比將音頻文件轉(zhuǎn)換成較小的擴展名為.MP3的文件坡垫,基本保持原文件的音質(zhì)。MP3是ISO/MPEG標準的一部分咽白,ISO/MPEG標準描述了使用高性能感知編碼方案的音頻壓縮页响,此標準一直在不斷更新以滿足“質(zhì)高量小”的追求篓足,現(xiàn)已形成MPEGLayer1、Layer2闰蚕、Layer3三個音頻編碼解碼方案栈拖。MPEGLayer3壓縮率可達1:10至1:12,1M的MP3文件可播放1分鐘陪腌,而1分鐘CD音質(zhì)的WAV文件(44100Hz,16bit,雙聲道,60秒)要占用10M空間辱魁,這樣算來烟瞧,一張650M的MP3光盤播放時間應(yīng)在10小時以上诗鸭,而同樣容量的一張CD盤播放時間在70分鐘左右。
文件結(jié)構(gòu):
MP3的文件大體分為三部分:TAG_V2参滴、幀集合强岸、TAG_V1。
TAG_V2通常在文件的首部砾赔,包含了作者蝌箍、作者青灼、專輯等信息,長度不固定妓盲,有專門的頭部進行指示杂拨。
幀集合通常在文件的中間,緊挨TAG_V2悯衬,由一系列的音頻幀組成弹沽,每個幀包括4個字節(jié)的幀頭,和不定長的壓縮后的媒體筋粗。
TAG_V1通常在文件尾部策橘,包含了作者、作者娜亿、專輯等信息丽已,長度固定為128字節(jié)。
在以上三部分中买决,嚴格上說只有幀集合是必須的沛婴,因此出于知識的關(guān)聯(lián)性,我這里只對幀信息做一些基本的介紹督赤。
幀結(jié)構(gòu):
每個MP3音頻幀包括兩部分:4字節(jié)的幀頭 和 不定長的壓縮媒體瘸味。為了說明每個位的含義,下面我簡單表列如下:
--------------------------------------------------------------------------------------------------------
| 長度(位) | 位置(位) | 描述
--------------------------------------------------------------------------------------------------------
| 11 | 31~21 | 11位的幀同步標識够挂,全為1
--------------------------------------------------------------------------------------------------------
| 2 | 20~19 | MPEG音頻版本ID旁仿,對MP3來說取值11
--------------------------------------------------------------------------------------------------------
| 2 | 18~17 | Layer標識,對MP3來說取值01孽糖,表示Layer III
--------------------------------------------------------------------------------------------------------
| 1 | 16 | 校驗位枯冈,通常取值1,表示沒有校驗
--------------------------------------------------------------------------------------------------------
| 4 | 15~12 | 位率標識办悟,對MP3來說取值0101表示64kpbs尘奏,取值1001表示128kpbs,其他值請查表
--------------------------------------------------------------------------------------------------------
| 2 | 11~10 | 采樣率病蛉,對MP3來說取值:00-44.1K炫加,01-48K,10-32K铺然,11-保留
--------------------------------------------------------------------------------------------------------
| 1 | 9 | 填充位俗孝,0表示沒有填充,1表示有填充魄健,對MP3來說赋铝,表示增加1字節(jié)的填充
--------------------------------------------------------------------------------------------------------
| 1 | 8 | 私有位,可以被應(yīng)用程序用來做特殊用途
--------------------------------------------------------------------------------------------------------
| 2 | 7~6 | 聲道標識沽瘦,取值:00-立體聲革骨,01-聯(lián)合立體聲农尖,10-雙聲道,11-單聲道
--------------------------------------------------------------------------------------------------------
| 2 | 5~4 | 擴展模式(僅在聯(lián)合立體聲時有效)
--------------------------------------------------------------------------------------------------------
| 1 | 3 | 版權(quán)標識良哲,取值:0-無版權(quán)盛卡,1-有版權(quán)
--------------------------------------------------------------------------------------------------------
| 1 | 2 | 原創(chuàng)標識,取值:0-原創(chuàng)拷貝筑凫,1-原創(chuàng)
--------------------------------------------------------------------------------------------------------
| 2 | 1~0 | 強調(diào)標識窟扑,通常取值00
--------------------------------------------------------------------------------------------------------
幀頭之后就是壓縮后的媒體數(shù)據(jù),它的長度沒有的幀頭直接給出漏健,但是可以通過其他參數(shù)進行計算嚎货,計算公式如下:
幀長度(字節(jié)) = ((每幀采樣數(shù) / 采樣率)* 比特率 ) / 8 + 填充
對MP3來說,每幀采樣數(shù)通常為1152蔫浆,若比特率為128Kbps殖属,按公式計算幀長度為417字節(jié)。
相關(guān)API:
數(shù)據(jù)結(jié)構(gòu)介紹:
音頻樣體格式定義:
enum AVSampleFormat
{
AV_SAMPLE_FMT_NONE = -1,
AV_SAMPLE_FMT_U8, ///< unsigned 8 bits
AV_SAMPLE_FMT_S16, ///< signed 16 bits
AV_SAMPLE_FMT_S32, ///< signed 32 bits
AV_SAMPLE_FMT_FLT, ///< float
AV_SAMPLE_FMT_DBL, ///< double
AV_SAMPLE_FMT_U8P, ///< unsigned 8 bits, planar
AV_SAMPLE_FMT_S16P, ///< signed 16 bits, planar
AV_SAMPLE_FMT_S32P, ///< signed 32 bits, planar
AV_SAMPLE_FMT_FLTP, ///< float, planar
AV_SAMPLE_FMT_DBLP, ///< double, planar
AV_SAMPLE_FMT_S64, ///< signed 64 bits
AV_SAMPLE_FMT_S64P, ///< signed 64 bits, planar
AV_SAMPLE_FMT_NB ///< Number of sample formats. DO NOT USE if linking dynamically
};
包括兩個維度的定義:采樣精度和多通道排列瓦盛。采樣精度包括8位無符號洗显、16位有符號、32位有符號原环、32位浮點數(shù)等挠唆。多通道排列包括交錯排列和非交錯排列,即壓包格式和平面格式嘱吗。這兩種格式有處理單頻數(shù)據(jù)時非常重要玄组,對壓包格式來說,所有通道的樣本是A1B1A2B2...排列的谒麦,并且只使用一個連續(xù)的內(nèi)存俄讹,而對平面格式來說,每個通道單獨使用一塊內(nèi)存绕德,即排布格式為 A1A2...患膛,B1B2... 。
編碼器結(jié)構(gòu)定義:
struct AVCodec
{
const char *name;
const char *long_name;
enum AVMediaType type;
enum AVCodecID id;
const AVRational *supported_framerates;
const enum AVPixelFormat *pix_fmts;
const int *supported_samplerates;
const enum AVSampleFormat *sample_fmts;
const uint64_t *channel_layouts;
const char *bsfs;
const uint32_t *codec_tags;
int (*init)(struct AVCodecContext *);
/**
* Encode data to an AVPacket.
*
* @param avctx codec context
* @param avpkt output AVPacket
* @param[in] frame AVFrame containing the raw data to be encoded
* @param[out] got_packet_ptr encoder sets to 0 or 1 to indicate that a
* non-empty packet was returned in avpkt.
* @return 0 on success, negative error code on failure
*/
int (*encode2)(struct AVCodecContext *avctx, struct AVPacket *avpkt,
const struct AVFrame *frame, int *got_packet_ptr);
/**
* Decode picture or subtitle data.
*
* @param avctx codec context
* @param outdata codec type dependent output struct
* @param[out] got_frame_ptr decoder sets to 0 or 1 to indicate that a
* non-empty frame or subtitle was returned in
* outdata.
* @param[in] avpkt AVPacket containing the data to be decoded
* @return amount of bytes read from the packet on success, negative error
* code on failure
*/
int (*decode)(struct AVCodecContext *avctx, void *outdata,
int *got_frame_ptr, struct AVPacket *avpkt);
int (*close)(struct AVCodecContext *);
/**
* Encode API with decoupled frame/packet dataflow. This function is called
* to get one output packet. It should call ff_encode_get_frame() to obtain
* input data.
*/
int (*receive_packet)(struct AVCodecContext *avctx, struct AVPacket *avpkt);
/**
* Decode API with decoupled packet/frame dataflow. This function is called
* to get one output frame. It should call ff_decode_get_packet() to obtain
* input data.
*/
int (*receive_frame)(struct AVCodecContext *avctx, struct AVFrame *frame);
/**
* Flush buffers.
* Will be called when seeking
*/
void (*flush)(struct AVCodecContext *);
};
struct AVCodecContext
{
enum AVMediaType codec_type; /* see AVMEDIA_TYPE_xxx */
const struct AVCodec *codec;
enum AVCodecID codec_id; /* see AV_CODEC_ID_xxx */
unsigned int codec_tag;
int64_t bit_rate;
int global_quality;
int compression_level;
AVRational time_base;
int width, height;
int coded_width, coded_height;
int gop_size;
enum AVPixelFormat pix_fmt;
int max_b_frames;
int has_b_frames;
/* audio only */
int sample_rate; ///< samples per second
int channels; ///< number of audio channels
enum AVSampleFormat sample_fmt; ///< sample format
uint64_t channel_layout;
AVRational framerate;
enum AVPixelFormat sw_pix_fmt;
};
AVCodec這是一個抽象結(jié)構(gòu)耻蛇,所有的編解碼器實現(xiàn)均基于這個結(jié)構(gòu)進行踪蹬,實現(xiàn)各自的編解碼邏輯。在平常的應(yīng)用開發(fā)過程中臣咖,我們不需要過多關(guān)心該結(jié)構(gòu)實現(xiàn)者的細節(jié)跃捣,只要按相關(guān)的函數(shù)標準使用即可。
AVCodecContext定義了編解碼過程的上下文細節(jié)亡哄,如指定樣本格式枝缔、通道布局、采樣率等蚊惯,同時編解碼過程中的轉(zhuǎn)換流愿卸,該上下文也有保存。
編碼前后的幀結(jié)構(gòu)定義:
struct AVPacket
{
AVBufferRef *buf;
int64_t pts;
int64_t dts;
uint8_t *data;
int size;
int stream_index;
int64_t duration;
int64_t pos;
AVRational time_base;
};
struct AVFrame
{
#define AV_NUM_DATA_POINTERS 8
uint8_t *data[AV_NUM_DATA_POINTERS];
int linesize[AV_NUM_DATA_POINTERS];
int width, height;
int nb_samples;
int format;
int key_frame;
enum AVPictureType pict_type;
AVRational sample_aspect_ratio;
int64_t pts;
int64_t pkt_dts;
int64_t pkt_duration;
int64_t pkt_pos;
int quality; // (between 1 (good) and FF_LAMBDA_MAX (bad))
int sample_rate;
uint64_t channel_layout;
AVDictionary *metadata;
int channels;
int pkt_size;
};
AVPacket表示編碼后的包結(jié)構(gòu)截型,AVFrame表示編碼前的包結(jié)構(gòu)趴荸,編解碼過程是兩者互轉(zhuǎn),編碼是AVFrame->AVPacket宦焦,解碼反過來发钝。相關(guān)字段的用法請參見例子代碼。
函數(shù)介紹:
查找編碼器波闹,ID和名稱兩個版本:
const AVCodec *avcodec_find_encoder(enum AVCodecID id);
const AVCodec *avcodec_find_encoder_by_name(const char *name);
編碼器上下文操作酝豪,在調(diào)用avcodec_open2打開編碼上下文之前,需要先設(shè)置好相關(guān)參數(shù):
AVCodecContext *avcodec_alloc_context3(const AVCodec *codec);
int avcodec_open2(AVCodecContext *avctx, const AVCodec *codec, AVDictionary **options);
void avcodec_free_context(AVCodecContext **avctx);
分配和釋放編碼后的包結(jié)構(gòu):
AVPacket *av_packet_alloc(void);
void av_packet_free(AVPacket **pkt);
分配和釋放編碼前的幀結(jié)構(gòu):
AVFrame *av_frame_alloc(void);
void av_frame_free(AVFrame **frame);
根據(jù)編碼前幀結(jié)構(gòu)中設(shè)置的相關(guān)參數(shù)分配內(nèi)存:
int av_frame_get_buffer(AVFrame *frame, int align);
確保幀結(jié)構(gòu)中的內(nèi)存可寫:
int av_frame_make_writable(AVFrame *frame);
編碼的一對函數(shù)精堕,avcodec_send_frame向上下文發(fā)送原始數(shù)據(jù)孵淘,avcodec_receive_packet從上下文中接收編碼后的數(shù)據(jù)。注意歹篓,由于編碼器有自己的實現(xiàn)細節(jié)瘫证,并不過每次avcodec_send_frame調(diào)用后avcodec_receive_packet均能接收編碼結(jié)果,可能上下文中還持有部分緩存庄撮,結(jié)束前需要做尾包處理背捌。
int avcodec_send_frame(AVCodecContext *avctx, const AVFrame *frame);
int avcodec_receive_packet(AVCodecContext *avctx, AVPacket *avpkt);
代碼舉例:
下面這個例子演示了如何生成1K的純音音頻,并編碼生成不包含標簽的裸音頻MP3文件洞斯,代碼不過多解釋毡庆,重點部分已注釋說明,如下:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
extern "C"
{
#include <libavcodec/avcodec.h>
#include <libavutil/channel_layout.h>
}
// 編碼幀
bool encode(AVCodecContext* pCodecCTX, const AVFrame* pFrame, AVPacket* pPacket, FILE* pFile);
int main(int argc, char* argv[])
{
// 搜索指定的編碼器
const AVCodec* pCodec = avcodec_find_encoder_by_name("libmp3lame");
if (pCodec == NULL)
{
printf("not support mp3 encoder! \n");
return 0;
}
// 以下打印該編碼器支持的樣本格式烙如、通道布局扭仁、采樣率
printf("support sample formats: \n");
const enum AVSampleFormat* pSampleFMT = pCodec->sample_fmts;
while (pSampleFMT && *pSampleFMT)
{
printf("\t %d - %s \n", *pSampleFMT, av_get_sample_fmt_name(*pSampleFMT));
++pSampleFMT;
}
printf("support layouts: \n");
const uint64_t* pLayout = pCodec->channel_layouts;
while (pLayout && *pLayout)
{
int nb_channels = av_get_channel_layout_nb_channels(*pLayout);
char sBuf[128] = {0};
av_get_channel_layout_string(sBuf, sizeof(sBuf), nb_channels, *pLayout);
printf("\t %d - %s \n", nb_channels, sBuf);
++pLayout;
}
printf("support sample rates: \n");
const int* pSampleRate = pCodec->supported_samplerates;
while (pSampleRate && *pSampleRate)
{
printf("\t %dHz \n", *pSampleRate);
++pSampleRate;
}
// 根據(jù)編碼器初使化編碼器上下文結(jié)構(gòu),并設(shè)置相關(guān)默認值
AVCodecContext* pCodecCTX = avcodec_alloc_context3(pCodec);
if (pCodecCTX == NULL)
{
printf("alloc aac encoder context failed! \n");
return 0;
}
// 填寫音頻編碼的關(guān)鍵參數(shù):樣本格式厅翔、通道數(shù)及布局乖坠、采樣率
pCodecCTX->sample_fmt = AV_SAMPLE_FMT_S16P;
pCodecCTX->channel_layout = AV_CH_LAYOUT_STEREO;
pCodecCTX->channels = 2;
pCodecCTX->sample_rate = 44100;
// 以設(shè)定的參數(shù)打開編碼器
int rc = avcodec_open2(pCodecCTX, pCodec, NULL);
if (rc < 0)
{
char sError[128] = {0};
av_strerror(rc, sError, sizeof(sError));
printf("avcodec_open2() ret:[%d:%s] \n", rc, sError);
return -1;
}
AVPacket* pPacket = av_packet_alloc();
AVFrame* pFrame = av_frame_alloc();
// 設(shè)置音頻幀的相關(guān)參數(shù):樣本格式、通道數(shù)及布局刀闷、樣本數(shù)量
pFrame->format = pCodecCTX->sample_fmt;
pFrame->channel_layout = pCodecCTX->channel_layout;
pFrame->channels = pCodecCTX->channels;
pFrame->nb_samples = pCodecCTX->frame_size;
// 根據(jù)參數(shù)設(shè)置熊泵,申請幀空間
rc = av_frame_get_buffer(pFrame, 0);
if (rc < 0)
{
char sError[128] = {0};
av_strerror(rc, sError, sizeof(sError));
printf("av_frame_get_buffer() ret:[%d:%s] \n", rc, sError);
return -1;
}
FILE* pFile = fopen("test.mp3", "wb");
if (pFile == NULL)
{
printf("open test.mp3 failed! \n");
return -1;
}
// 計算1Khz的正弦波采樣步進
float fXStep = 2 * 3.1415926 * 1000 / pCodecCTX->sample_rate;
for (int n = 0; n < pCodecCTX->sample_rate * 10; )
{
av_frame_make_writable(pFrame);
// 下面以平面格式進行音頻格式組裝
for (int c = 0; c < pFrame->channels; ++c)
{
int16_t* pData = (int16_t*)pFrame->data[c];
for (int i = 0; i < pFrame->nb_samples; ++i)
{
pData[i] = (int16_t)(sin(fXStep * (n + i)) * 1000);
}
}
// 編碼生成壓縮格式
if (!encode(pCodecCTX, pFrame, pPacket, pFile))
{
printf("encode() fatal! \n");
exit(-1);
}
n += pFrame->nb_samples;
}
// 寫入尾包
encode(pCodecCTX, NULL, pPacket, pFile);
printf("test.mp3 write succuss! \n");
fclose(pFile);
av_frame_free(&pFrame);
av_packet_free(&pPacket);
avcodec_free_context(&pCodecCTX);
return 0;
}
// 編碼幀
bool encode(AVCodecContext* pCodecCTX, const AVFrame* pFrame, AVPacket* pPacket, FILE* pFile)
{
int rc = avcodec_send_frame(pCodecCTX, pFrame);
if (rc < 0)
{
char sError[128] = {0};
av_strerror(rc, sError, sizeof(sError));
printf("avcodec_send_frame() ret:[%d:%s] \n", rc, sError);
return false;
}
while (true)
{
rc = avcodec_receive_packet(pCodecCTX, pPacket);
if (rc < 0)
{
if (rc == AVERROR(EAGAIN) || rc == AVERROR_EOF)
return true;
char sError[128] = {0};
av_strerror(rc, sError, sizeof(sError));
printf("avcodec_receive_packet() ret:[%d:%s] \n", rc, sError);
return false;
}
fwrite(pPacket->data, 1, pPacket->size, pFile);
av_packet_unref(pPacket);
}
return true;
}
編譯:
g++ -o encode_mp3 encode_mp3.cpp -I/usr/local/ffmpeg/include -L/usr/local/ffmpeg/lib -lavformat -lavcodec -lavutil
運行,輸出如下:
$./encode_mp3
support sample formats:
7 - s32p
8 - fltp
6 - s16p
-1 - (null)
support layouts:
1 - mono
2 - stereo
support sample rates:
44100Hz
48000Hz
32000Hz
22050Hz
24000Hz
16000Hz
11025Hz
12000Hz
8000Hz
test.mp3 write succuss!
查看test.mp3媒體信息:
$ ffprobe.exe -i test.mp3
ffprobe version N-104465-g08a501946f Copyright (c) 2007-2021 the FFmpeg developers
built with gcc 11.2.0 (Rev6, Built by MSYS2 project)
configuration: --prefix=/usr/local/ffmpeg --enable-shared --enable-libmp3lame
libavutil 57. 7.100 / 57. 7.100
libavcodec 59. 12.100 / 59. 12.100
libavformat 59. 8.100 / 59. 8.100
libavdevice 59. 0.101 / 59. 0.101
libavfilter 8. 16.101 / 8. 16.101
libswscale 6. 1.100 / 6. 1.100
libswresample 4. 0.100 / 4. 0.100
[mp3 @ 000001f10bddcbc0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from 'test.mp3':
Duration: 00:00:10.03, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s