想搞個(gè)百度語(yǔ)音識(shí)別玩玩箫老,但人家要固定格式的音頻(關(guān)于百度語(yǔ)音識(shí)別的請(qǐng)查看官方文檔——百度語(yǔ)音識(shí)別SDK)挺益,于是就上網(wǎng)找呀找呀娩践,結(jié)果轉(zhuǎn)出來(lái)的要不就是聽不了損壞了,要不就是不能給百度識(shí)別就是說(shuō)轉(zhuǎn)的格式不正確畜埋。后來(lái)看到一篇國(guó)外的解決方案終于搞定。廢話不多說(shuō)畴蒲,先把完整代碼弄上來(lái)悠鞍,然后在廢話吧。
代碼
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import javax.sound.sampled.AudioFileFormat;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.UnsupportedAudioFileException;
import org.json.JSONObject;
import com.baidu.aip.speech.AipSpeech;
public class MP3ToWav {
/**
* mp3的字節(jié)數(shù)組生成wav文件
* @param sourceBytes
* @param targetPath
*/
public static boolean byteToWav(byte[] sourceBytes, String targetPath) {
if (sourceBytes == null || sourceBytes.length == 0) {
System.out.println("Illegal Argument passed to this method");
return false;
}
try (final ByteArrayInputStream bais = new ByteArrayInputStream(sourceBytes); final AudioInputStream sourceAIS = AudioSystem.getAudioInputStream(bais)) {
AudioFormat sourceFormat = sourceAIS.getFormat();
// 設(shè)置MP3的語(yǔ)音格式,并設(shè)置16bit
AudioFormat mp3tFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, sourceFormat.getSampleRate(), 16, sourceFormat.getChannels(), sourceFormat.getChannels() * 2, sourceFormat.getSampleRate(), false);
// 設(shè)置百度語(yǔ)音識(shí)別的音頻格式
AudioFormat pcmFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 16000, 16, 1, 2, 16000, false);
try (
// 先通過MP3轉(zhuǎn)一次模燥,使音頻流能的格式完整
final AudioInputStream mp3AIS = AudioSystem.getAudioInputStream(mp3tFormat, sourceAIS);
// 轉(zhuǎn)成百度需要的流
final AudioInputStream pcmAIS = AudioSystem.getAudioInputStream(pcmFormat, mp3AIS)) {
// 根據(jù)路徑生成wav文件
AudioSystem.write(pcmAIS, AudioFileFormat.Type.WAVE, new File(targetPath));
}
return true;
} catch (IOException e) {
System.out.println("文件轉(zhuǎn)換異常:" + e.getMessage());
return false;
} catch (UnsupportedAudioFileException e) {
System.out.println("文件轉(zhuǎn)換異常:" + e.getMessage());
return false;
}
}
/**
* 將文件轉(zhuǎn)成字節(jié)流
* @param filePath
* @return
*/
private static byte[] getBytes(String filePath) {
byte[] buffer = null;
try {
File file = new File(filePath);
FileInputStream fis = new FileInputStream(file);
ByteArrayOutputStream bos = new ByteArrayOutputStream(1000);
byte[] b = new byte[1000];
int n;
while ((n = fis.read(b)) != -1) {
bos.write(b, 0, n);
}
fis.close();
bos.close();
buffer = bos.toByteArray();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return buffer;
}
public static void main(String args[]) {
String filePath = "E:/data/storage/public/1111.mp3";
String targetPath = "E:/data/storage/public/2222.wav";
byteToWav(getBytes(filePath), targetPath);
AipSpeech client = new AipSpeech("XXXXXX", "XXXXXXXX", "XXXXXXXX");
JSONObject asrRes = client.asr(targetPath, "wav", 16000, null);
System.out.println(asrRes);
System.out.println(asrRes.get("result"));
}
}
嘮嗑
看代碼就知道這里引得jar都是jdk里面的咖祭,不用另外找掩宜,網(wǎng)上的方法就是要下其余jar麻煩。么翰。牺汤。另外還有json和百度的其實(shí)就是語(yǔ)音識(shí)別要用而已。順便將maven地址放上來(lái)
<dependency>
<groupId>com.baidu.aip</groupId>
<artifactId>java-sdk</artifactId>
<version>4.4.0</version>
</dependency>
要注意的是這里用到j(luò)dk7的特性浩嫌,就是將資源流使用完之后自己關(guān)閉并捕獲檐迟,(以前我也不知道有這樣的特性【遮臉!固该!】)
try (final ByteArrayInputStream bais = new ByteArrayInputStream(sourceBytes); final AudioInputStream sourceAIS = AudioSystem.getAudioInputStream(bais)) {
//一些處理...
}
這里設(shè)置了兩個(gè)格式轉(zhuǎn)換锅减,下面也進(jìn)行了兩次格式轉(zhuǎn)換,為什么呢伐坏?本來(lái)就是MP3了呀怔匣,還要轉(zhuǎn)成MP3???
其實(shí)這里就是一個(gè)坑
// 設(shè)置MP3的語(yǔ)音格式,并設(shè)置16bit
AudioFormat mp3tFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, sourceFormat.getSampleRate(), 16, sourceFormat.getChannels(), sourceFormat.getChannels() * 2, sourceFormat.getSampleRate(), false);
// 設(shè)置百度語(yǔ)音識(shí)別的音頻格式
AudioFormat pcmFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 16000, 16, 1, 2, 16000, false);
這里的sourceFormat 在debug下,可以看到有這么兩句unknown bits per sample, unknown frame size桦沉,所以跨格式轉(zhuǎn)換的時(shí)候就會(huì)出錯(cuò)每瞒,但只轉(zhuǎn)回自己就沒問題(上面格式的16和sourceFormat.getChannels() * 2,就是對(duì)應(yīng)的格式)纯露。經(jīng)過一次轉(zhuǎn)換之后剿骨,音頻流的格式信息就完整了,最后才能成功轉(zhuǎn)為wav.
AudioFormat sourceFormat = sourceAIS.getFormat();
//sourceFormat
//MPEG2L3 22050.0 Hz, unknown bits per sample, mono, unknown frame size, 38.28125 frames/second,