会議録音文字起こしスクリプト ¶

GoogleSpeechAPIにより.wavファイルの文字起こしを行うPython スクリプトプログラム¶

JupyterNoteBookにより記述されている。推奨する実行環境はVisualStudioCode、PythonのVersionは3.6以上。Conda環境でも構わない。必要なライブラリは適宜pipもしくはcondaによりインストールが必要。解析結果はtkinterライブラリにより呼び出されるファイルダイアログで指定したディレクトリに.csvファイルで保存される。

文字起こしレベルの議事録を作れという要求に対応し、かつ残業しないための道具

必要なライブラリのimport

In [ ]:

import wave
import struct
from scipy import int16
import numpy as np
import os
import math
import pandas as pd
import speech_recognition as sr
import tkinter.filedialog

wav_to_text関数は単純に入力された音声ファイルをGoogleSpeechAPIで解析するもの。

In [ ]:

def wav_to_text(wavfile):
    r = sr.Recognizer()
    
    with sr.AudioFile(wavfile) as source:
        audio = r.record(source)
    text = r.recognize_google(audio,language='ja-JP')
    
    print(os.path.abspath(wavfile))
    print(text)
    return text   

cut_wav関数は、GoogleSpeechAPIが長すぎる音声ファイルを拒否するため、事前にファイルを細切れにしてからwav_to_text関数で解析処理を行いcsvへ保存するもの

パラメータの説明

filename：解析したい音声ファイルのこと。実行時はウインドウダイアログで要求される。wavファイルでないとエラーが発生する。
save_file_name：解析した文字起こし結果を保存するcsvファイル名。実行時はウインドウダイアログで要求される。csvファイルでないとエラーが発生する。
time：元のファイルを何秒ずつ細切れにするかを秒数で指定する。認識精度に関係するがおおよそ30（秒）から60（秒）で設定すると良い。

In [ ]:

def cut_wav(filename,save_file_name,time):
    #細切れにしたファイルを保存するためのディレクトリを指定しているため、宜しく環境に合わせて’’内を書き換えること
    out_dir = os.path.abspath('hogehoge') 
    with wave.open(filename, mode='rb') as wr:
        #waveライブラリはPythonの標準ライブラリであるため、Documentを読めばおおよそのことが理解できる。
        ch = wr.getnchannels()
        width = wr.getsampwidth()
        fr = wr.getframerate()
        fn = wr.getnframes()
        
        total_time = fn / fr
        integer = math.floor(total_time*100) #math.floor(x):ｘの底（ｘ以下の最大の整数値）
        t = int(time*100)
        frames = int(ch * fr * t/100)
        num_cut = int(integer//t) #切り捨て除算
        data = wr.readframes(fn)
        X = np.frombuffer(data,dtype=int16) #numpyでのメモリから直接読み込むメソッド。音声ファイルをndarrayに入れるする場合はこのメソッドが高速化できるものになる。
        
        for i in range(num_cut + 1):
            outf = os.path.join(out_dir) + '/' + str(i) + '.wav'
                        
            if i > 0:
                start_cut = int(i*frames) - int(180000)
            else:
                start_cut = int(i*frames)
            
            end_cut = int(i*frames + frames)
            
            Y = X[start_cut:end_cut]
            outd = struct.pack('h' * len(Y),*Y)
            
            with wave.open(outf,mode='wb') as ww:
                ww.setnchannels(ch)
                ww.setsampwidth(width)
                ww.setframerate(fr)
                ww.writeframes(outd)

    list1 = [filename,'','']
    df = pd.DataFrame([list1])
    df.columns = ['no','音声ファイル','変換結果']

    for ii in range(num_cut + 1):
        # 保存した細切れの音声ファイルを順番に解析にかけている処理である。
        outf = os.path.join(out_dir) + '/' + str(ii) + '.wav'
        str_out = wav_to_text(outf)
        df.loc[ii] = [ii,str(ii)+'.wav',str_out]
        
    df.to_csv(save_file_name)

tkinterライブラリにより保存するファイル名と読み込む対象の音声ファイルをダイアログで聞き、処理を行う内容を記載している。

In [ ]:

file_type = [("","*.wav")]
i_dir = os.path.abspath('fugafuga')
f_name = tkinter.filedialog.askopenfilename(title='Please Select Targert a .wav file',filetypes = file_type,initialdir = i_dir)
save_target_file = tkinter.filedialog.asksaveasfilename(title='Please Input Save FileName',initialdir= i_dir)

time = 30
cut_wav(f_name,save_target_file,float(time))

cross_entropy_error

#ロードバイク #山形

GoogleSpeechAPIにより.wavファイルの文字起こしを行うPythonスクリプトプログラム

会議録音文字起こしスクリプト ¶

GoogleSpeechAPIにより.wavファイルの文字起こしを行うPython スクリプトプログラム¶

会議録音文字起こしスクリプト¶

GoogleSpeechAPIにより.wavファイルの文字起こしを行うPythonスクリプトプログラム¶

会議録音文字起こしスクリプト ¶

GoogleSpeechAPIにより.wavファイルの文字起こしを行うPython スクリプトプログラム¶