5 months ago
Logistic Regression is a popular classification algorithm. In this episode we discuss how it can be used to determine if an audio clip represents one of two given speakers. It assumes an output variable (isLinhda) is a linear combination of available features, which are spectral bands in the discussion on this episode.
The model takes this form:
The algorithm uses maximum likelihood to find the optimal values for the parameters . The left side of that equation uses the logistic function to transform the output, and a threshold is established to define the classification.
The figures below are referenced during the episode.
Keep an eye on the dataskeptic.com blog this week as we post more details about this project.
%matplotlib inline import matplotlib.pyplot as plt import numpy as np import math from scipy.fftpack import fft import os import scipy.io.wavfile as wav
x = np.arange(-6, 6, .01) y = map(lambda v: 1.0 / (1.0 + math.exp(-1 * v)), x) plt.figure(figsize=(10,5)) plt.plot(x,y) plt.title('Logistic curve', fontsize=18) plt.xlabel('X value (any real number)', fontsize=16) plt.ylabel('Logistic transformation [0, 1]', fontsize=16) plt.show()
dname = '../../methods/2017/audio/who-speaking-raw/'
def show_waveform(fname, offset): rate, data = wav.read(fname) plt.figure(figsize=(10,3)) plt.plot(data[offset*rate:int(offset+3.5)*rate]) plt.show()
The two plots below should look familiar to most readers. They are waveforms from audio of Linh Da and Kyle speaking. Kyle made Linh Da predict which is which in the episode. You can see from the filenames who the actual speaker is, but this test is a bit unfair as determining a speaker from waveforms, while possible, is a formitable challenge for typical speech.
show_waveform(dname + 'linhda.wav', 0)
show_waveform(dname + 'kyle.wav', 1)
The plots below are samples of the frequency spectra for hosts Linh Da and Kyle. While you might not be able to explicitly tell who is who from these plots, can you determine that each is generated from a unique and distinct source? If so, your eye is discriminating enough to solve this problem, so we expect a logistic regression should be also.
def get_spectra(fname, start, stop, window): rate, data = wav.read(fname) bands =  for i in np.arange(start, stop, window): a = data[int(i*rate):int((i+.5)*rate)] b=[(ele/2**8.)*2-1 for ele in a] c = fft(b) d = len(c)/2 f = abs(c[:(d-1)]) bands.append(f[0:1000]) nbands = np.array(bands) i=0 for nb in nbands: nbands[i] = map(lambda x: x**.5, nb) i+=1 return nbands
lbands = get_spectra(dname + 'linhda.wav', 0, 800, .25)
kbands = get_spectra(dname + 'kyle.wav', 0, 800, .25)
plt.imshow(lbands.transpose(), cmap='cool', interpolation='nearest') plt.show()
plt.imshow(kbands.transpose(), cmap='cool', interpolation='nearest') plt.show()
From the transformed frequency data, we can bin ranges of values and use those numeric inputs as our features for the logistic regression model. Essentially, we are asking it to find the best each band such that when we do a linear combination (sum) of each band times its weight (the parameter), we can take that value, apply the logistic transformation, and then compare against a cut off point to do our classification.