Gender Recognition by Voice | 05 | Neural Network

Neural Network Method Overview

The use of neural networks for machine learning is a new and upcoming space in data science technology. While neural networks themselves are not new, recent advancements have made them very attractive for moden problem solving. One of their strengths is their overall flexibility. This flexibility comes at a cost, however, in that successfully training a network requires large amounts of input data.

In today's world where large companies have access to innumerous amounts of data (pictures, web searches, conversations, etc.), it is very feasible to train neural networks to recognize features from our everyday lives. It is of no surprise, then, that Google is a technology leader in this space. This project notebook uses Google's Tensorflow neural network library to create a classification system for our gender recognition data set.

Neural Network

Example of neural network with 3 hidden layers.

Image from neuralnetworksanddeeplearning.com

The structure of a neural network is a series of layers, each with multiple nodes. The nodes of each layer are connected to all nodes of the previous layer. However, not all of these connections are equally strong. When training a neural network, some of these connections are made stronger or weaker as needed to statisfy the goals of the user. In our cause, we desire to minimize the error associated with the network's prediction (male vs female) as compared to the true classification. The specifics of how this is done will be discussed throughout this notebook.

Import Libraries

In [1]:
import obj
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Parameters
sns.set_style('whitegrid')  # set plot backgrounds to white

# Set graphics to appear inline with notebook code
%matplotlib inline

Import scaled data from previous notebook.

In [2]:
data_scale = obj.load('var/data_scale')

Convert data to matrix form

Unlike the machine learning libraries in Sci-Kit Learn, Google's Tensorflow does not understand data in the form of our powerful Pandas DataFrames, therefore our data must be converted to a more basic matrix/array format.

In [3]:
# Convert features to matrix
feat_scale_array = data_scale.drop('label',axis=1).as_matrix()

print(feat_scale_array[0])
[-4.04924806  0.4273553  -4.22490077 -2.57610164 -5.69360723 -0.21477826
  2.29330585  1.76294635 -0.03908279  0.4715753  -2.14121031 -4.04924806
 -1.81203825 -1.0979981   0.56595854 -1.5642046  -0.70840431 -1.43142165
 -1.41913712 -1.45477229]
In [4]:
# Convert labels to array
label_array = data_scale['label'].as_matrix()

# Show label array in string format
print(label_array[:4])
print(label_array[-4:])
['male' 'male' 'male' 'male']
['female' 'female' 'female' 'female']

Convert label strings to binary targets

Further, the neural network system does not work directly with strings. Predictions from a neural network are not verbal declarations, but are very simply the "intensity" of the output nodes. In our specific case, we are trying to classify voices as either male or female. Therefore, our output layer will consist of simply two nodes, one to respresent a male classification, and one to represent a female classification. In all cases, both nodes will "light up", but in a well-trained model, one will be much "brighter" than the other, thus giving us our prediction.

To make our targets compatible with our neural network, we will encode the "male" string to match up with the first output node, and the "female" string to match up with the second. Notice how the output compares to the string output above.

In [5]:
#  "male"  --> [0,1]
# "female" --> [1,0]

target_array = []

for label in label_array:
    if label == 'male':
        target = [0,1]
    else:
        target = [1,0]
    target_array.append(target)
    
# Show target array in binary format
print(target_array[:4])
print(target_array[-4:])
[[0, 1], [0, 1], [0, 1], [0, 1]]
[[1, 0], [1, 0], [1, 0], [1, 0]]

Split data into testing and training sets

As always, it is necessary to separate our data into a set that will train our network, and a separate set to test the final fit.

In [6]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(feat_scale_array, target_array,
                                                    test_size=0.33)

Build neural network

The parameters used for the network are based on experience and trial-and-error. There are no hard and fast rules for building a neural network, but various guidelines can be used to get started.

Define network parameters

In [7]:
import tensorflow as tf

# Learning Parameters
rate   =  0.025  # training rate
epochs = 99      # number of full training cycles
batch  = 10      # number of data points to train per batch

# Network Parameters
n_hidden_1 = 10  # number of nodes in hidden layer 1
n_hidden_2 = 10  # number of nodes in hidden layer 2
n_hidden_3 = 10  # number of nodes in hidden layer 3
n_input    = len(feat_scale_array[0])  # 20
n_classes  = 2
n_samples  = len(X_train)  # 2122

Create network model

The mathematics behind neural networks are based on matrix arithmetic and manipulation. But against typical expectations, Tensorflow systems do not carry out their actions when they are defined. Instead, they are put into place and only run when called within a Tensorflow session.

When building our system, we will use Tensorflow's linear algebra methods to create our system, meaning that all the interactions will be defined upfront, but will not be executed until we call them to action later on.

In [8]:
def multilayer_perceptron(X, weights, biases):
    
    # Hidden Layer 1
    layer1 = tf.matmul(X, weights['w1'])
    layer1 = tf.add(layer1, biases['b1'])
    layer1 = tf.nn.relu(layer1)
    
    # Hidden Layer 2
    layer2 = tf.matmul(layer1, weights['w2'])
    layer2 = tf.add(layer2, biases['b2'])
    layer2 = tf.nn.relu(layer2)
    
    # Hidden Layer 3
    layer3 = tf.matmul(layer2, weights['w3'])
    layer3 = tf.add(layer3, biases['b3'])
    layer3 = tf.nn.relu(layer3)
    
    # Output Layer
    out = tf.matmul(layer3, weights['out'])
    out = tf.add(out, biases['out'])
    
    return out

Define inputs

Because we are only building the system and not asking it to perform work, our input values will be placeholders. We will define their type and shape upfront so that Tensorflow knows what it is working with.

In [9]:
# Define input and output
X = tf.placeholder('float', [None,n_input])
y = tf.placeholder('float', [None,n_classes])

# Define weights used to initialize the neural node connections
weights = {
    'w1' : tf.Variable(tf.random_normal([n_input,    n_hidden_1])),
    'w2' : tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'w3' : tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3])),
    'out': tf.Variable(tf.random_normal([n_hidden_3, n_classes ]))
}

# Define some initial biases for the network to work against
biases = {
    'b1' : tf.Variable(tf.random_normal([n_hidden_1])),
    'b2' : tf.Variable(tf.random_normal([n_hidden_2])),
    'b3' : tf.Variable(tf.random_normal([n_hidden_3])),
    'out': tf.Variable(tf.random_normal([n_classes ]))
}

# Place inputs into model
model = multilayer_perceptron(X, weights, biases)

Define cost function and minimization function

The cost function is how our system knows how well or how poorly it is achieving its goal. A high cost signifies a large discrepancy between our output nodes' prediction and our target. A low cost signifies more agreement between the node outputs and our targets.

Because we want the cost function to be as small as possible, we will shape our network with a minimization function. A minimization function propogates through the network and changes the weighted strengths of the neural node connections in an effort to create a better predictive system.

In [10]:
# Define cost function
f_cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(model, y))

# Define minimization function
f_optimizer = tf.train.AdamOptimizer(learning_rate=rate).minimize(f_cost)

Run Network

Now that our network structure is built, our inputs are defined, and our training functions are defined, we are ready to run a Tensorflow session to put these pieces into motion.

In [11]:
# Define startup task
init = tf.initialize_all_variables()

# Instantiate session
s = tf.InteractiveSession()

# Start session
s.run(init)

# TRAIN NETWORK
for epoch in range(epochs):
   
    cost_avg = 0.
    
    batch_total = int(n_samples/batch)
    
    count = 0
    for batch_i in range(batch_total):
        X_batch_i = X_train[count : count+batch]
        y_batch_i = y_train[count : count+batch]
        count += batch
        
        _, cost = s.run([f_optimizer,f_cost], feed_dict={X:X_batch_i, y:y_batch_i})
        
        cost_avg += cost / batch_total
        
    print('Epoch {:2}:  cost = {:.20f}'.format(epoch+1, cost_avg))

print('\n')
print('Model trained successfully!')
Epoch  1:  cost = 2.16148978066655539365
Epoch  2:  cost = 0.31210387504772169054
Epoch  3:  cost = 0.23612487749715185825
Epoch  4:  cost = 0.19309472637943611573
Epoch  5:  cost = 0.14380158780909366167
Epoch  6:  cost = 0.09920659690705750655
Epoch  7:  cost = 0.08333123434406514152
Epoch  8:  cost = 0.08224842283998354264
Epoch  9:  cost = 0.07705833141135325082
Epoch 10:  cost = 0.07115213437425864373
Epoch 11:  cost = 0.06417291725963726601
Epoch 12:  cost = 0.06434935897320968878
Epoch 13:  cost = 0.07617914769883245285
Epoch 14:  cost = 0.07582870655583837105
Epoch 15:  cost = 0.05558912286515409890
Epoch 16:  cost = 0.06915610871001152959
Epoch 17:  cost = 0.06202949113024795041
Epoch 18:  cost = 0.05596385894888853546
Epoch 19:  cost = 0.04860098341330373189
Epoch 20:  cost = 0.05916827563933452216
Epoch 21:  cost = 0.05710971891129965755
Epoch 22:  cost = 0.05592544540552364662
Epoch 23:  cost = 0.05304792053782295819
Epoch 24:  cost = 0.06591947936451049428
Epoch 25:  cost = 0.06550146132304636437
Epoch 26:  cost = 0.05881894850080983395
Epoch 27:  cost = 0.06064528595869811411
Epoch 28:  cost = 0.05514325070781175842
Epoch 29:  cost = 0.05906365903921918098
Epoch 30:  cost = 0.07083916359276355712
Epoch 31:  cost = 0.05029711077326119023
Epoch 32:  cost = 0.05334434993582540252
Epoch 33:  cost = 0.04999976162392830853
Epoch 34:  cost = 0.06224975353640029019
Epoch 35:  cost = 0.08388706640599272180
Epoch 36:  cost = 0.04918530480059088439
Epoch 37:  cost = 0.04618652272935129532
Epoch 38:  cost = 0.04565292737697741743
Epoch 39:  cost = 0.04247307279197036489
Epoch 40:  cost = 0.04615165952684812528
Epoch 41:  cost = 0.04500292578492643553
Epoch 42:  cost = 0.04713881771898230377
Epoch 43:  cost = 0.07941337382471617967
Epoch 44:  cost = 0.04767651537888428076
Epoch 45:  cost = 0.03702091656444823908
Epoch 46:  cost = 0.06378310072168336120
Epoch 47:  cost = 0.05424597751717433886
Epoch 48:  cost = 0.05237118259160547518
Epoch 49:  cost = 0.04449399770313872821
Epoch 50:  cost = 0.06902149536557762588
Epoch 51:  cost = 0.04427598008611710600
Epoch 52:  cost = 0.04515372661418890626
Epoch 53:  cost = 0.04185347973148571127
Epoch 54:  cost = 0.03638945478225921043
Epoch 55:  cost = 0.04382604312802491908
Epoch 56:  cost = 0.06809841885550758922
Epoch 57:  cost = 0.11563730735663757532
Epoch 58:  cost = 0.06644818862606853560
Epoch 59:  cost = 0.05900326753624730858
Epoch 60:  cost = 0.04755461791971795499
Epoch 61:  cost = 0.04905726239001025057
Epoch 62:  cost = 0.04634823400816751038
Epoch 63:  cost = 0.05169653943814039171
Epoch 64:  cost = 0.05251949699147293038
Epoch 65:  cost = 0.04556430026826525231
Epoch 66:  cost = 0.04545543274705346470
Epoch 67:  cost = 0.04514173506205001324
Epoch 68:  cost = 0.03986249834613113385
Epoch 69:  cost = 0.03791345025516765915
Epoch 70:  cost = 0.05155434202469798277
Epoch 71:  cost = 0.04580700491929598495
Epoch 72:  cost = 0.04163345531511755682
Epoch 73:  cost = 0.09235923355164506188
Epoch 74:  cost = 0.05740519414341785365
Epoch 75:  cost = 0.05609162813577287660
Epoch 76:  cost = 0.04740444222832027044
Epoch 77:  cost = 0.04433827095228896203
Epoch 78:  cost = 0.05967677120590406115
Epoch 79:  cost = 0.05327547462361890279
Epoch 80:  cost = 0.03734860359352386999
Epoch 81:  cost = 0.03340157367047402043
Epoch 82:  cost = 0.03613850623342595297
Epoch 83:  cost = 0.04214318546040157765
Epoch 84:  cost = 0.03461601473779436777
Epoch 85:  cost = 0.15920378924360523154
Epoch 86:  cost = 0.05494406512721236591
Epoch 87:  cost = 0.09672369605539068993
Epoch 88:  cost = 0.04437301319906324787
Epoch 89:  cost = 0.04008231104932064026
Epoch 90:  cost = 0.03908040427458887162
Epoch 91:  cost = 0.04262242066150791209
Epoch 92:  cost = 0.03758141580622819528
Epoch 93:  cost = 0.04189689139852101341
Epoch 94:  cost = 0.04254094385904123171
Epoch 95:  cost = 0.03579114862666478436
Epoch 96:  cost = 0.03452778989238582813
Epoch 97:  cost = 0.03643744121422477084
Epoch 98:  cost = 0.03584829566538550588
Epoch 99:  cost = 0.04579505621469971699


Model trained successfully!

Evaluate performance

With our network trained, it is time to evaluate how successfully it can classify our testing set of data.

In [12]:
# Define success
correct_target = tf.equal(tf.argmax(model, 1), tf.argmax(y, 1))
correct_target = tf.cast(correct_target, 'float')

# Define accuracy calculation
accuracy = tf.reduce_mean(correct_target)

# Find and print model accuracy
print('Accuracy is {:.2f}%'.format(100*accuracy.eval({X:X_test, y:y_test})))
Accuracy is 96.94%

Conclusion

Given the complexity of creating a neural network, one might expect unbeatable results for any given problem. However, as mentioned at the beginning of this notebook, the strength of neural networks is classifying very large and very complicated sets of data. Our data is fairly small, and not entirely complex. If we were dealing with a much more complicated set of data, then our other methodologies would likely perform below our standards and the neural network would prove indispensable.

But given our relatively small set of data, the neural network proved moderately successful.