¿Cómo puedo evitar problemas de NAN?

Me estoy poniendoMean of empty slice advertencias de tiempo de ejecución. Cuando imprimo cuáles son mis variables (matrices numpy), varias de ellas contienennan valores. La advertencia de tiempo de ejecución está mirando la línea 58 como el problema. ¿Qué puedo cambiar para que funcione?

A veces el programa se ejecutará sin problemas. La mayoría de las veces no.

Este es un algoritmo K-Means from scratch que agrupa el conjunto de datos del iris. Primero solicita a los usuarios la cantidad de centroides que desean (grupos). Luego genera aleatoriamente dicho número de grupos en el rango dado a partir de los números en el archivo de texto cargado.

Tengo el valor de ruptura en la instrucción else para evitar bucles infinitos.

¿Es porque tengo números que van por debajo de cero cuando resta los Centroides de los puntos de datos en el archivo?

Error que obtengo cuando ejecuto:

How Many Centrouds? 3
Dimensionality of Data:  (150, 4)
Starting Centroiuds:
 [[ 1.4  7.9  0.2  3.4]
 [ 7.8  0.2  4.3  1.4]
 [ 5.7  6.9  3.   6.6]]
t0 :
 [[[-3.7  4.4 -1.2  3.2]
  [ 2.7 -3.3  2.9  1.2]
  [ 0.6  3.4  1.6  6.4]]

 [[-3.5  4.9 -1.2  3.2]
  [ 2.9 -2.8  2.9  1.2]
  [ 0.8  3.9  1.6  6.4]]

 [[-3.3  4.7 -1.1  3.2]
  [ 3.1 -3.   3.   1.2]
  [ 1.   3.7  1.7  6.4]]

 ..., 
 [[-5.1  4.9 -5.   1.4]
  [ 1.3 -2.8 -0.9 -0.6]
  [-0.8  3.9 -2.2  4.6]]

 [[-4.8  4.5 -5.2  1.1]
  [ 1.6 -3.2 -1.1 -0.9]
  [-0.5  3.5 -2.4  4.3]]

 [[-4.5  4.9 -4.9  1.6]
  [ 1.9 -2.8 -0.8 -0.4]
  [-0.2  3.9 -2.1  4.8]]]

Warning (from warnings module):
  File "C:\Python27\lib\site-packages\numpy\core\_methods.py", line 59
    warnings.warn("Mean of empty slice.", RuntimeWarning)
RuntimeWarning: Mean of empty slice.

Warning (from warnings module):
  File "C:\Python27\lib\site-packages\numpy\core\_methods.py", line 68
    ret, rcount, out=ret, casting='unsafe', subok=False)
RuntimeWarning: invalid value encountered in true_divide
---------------
Starting Centroids:

[[ 1.4  7.9  0.2  3.4]
 [ 7.8  0.2  4.3  1.4]
 [ 5.7  6.9  3.   6.6]]


Starting NewMeans:

[[        nan         nan         nan         nan]
 [ 5.84333333  3.054       3.75866667  1.19866667]
 [        nan         nan         nan         nan]]
Starting Centroids Now:

[[        nan         nan         nan         nan]
 [ 5.84333333  3.054       3.75866667  1.19866667]
 [        nan         nan         nan         nan]]


NewMeans now:
[[        nan         nan         nan         nan]
 [ 5.84333333  3.054       3.75866667  1.19866667]
 [        nan         nan         nan         nan]]

Código de Python:

import numpy as np
from pprint import pprint
import random
import sys
import warnings

arglist = sys.argv 

#UNCOMMENT BELOW IN FINAL PROGRAM
'''
NoOfCentroids = int(arglist[2])
dataPointsFromFile = np.array(np.loadtxt(sys.argv[1], delimiter = ','))
'''

dataPointsFromFile = np.array(np.loadtxt('iris.txt', delimiter = ','))

NoOfCentroids = input('How Many Centrouds? ')

dataRange = ([])

#UNCOMMENT BELOW IN FINAL PROGRAM
'''
with open(arglist[1]) as f:
    print 'Points in data set: ',sum(1 for _ in f)
'''
dataRange.append(round(np.amin(dataPointsFromFile),1))
dataRange.append(round(np.amax(dataPointsFromFile),1))
dataRange = np.asarray(dataRange)

dataPoints = np.array(dataPointsFromFile)
print 'Dimensionality of Data: ', dataPoints.shape

randomCentroids = []
data = ([])
templist = []
i = 0

while i<NoOfCentroids:
    for j in range(len(dataPointsFromFile[1,:])):
        cat = round(random.uniform(np.amin(dataPointsFromFile),np.amax(dataPointsFromFile)),1)
        templist.append(cat)
    randomCentroids.append(templist)
    templist = []
    i = i+1

centroids = np.asarray(randomCentroids)

def kMeans(array1, array2):
    ConvergenceCounter = 1
    keepGoing = True
    StartingCentroids = np.copy(centroids)
    print 'Starting Centroiuds:\n {}'.format(StartingCentroids)
    while keepGoing:      
        #--------------Find The new means---------#
        t0 = StartingCentroids[None, :, :] - dataPoints[:, None, :]
        print 't0 :\n {}'.format(t0)
        t1 = np.linalg.norm(t0, axis=-1)
        t2 = np.argmin(t1, axis=-1)
        #------Push the new means to a new array for comparison---------#
        CentroidMeans = []
        for x in range(len(StartingCentroids)):
            CentroidMeans.append(np.mean(dataPoints[t2 == [x]], axis=0))
        #--------Convert to a numpy array--------#
        NewMeans = np.asarray(CentroidMeans)
        #------Compare the New Means with the Starting Means------#
        if np.array_equal(NewMeans,StartingCentroids):
            print ('Convergence has been reached after {} moves'.format(ConvergenceCounter))
            print ('Starting Centroids:\n{}'.format(centroids))
            print ('Final Means:\n{}'.format(NewMeans))
            print ('Final Cluster assignments: {}'.format(t2))
            for x in xrange(len(StartingCentroids)):
                print ('Cluster {}:\n'.format(x)), dataPoints[t2 == [x]]
            for x in xrange(len(StartingCentroids)):
                print ('Size of Cluster {}:'.format(x)), len(dataPoints[t2 == [x]])
            keepGoing = False
        else:
            print 15*'-'
            ConvergenceCounter  = ConvergenceCounter +1
            print 'Starting Centroids:\n'
            print StartingCentroids
            print '\n'
            print 'Starting NewMeans:\n'
            print NewMeans
            StartingCentroids =np.copy(NewMeans)
            print 'Starting Centroids Now:\n'
            print StartingCentroids
            print '\n'
            print 'NewMeans now:'
            print NewMeans
            break


kMeans(centroids, dataPoints)

Respuestas a la pregunta(1)

Su respuesta a la pregunta