Ein nächster Nachbar mit awk

Question

Jun 18, 2016, 02:49 PM

Ein nächster Nachbar mit awk

Das ist, was ich versuche, mit AWK-Sprache zu tun. Ich habe hauptsächlich mit Schritt 2 ein Problem. Ich habe einen Beispieldatensatz angezeigt, aber der Originaldatensatz besteht aus 100 Feldern und 2000 Datensätzen.

Algorithmu

1) initialisiere Genauigkeit = 0

2) für jeden Datensatz r

     Find the closest other record, o, in the dataset using distance formula

Um den nächsten Nachbarn für r0 zu finden, muss ich r0 mit r1 bis r9 vergleichen und wie folgt rechnen:square (abs (r0.c1 - r1.c1)) + square (abs (r0.c2 - r1.c2)) + ... + square (abs (r0.c5 - r1.c5)) und diese Entfernung speichern.

3) Vergleichen Sie die c6-Werte eines mit dem Mindestabstand. Wenn c6-Werte gleich sind, erhöhen Sie die Genauigkeit um 1.

Nachdem der Vorgang für alle Datensätze wiederholt wurde.

4) Abschließend erhalten Sie den 1nn-Genauigkeitsprozentsatz durch (Genauigkeit / total_records) * 100;

Sample Dataset

        c1   c2   c3   c4   c5   c6  --> Columns
  r0  0.19 0.33 0.02 0.90 0.12 0.17  --> row1 & row7 nearest neighbour in c1
  r1  0.34 0.47 0.29 0.32 0.20 1.00      and same values in c6(0.3) so ++accuracy
  r2  0.37 0.72 0.34 0.60 0.29 0.15 
  r3  0.43 0.39 0.40 0.39 0.32 0.27 
  r4  0.27 0.41 0.08 0.19 0.10 0.18 
  r5  0.48 0.27 0.68 0.23 0.41 0.25 
  r6  0.52 0.68 0.40 0.75 0.75 0.35 
  r7  0.55 0.59 0.61 0.56 0.74 0.76 
  r8  0.04 0.14 0.03 0.24 0.27 0.37 
  r9  0.39 0.07 0.07 0.08 0.08 0.89

Cod

BEGIN   {
            #initialize accuracy and total_records
            accuracy = 0;
            total_records = 10;
        }


NR==FNR {    # Loop through each record and store it in an array
                for (i=1; i<=NF; i++) 
                {
                     records[i]=$i;
                }
            next             
        }

        {   # Re-Loop through the file and compare each record from the array with each record in a file    
              for(i=1; i <= length(records); i++)
              {
                   for (j=1; j<=NF; j++) 
                   {      # here I need to get the difference of each field of the record[i] with each all the records, square them and sum it up. 
                          distance[j] = (records[i] - $j)^2;
                   }
               #Once I have all the distance, I can simply compare the values of field_6 for the record with least distance.
              if(min(distance[j]))
              {
                  if(records[$6] == $6)
                  {
                        ++accuracy;
                  } 
              }
       }
END{
     percentage = 100 * (accuracy/total_records); 
     print percentage;
}