Dice's Coefficient
Posted On viernes, 27 de marzo de 2009 at en 15:20 by Jaime LaraDice's coefficient
Formula :
- S = ( 2 * Nt ) / ( Nx + Ny )
AWK code:
#Obtain the number of Bigrams in the intersection between Word1 and Word2
function Intersection(BigramsWord1,BigramsWord2) {
Nt=0;
for (Bigram in BigramsWord1) {
if (Bigram in BigramsWord2) {
if (BigramsWord1[Bigram] <= BigramsWord2[Bigram])
Nt = Nt + BigramsWord1[Bigram]
else
Nt = Nt + BigramsWord2[Bigram];
}
}
return Nt;
}
#Obtain the Bigrams and the number of Bigrams per Word
function ObtainBigrams(Word,LettersWord,BigramsWord) {
Bigram = "";
Cardinality = 0;
NumberBigrams = 0;
WordLength=length(Word);
for (i=1; i<=WordLength; i++) {
Bigram = Bigram""LettersWord[i];
Cardinality++;
if (Cardinality == 2) {
Cardinality = 1;
NumberBigrams++;
BigramsWord[Bigram]++;
Bigram=LettersWord[i];
}
}
return NumberBigrams;
}
#Obtain the DICE's coefficient between two words
function DICE (Word1,Word2) {
split(Word1, LettersWord1,"");
split(Word2, LettersWord2,"");
Nx = ObtainBigrams(Word1,LettersWord1,BigramsWord1);
Ny = ObtainBigrams(Word2,LettersWord2,BigramsWord2);
Nt = Intersection(BigramsWord1,BigramsWord2);
if ((Nx+Ny) > 0)
return (2*Nt)/(Nx+Ny)
else
return 0;
}
BEGIN{
FS=",";
}
{
Word1=$1;
Word2=$2;
print DICE(Word1,Word2);
}
Example: