Parent

Included Modules

Statsample::Bivariate::Tetrachoric

Compute tetrachoric correlation.

The tetrachoric correlation is a measure of bivariate association arising when both observed variates are categorical variables that result from dichotomizing the two undelying continuous variables (Drasgow, 2006). The tetrachoric correlation is a good way to measure rater agreement (Uebersax, 2006)

This class uses Brown (1977) algorithm. You can see FORTRAN code on lib.stat.cmu.edu/apstat/116

Usage

With two variables x and y on a crosstab like this:

        -------------
        | y=0 | y=1 |
        -------------
  x = 0 |  a  |  b  |
        -------------
  x = 1 |  c  |  d  |
        -------------

The code will be

  tc=Statsample::Bivariate::Tetrachoric.new(a,b,c,d)
  tc.r # correlation
  tc.se # standard error
  tc.threshold_y # threshold for y variable
  tc.threshold_x # threshold for x variable

Reference:

Constants

RequerimentNotMeet
TWOPI
SQT2PI
RLIMIT
RCUT
UPLIM
CONST
CHALF
CONV
CITER
NITER
X
W

Attributes

r[R]
name[RW]

Public Class Methods

new(a,b,c,d) click to toggle source

Creates a new tetrachoric object for analysis

     # File lib/statsample/bivariate/tetrachoric.rb, line 141
141:       def initialize(a,b,c,d)
142:         @a,@b,@c,@d=a,b,c,d
143:         @name=_("Tetrachoric correlation")
144:         #
145:         #       CHECK IF ANY CELL FREQUENCY IS NEGATIVE
146:         #
147:         raise "All frequencies should be positive" if  (@a < 0 or @b < 0 or @c < 0  or @d < 0)
148:         compute
149:       end
new_with_matrix(m) click to toggle source

Creates a Tetrachoric object based on a 2x2 Matrix.

    # File lib/statsample/bivariate/tetrachoric.rb, line 88
88:       def self.new_with_matrix(m)
89:         Tetrachoric.new(m[0,0], m[0,1], m[1,0],m[1,1])
90:       end
new_with_vectors(v1,v2) click to toggle source

Creates a Tetrachoric object based on two vectors. The vectors are dichotomized previously.

     # File lib/statsample/bivariate/tetrachoric.rb, line 93
 93:       def self.new_with_vectors(v1,v2)
 94:         v1a, v2a=Statsample.only_valid(v1,v2)
 95:         v1a=v1a.dichotomize
 96:         v2a=v2a.dichotomize
 97:         raise RequerimentNotMeet, "v1 have only 0" if v1a.factors==[0]
 98:         raise RequerimentNotMeet, "v2 have only 0" if v2a.factors==[0]
 99:         a,b,c,d = 0,0,0,0
100:         v1a.each_index{|i|
101:           x,y=v1a[i],v2a[i]
102:           a+=1 if x==0 and y==0
103:           b+=1 if x==0 and y==1
104:           c+=1 if x==1 and y==0
105:           d+=1 if x==1 and y==1
106:         }
107:         Tetrachoric.new(a,b,c,d)
108:       end

Public Instance Methods

se() click to toggle source

Standard error

     # File lib/statsample/bivariate/tetrachoric.rb, line 110
110:       def se
111:         @sdr
112:       end
threshold_x() click to toggle source

Threshold for variable x (rows) Point on gauss curve under X rater select cases

     # File lib/statsample/bivariate/tetrachoric.rb, line 115
115:       def threshold_x
116:         @zab
117:       end
threshold_y() click to toggle source

Threshold for variable y (columns) Point on gauss curve under Y rater select cases

     # File lib/statsample/bivariate/tetrachoric.rb, line 122
122:       def threshold_y
123:         @zac
124:       end

Private Instance Methods

calculate_cosine() click to toggle source
     # File lib/statsample/bivariate/tetrachoric.rb, line 416
416:       def calculate_cosine
417:         #
418:         #        WHEN ALL MARGINALS ARE EQUAL THE COSINE FUNCTION IS USED
419:         #
420:         @rr = -Math::cos(TWOPI * @probaa)
421:         @itype = 2
422:         calculate_sdr
423:       end
compute() click to toggle source

Compute the tetrachoric correlation. Called on object creation.

     # File lib/statsample/bivariate/tetrachoric.rb, line 153
153:       def compute
154: 
155:         #
156:         # INITIALIZATION
157:         #
158:         @r = 0
159:         sdzero = 0
160:         @sdr = 0
161:         @itype = 0
162:         @ifault = 0
163: 
164:         #
165:         #       CHECK IF ANY FREQUENCY IS 0.0 AND SET kdelta
166:         #
167:         @kdelta = 1
168:         delta  = 0
169:         @kdelta  = 2 if (@a == 0 or @d == 0)
170:         @kdelta += 2 if (@b == 0 or @c == 0)
171:         #
172:         #        kdelta=4 MEANS TABLE HAS 0.0 ROW OR COLUMN, RUN IS TERMINATED
173:         #
174: 
175:         raise RequerimentNotMeet, "Rows and columns should have more than 0 items" if @kdelta==4
176: 
177:         #      GOTO (4, 1, 2 , 92), kdelta
178:         #
179:         #        delta IS 0.0, 0.5 OR -0.5 ACCORDING TO WHICH CELL IS 0.0
180:         #
181: 
182:         if(@kdelta==2)
183:           # 1
184:           delta=0.5
185:           @r=1 if (@a==0 and @d==0)
186:         elsif(@kdelta==3)
187:           # 2
188:           delta=0.5
189:           @r=1 if (@b==0 and @c==0)
190:         end
191:         # 4
192:         if @r!=0
193:           @itype=3
194:         end
195: 
196:         #
197:         #        STORE FREQUENCIES IN  AA, BB, CC AND DD
198:         #
199:         @aa = @a + delta
200:         @bb = @b - delta
201:         @cc = @c - delta
202:         @dd = @d + delta
203:         @tot = @aa+@bb+@cc+@dd
204:         #
205:         #        CHECK IF CORRELATION IS NEGATIVE, 0.0, POSITIVE
206:         #        IF (AA * DD - BB * CC) 7, 5, 6
207: 
208:         corr_dir=@aa * @dd - @bb * @cc
209:         if(corr_dir < 0)
210:           # 7
211:           @probaa = @bb.quo(@tot)
212:           @probac = (@bb + @dd).quo(@tot)
213:           @ksign = 2
214:           # ->  8
215:         else
216:           if (corr_dir==0)
217:             # 5
218:             @itype=4
219:           end
220:           # 6
221:           #
222:           #        COMPUTE PROBABILITIES OF QUADRANT AND OF MARGINALS
223:           #        PROBAA AND PROBAC CHOSEN SO THAT CORRELATION IS POSITIVE.
224:           #        KSIGN INDICATES WHETHER QUADRANTS HAVE BEEN SWITCHED
225:           #
226: 
227:           @probaa = @aa.quo(@tot)
228:           @probac = (@aa+@cc).quo(@tot)
229:           @ksign=1
230:         end
231:         # 8
232: 
233:         @probab = (@aa+@bb).quo(@tot)
234: 
235:         #
236:         #        COMPUTE NORMAL DEVIATES FOR THE MARGINAL FREQUENCIES
237:         #        SINCE NO MARGINAL CAN BE 0.0, IE IS NOT CHECKED
238:         #
239:         @zac = Distribution::Normal.p_value(@probac)
240:         @zab = Distribution::Normal.p_value(@probab)
241:         @ss = Math::exp(0.5 * (@zac ** 2 + @zab ** 2)).quo(TWOPI)
242:         #
243:         #        WHEN R IS 0.0, 1.0 OR -1.0, TRANSFER TO COMPUTE SDZERO
244:         #
245:         if (@r != 0 or @itype > 0)
246:           compute_sdzero
247:           return true
248:         end
249:         #
250:         #        WHEN MARGINALS ARE EQUAL, COSINE EVALUATION IS USED
251:         #
252:         if (@a == @b and @b == @c)
253:           calculate_cosine
254:           return true
255:         end
256:         #
257:         #        INITIAL ESTIMATE OF CORRELATION IS YULES Y
258:         #
259:         @rr = ((Math::sqrt(@aa * @dd) - Math::sqrt(@bb * @cc)) ** 2)  / (@aa * @dd - @bb * @cc).abs
260:         @iter = 0
261:         begin
262:           #
263:           #        IF RR EXCEEDS RCUT, GAUSSIAN QUADRATURE IS USED
264:           #
265:           #10
266:           if @rr>RCUT
267:             gaussian_quadrature
268:             return true
269:           end
270:           #
271:           #        TETRACHORIC SERIES IS COMPUTED
272:           #
273:           #        INITIALIZATION
274:           #
275:           va=1.0
276:           vb=@zac.to_f
277:           wa=1.0
278:           wb=@zab.to_f
279:           term = 1.0
280:           iterm = 0.0
281:           @sum = @probab * @probac
282:           deriv = 0.0
283:           sr = @ss
284:           #15
285:           begin
286:             if(sr.abs<=CONST)
287:               #
288:               #        RESCALE TERMS TO AVOID OVERFLOWS AND UNDERFLOWS
289:               #
290:               sr = sr  / CONST
291:               va = va * CHALF
292:               vb = vb * CHALF
293:               wa = wa * CHALF
294:               wb = wb * CHALF
295:             end
296:             #
297:             #        FORM SUM AND DERIVATIVE OF SERIES
298:             #
299:             #  20
300:             dr = sr * va * wa
301:             sr = sr * @rr / term
302:             cof = sr * va * wa
303:             #
304:             #        ITERM COUNTS NO. OF CONSECUTIVE TERMS  <  CONV
305:             #
306:             iterm+=  1
307:             iterm=0 if (cof.abs > CONV)
308:             @sum = @sum + cof
309:             deriv += dr
310:             vaa = va
311:             waa = wa
312:             va = vb
313:             wa = wb
314:             vb = @zac * va - term * vaa
315:             wb = @zab * wa - term * waa
316:             term += 1
317:           end while (iterm < 2 or term < 6)
318:           #
319:           #        CHECK IF ITERATION CONVERGED
320:           #
321:           if((@sum-@probaa).abs <= CITER)
322:             @itype=term
323:             calculate_sdr
324:             return true
325:           end
326:           #
327:           #        CALCULATE NEXT ESTIMATE OF CORRELATION
328:           #
329:           #25
330:           @iter += 1
331:           #
332:           #        IF TOO MANY ITERATlONS, RUN IS TERMINATED
333:           #
334:           delta = (@sum - @probaa) /  deriv
335:           @rrprev = @rr
336:           @rr = @rr - delta
337:           @rr += 0.5 * delta if(@iter == 1)
338:           @rr= RLIMIT if (@rr > RLIMIT)
339:           @rr =0 if (@rr  <  0.0)
340:         end while @iter < NITER
341:         raise "Too many iteration"
342:         #  GOTO 10
343:       end
compute_sdzero() click to toggle source

85

       COMPUTE SDZERO
     # File lib/statsample/bivariate/tetrachoric.rb, line 452
452:       def compute_sdzero
453:         @sdzero = Math::sqrt(((@aa + @bb) * (@aa + @cc) * (@bb + @dd) * (@cc + @dd)).quo(@tot)).quo(@tot ** 2 * @ss)
454:         @sdr = @sdzero if (@r == 0)
455:       end
gaussian_quadrature() click to toggle source

GAUSSIAN QUADRATURE 40

     # File lib/statsample/bivariate/tetrachoric.rb, line 346
346:       def gaussian_quadrature
347:         if(@iter==0)
348:           # INITIALIZATION, IF THIS IS FIRST ITERATION
349:           @sum=@probab*@probac
350:           @rrprev=0
351:         end
352: 
353:         # 41
354:         sumprv = @probab - @sum
355:         @prob = @bb.quo(@tot)
356:         @prob = @aa.quo(@tot) if (@ksign == 2)
357:         @itype = 1
358:         #
359:         # LOOP TO FIND ESTIMATE OF CORRELATION
360:         #  COMPUTATION OF INTEGRAL (SUM) BY QUADRATURE
361:         #
362:         # 42
363: 
364:         begin
365:           rrsq = Math::sqrt(1 - @rr ** 2)
366:           amid = 0.5 * (UPLIM + @zac)
367:           xlen = UPLIM - amid
368:           @sum = 0
369:           (1..16).each do |iquad|
370:             xla = amid + X[iquad] * xlen
371:             xlb = amid - X[iquad] * xlen
372: 
373: 
374:             #
375:             #       TO AVOID UNDERFLOWS, TEMPA AND TEMPB ARE USED
376:             #
377:             tempa = (@zab - @rr * xla) / rrsq
378:             if (tempa >= 6.0)
379:               @sum = @sum + W[iquad] * Math::exp(0.5  * xla ** 2) * Distribution::Normal.cdf(tempa)
380:             end
381:             tempb = (@zab - @rr * xlb) / rrsq
382: 
383:             if (tempb >= 6.0)
384:               @sum = @sum + W[iquad] * Math::exp(0.5 * xlb ** 2) * Distribution::Normal.cdf(tempb)
385:             end
386:           end # 44 ~ iquad
387:           @sum=@sum*xlen / SQT2PI
388:           #
389:           # CHECK IF ITERATION HAS CONVERGED
390:           #
391:           if ((@prob - @sum).abs <= CITER)
392:             calculate_sdr
393:             return true
394:           end
395:           # ESTIMATE CORRELATION FOR NEXT ITERATION BY LINEAR INTERPOLATION
396: 
397:           rrest = ((@prob -  @sum) * @rrprev - (@prob - sumprv) * @rr) / (sumprv - @sum)
398:           rrest = RLIMIT if (rrest > RLIMIT)
399:           rrest = 0 if (rrest < 0)
400:           @rrprev = @rr
401:           @rr = rrest
402:           sumprv = @sum
403:           #
404:           #        if estimate has same value on two iterations, stop iteration
405:           #
406:           if @rr == @rrprev
407:             calculate_sdr
408:             return true
409:           end
410: 
411: 
412:         end while @iter < NITER
413:         raise "Too many iterations"
414:         # ir a 42
415:       end

Disabled; run with --debug to generate this.

[Validate]

Generated with the Darkfish Rdoc Generator 1.1.6.