Logic Regression - examples

I ran the currently available options for logic regression on a very simple simulated data set. The data have 500 cases, and 20 binary variables. Each predictor k is simulated from an independent Bernoulli(pk) random variables, with success probabilities pk between 0.1 and 0.9. The response variable is simulated from the model

Y = 3 + 1 L1 - 2 L2 + N(0,1),

where L1=(X1 or X2) and L2=(X3 or X4). The data are available in the R logic regression package as logreg.testdat. So the task is to use the linear model in the logic regression framework to find L1 and L2. Follow the script below to see how we ran this example. The first two steps are to verify that there is signal in the data. Then we find the best scoring models for various sizes, and carry out the two options for model selection. It should be obvious that the best model has two trees, and a total of four leaves. In between we added some lines that are not really necessary, but illustrate how to use the code and grab the objects it generates. You can also download the code below as a separate file. If you use R, we recommend using the option paper="letter" if you create a postscript file and do not want the default 'A4' size.

Make sure to check out the documents about the methodolgy, so you know what's going on here.



### LOAD THE LOGIC REGRESSION PACKAGE ################################
library(LogicReg)    # for Splus, use the attach() function



### FIND THE BEST SCORING MODEL (BIG SIZE) ##############################

# SET THE ANNEALING PARAMETERS
myanneal <- logreg.anneal.control(start=-1,end=-4,iter=25000,update=1000)

# FIND THE BEST SCORING MODEL USING UP TO TWO TREES AND EIGHT LEAVES
# PER TREE (CHECK THE HELP FILES FOR THE DEFAULT SETTINGS)
fit1 <- logreg(resp=logreg.testdat[,1],bin=logreg.testdat[,2:21], 
               type=2,select=1,ntrees=2,anneal.control=myanneal)

# GENERATE POSTSCRIPT FILES OF THE TWO TREES
plot(fit1,pscript=T)

# CUSTOMIZE THE PLOT OF THE SECOND TREE
plot.logregtree(fit1$model$trees[[2]],info=T,coef=F,nms=LETTERS)



### NULL MODEL TEST FOR SIGNAL IN THE DATA ##############################

# TURN OFF THE ITERATION UPDATES ON THE SCREEN
myanneal2 <- logreg.anneal.control(start=-1,end=-4,iter=25000,update=0)

# A PERMUTATION TEST FOR SIGNAL IN THE DATA, 20 REPLICATES
fit4 <- logreg(select=4,anneal.control=myanneal2,oldfit=fit1,nrep=20)

# GENERATE A POSTSCRIPT FILE OF THE REFERENCE DISTRIBUTION
plot(fit4,pscript=T)



### FIND THE BEST SCORING MODEL FOR VARIOUS SIZES #######################

# FIND THE BEST SCORING MODELS OF VARIOUS SIZES, ALLOWING ONE OR TWO 
# TREES, AND BETWEEN ONE AND SEVEN LEAVES PER TREE
fit2 <- logreg(resp=logreg.testdat[,1],bin=logreg.testdat[,2:21],
               type=2,select=2,ntrees=c(1,2),nleaves=c(1,7),
               anneal.control=myanneal2)

# AN EASIER WAY (EQUIVALENT TO THE ABOVE)
fit2 <- logreg(oldfit=fit1,select=2,ntrees=c(1,2),nleaves=c(1,7),
               anneal.control=myanneal2)

# LOOK AT THE SCORES
plot(fit2)

# GENERATE POSTSCRIPT FILES OF THE SCORES AND ALL TREES
plot(fit2,pscript=T)



### CROSS VALIDATION ####################################################

# 10-FOLD CROSS-VALIDATION ON THE TREES IN FIT2
fit3 <- logreg(select=3,oldfit=fit2)

# GENERATE POSTSCRIPT FILES OF THE TRAINING AND TEST SCORES
plot(fit3,pscript=T)

# PRINT A TABLE WITH THE SCORES
n1<-seq(10,dim(fit3$cvscores)[1],10);n2<-c(1,2,6,8)
print(round(fit3$cvscores[n1,n2],3))



### RANDOMIZATION TEST FOR MODEL SELECTION ############################

# PERMUTATION TEST FOR MODEL SELECTION, 10 REPLICATES
fit5 <- logreg(select=5,oldfit=fit2,nrep=10)

# GENERATE A POSTSCRIPT FILE OF THE REFERENCE DISTRIBUTIONS
plot(fit5,pscript=T)



### SUMMARY ############################################################

# print all scores
print(fit2$allscores)

# plot the trees for the best model - two trees, four leaves
postscript("final.ps")
par(mfrow=c(1,2),mar=rep(0,4))
plot.logregtree(fit2$alltrees[[10]]$trees[[1]],
indents=c(0.5,0.2,0.2,0.2),coef.rd=2)
plot.logregtree(fit2$alltrees[[10]]$trees[[2]],
indents=c(0.5,0.2,0.2,0.2),coef.rd=2)
dev.off()





For questions please contact Ingo Ruczinski or Charles Kooperberg.