Advanced R course exercises

Exercise 1

Read in the data we were given

setwd("o:/Training/Advanced R/Advanced R Data/")
read.delim("methylation.txt") -> methylation
read.delim("expression.txt") -> expression

Remove any lines from the methylation data where the promoter methylation equals -1 (ie not actually measured), and see what effect this has on the number of rows of data we have

nrow(methylation)
## [1] 25077
methylation[methylation$Promoter_meth != -1,] -> methylation
nrow(methylation)
## [1] 23974

Deduplicate the expression data based on probe names. We’re going to leave one instance of each duplicated name behind.

expression[!duplicated(expression$Probe),] -> expression

Find out how many olfactory receptors we have in our expression data

Exercise 2

sum(grepl("Olfr",expression$Probe))
## [1] 1118

Change the chromosome names from 1,2,3 etc, to chr1,chr2,chr3 etc.

expression$Chromosome <- paste("chr",expression$Chromosome,sep="")

Find out which probes are present in the methylation data, but not the expression data

methylation$Probe[!(methylation$Probe %in% expression$Probe)]
##   [1] AC027184.1      AC044807.1      AC044864.2      AC069308.1     
##   [5] AC087117.2      AC087556.1      AC087898.1      AC093346.1     
##   [9] AC099934.1      AC100600.1      AC102240.1      AC102463.1     
##  [13] AC102525.1      AC102689.1      AC102815.1      AC103359.1     
##  [17] AC103368.1      AC107711.2      AC107770.1      AC108409.1     
##  [21] AC108434.1      AC109138.1      AC109196.1      AC110211.1     
##  [25] AC110235.1      AC113244.1      AC113316.1      AC113598.1     
##  [29] AC114657.1      AC114668.1      AC115005.1      AC115697.1     
##  [33] AC117657.1      AC119813.1      AC120136.3      AC120540.1     
##  [37] AC121599.1      AC121600.1      AC121784.1      AC121784.4     
##  [41] AC122247.1      AC122490.1      AC122520.2      AC122537.1     
##  [45] AC122546.1      AC122860.1      AC123610.1      AC123616.1     
##  [49] AC123882.1      AC124180.1      AC124466.1      AC124472.1     
##  [53] AC124489.1      AC124584.1      AC124613.1      AC124807.1     
##  [57] AC125201.1      AC126679.1      AC126690.2      AC127419.1     
##  [61] AC132367.1      AC133195.1      AC133646.1      AC134596.1     
##  [65] AC136021.1      AC137708.1      AC138311.2      AC139241.1     
##  [69] AC139317.1      AC140327.1      AC141881.8      AC142212.1     
##  [73] AC147101.1      AC148990.1      AC149593.1      AC149599.1     
##  [77] AC150277.1      AC152062.1      AC152063.1      AC152453.1     
##  [81] AC152826.1      AC153556.1      AC154187.1      AC154247.1     
##  [85] AC154473.1      AC154495.1      AC156026.1      AC156032.1     
##  [89] AC156953.1      AC157822.1      AC158667.1      AC158898.1     
##  [93] AC158995.1      AC159379.1      AC159624.1      AC160757.1     
##  [97] AC161246.1      AC161366.1      AC162464.1      AC162799.1     
## [101] AC163215.1      AC163703.1      AC164408.1      AC165361.1     
## [105] AC166172.1      AC166290.1      AC166574.2      AC167242.1     
## [109] AC167245.1      AC170189.1      AI324046        AL589670.1     
## [113] AL590969.1      AL591146.1      AL591712.1      AL591905.1     
## [117] AL596123.1      AL596258.1      AL603708.1      AL603828.1     
## [121] AL606908.1      AL626775.1      AL626784.1      AL645470.1     
## [125] AL645971.1      AL646096.2      AL663027.1      AL669947.1     
## [129] AL669982.1      AL671866.1      AL671882.1      AL671913.1     
## [133] AL672070.1      AL672076.1      AL672246.1      AL683882.1     
## [137] AL713978.1      AL732564.1      AL732626.1      AL772271.1     
## [141] AL772271.4      AL772272.1      AL772349.1      AL772404.1     
## [145] AL806517.1      AL807811.1      AL831731.1      AL844480.1     
## [149] AL844840.2      AL844869.1      AL844871.1      AL845174.1     
## [153] AL928544.2      AL929011.1      AL929371.2      AL954388.2     
## [157] Art2a-ps        BX005053.1      BX470216.1      BX537299.1     
## [161] BX537327.1      BX649539.1      Cdk3-ps         CPEB3_ribozyme 
## [165] CR956641.1      CR974466.1      CT010479.2      CT025673.1     
## [169] CT030240.1      CT030723.1      CT485616.1      CT573086.1     
## [173] Gm10895         Gm10907         Gm13892         Gm13893        
## [177] Gm13909         Gm13934         Gm13950         Gm14030        
## [181] Gm16603         Gm16611         Gm16612         Gm16886        
## [185] Gm16888         Gm16891         Gm16931         Gm16971        
## [189] Gm16979         Gm17002         Gm17004         Gm17011        
## [193] Gm17015         Gm17347         Gm5074          H2-Bl          
## [197] H2-T10          Ighg            Ighg1           Ighm           
## [201] Ighv3-3         Igkc            Igkv12-38       Igkv12-89      
## [205] Igkv18-36       Igkv6-32        Mir1197         Mir1199        
## [209] Mir1247         Mir1249         Mir124a-3       Mir126         
## [213] Mir129-2        Mir132          Mir137          Mir138-2       
## [217] Mir149          Mir154          Mir1893         Mir1901        
## [221] Mir1905         Mir1932         Mir1938         Mir1945        
## [225] Mir1964         Mir1983         Mir210          Mir212         
## [229] Mir218-1        Mir218-2        Mir219-2        Mir26a-1       
## [233] Mir320          Mir323          Mir339          Mir341         
## [237] Mir375          Mir411          Mir425          Mir431         
## [241] Mir434          Mir486          Mir543          Mir574         
## [245] Mir598          Mir666          Mir680-1        Mir702         
## [249] Mir707          Mir718          Mir760          Mir762         
## [253] Mir770          Mir92b          Mir99a          Mir99b         
## [257] mmu-mir-2134-1  mmu-mir-2134-2  mmu-mir-2134-4  mmu-mir-689-2  
## [261] mt-Rnr2         n-R5-8s1        n-R5s109        n-R5s151       
## [265] n-R5s156        n-R5s172        n-R5s175        n-R5s193       
## [269] n-R5s194        n-R5s2          n-R5s28         n-R5s33        
## [273] n-R5s56         n-R5s71         n-R5s92         n-R5s96        
## [277] Nlrp1c-ps       NRON            Ptcra           RNase_MRP      
## [281] Scarna10        Scarna13        SCARNA13        Scarna2        
## [285] Scnm1           Serpinb10-ps    Slc22a13b       SNORA53        
## [289] Snora61         SNORA71         Snora73a        Snora73b       
## [293] Snora78         Snord104        Snord13         snoU54         
## [297] Tcra-V8         Tcrg-C          Tcrg-V4         Tcrg-V5        
## [301] Tcrg-V6         Tcrg-V7         Telomerase-vert Trac           
## [305] Trav1           Trav10n         Trav13-2        Trav17         
## [309] Trav18          Trav2           Trav3-1         Trav3-3        
## [313] Trav3-4         Trav6d-3        Trav6d-5        Trav7-6        
## [317] Trav8d-1        Trbc1           Trbc2           Trbv1          
## [321] Trbv13-3        Trbv16          Trbv19          Trbv2          
## [325] Trbv31          Trbv4           Trbv5           Trdv1          
## [329] Trdv2-1         Trdv5           U11             U12            
## [333] U2              U3              U4atac         
## 25077 Levels: 0610005C13Rik 0610007C21Rik 0610007L01Rik ... Zzz3

Merge the expression and methylation datasets together. The only column name in common is Probe, which is what we want to merge on anyway.

merge(expression,methylation) -> merged.data

Exercise 3

Loop over the expression/methylation columns to calculate the range of values in each.

apply(merged.data[,6:8],2,range)
##      Expression Gene_body_meth Promoter_meth
## [1,]  -9.743026              0             0
## [2,]  11.321041            100           100

Calculate the mean gene body methylation for each chromosome, then plot this as a barplot

tapply(merged.data$Gene_body_meth,merged.data$Chromosome,mean)
##     chr1    chr10    chr11    chr12    chr13    chr14    chr15    chr16 
## 43.58484 46.10247 39.33736 45.42464 42.77284 41.64486 40.85106 44.89734 
##    chr17    chr18    chr19     chr2     chr3     chr4     chr5     chr6 
## 42.08896 43.12389 43.66956 39.22779 40.02630 41.98396 48.95195 40.50904 
##     chr7     chr8     chr9    chrMT     chrX 
## 41.69334 47.86813 43.31258 16.70353 40.79321
barplot(tapply(merged.data$Gene_body_meth,merged.data$Chromosome,mean))

Exercise 4

Set up a function to calculate the SEM

sem <- function (x,absent=NA) {
  
  if (sum(is.na(x)) > 0) return (absent)
  
  return (sd(x)/sqrt(length(x)))
}

Calculate the gene body methylation SEM for each chromosome.

tapply(merged.data$Gene_body_meth,merged.data$Chromosome,sem)
##      chr1     chr10     chr11     chr12     chr13     chr14     chr15 
## 0.8997813 1.0365418 0.7527817 1.2446569 1.1330741 1.1697267 1.1343304 
##     chr16     chr17     chr18     chr19      chr2      chr3      chr4 
## 1.2531691 0.9920431 1.4073080 1.1942370 0.7315191 0.9978256 0.8167108 
##      chr5      chr6      chr7      chr8      chr9     chrMT      chrX 
## 0.8783216 0.9507895 0.7829240 0.9606002 0.9308829 3.7439534 1.0516666

Exercise 5

Install the vioplot library and use it to draw violin plots for the 3 datasets we have.

library(vioplot)
par(mfrow=c(1,3))
sapply(colnames(merged.data)[6:8],function(x) {
  vioplot(merged.data[[x]])
  title(x)
}
)