--- title: 'Stat7350: Experimental Design 2 - Examples' author: "AC Gerstein" date: '2019-03-07' output: pdf_document: latex_engine: xelatex header-includes: - \usepackage[fontsize=12pt]{scrextend} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) options(width=40) library(tidyverse) library(ggmosaic) ``` ```{r wrap-hook, echo = FALSE} library(knitr) hook_output = knit_hooks$get('output') knit_hooks$set(output = function(x, options) { # this hook is used only when the linewidth option is not NULL if (!is.null(n <- options$linewidth)) { x = knitr:::split_lines(x) # any lines wider than n should be wrapped if (any(nchar(x) > n)) x = strwrap(x, width = n) x = paste(x, collapse = '\n') } hook_output(x, options) }) ``` #Learning Objectives * See how simple biological experiments can lead to importannt insights (with a statistical controversy thrown in) * Walk-through of 'typical' exeriments that test association between two categorical variables * Walk-through of 'typical' exeriments that seek to compare population means # Genetics: Mendels pea plants **Genetics**: the branch of biology that deals with heredity and variation of organisms. In eukaryotic organisms (animals, plants, fungi), chromosomes carry the hereditary information (genes) that are comprised solely of four base pairs, ACTG that combine in different ways to code information. Gregor Mendel, an Austrian Monk, performed experiments that demonstrated how the law of inheritance works. Before Mendel's experiments, it was though that traits were passed on through a blending process, where offspring inherited a mix of both parental characteristics. Mendel looked at seven different traits (phenotypes) from pea plants: ![](Mendel_traits.png) Credit: Rupali Raju Source: CK-12 Foundation Mendel is credited as the first biologist to use mathematics to quantitatively explain his results. From his pea breeding experiments he predicted. - the concept of genes as the unit of heredity - that genes occur in pairs (i.e., there are two alleles that occupy at the same locus (the same position on a strand of DNA) on homologous chromosomes (matching chromosomes) that influence the same trait) - that one gene of each pair is present in the gametes (i.e., you get one from each parent) --- ##Some terms: **genotype** - the genetic makeup **phenotype** - the physical appearance (genotype + environmental effects) **homozygous** - having the same two alleles of a gene (of position within a gene) **heterozygous** - having two different alleles **dominant** - the allele of a gene that masks or suppresses the expression of an alternative allele **recessive** - an allele that is masked by a dominant allele --- ##Monohybrid cross A *monohybrid cross* is a genetic cross involving parents that differ in only a single trait (single gene). *P* = Parental generation *$F_1$* = First filial generation; the first set of offspring from a genetic cross *$F_2$* = Second filial generation of a genetic cross Mendel conducted a monohybrid cross between parents that were tall (genotype: TT) and dwarf (Genotype: tt). The genotype of all $F_1$ generation plants is Tt. Phenotypically, all plants were tall. ```{r, echo=FALSE, out.width = '100%'} knitr::include_graphics("F1_punnett") ``` If you let the F1 generation self-fertilize, the next monohybrid cross is $Tt * Tt$. ```{r, echo=FALSE, out.width = '100%'} knitr::include_graphics("F2_punnett") ``` You can also use this information to perform a 'test cross' to elucidate an unknown genotype of one parent. ```{r, echo=FALSE, out.width = '100%'} knitr::include_graphics("test_cross") ``` --- These experiments led to: **(1) the principle of dominance**--- one allele is dominant over another **(2) the principle of segregation**--- that when gametes are formed, each sex cell (e.g., sperm/egg) receives only one copy of each gene. Traits that follow strict dominance/recessive relationships (one gene, 0/1 phenotype) are referred to as Mendelian. Many traits are not Mendelian, though some diseases (such as cystic fibrosis) are. ## Dihybrid cross The second type of cross was a **dihybrid cross** involving two traits. P cross: One parent has a phenotype of round yellow seeds (genotype: RRYY) and the other parent has a phenotype of wrinkled green seeds (genotype: rryy). $F_1$ cross: All have a phenotype of round yellow seeds (genotype: RrYy). *** ###CHALLENGE What is the ratio of genotypes and phenotypes in the offspring of the dihybrid cross? *** These experiments led Mendel to postulated: **(3) The principle of independent assortment**--- members of one gene pair segregate independently from other gene pairs during gamete formations. That genes get "shuffled" and thus many combinations are formed is one of the major advantages of sexual reproduction. The original paper (published in 1866) was initially poorly received (3 citations in the first 35 years) before it was simultaneously rediscovered by multiple researchers. Shortly thereafter Mendel was accused of falsifying his data, by Oxford biologist W. F. R. Weldon and then by R.A. Fisher. as the results were too close to expectation when analyzed statistically. Many words (an entire book) has been written about this. Here's a few suggestions for additional reading: Gregory Radick. Beyond the Mendel-Fisher controversy. Science Vol. 350, Issue 6257, pp. 159-160 (2015) http://science.sciencemag.org.uml.idm.oclc.org/content/350/6257/159 Ana M. Pires and João A. Branco. A Statistical Model to Explain the Mendel–Fisher Controversy. Statist. Sci. Volume 25, Number 4, 545-565 (2010). https://projecteuclid.org/euclid.ss/1300108237 Additional source: https://hemantmore.org.in/science/biology/monohybrid-cross/10676/ --- # Association tests between categorical variables ## Cancer and Aspirin: 2x2 Contingency Table Whitlock & Schluter Example 9.2: Aspirin has been thought to reduce the risk of stoke and heart attack in susceptible people. An experimental study was designed to test this. 39,876 women were randomly assigned to two treatments: 19,934 received 100 mg of aspirin every other day while 19,942 women received a placebo. The experiment was single-blind: the women did not know which treatment group they were in. During the study, 1438 women on aspirin and 1427 of those on the placebo were diagnosed with cancer. Source: https://jamanetwork.com/journals/jama/fullarticle/10.1001/jama.294.1.47 --- ```{r, fig.width=4, fig.height = 3} cancer <- read_csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter09/chap09e2AspirinCancer.csv"), col_type = cols()) ggplot(data = cancer) + geom_mosaic(aes(x = product(cancer, aspirinTreatment), fill=cancer), na.rm=TRUE, show.legend = FALSE) + xlab("") + ylab("") + theme_bw() chiTest <- chisq.test(cancer$cancer, cancer$aspirinTreatment, correct = FALSE) chiTest ``` ## Parasites Whitlock & Schluter Example 9.3: Many parasites have multiple species of hosts that are required for them to complete their life cycle. Trematodes of the species **Euhaplorchis californienisis* use three hosts during their life cycle. The worms mature in birds and lay eggs that pass through the bird feces. The horn snail *Cerithidea californica* eats the eggs that hatch and castrate the snail. When an infected snail is eaten by the killifish *Fundulus parviinnis* the parasite further develops and encysts in the fish's brain. Finally, when the killifish is eaten by a bird, the worm becomes a mature adult and starts again. Laffertty and Morris (1996) tested whether infected fish are more likely to be ingested by a bird than non-infected fish. They set up an experiment: a large, open, outdoor tank was stocked with three types of killifish: unparasitized, lightly parasitized, and heavily infected. Foraging birds were naturally able to eat fish directly from the tank. https://esajournals-onlinelibrary-wiley-com.uml.idm.oclc.org/doi/abs/10.2307/2265536 ```{r, echo=FALSE, out.width = '50%', fig.align = "center"} knitr::include_graphics("trematode") ``` Source: https://theethogram.com/2018/06/12/creature-feature-euhaplorchis-californiensis/ --- ```{r, fig.width=8, fig.align = "center"} worm <- read_csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter09/chap09e4WormGetsBird.csv"), col_type = cols()) worm$infection <- factor(worm$infection, levels = c("uninfected", "lightly", "highly")) ggplot(data = worm) + geom_mosaic(aes(x = product(fate, infection), fill=fate), na.rm=TRUE, show.legend = FALSE) + xlab("") + ylab("") + theme_bw() chiTest2 <- chisq.test(worm$fate, worm$infection, correct = FALSE) chiTest2 ``` \newpage # Comparing Population Means ## Human body temperature Whitlock & Schluter example 11.3: It has often been reported that the average human body temperature is 37 degrees Celsius. This stems from a book published in 1868 by German physician Carl Reinhold August Wunderlich who reported an analysis of over one million temperature readings from 24,000 patients. In 1996 Allen Shoemaker published data from 130 body temperature readings for 65 males and 65 females. Is this data consistent with a mean body temperature of 37 degrees? http://jse.amstat.org/v4n2/datasets.shoemaker.html ```{r} temp <- read_csv("bodyTemp.txt", col_type = cols()) ggplot(temp, aes(temp)) + geom_histogram() + theme_bw() + xlab("temperature") ``` ```{r} t.test(temp$temp, mu = 37) ``` What other information might you like about this dataset? \newpage ## Horrned lizard spikes Whitlock & Schluter example 12.3: The horned lizard *Phrynosoma mcalli* is named for the fringe of spikes surrounding its head. A group of herpetologists recently tested whether the long spikes help protect horned lizards from being eaten. They took advantage of the behaviour of one of the main natural predators, the loggerhead shrike *Lanius ludovicianus*. The loggerhead shrike skewers its victims on thorns or barbed wire to save for future eating. Young et al (2004) wanted to test whether horn length influenced the likelihood of successful predation. To do this they measured the horn length from 30 horned lizards they found that had been killed by shrikes. For comparison, they measured the horn length on 154 horned lizards that were still alive in the same area. http://science.sciencemag.org.uml.idm.oclc.org/content/304/5667/65 ```{r, echo=FALSE, out.width = '50%'} knitr::include_graphics("HornedLizard") ``` ```{r} lizard <- read_csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter12/chap12e3HornedLizards.csv"), col_type = cols()) lizard2 <- lizard %>% na.omit() ggplot(lizard2, aes(squamosalHornLength, fill = Survival)) + geom_histogram(alpha=0.5, position="identity") ggplot(lizard2, aes(squamosalHornLength, fill = Survival)) + geom_density(alpha=0.5, position="identity") t.test(squamosalHornLength ~ Survival, data = lizard, var.equal = TRUE) ``` \newpage ## Sexual cannibalism in sagebrush crickets During mating in the sage cricket, *Cyphoderris strepitans*, the male offers his fleshy hind wings to the female to eat. These wounds are not fatal, but a male that already has nibbled wings is less likely to be chosen by a female for subsequent mating. Since females get some nutrition from this process, Johnson et al. (1999) decided to test whether hungry females were more likely to mate than satiated females. https://academic.oup.com/beheco/article/10/3/227/201476 To test this, they randomly divided 24 females into two groups: one group (n = 11) was starved for at least two days and the second group (n = 13) was fed during the same period. Each female was then separately put in a cage with a single (neww) male and the waiting time to mating was recorded. ```{r} cannibalism <- read_csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter13/chap13e5SagebrushCrickets.csv"), col_type = cols()) ggplot(cannibalism, aes(timeToMating)) + geom_histogram(position="identity", colour = "black", binwidth = 20, boundary= 0) + facet_wrap(~feedingStatus) + scale_x_continuous(breaks = c(0, 20, 40, 60, 80, 100)) + theme_bw() ``` Neither group is normally distributed and we have a small sample size. So test using a Wilcoxon rank-sum test. ```{r} wilcox.test(timeToMating ~ feedingStatus, data = cannibalism) ``` This is also the type of data we could run a permutation test on. ```{r} cricketMeans <- cannibalism %>% group_by(feedingStatus) %>% summarise(mean_timeToMating = mean(timeToMating)) diffMeans <- cricketMeans$mean_timeToMating[2] - cricketMeans$mean_timeToMating[1] nPerm <- 10000 permResult <- vector() # initializes for(i in 1:nPerm){ # step 1: permute the times to mating permSample <- sample(cannibalism$timeToMating, replace = FALSE) # step 2: calculate difference betweeen means permMeans <- tapply(permSample, cannibalism$feedingStatus, mean) permResult[i] <- permMeans[2] - permMeans[1] } hist(permResult, right = FALSE, breaks = 100, main= "permutation results") arrows(diffMeans, 25, diffMeans, 15, col="red", lwd=2) ``` Use the null distribution to calculate an approximate p-value. Calculate the number of permuted means that fall below `diffMeans` ```{r} #two-tailed p-value 2* (sum(as.numeric(permResult <= diffMeans))/nPerm) ```