--- title: "R Notebook: 26-01" output: html_notebook --- Create a matrix 2 * 3 consisting of 0: ```{r} matrix(0, nrow=2, ncol=3) ``` Arrange a vector of 12 values into matrix 3 * 4 arrange by rows: ```{r} v <- 1:12 m <- matrix(v, n=3, ncol=4, byrow = TRUE) m ``` Sum of every row in a matrix: ```{r} rowSums(m) ``` Recall: how to create a sample of 0 and 1 of size 10. ```{r} sample(c(0, 1), 10, replace = TRUE) ``` Experiment from the lecture: toss a coin 10 times and repeat this sequence 10000 times: ```{r} tosses <- 10 samples <- 10000 dat <- matrix(sample(c(0, 1), tosses * samples, replace=TRUE), ncol=tosses, byrow=TRUE) ``` Calculate `phats` - proportions of heads in each experiment: ```{r} pbar <- rowSums(dat) / tosses hist(pbar, breaks=tosses, xlim=c(0, 1)) ``` Test $H_0: p = 0.5$: ```{r} binom.test(3, 10, p=0.5) # 3 out of 10 - a fair coin? binom.test(2, 10) # 2 out of 10 - a fair coin? binom.test(1, 10) # 1 out of 10 - a fair coin? ``` Load some dataset and check some null hypothesis with binomial test. ```{r} df <- read.csv("https://raw.githubusercontent.com/LingData2019/LingData/master/seminars/26-01/poetry_last_in_lines.csv", sep = "\t") ``` Suggest your hypotheses about p of nouns. Look at frequencies: ```{r} table(df$UPoS) table(df$UPoS)/sum(table(df$UPoS)) ``` Is it enough to make conclusions? No, proceed to formal tests: ```{r} # select lines with nouns nouns <- df[df$UPoS=='NOUN',] total <- nrow(df) # number of trials nnouns <- nrow(nouns) # number of successes ``` ```{r} # H0: p = 0.6 binom.test(nnoun, total, p = 0.6) # not reject (correct) # not reject != accept != H0 holds ``` ```{r} # H0: p = 0.4 binom.test(nnouns, total, 0.4) # p=0.4 ``` ```{r} # choose lines with one-syllable words at the end one_syll <- df[df$RhymedNsyl == 1, ] nrow(one_syll) nrow(one_syll[one_syll$UPoS == "NOUN", ]) binom.test(32, 43, p=0.6) ``` ```{r} # you can test on your own for every number of syllables table(df$RhymedNsyl) ```