R
- STID 2ème annéeheart = read.table("donnees/heart.txt", header = T)
head(heart)
age sexe type_douleur pression cholester sucre electro taux_max
1 70 masculin D 130 322 A C 109
2 67 feminin C 115 564 A C 160
3 57 masculin B 124 261 A A 141
4 64 masculin D 128 263 A A 105
5 74 feminin B 120 269 A C 121
6 65 masculin D 120 177 A A 140
angine depression pic vaisseau coeur
1 non 2.4 2 D presence
2 non 1.6 2 A absence
3 non 0.3 1 A presence
4 oui 0.2 2 B absence
5 oui 0.2 1 B absence
6 non 0.4 1 A absence
dh = read.table("donnees/Detroit_homicide.txt",
skip = 35, header = T)
head(dh)
FTP UEMP MAN LIC GR CLEAR WM NMAN GOV HE WE
1 260.35 11.0 455.5 178.15 215.98 93.4 558724 538.1 133.9 2.98 117.18
2 269.80 7.0 480.2 156.41 180.48 88.5 538584 547.6 137.6 3.09 134.02
3 272.04 5.2 506.1 198.02 209.57 94.4 519171 562.8 143.6 3.23 141.68
4 272.96 4.3 535.8 222.10 231.67 92.0 500457 591.0 150.3 3.33 147.98
5 272.51 3.5 576.0 301.92 297.65 91.0 482418 626.1 164.3 3.46 159.85
6 261.34 3.2 601.7 391.22 367.62 87.4 465029 659.8 179.5 3.60 157.19
HOM ACC ASR
1 8.60 39.17 306.18
2 8.90 40.27 315.16
3 8.52 45.31 277.53
4 8.89 49.51 234.07
5 13.07 55.05 230.84
6 14.57 53.90 217.99
dim(dh)
[1] 13 14
hep = read.table("donnees/hepatitis.TXT",
header = T, na.strings = "?")
head(hep)
AGE SEX STEROID ANTIVIRALS FATIGUE MALAISE ANOREXIA LIVER_BIG
1 30 male no no no no no no
2 50 female no no yes no no no
3 78 female yes no yes no no yes
4 31 female <NA> yes no no no yes
5 34 female yes no no no no yes
6 34 female yes no no no no yes
LIVER_FIRM SPLEEN_PALPABLE SPIDERS ASCITES VARICES BILIRUBIN
1 no no no no no 1.0
2 no no no no no 0.9
3 no no no no no 0.7
4 no no no no no 0.7
5 no no no no no 1.0
6 no no no no no 0.9
ALK_PHOSPHATE SGOT ALBUMIN PROTIME HISTOLOGY Class
1 85.00 18 4.0 61.85 no LIVE
2 135.00 42 3.5 61.85 no LIVE
3 96.00 32 4.0 61.85 no LIVE
4 46.00 52 4.0 80.00 no LIVE
5 105.33 200 4.0 61.85 no LIVE
6 95.00 28 4.0 75.00 no LIVE
adult = read.table("donnees/adult.data",
sep = ",", na.strings = " ?")
head(adult)
V1 V2 V3 V4 V5 V6
1 39 State-gov 77516 Bachelors 13 Never-married
2 50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse
3 38 Private 215646 HS-grad 9 Divorced
4 53 Private 234721 11th 7 Married-civ-spouse
5 28 Private 338409 Bachelors 13 Married-civ-spouse
6 37 Private 284582 Masters 14 Married-civ-spouse
V7 V8 V9 V10 V11 V12 V13
1 Adm-clerical Not-in-family White Male 2174 0 40
2 Exec-managerial Husband White Male 0 0 13
3 Handlers-cleaners Not-in-family White Male 0 0 40
4 Handlers-cleaners Husband Black Male 0 0 40
5 Prof-specialty Wife Black Female 0 0 40
6 Exec-managerial Wife White Female 0 0 40
V14 V15
1 United-States <=50K
2 United-States <=50K
3 United-States <=50K
4 United-States <=50K
5 Cuba <=50K
6 United-States <=50K
names(adult)
[1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11"
[12] "V12" "V13" "V14" "V15"
adult.names = read.table("donnees/adult.names",
skip = 96, sep = ":",
stringsAsFactors = FALSE)
adult.names
V1
1 age
2 workclass
3 fnlwgt
4 education
5 education-num
6 marital-status
7 occupation
8 relationship
9 race
10 sex
11 capital-gain
12 capital-loss
13 hours-per-week
14 native-country
V2
1 continuous.
2 Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.
3 continuous.
4 Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
5 continuous.
6 Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
7 Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.
8 Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.
9 White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
10 Female, Male.
11 continuous.
12 continuous.
13 continuous.
14 United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.
adult.names$V1
[1] "age" "workclass" "fnlwgt" "education"
[5] "education-num" "marital-status" "occupation" "relationship"
[9] "race" "sex" "capital-gain" "capital-loss"
[13] "hours-per-week" "native-country"
names(adult) = c(adult.names$V1, "class")
head(adult)
age workclass fnlwgt education education-num
1 39 State-gov 77516 Bachelors 13
2 50 Self-emp-not-inc 83311 Bachelors 13
3 38 Private 215646 HS-grad 9
4 53 Private 234721 11th 7
5 28 Private 338409 Bachelors 13
6 37 Private 284582 Masters 14
marital-status occupation relationship race sex
1 Never-married Adm-clerical Not-in-family White Male
2 Married-civ-spouse Exec-managerial Husband White Male
3 Divorced Handlers-cleaners Not-in-family White Male
4 Married-civ-spouse Handlers-cleaners Husband Black Male
5 Married-civ-spouse Prof-specialty Wife Black Female
6 Married-civ-spouse Exec-managerial Wife White Female
capital-gain capital-loss hours-per-week native-country class
1 2174 0 40 United-States <=50K
2 0 0 13 United-States <=50K
3 0 0 40 United-States <=50K
4 0 0 40 United-States <=50K
5 0 0 40 Cuba <=50K
6 0 0 40 United-States <=50K
Reprendre l’importation du fichier "heart.txt"
(cf ci-dessus), et répondre aux questions suivantes en complétant le code précédemment écrit.
heart$indicatrice = heart$coeur == "presence"
heart$nbA = (heart$type_douleur == "A") +
(heart$sucre == "A") +
(heart$electro == "A") +
(heart$vaisseau == "A")
heart$nbAbis =
rowSums(heart[c("type_douleur", "sucre", "electro", "vaisseau")] == "A")
heart$ind2 = factor(heart$indicatrice, labels = c("Absence", "Présence"))
head(heart)
age sexe type_douleur pression cholester sucre electro taux_max
1 70 masculin D 130 322 A C 109
2 67 feminin C 115 564 A C 160
3 57 masculin B 124 261 A A 141
4 64 masculin D 128 263 A A 105
5 74 feminin B 120 269 A C 121
6 65 masculin D 120 177 A A 140
angine depression pic vaisseau coeur indicatrice nbA nbAbis ind2
1 non 2.4 2 D presence TRUE 1 1 Présence
2 non 1.6 2 A absence FALSE 2 2 Absence
3 non 0.3 1 A presence TRUE 3 3 Présence
4 oui 0.2 2 B absence FALSE 2 2 Absence
5 oui 0.2 1 B absence FALSE 1 1 Absence
6 non 0.4 1 A absence FALSE 3 3 Absence
heart1a = heart[heart$age < 60,]
heart1b = heart[which(heart$age < 60),]
heart1 = subset(heart, age < 60)
heart1f = subset(heart1, sexe == "feminin")
heart1m = subset(heart1, sexe == "masculin")
attributes(dh)
$names
[1] "FTP" "UEMP" "MAN" "LIC" "GR" "CLEAR" "WM" "NMAN"
[9] "GOV" "HE" "WE" "HOM" "ACC" "ASR"
$class
[1] "data.frame"
$row.names
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13
attr(dh, "names")
[1] "FTP" "UEMP" "MAN" "LIC" "GR" "CLEAR" "WM" "NMAN"
[9] "GOV" "HE" "WE" "HOM" "ACC" "ASR"
attr(dh, "info") =
paste(readLines("donnees/Detroit_homicide.txt", n = 19), collapse = "\n")
cat(attr(dh, "info"))
This is the data set called `DETROIT' in the book `Subset selection in
regression' by Alan J. Miller published in the Chapman & Hall series of
monographs on Statistics & Applied Probability, no. 40. The data are
unusual in that a subset of three predictors can be found which gives a
very much better fit to the data than the subsets found from the Efroymson
stepwise algorithm, or from forward selection or backward elimination.
The original data were given in appendix A of `Regression analysis and its
application: A data-oriented approach' by Gunst & Mason, Statistics
textbooks and monographs no. 24, Marcel Dekker. It has caused problems
because some copies of the Gunst & Mason book do not contain all of the data,
and because Miller does not say which variables he used as predictors and
which is the dependent variable. (HOM was the dependent variable, and the
predictors were FTP ... WE)
The data were collected by J.C. Fisher and used in his paper: "Homicide in
Detroit: The Role of Firearms", Criminology, vol.14, 387-400 (1976)
The data are on the homicide rate in Detroit for the years 1961-1973.
noms = tail(readLines("donnees/Detroit_homicide.txt", n = 34), 15)
noms = noms[noms != ""]
attr(dh, "info.var") = data.frame(
var = trimws(substr(noms, 1, 6)),
descriptif = substr(noms, 10, 100),
stringsAsFactors = FALSE
)