id name abstract data_type default_task attribute_type n_instances n_attrs y area url 1 Abalone Predict the age of abalone from physical measurements Multivariate Classification Categorical, Integer, Real 4177 8 1995 Life http://archive.ics.uci.edu/ml/datasets/Abalone 2 Adult Predict whether income exceeds $50K/yr based on census data. Also known as 'Census Income' dataset. Multivariate Classification Categorical, Integer 48842 14 1996 Social http://archive.ics.uci.edu/ml/datasets/Adult 3 Annealing Steel annealing data Multivariate Classification Categorical, Integer, Real 798 38 Physical http://archive.ics.uci.edu/ml/datasets/Annealing 4 Anonymous Microsoft Web Data Log of anonymous users of www.microsoft.com; predict areas of the web site a user visited based on data on other areas the user visited. Recommender-Systems Categorical 37711 294 1998 Computer http://archive.ics.uci.edu/ml/datasets/Anonymous+Microsoft+Web+Data 5 Arrhythmia Distinguish between the presence and absence of cardiac arrhythmia and classify it in one of the 16 groups. Multivariate Classification Categorical, Integer, Real 452 279 1998 Life http://archive.ics.uci.edu/ml/datasets/Arrhythmia 6 Artificial Characters Dataset artificially generated by using first order theory which describes structure of ten capital letters of English alphabet Multivariate Classification Categorical, Integer, Real 6000 7 1992 Computer http://archive.ics.uci.edu/ml/datasets/Artificial+Characters 7 Audiology (Original) Nominal audiology dataset from Baylor Multivariate Classification Categorical 226 1987 Life http://archive.ics.uci.edu/ml/datasets/Audiology+%28Original%29 8 Audiology (Standardized) Standardized version of the original audiology database Multivariate Classification Categorical 226 69 1992 Life http://archive.ics.uci.edu/ml/datasets/Audiology+%28Standardized%29 9 Auto MPG Revised from CMU StatLib library, data concerns city-cycle fuel consumption Multivariate Regression Categorical, Real 398 8 1993 Other http://archive.ics.uci.edu/ml/datasets/Auto+MPG 10 Automobile From 1985 Ward's Automotive Yearbook Multivariate Regression Categorical, Integer, Real 205 26 1987 Other http://archive.ics.uci.edu/ml/datasets/Automobile 11 Badges Badges labeled with a '+' or '-' as a function of a person's name Univariate, Text Classification 294 1 1994 Other http://archive.ics.uci.edu/ml/datasets/Badges 12 Balance Scale Balance scale weight & distance database Multivariate Classification Categorical 625 4 1994 Social http://archive.ics.uci.edu/ml/datasets/Balance+Scale 13 Balloons Data previously used in cognitive psychology experiment; 4 data sets represent different conditions of an experiment Multivariate Classification Categorical 16 4 Social http://archive.ics.uci.edu/ml/datasets/Balloons 14 Breast Cancer Breast Cancer Data (Restricted Access) Multivariate Classification Categorical 286 9 1988 Life http://archive.ics.uci.edu/ml/datasets/Breast+Cancer 15 Breast Cancer Wisconsin (Original) Original Wisconsin Breast Cancer Database Multivariate Classification Integer 699 10 1992 Life http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29 16 Breast Cancer Wisconsin (Prognostic) Prognostic Wisconsin Breast Cancer Database Multivariate Classification, Regression Real 198 34 1995 Life http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Prognostic%29 17 Breast Cancer Wisconsin (Diagnostic) Diagnostic Wisconsin Breast Cancer Database Multivariate Classification Real 569 32 1995 Life http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29 18 Pittsburgh Bridges Bridges database that has original and numeric-discretized datasets Multivariate Classification Categorical, Integer 108 13 1990 Other http://archive.ics.uci.edu/ml/datasets/Pittsburgh+Bridges 19 Car Evaluation Derived from simple hierarchical decision model, this database may be useful for testing constructive induction and structure discovery methods. Multivariate Classification Categorical 1728 6 1997 Other http://archive.ics.uci.edu/ml/datasets/Car+Evaluation 20 Census Income Predict whether income exceeds $50K/yr based on census data. Also known as 'Adult' dataset. Multivariate Classification Categorical, Integer 48842 14 1996 Social http://archive.ics.uci.edu/ml/datasets/Census+Income 21 Chess (King-Rook vs. King-Knight) Knight Pin Chess End-Game Database Creator Multivariate, Data-Generator Classification Categorical, Integer 22 1988 Game http://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King-Knight%29 22 Chess (King-Rook vs. King-Pawn) King+Rook versus King+Pawn on a7 (usually abbreviated KRKPA7). Multivariate Classification Categorical 3196 36 1989 Game http://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King-Pawn%29 23 Chess (King-Rook vs. King) Chess Endgame Database for White King and Rook against Black King (KRK). Multivariate Classification Categorical, Integer 28056 6 1994 Game http://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King%29 24 Chess (Domain Theories) 6 different domain theories for generating legal moves of chess Domain-Theory Game http://archive.ics.uci.edu/ml/datasets/Chess+%28Domain+Theories%29 25 Bach Chorales Time-series data based on chorales; challenge is to learn generative grammar; data in Lisp Univariate, Time-Series Categorical, Integer 100 6 Other http://archive.ics.uci.edu/ml/datasets/Bach+Chorales 26 Connect-4 Contains connect-4 positions Multivariate, Spatial Classification Categorical 67557 42 1995 Game http://archive.ics.uci.edu/ml/datasets/Connect-4 27 Credit Approval This data concerns credit card applications; good mix of attributes Multivariate Classification Categorical, Integer, Real 690 15 Financial http://archive.ics.uci.edu/ml/datasets/Credit+Approval 28 Japanese Credit Screening Includes domain theory (generated by talking to Japanese domain experts); data in Lisp Multivariate, Domain-Theory Classification Categorical, Real, Integer 125 1992 Financial http://archive.ics.uci.edu/ml/datasets/Japanese+Credit+Screening 29 Computer Hardware Relative CPU Performance Data, described in terms of its cycle time, memory size, etc. Multivariate Regression Integer 209 9 1987 Computer http://archive.ics.uci.edu/ml/datasets/Computer+Hardware 30 Contraceptive Method Choice Dataset is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey. Multivariate Classification Categorical, Integer 1473 9 1997 Life http://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice 31 Covertype Forest CoverType dataset Multivariate Classification Categorical, Integer 581012 54 1998 Life http://archive.ics.uci.edu/ml/datasets/Covertype 32 Cylinder Bands Used in decision tree induction for mitigating process delays known as 'cylinder bands' in rotogravure printing Multivariate Classification Categorical, Integer, Real 512 39 1995 Physical http://archive.ics.uci.edu/ml/datasets/Cylinder+Bands 33 Dermatology Aim for this dataset is to determine the type of Eryhemato-Squamous Disease. Multivariate Classification Categorical, Integer 366 33 1998 Life http://archive.ics.uci.edu/ml/datasets/Dermatology 34 Diabetes This diabetes dataset is from AIM '94 Multivariate, Time-Series Categorical, Integer 20 Life http://archive.ics.uci.edu/ml/datasets/Diabetes 35 DGP2 - The Second Data Generation Program Generates application domains based on specific parameters, number of features, and proportion of positive to negative examples Data-Generator Real Other http://archive.ics.uci.edu/ml/datasets/DGP2+-+The+Second+Data+Generation+Program 36 Document Understanding Five concepts, expressed as predicates, to be learned 1994 Other http://archive.ics.uci.edu/ml/datasets/Document+Understanding 37 EBL Domain Theories Assorted small-scale domain theories Computer http://archive.ics.uci.edu/ml/datasets/EBL+Domain+Theories 38 Echocardiogram Data for classifying if patients will survive for at least one year after a heart attack Multivariate Classification Categorical, Integer, Real 132 12 1989 Life http://archive.ics.uci.edu/ml/datasets/Echocardiogram 39 Ecoli This data contains protein localization sites Multivariate Classification Real 336 8 1996 Life http://archive.ics.uci.edu/ml/datasets/Ecoli 40 Flags From Collins Gem Guide to Flags, 1986 Multivariate Classification Categorical, Integer 194 30 1990 Other http://archive.ics.uci.edu/ml/datasets/Flags 41 Function Finding Cases collected mostly from investigations in physical science; intention is to evaluate function-finding algorithms Function-Learning Real 352 1990 Physical http://archive.ics.uci.edu/ml/datasets/Function+Finding 42 Glass Identification From USA Forensic Science Service; 6 types of glass; defined in terms of their oxide content (i.e. Na, Fe, K, etc) Multivariate Classification Real 214 10 1987 Physical http://archive.ics.uci.edu/ml/datasets/Glass+Identification 43 Haberman's Survival Dataset contains cases from study conducted on the survival of patients who had undergone surgery for breast cancer Multivariate Classification Integer 306 3 1999 Life http://archive.ics.uci.edu/ml/datasets/Haberman%27s+Survival 44 Hayes-Roth Topic: human subjects study Multivariate Classification Categorical 160 5 1989 Social http://archive.ics.uci.edu/ml/datasets/Hayes-Roth 45 Heart Disease 4 databases: Cleveland, Hungary, Switzerland, and the VA Long Beach Multivariate Classification Categorical, Integer, Real 303 75 1988 Life http://archive.ics.uci.edu/ml/datasets/Heart+Disease 46 Hepatitis From G.Gong: CMU; Mostly Boolean or numeric-valued attribute types; Includes cost data (donated by Peter Turney) Multivariate Classification Categorical, Integer, Real 155 19 1988 Life http://archive.ics.uci.edu/ml/datasets/Hepatitis 47 Horse Colic Well documented attributes; 368 instances with 28 attributes (continuous, discrete, and nominal); 30% missing values Multivariate Classification Categorical, Integer, Real 368 27 1989 Life http://archive.ics.uci.edu/ml/datasets/Horse+Colic 48 Housing Taken from StatLib library Multivariate Regression Categorical, Integer, Real 506 14 1993 Other http://archive.ics.uci.edu/ml/datasets/Housing 49 ICU Data set prepared for the use of participants for the 1994 AAAI Spring Symposium on Artificial Intelligence in Medicine. Multivariate, Time-Series Real Life http://archive.ics.uci.edu/ml/datasets/ICU 50 Image Segmentation Image data described by high-level numeric-valued attributes, 7 classes Multivariate Classification Real 2310 19 1990 Other http://archive.ics.uci.edu/ml/datasets/Image+Segmentation 51 Internet Advertisements This dataset represents a set of possible advertisements on Internet pages. Multivariate Classification Categorical, Integer, Real 3279 1558 1998 Computer http://archive.ics.uci.edu/ml/datasets/Internet+Advertisements 52 Ionosphere Classification of radar returns from the ionosphere Multivariate Classification Integer, Real 351 34 1989 Physical http://archive.ics.uci.edu/ml/datasets/Ionosphere 53 Iris Famous database; from Fisher, 1936 Multivariate Classification Real 150 4 1988 Life http://archive.ics.uci.edu/ml/datasets/Iris 54 ISOLET Goal: Predict which letter-name was spoken--a simple classification task. Multivariate Classification Real 7797 617 1994 Computer http://archive.ics.uci.edu/ml/datasets/ISOLET 55 Kinship Relational dataset Relational Relational-Learning Categorical 104 12 1990 Social http://archive.ics.uci.edu/ml/datasets/Kinship 56 Labor Relations From Collective Bargaining Review Multivariate Categorical, Integer, Real 57 16 1988 Social http://archive.ics.uci.edu/ml/datasets/Labor+Relations 57 LED Display Domain From Classification and Regression Trees book; We provide here 2 C programs for generating sample databases Multivariate, Data-Generator Classification Categorical 7 1988 Computer http://archive.ics.uci.edu/ml/datasets/LED+Display+Domain 58 Lenses Database for fitting contact lenses Multivariate Classification Categorical 24 4 1990 Other http://archive.ics.uci.edu/ml/datasets/Lenses 59 Letter Recognition Database of character image features; try to identify the letter Multivariate Classification Integer 20000 16 1991 Computer http://archive.ics.uci.edu/ml/datasets/Letter+Recognition 60 Liver Disorders BUPA Medical Research Ltd. database donated by Richard S. Forsyth Multivariate Categorical, Integer, Real 345 7 1990 Life http://archive.ics.uci.edu/ml/datasets/Liver+Disorders 61 Logic Theorist All code for Logic Theorist Domain-Theory Computer http://archive.ics.uci.edu/ml/datasets/Logic+Theorist 62 Lung Cancer Lung cancer data; no attribute definitions Multivariate Classification Integer 32 56 1992 Life http://archive.ics.uci.edu/ml/datasets/Lung+Cancer 63 Lymphography This lymphography domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. (Restricted access) Multivariate Classification Categorical 148 18 1988 Life http://archive.ics.uci.edu/ml/datasets/Lymphography 64 Mechanical Analysis Fault diagnosis problem of electromechanical devices; also PUMPS DATA SET is newer version with domain theory and results Multivariate Classification Categorical, Integer, Real 209 8 1990 Computer http://archive.ics.uci.edu/ml/datasets/Mechanical+Analysis 65 Meta-data Meta-Data was used in order to give advice about which classification method is appropriate for a particular dataset (taken from results of Statlog project). Multivariate Classification Categorical, Integer, Real 528 22 1996 Other http://archive.ics.uci.edu/ml/datasets/Meta-data 66 Mobile Robots Learning concepts from sensor data of a mobile robot; set of data sets Domain-Theory Categorical, Integer, Real 1995 Computer http://archive.ics.uci.edu/ml/datasets/Mobile+Robots 67 Molecular Biology (Promoter Gene Sequences) E. Coli promoter gene sequences (DNA) with partial domain theory Sequential, Domain-Theory Classification Categorical 106 58 1990 Life http://archive.ics.uci.edu/ml/datasets/Molecular+Biology+%28Promoter+Gene+Sequences%29 68 Molecular Biology (Protein Secondary Structure) From CMU connectionist bench repository; Classifies secondary structure of certain globular proteins Sequential Classification Categorical 128 Life http://archive.ics.uci.edu/ml/datasets/Molecular+Biology+%28Protein+Secondary+Structure%29 69 Molecular Biology (Splice-junction Gene Sequences) Primate splice-junction gene sequences (DNA) with associated imperfect domain theory Sequential, Domain-Theory Classification Categorical 3190 61 1992 Life http://archive.ics.uci.edu/ml/datasets/Molecular+Biology+%28Splice-junction+Gene+Sequences%29 70 MONK's Problems A set of three artificial domains over the same attribute space; Used to test a wide range of induction algorithms Multivariate Classification Categorical 432 7 1992 Other http://archive.ics.uci.edu/ml/datasets/MONK%27s+Problems 71 Moral Reasoner Horn-clause model that qualitatively simulates moral reasoning; Theory includes negated literals Domain-Theory 202 1994 Computer http://archive.ics.uci.edu/ml/datasets/Moral+Reasoner 72 Multiple Features This dataset consists of features of handwritten numerals (`0'--`9') extracted from a collection of Dutch utility maps Multivariate Classification Integer, Real 2000 649 Computer http://archive.ics.uci.edu/ml/datasets/Multiple+Features 73 Mushroom From Audobon Society Field Guide; mushrooms described in terms of physical characteristics; classification: poisonous or edible Multivariate Classification Categorical 8124 22 1987 Life http://archive.ics.uci.edu/ml/datasets/Mushroom 74 Musk (Version 1) The goal is to learn to predict whether new molecules will be musks or non-musks Multivariate Classification Integer 476 168 1994 Physical http://archive.ics.uci.edu/ml/datasets/Musk+%28Version+1%29 75 Musk (Version 2) The goal is to learn to predict whether new molecules will be musks or non-musks Multivariate Classification Integer 6598 168 1994 Physical http://archive.ics.uci.edu/ml/datasets/Musk+%28Version+2%29 76 Nursery Nursery Database was derived from a hierarchical decision model originally developed to rank applications for nursery schools. Multivariate Classification Categorical 12960 8 1997 Social http://archive.ics.uci.edu/ml/datasets/Nursery 77 Othello Domain Theory Used in research to generate features for an inductive learning system Domain-Theory 1991 Game http://archive.ics.uci.edu/ml/datasets/Othello+Domain+Theory 78 Page Blocks Classification The problem consists of classifying all the blocks of the page layout of a document that has been detected by a segmentation process. Multivariate Classification Integer, Real 5473 10 1995 Computer http://archive.ics.uci.edu/ml/datasets/Page+Blocks+Classification 79 Pima Indians Diabetes From National Institute of Diabetes and Digestive and Kidney Diseases; Includes cost data (donated by Peter Turney) Multivariate Classification Integer, Real 768 8 1990 Life http://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes 80 Optical Recognition of Handwritten Digits Two versions of this database available; see folder Multivariate Classification Integer 5620 64 1998 Computer http://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits 81 Pen-Based Recognition of Handwritten Digits Digit database of 250 samples from 44 writers Multivariate Classification Integer 10992 16 1998 Computer http://archive.ics.uci.edu/ml/datasets/Pen-Based+Recognition+of+Handwritten+Digits 82 Post-Operative Patient Dataset of patient features Multivariate Classification Categorical, Integer 90 8 1993 Life http://archive.ics.uci.edu/ml/datasets/Post-Operative+Patient 83 Primary Tumor From Ljubljana Oncology Institute Multivariate Classification Categorical 339 17 1988 Life http://archive.ics.uci.edu/ml/datasets/Primary+Tumor 84 Prodigy Assorted domains like blocksworld, eightpuzzle, and schedworld. Domain-Theory Other http://archive.ics.uci.edu/ml/datasets/Prodigy 85 Qualitative Structure Activity Relationships Two sets of datasets are given: pyrimidines and triazines Domain-Theory Physical http://archive.ics.uci.edu/ml/datasets/Qualitative+Structure+Activity+Relationships 86 Quadruped Mammals The file animals.c is a data generator of structured instances representing quadruped animals Multivariate, Data-Generator Classification Real 72 1992 Life http://archive.ics.uci.edu/ml/datasets/Quadruped+Mammals 87 Servo Data was from a simulation of a servo system Multivariate Regression Categorical, Integer 167 4 1993 Computer http://archive.ics.uci.edu/ml/datasets/Servo 88 Shuttle Landing Control Tiny database; all nominal values Multivariate Classification Categorical 15 6 1988 Physical http://archive.ics.uci.edu/ml/datasets/Shuttle+Landing+Control 89 Solar Flare Each class attribute counts the number of solar flares of a certain class that occur in a 24 hour period Multivariate Regression Categorical 1389 10 1989 Physical http://archive.ics.uci.edu/ml/datasets/Solar+Flare 90 Soybean (Large) Michalski's famous soybean disease database Multivariate Classification Categorical 307 35 1988 Life http://archive.ics.uci.edu/ml/datasets/Soybean+%28Large%29 91 Soybean (Small) Michalski's famous soybean disease database Multivariate Classification Categorical 47 35 1987 Life http://archive.ics.uci.edu/ml/datasets/Soybean+%28Small%29 92 Challenger USA Space Shuttle O-Ring Task: predict the number of O-rings that experience thermal distress on a flight at 31 degrees F given data on the previous 23 shuttle flights Multivariate Regression Integer 23 4 1993 Physical http://archive.ics.uci.edu/ml/datasets/Challenger+USA+Space+Shuttle+O-Ring 93 Low Resolution Spectrometer From IRAS data -- NASA Ames Research Center Multivariate Classification Integer, Real 531 102 1988 Physical http://archive.ics.uci.edu/ml/datasets/Low+Resolution+Spectrometer 94 Spambase Classifying Email as Spam or Non-Spam Multivariate Classification Integer, Real 4601 57 1999 Computer http://archive.ics.uci.edu/ml/datasets/Spambase 95 SPECT Heart Data on cardiac Single Proton Emission Computed Tomography (SPECT) images. Each patient classified into two categories: normal and abnormal. Multivariate Classification Categorical 267 22 2001 Life http://archive.ics.uci.edu/ml/datasets/SPECT+Heart 96 SPECTF Heart Data on cardiac Single Proton Emission Computed Tomography (SPECT) images. Each patient classified into two categories: normal and abnormal. Multivariate Classification Integer 267 44 2001 Life http://archive.ics.uci.edu/ml/datasets/SPECTF+Heart 97 Sponge Data on sponges; Attributes in spanish Multivariate Clustering Categorical, Integer 76 45 Life http://archive.ics.uci.edu/ml/datasets/Sponge 98 Statlog Project Various Databases: Vehicle silhouttes, Landsat Sattelite, Shuttle, Australian Credit Approval, Heart Disease, Image Segmentation, German Credit 1992 Other http://archive.ics.uci.edu/ml/datasets/Statlog+Project 99 Student Loan Relational Student Loan Relational Domain Domain-Theory 1000 1993 Social http://archive.ics.uci.edu/ml/datasets/Student+Loan+Relational 100 Teaching Assistant Evaluation The data consist of evaluations of teaching performance; scores are 'low', 'medium', or 'high' Multivariate Classification Categorical, Integer 151 5 1997 Other http://archive.ics.uci.edu/ml/datasets/Teaching+Assistant+Evaluation 101 Tic-Tac-Toe Endgame Binary classification task on possible configurations of tic-tac-toe game Multivariate Classification Categorical 958 9 1991 Game http://archive.ics.uci.edu/ml/datasets/Tic-Tac-Toe+Endgame 102 Thyroid Disease 10 separate databases from Garavan Institute Multivariate, Domain-Theory Classification Categorical, Real 7200 21 1987 Life http://archive.ics.uci.edu/ml/datasets/Thyroid+Disease 103 Trains 2 data formats (structured, one-instance-per-line) Multivariate Classification Categorical 10 32 1994 Other http://archive.ics.uci.edu/ml/datasets/Trains 104 University Data in original (LISP-readable) form Multivariate Classification Categorical, Integer 285 17 1988 Other http://archive.ics.uci.edu/ml/datasets/University 105 Congressional Voting Records 1984 United Stated Congressional Voting Records; Classify as Republican or Democrat Multivariate Classification Categorical 435 16 1987 Social http://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records 106 Water Treatment Plant Multiple classes predict plant state Multivariate Clustering Integer, Real 527 38 1993 Physical http://archive.ics.uci.edu/ml/datasets/Water+Treatment+Plant 107 Waveform Database Generator (Version 1) CART book's waveform domains Multivariate, Data-Generator Classification Real 5000 21 1988 Physical http://archive.ics.uci.edu/ml/datasets/Waveform+Database+Generator+%28Version+1%29 108 Waveform Database Generator (Version 2) CART book's waveform domains Multivariate, Data-Generator Classification Real 5000 40 1988 Physical http://archive.ics.uci.edu/ml/datasets/Waveform+Database+Generator+%28Version+2%29 109 Wine Using chemical analysis determine the origin of wines Multivariate Classification Integer, Real 178 13 1991 Physical http://archive.ics.uci.edu/ml/datasets/Wine 110 Yeast Predicting the Cellular Localization Sites of Proteins Multivariate Classification Real 1484 8 1996 Life http://archive.ics.uci.edu/ml/datasets/Yeast 111 Zoo Artificial, 7 classes of animals Multivariate Classification Categorical, Integer 101 17 1990 Life http://archive.ics.uci.edu/ml/datasets/Zoo 112 Undocumented Various datasets without documentation (feel free to explore!) Other http://archive.ics.uci.edu/ml/datasets/Undocumented 113 Twenty Newsgroups This data set consists of 20000 messages taken from 20 newsgroups. Text 20000 1999 Other http://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups 114 Australian Sign Language signs This data consists of sample of Auslan (Australian Sign Language) signs. Examples of 95 signs were collected from five signers with a total of 6650 sign samples. Multivariate, Time-Series Classification Categorical, Real 6650 15 1999 Other http://archive.ics.uci.edu/ml/datasets/Australian+Sign+Language+signs 115 Australian Sign Language signs (High Quality) This data consists of sample of Auslan (Australian Sign Language) signs. 27 examples of each of 95 Auslan signs were captured from a native signer using high-quality position trackers Multivariate, Time-Series Classification Real 2565 22 2002 Other http://archive.ics.uci.edu/ml/datasets/Australian+Sign+Language+signs+%28High+Quality%29 116 US Census Data (1990) The USCensus1990raw data set contains a one percent sample of the Public Use Microdata Samples (PUMS) person records drawn from the full 1990 census sample. Multivariate Clustering Categorical 2458285 68 Social http://archive.ics.uci.edu/ml/datasets/US+Census+Data+%281990%29 117 Census-Income (KDD) This data set contains weighted census data extracted from the 1994 and 1995 current population surveys conducted by the U.S. Census Bureau. Multivariate Classification Categorical, Integer 299285 40 2000 Social http://archive.ics.uci.edu/ml/datasets/Census-Income+%28KDD%29 118 Coil 1999 Competition Data This data set is from the 1999 Computational Intelligence and Learning (COIL) competition. The data contains measurements of river chemical concentrations and algae densities. Multivariate Categorical, Real 340 17 1999 Physical http://archive.ics.uci.edu/ml/datasets/Coil+1999+Competition+Data 119 Corel Image Features This dataset contains image features extracted from a Corel image collection. Four sets of features are available based on the color histogram, color histogram layout, color moments, and co-occurence Multivariate Real 68040 89 1999 Other http://archive.ics.uci.edu/ml/datasets/Corel+Image+Features 120 E. Coli Genes Data giving characteristics of each ORF (potential gene) in the E. coli genome. Sequence, homology (similarity to other genes) and structural information, and function (if known) are provided. Relational 2001 Life http://archive.ics.uci.edu/ml/datasets/E.+Coli+Genes 121 EEG Database This data arises from a large study to examine EEG correlates of genetic predisposition to alcoholism. It contains measurements from 64 electrodes placed on the scalp sampled at 256 Hz Multivariate, Time-Series Categorical, Integer, Real 122 4 1999 Life http://archive.ics.uci.edu/ml/datasets/EEG+Database 122 El Nino The data set contains oceanographic and surface meteorological readings taken from a series of buoys positioned throughout the equatorial Pacific. Spatio-temporal Integer, Real 178080 12 1999 Physical http://archive.ics.uci.edu/ml/datasets/El+Nino 123 Entree Chicago Recommendation Data This data contains a record of user interactions with the Entree Chicago restaurant recommendation system. Transactional, Sequential Recommender-Systems Categorical 50672 2000 Other http://archive.ics.uci.edu/ml/datasets/Entree+Chicago+Recommendation+Data 124 CMU Face Images This data consists of 640 black and white face images of people taken with varying pose (straight, left, right, up), expression (neutral, happy, sad, angry), eyes (wearing sunglasses or not), and size Image Classification Integer 640 1999 Other http://archive.ics.uci.edu/ml/datasets/CMU+Face+Images 125 Insurance Company Benchmark (COIL 2000) This data set used in the CoIL 2000 Challenge contains information on customers of an insurance company. The data consists of 86 variables and includes product usage data and socio-demographic data Multivariate Regression, Description Categorical, Integer 9000 86 2000 Social http://archive.ics.uci.edu/ml/datasets/Insurance+Company+Benchmark+%28COIL+2000%29 126 Internet Usage Data This data contains general demographic information on internet users in 1997. Multivariate Categorical, Integer 10104 72 1999 Computer http://archive.ics.uci.edu/ml/datasets/Internet+Usage+Data 127 IPUMS Census Database This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the years 1970, 1980, and 1990. Multivariate Categorical, Integer 256932 61 1999 Social http://archive.ics.uci.edu/ml/datasets/IPUMS+Census+Database 128 Japanese Vowels This dataset records 640 time series of 12 LPC cepstrum coefficients taken from nine male speakers. Multivariate, Time-Series Classification Real 640 12 Other http://archive.ics.uci.edu/ml/datasets/Japanese+Vowels 129 KDD Cup 1998 Data This is the data set used for The Second International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-98 Multivariate Regression Categorical, Integer 191779 481 1998 Other http://archive.ics.uci.edu/ml/datasets/KDD+Cup+1998+Data 130 KDD Cup 1999 Data This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 Multivariate Classification Categorical, Integer 4000000 42 1999 Computer http://archive.ics.uci.edu/ml/datasets/KDD+Cup+1999+Data 131 M. Tuberculosis Genes Data giving characteristics of each ORF (potential gene) in the M. tuberculosis bacterium. Sequence, homology (similarity to other genes) and structural information, and function (if known) are provided Relational 2001 Life http://archive.ics.uci.edu/ml/datasets/M.+Tuberculosis+Genes 132 Movie This data set contains a list of over 10000 films including many older, odd, and cult films. There is information on actors, casts, directors, producers, studios, etc. Multivariate, Relational 10000 1999 Other http://archive.ics.uci.edu/ml/datasets/Movie 133 MSNBC.com Anonymous Web Data This data describes the page visits of users who visited msnbc.com on September 28, 1999. Visits are recorded at the level of URL category (see description) and are recorded in time order. Sequential Categorical 989818 Computer http://archive.ics.uci.edu/ml/datasets/MSNBC.com+Anonymous+Web+Data 134 NSF Research Award Abstracts 1990-2003 This data set consists of (a) 129,000 abstracts describing NSF awards for basic research, (b) bag-of-word data files extracted from the abstracts, (c) a list of words used for indexing the bag-of-word Text 129000 2003 Other http://archive.ics.uci.edu/ml/datasets/NSF+Research+Award+Abstracts+1990-2003 135 Pioneer-1 Mobile Robot Data This dataset contains time series sensor readings of the Pioneer-1 mobile robot. The data is broken into 'experiences' in which the robot takes action for some period of time and experiences a control Multivariate, Time-Series Categorical, Real 1999 Computer http://archive.ics.uci.edu/ml/datasets/Pioneer-1+Mobile+Robot+Data 136 Pseudo Periodic Synthetic Time Series This data set is designed for testing indexing schemes in time series databases. The data appears highly periodic, but never exactly repeats itself. Univariate, Time-Series 100000 1999 Other http://archive.ics.uci.edu/ml/datasets/Pseudo+Periodic+Synthetic+Time+Series 137 Reuters-21578 Text Categorization Collection This is a collection of documents that appeared on Reuters newswire in 1987. The documents were assembled and indexed with categories. Text Classification Categorical 21578 5 1997 Other http://archive.ics.uci.edu/ml/datasets/Reuters-21578+Text+Categorization+Collection 138 Robot Execution Failures This dataset contains force and torque measurements on a robot after failure detection. Each failure is characterized by 15 force/torque samples collected at regular time intervals Multivariate, Time-Series Classification Integer 463 90 1999 Physical http://archive.ics.uci.edu/ml/datasets/Robot+Execution+Failures 139 Synthetic Control Chart Time Series This data consists of synthetically generated control charts. Time-Series Classification, Clustering Real 600 1999 Other http://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series 140 Syskill and Webert Web Page Ratings This database contains HTML source of web pages plus the ratings of a single user on these web pages. Web pages are on four seperate subjects (Bands- recording artists; Goats; Sheep; and BioMedical) Multivariate, Text Classification Categorical 332 5 1998 Computer http://archive.ics.uci.edu/ml/datasets/Syskill+and+Webert+Web+Page+Ratings 141 UNIX User Data This file contains 9 sets of sanitized user data drawn from the command histories of 8 UNIX computer users at Purdue over the course of up to 2 years. Text, Sequential Computer http://archive.ics.uci.edu/ml/datasets/UNIX+User+Data 142 Volcanoes on Venus - JARtool experiment The JARtool project was a pioneering effort to develop an automatic system for cataloging small volcanoes in the large set of Venus images returned by the Magellan spacecraft. Image Classification Physical http://archive.ics.uci.edu/ml/datasets/Volcanoes+on+Venus+-+JARtool+experiment 143 Statlog (Australian Credit Approval) This file concerns credit card applications. This database exists elsewhere in the repository (Credit Screening Database) in a slightly different form Multivariate Classification Categorical, Integer, Real 690 14 Financial http://archive.ics.uci.edu/ml/datasets/Statlog+%28Australian+Credit+Approval%29 144 Statlog (German Credit Data) This dataset classifies people described by a set of attributes as good or bad credit risks. Comes in two formats (one all numeric). Also comes with a cost matrix Multivariate Classification Categorical, Integer 1000 20 1994 Financial http://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29 145 Statlog (Heart) This dataset is a heart disease database similar to a database already present in the repository (Heart Disease databases) but in a slightly different form Multivariate Classification Categorical, Real 270 13 Life http://archive.ics.uci.edu/ml/datasets/Statlog+%28Heart%29 146 Statlog (Landsat Satellite) Multi-spectral values of pixels in 3x3 neighbourhoods in a satellite image, and the classification associated with the central pixel in each neighbourhood Multivariate Classification Integer 6435 36 1993 Physical http://archive.ics.uci.edu/ml/datasets/Statlog+%28Landsat+Satellite%29 147 Statlog (Image Segmentation) This dataset is an image segmentation database similar to a database already present in the repository (Image segmentation database) but in a slightly different form. Multivariate Classification Real 2310 19 1990 Other http://archive.ics.uci.edu/ml/datasets/Statlog+%28Image+Segmentation%29 148 Statlog (Shuttle) The shuttle dataset contains 9 attributes all of which are numerical. Approximately 80% of the data belongs to class 1 Multivariate Classification Integer 58000 9 Physical http://archive.ics.uci.edu/ml/datasets/Statlog+%28Shuttle%29 149 Statlog (Vehicle Silhouettes) 3D objects within a 2D image by application of an ensemble of shape feature extractors to the 2D silhouettes of the objects. Multivariate Classification Integer 946 18 Other http://archive.ics.uci.edu/ml/datasets/Statlog+%28Vehicle+Silhouettes%29 150 Connectionist Bench (Nettalk Corpus) The file 'nettalk.data' contains a list of 20,008 English words, along with a phonetic transcription for each word. The task is to train a network to produce the proper phonemes Multivariate Categorical 20008 4 Other http://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+%28Nettalk+Corpus%29 151 Connectionist Bench (Sonar, Mines vs. Rocks) The task is to train a network to discriminate between sonar signals bounced off a metal cylinder and those bounced off a roughly cylindrical rock. Multivariate Classification Real 208 60 Physical http://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+%28Sonar%2C+Mines+vs.+Rocks%29 152 Connectionist Bench (Vowel Recognition - Deterding Data) Speaker independent recognition of the eleven steady state vowels of British English using a specified training set of lpc derived log area ratios. Classification Real 528 10 Other http://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+%28Vowel+Recognition+-+Deterding+Data%29 153 Economic Sanctions Domain Theory on Economic Sanctions; Undocumented Domain-Theory Financial http://archive.ics.uci.edu/ml/datasets/Economic+Sanctions 154 Protein Data Undocumented Life http://archive.ics.uci.edu/ml/datasets/Protein+Data 155 Cloud Little Documentation Multivariate Real 1024 10 1989 Physical http://archive.ics.uci.edu/ml/datasets/Cloud 156 CalIt2 Building People Counts This data comes from the main door of the CalIt2 building at UCI. Multivariate, Time-Series Categorical, Integer 10080 4 2006 Other http://archive.ics.uci.edu/ml/datasets/CalIt2+Building+People+Counts 157 Dodgers Loop Sensor Loop sensor data was collected for the Glendale on ramp for the 101 North freeway in Los Angeles Multivariate, Time-Series Categorical, Integer 50400 3 2006 Other http://archive.ics.uci.edu/ml/datasets/Dodgers+Loop+Sensor 158 Poker Hand Purpose is to predict poker hands Multivariate Classification Categorical, Integer 1025010 11 2007 Game http://archive.ics.uci.edu/ml/datasets/Poker+Hand 159 MAGIC Gamma Telescope Data are MC generated to simulate registration of high energy gamma particles in an atmospheric Cherenkov telescope Multivariate Classification Real 19020 11 2007 Physical http://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope 160 UJI Pen Characters Data consists of written characters in a UNIPEN-like format Multivariate, Sequential Classification Integer 1364 2007 Computer http://archive.ics.uci.edu/ml/datasets/UJI+Pen+Characters 161 Mammographic Mass Discrimination of benign and malignant mammographic masses based on BI-RADS attributes and the patient's age. Multivariate Classification Integer 961 6 2007 Life http://archive.ics.uci.edu/ml/datasets/Mammographic+Mass 162 Forest Fires This is a difficult regression task, where the aim is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data (see details at: http://www.dsi.uminho.pt/~pcortez/forestfires). Multivariate Regression Real 517 13 2008 Physical http://archive.ics.uci.edu/ml/datasets/Forest+Fires 163 Reuters Transcribed Subset This dataset is created by reading out 200 files from the 10 largest Reuters classes and using an Automatic Speech Recognition system to create corresponding transcriptions. Text Classification 200 2008 Business http://archive.ics.uci.edu/ml/datasets/Reuters+Transcribed+Subset 164 Bag of Words This data set contains five text collections in the form of bags-of-words. Text Clustering Integer 8000000 100000 2008 Other http://archive.ics.uci.edu/ml/datasets/Bag+of+Words 165 Concrete Compressive Strength Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients. Multivariate Regression Real 1030 9 2007 Physical http://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength 166 Hill-Valley Each record represents 100 points on a two-dimensional graph. When plotted in order (from 1 through 100) as the Y co-ordinate, the points will create either a Hill (a “bump” in the terrain) or a Valley (a “dip” in the terrain). Sequential Classification Real 606 101 2008 Other http://archive.ics.uci.edu/ml/datasets/Hill-Valley 167 Arcene ARCENE's task is to distinguish cancer versus normal patterns from mass-spectrometric data. This is a two-class classification problem with continuous input variables. This dataset is one of 5 datasets of the NIPS 2003 feature selection challenge. Multivariate Classification Real 900 10000 2008 Life http://archive.ics.uci.edu/ml/datasets/Arcene 168 Dexter DEXTER is a text classification problem in a bag-of-word representation. This is a two-class classification problem with sparse continuous input variables. This dataset is one of five datasets of the NIPS 2003 feature selection challenge. Multivariate Classification Integer 2600 20000 2008 Other http://archive.ics.uci.edu/ml/datasets/Dexter 169 Dorothea DOROTHEA is a drug discovery dataset. Chemical compounds represented by structural molecular features must be classified as active (binding to thrombin) or inactive. This is one of 5 datasets of the NIPS 2003 feature selection challenge. Multivariate Classification Integer 1950 100000 2008 Life http://archive.ics.uci.edu/ml/datasets/Dorothea 170 Gisette GISETTE is a handwritten digit recognition problem. The problem is to separate the highly confusible digits '4' and '9'. This dataset is one of five datasets of the NIPS 2003 feature selection challenge. Multivariate Classification Integer 13500 5000 2008 Computer http://archive.ics.uci.edu/ml/datasets/Gisette 171 Madelon MADELON is an artificial dataset, which was part of the NIPS 2003 feature selection challenge. This is a two-class classification problem with continuous input variables. The difficulty is that the problem is multivariate and highly non-linear. Multivariate Classification Real 4400 500 2008 Other http://archive.ics.uci.edu/ml/datasets/Madelon 172 Ozone Level Detection Two ground ozone level data sets are included in this collection. One is the eight hour peak set (eighthr.data), the other is the one hour peak set (onehr.data). Those data were collected from 1998 to 2004 at the Houston, Galveston and Brazoria area. Multivariate, Sequential, Time-Series Classification Real 2536 73 2008 Physical http://archive.ics.uci.edu/ml/datasets/Ozone+Level+Detection 173 Abscisic Acid Signaling Network The objective is to determine the set of boolean rules that describe the interactions of the nodes within this plant signaling network. The dataset includes 300 separate boolean pseudodynamic simulations using an asynchronous update scheme. Multivariate Causal-Discovery Integer 300 43 2008 Life http://archive.ics.uci.edu/ml/datasets/Abscisic+Acid+Signaling+Network 174 Parkinsons Oxford Parkinson's Disease Detection Dataset Multivariate Classification Real 197 23 2008 Life http://archive.ics.uci.edu/ml/datasets/Parkinsons 175 Character Trajectories Multiple, labelled samples of pen tip trajectories recorded whilst writing individual characters. All samples are from the same writer, for the purposes of primitive extraction. Only characters with a single pen-down segment were considered. Time-Series Classification, Clustering Real 2858 3 2008 Computer http://archive.ics.uci.edu/ml/datasets/Character+Trajectories 176 Blood Transfusion Service Center Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan -- this is a classification problem. Multivariate Classification Real 748 5 2008 Business http://archive.ics.uci.edu/ml/datasets/Blood+Transfusion+Service+Center 177 UJI Pen Characters (Version 2) A pen-based database with more than 11k isolated handwritten characters Multivariate, Sequential Classification Integer 11640 2009 Computer http://archive.ics.uci.edu/ml/datasets/UJI+Pen+Characters+%28Version+2%29 178 Semeion Handwritten Digit 1593 handwritten digits from around 80 persons were scanned, stretched in a rectangular box 16x16 in a gray scale of 256 values. Multivariate Classification Integer 1593 256 2008 Computer http://archive.ics.uci.edu/ml/datasets/Semeion+Handwritten+Digit 179 SECOM Data from a semi-conductor manufacturing process Multivariate Classification, Causal-Discovery Real 1567 591 2008 Computer http://archive.ics.uci.edu/ml/datasets/SECOM 180 Plants Data has been extracted from the USDA plants database. It contains all plants (species and genera) in the database and the states of USA and Canada where they occur. Multivariate Clustering Categorical 22632 70 2008 Life http://archive.ics.uci.edu/ml/datasets/Plants 181 Libras Movement The data set contains 15 classes of 24 instances each. Each class references to a hand movement type in LIBRAS (Portuguese name 'LÍngua BRAsileira de Sinais', oficial brazilian signal language). Multivariate, Sequential Classification, Clustering Real 360 91 2009 Other http://archive.ics.uci.edu/ml/datasets/Libras+Movement 182 Concrete Slump Test Concrete is a highly complex material. The slump flow of concrete is not only determined by the water content, but that is also influenced by other concrete ingredients. Multivariate Regression Real 103 10 2009 Computer http://archive.ics.uci.edu/ml/datasets/Concrete+Slump+Test 183 Communities and Crime Communities within the United States. The data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR. Multivariate Regression Real 1994 128 2009 Social http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime 184 Acute Inflammations The data was created by a medical expert as a data set to test the expert system, which will perform the presumptive diagnosis of two diseases of the urinary system. Multivariate Classification Categorical, Integer 120 6 2009 Life http://archive.ics.uci.edu/ml/datasets/Acute+Inflammations 185 Wine Quality Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests (see [Cortez et al., 2009], http://www3.dsi.uminho.pt/pcortez/wine/). Multivariate Classification, Regression Real 4898 12 2009 Business http://archive.ics.uci.edu/ml/datasets/Wine+Quality 186 URL Reputation Anonymized 120-day subset of the ICML-09 URL data containing 2.4 million examples and 3.2 million features. Multivariate, Time-Series Classification Integer, Real 2396130 3231961 2009 Computer http://archive.ics.uci.edu/ml/datasets/URL+Reputation 187 p53 Mutants The goal is to model mutant p53 transcriptional activity (active vs inactive) based on data extracted from biophysical simulations. Multivariate Classification Real 16772 5409 2010 Life http://archive.ics.uci.edu/ml/datasets/p53+Mutants 188 Parkinsons Telemonitoring Oxford Parkinson's Disease Telemonitoring Dataset Multivariate Regression Integer, Real 5875 26 2009 Life http://archive.ics.uci.edu/ml/datasets/Parkinsons+Telemonitoring 189 Demospongiae Marine sponges of the Demospongiae class classification domain. Multivariate Classification Integer 503 2010 Life http://archive.ics.uci.edu/ml/datasets/Demospongiae 190 Opinosis Opinion ⁄ Review This dataset contains sentences extracted from user reviews on a given topic. Example topics are “performance of Toyota Camry” and “sound quality of ipod nano”. Text 51 2010 Computer http://archive.ics.uci.edu/ml/datasets/Opinosis+Opinion+%26frasl%3B+Review 191 Breast Tissue Dataset with electrical impedance measurements of freshly excised tissue samples from the breast. Multivariate Classification Real 106 10 2010 Life http://archive.ics.uci.edu/ml/datasets/Breast+Tissue 192 Cardiotocography The dataset consists of measurements of fetal heart rate (FHR) and uterine contraction (UC) features on cardiotocograms classified by expert obstetricians. Multivariate Classification Real 2126 23 2010 Life http://archive.ics.uci.edu/ml/datasets/Cardiotocography 193 Wall-Following Robot Navigation Data The data were collected as the SCITOS G5 robot navigates through the room following the wall in a clockwise direction, for 4 rounds, using 24 ultrasound sensors arranged circularly around its 'waist'. Multivariate, Sequential Classification Real 5456 24 2010 Computer http://archive.ics.uci.edu/ml/datasets/Wall-Following+Robot+Navigation+Data 194 Spoken Arabic Digit This dataset contains timeseries of mel-frequency cepstrum coefficients (MFCCs) corresponding to spoken Arabic digits. Includes data from 44 male and 44 female native Arabic speakers. Multivariate, Time-Series Classification Real 8800 13 2010 Other http://archive.ics.uci.edu/ml/datasets/Spoken+Arabic+Digit 195 Localization Data for Person Activity Data contains recordings of five people performing different activities. Each person wore four sensors (tags) while performing the same scenario five times. Univariate, Sequential, Time-Series Classification Real 164860 8 2010 Life http://archive.ics.uci.edu/ml/datasets/Localization+Data+for+Person+Activity 196 AutoUniv AutoUniv is an advanced data generator for classifications tasks. The aim is to reflect the nuances and heterogeneity of real data. Data can be generated in .csv, ARFF or C4.5 formats. Multivariate Classification Categorical, Integer, Real 2010 Other http://archive.ics.uci.edu/ml/datasets/AutoUniv 197 Steel Plates Faults A dataset of steel plates’ faults, classified into 7 different types. The goal was to train machine learning for automatic pattern recognition. Multivariate Classification Integer, Real 1941 27 2010 Physical http://archive.ics.uci.edu/ml/datasets/Steel+Plates+Faults 198 MiniBooNE particle identification This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos (signal) from muon neutrinos (background). Multivariate Classification Real 130065 50 2010 Physical http://archive.ics.uci.edu/ml/datasets/MiniBooNE+particle+identification 199 YearPredictionMSD Prediction of the release year of a song from audio features. Songs are mostly western, commercial tracks ranging from 1922 to 2011, with a peak in the year 2000s. Multivariate Regression Real 515345 90 2011 Other http://archive.ics.uci.edu/ml/datasets/YearPredictionMSD 200 PEMS-SF 15 months worth of daily data (440 daily records) that describes the occupancy rate, between 0 and 1, of different car lanes of the San Francisco bay area freeways across time. Multivariate, Time-Series Classification Real 440 138672 2011 Computer http://archive.ics.uci.edu/ml/datasets/PEMS-SF 201 OpinRank Review Dataset This data set contains user reviews of cars and and hotels collected from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews). Text 2011 Computer http://archive.ics.uci.edu/ml/datasets/OpinRank+Review+Dataset 202 Relative location of CT slices on axial axis The dataset consists of 384 features extracted from CT images. The class variable is numeric and denotes the relative location of the CT slice on the axial axis of the human body. Domain-Theory Regression Real 53500 386 2011 Computer http://archive.ics.uci.edu/ml/datasets/Relative+location+of+CT+slices+on+axial+axis 203 Online Handwritten Assamese Characters Dataset This is a dataset of 8235 online handwritten assamese characters. The “online” process involves capturing of data as text is written on a digitizing tablet with an electronic pen. Multivariate, Sequential Classification Integer 8235 2011 Computer http://archive.ics.uci.edu/ml/datasets/Online+Handwritten+Assamese+Characters+Dataset 204 PubChem Bioassay Data These highly imbalanced bioassay datasets are from the differing types of screening that can be performed using HTS technology. 21 datasets were created from 12 bioassays. Multivariate Classification Integer, Real 2011 Life http://archive.ics.uci.edu/ml/datasets/PubChem+Bioassay+Data 205 Record Linkage Comparison Patterns Element-wise comparison of records with personal data from a record linkage setting. The task is to decide from a comparison pattern whether the underlying records belong to one person. Multivariate Classification Real 5749132 12 2011 Other http://archive.ics.uci.edu/ml/datasets/Record+Linkage+Comparison+Patterns 206 Communities and Crime Unnormalized Communities in the US. Data combines socio-economic data from the '90 Census, law enforcement data from the 1990 Law Enforcement Management and Admin Stats survey, and crime data from the 1995 FBI UCR Multivariate Regression Real 2215 147 2011 Social http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime+Unnormalized 207 Vertebral Column Data set containing values for six biomechanical features used to classify orthopaedic patients into 3 classes (normal, disk hernia or spondilolysthesis) or 2 classes (normal or abnormal). Multivariate Classification Real 310 6 2011 http://archive.ics.uci.edu/ml/datasets/Vertebral+Column 208 EMG Physical Action Data Set The Physical Action Data Set includes 10 normal and 10 aggressive physical actions that measure the human activity. The data have been collected by 4 subjects using the Delsys EMG wireless apparatus. Time-Series Classification Real 10000 8 2011 Physical http://archive.ics.uci.edu/ml/datasets/EMG+Physical+Action+Data+Set 209 Vicon Physical Action Data Set The Physical Action Data Set includes 10 normal and 10 aggressive physical actions that measure the human activity. The data have been collected by 10 subjects using the Vicon 3D tracker. Time-Series Classification Real 3000 27 2011 Physical http://archive.ics.uci.edu/ml/datasets/Vicon+Physical+Action+Data+Set 210 Amazon Commerce reviews set The dataset is used for authorship identification in online Writeprint which is a new research field of pattern recognition. Multivariate, Text, Domain-Theory Classification Real 1500 10000 2011 Physical http://archive.ics.uci.edu/ml/datasets/Amazon+Commerce+reviews+set 211 Amazon Access Samples Amazon's InfoSec is getting smarter about the way Access data is leveraged. This is an anonymized sample of access provisioned within the company. Time-Series, Domain-Theory Regression, Clustering, Causal-Discovery 30000 20000 2011 Business http://archive.ics.uci.edu/ml/datasets/Amazon+Access+Samples 212 Reuter_50_50 The dataset is used for authorship identification in online Writeprint which is a new research field of pattern recognition. Multivariate, Text, Domain-Theory Classification, Clustering Real 2500 10000 2011 Computer http://archive.ics.uci.edu/ml/datasets/Reuter_50_50 213 Farm Ads This data was collected from text ads found on twelve websites that deal with various farm animal related topics. The binary labels are based on whether or not the content owner approves of the ad. Text Classification 4143 54877 2011 Business http://archive.ics.uci.edu/ml/datasets/Farm+Ads 214 DBWorld e-mails It contains 64 e-mails which I have manually collected from DBWorld mailing list. They are classified in: 'announces of conferences' and 'everything else'. Text Classification 64 4702 2011 Computer http://archive.ics.uci.edu/ml/datasets/DBWorld+e-mails 215 KEGG Metabolic Relation Network (Directed) KEGG Metabolic pathways modeled as directed relation network. Variety of graphical features presented. Multivariate, Univariate, Text Classification, Regression, Clustering Integer, Real 53414 24 2011 Life http://archive.ics.uci.edu/ml/datasets/KEGG+Metabolic+Relation+Network+%28Directed%29 216 KEGG Metabolic Reaction Network (Undirected) KEGG Metabolic pathways modeled as un-directed reaction network. Variety of graphical features presented. Multivariate, Univariate, Text Classification, Regression, Clustering Integer, Real 65554 29 2011 Life http://archive.ics.uci.edu/ml/datasets/KEGG+Metabolic+Reaction+Network+%28Undirected%29 217 Bank Marketing The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y). Multivariate Classification Real 45211 17 2012 Business http://archive.ics.uci.edu/ml/datasets/Bank+Marketing 218 YouTube Comedy Slam Preference Data This dataset provides user vote data on which video from a pair of videos is funnier collected on YouTube Comedy Slam. The task is to automatically predict this preference based on video metadata. Text Classification 1138562 3 2012 Computer http://archive.ics.uci.edu/ml/datasets/YouTube+Comedy+Slam+Preference+Data 219 Gas Sensor Array Drift Dataset This archive contains 13910 measurements from 16 chemical sensors utilized in simulations for drift compensation in a discrimination task of 6 gases at various levels of concentrations. Multivariate Classification Real 13910 128 2012 Computer http://archive.ics.uci.edu/ml/datasets/Gas+Sensor+Array+Drift+Dataset 220 ILPD (Indian Liver Patient Dataset) This data set contains 10 variables that are age, gender, total Bilirubin, direct Bilirubin, total proteins, albumin, A/G ratio, SGPT, SGOT and Alkphos. Multivariate Classification Integer, Real 583 10 2012 Life http://archive.ics.uci.edu/ml/datasets/ILPD+%28Indian+Liver+Patient+Dataset%29 221 OPPORTUNITY Activity Recognition The OPPORTUNITY Dataset for Human Activity Recognition from Wearable, Object, and Ambient Sensors is a dataset devised to benchmark human activity recognition algorithms (classification, automatic data segmentation, sensor fusion, feature extraction, etc). Multivariate, Time-Series Classification Real 2551 242 2012 Computer http://archive.ics.uci.edu/ml/datasets/OPPORTUNITY+Activity+Recognition 222 Nomao Nomao collects data about places (name, phone, localization...) from many sources. Deduplication consists in detecting what data refer to the same place. Instances in the dataset compare 2 spots. Univariate Classification Real 34465 120 2012 Computer http://archive.ics.uci.edu/ml/datasets/Nomao 223 SMS Spam Collection The SMS Spam Collection is a public set of SMS labeled messages that have been collected for mobile phone spam research. Multivariate, Text, Domain-Theory Classification, Clustering Real 5574 2012 Computer http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection 224 Skin Segmentation The Skin Segmentation dataset is constructed over B, G, R color space. Skin and Nonskin dataset is generated using skin textures from face images of diversity of age, gender, and race people. Univariate Classification Real 245057 4 2012 Computer http://archive.ics.uci.edu/ml/datasets/Skin+Segmentation 225 Planning Relax The dataset concerns with the classification of two mental stages from recorded EEG signals: Planning (during imagination of motor act) and Relax state. Univariate Classification Real 182 13 2012 Computer http://archive.ics.uci.edu/ml/datasets/Planning+Relax 226 PAMAP2 Physical Activity Monitoring The PAMAP2 Physical Activity Monitoring dataset contains data of 18 different physical activities, performed by 9 subjects wearing 3 inertial measurement units and a heart rate monitor. Multivariate, Time-Series Classification Real 3850505 52 2012 Computer http://archive.ics.uci.edu/ml/datasets/PAMAP2+Physical+Activity+Monitoring 227 Restaurant & consumer data The dataset was obtained from a recommender system prototype. The task was to generate a top-n list of restaurants according to the consumer preferences. Multivariate 138 47 2012 Computer http://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data 228 CNAE-9 This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a subset of 9 categories Multivariate, Text Classification Integer 1080 857 2012 Business http://archive.ics.uci.edu/ml/datasets/CNAE-9 229 Individual household electric power consumption Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available. Multivariate, Time-Series Regression, Clustering Real 2075259 9 2012 Physical http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption 230 seeds Measurements of geometrical properties of kernels belonging to three different varieties of wheat. A soft X-ray technique and GRAINS package were used to construct all seven, real-valued attributes. Multivariate Classification, Clustering Real 210 7 2012 Life http://archive.ics.uci.edu/ml/datasets/seeds 231 Northix Northix is designed to be a schema matching benchmark problem for data integration of two entity relationship databases. Multivariate, Univariate, Text Classification Integer, Real 115 200 2012 Computer http://archive.ics.uci.edu/ml/datasets/Northix 232 QtyT40I10D100K Since there is no numerical sequential data stream available in standard data sets, this data set is generated from the original T40I10D100K data set Sequential Integer 3960456 4 2012 http://archive.ics.uci.edu/ml/datasets/QtyT40I10D100K 233 Legal Case Reports A textual corpus of 4000 legal cases for automatic summarization and citation analysis. For each document we collect catchphrases, citations sentences, citation catchphrases and citation classes. Text Classification 2012 Other http://archive.ics.uci.edu/ml/datasets/Legal+Case+Reports 234 Human Activity Recognition Using Smartphones Human Activity Recognition database built from the recordings of 30 subjects performing activities of daily living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors. Multivariate, Time-Series Classification, Clustering 10299 561 2012 Computer http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones 235 One-hundred plant species leaves data set Sixteen samples of leaf each of one-hundred plant species. For each sample, a shape descriptor, fine scale margin and texture histogram are given. Classification Real 1600 64 2012 Life http://archive.ics.uci.edu/ml/datasets/One-hundred+plant+species+leaves+data+set 236 Energy efficiency This study looked into assessing the heating load and cooling load requirements of buildings (that is, energy efficiency) as a function of building parameters. Multivariate Classification, Regression Integer, Real 768 8 2012 Computer http://archive.ics.uci.edu/ml/datasets/Energy+efficiency 237 Yacht Hydrodynamics Delft data set, used to predict the hydodynamic performance of sailing yachts from dimensions and velocity. Multivariate Regression Real 308 7 2013 Physical http://archive.ics.uci.edu/ml/datasets/Yacht+Hydrodynamics 238 Fertility 100 volunteers provide a semen sample analyzed according to the WHO 2010 criteria. Sperm concentration are related to socio-demographic data, environmental factors, health status, and life habits Multivariate Classification, Regression Real 100 10 2013 Life http://archive.ics.uci.edu/ml/datasets/Fertility 239 Daphnet Freezing of Gait This dataset contains the annotated readings of 3 acceleration sensors at the hip and leg of Parkinson's disease patients that experience freezing of gait (FoG) during walking tasks. Multivariate, Time-Series Classification Real 237 9 2013 Life http://archive.ics.uci.edu/ml/datasets/Daphnet+Freezing+of+Gait 240 3D Road Network (North Jutland, Denmark) 3D road network with highly accurate elevation information (+-20cm) from Denmark used in eco-routing and fuel/Co2-estimation routing algorithms. Sequential, Text Regression, Clustering Real 434874 4 2013 Computer http://archive.ics.uci.edu/ml/datasets/3D+Road+Network+%28North+Jutland%2C+Denmark%29 241 ISTANBUL STOCK EXCHANGE Data sets includes returns of Istanbul Stock Exchange with seven other international index; SP, DAX, FTSE, NIKKEI, BOVESPA, MSCE_EU, MSCI_EM from Jun 5, 2009 to Feb 22, 2011. Multivariate, Univariate, Time-Series Classification, Regression Real 536 8 2013 Business http://archive.ics.uci.edu/ml/datasets/ISTANBUL+STOCK+EXCHANGE 242 Buzz in social media This data-set contains examples of buzz events from two different social networks: Twitter, and Tom's Hardware, a forum network focusing on new technology with more conservative dynamics. Time-Series, Multivariate Regression, Classification Integer, Real 140000 77 2013 Computer http://archive.ics.uci.edu/ml/datasets/Buzz+in+social+media+ 243 First-order theorem proving Given a theorem, predict which of five heuristics will give the fastest proof when used by a first-order prover. A sixth prediction declines to attempt a proof, should the theorem be too difficult. Multivariate Classification Real 6118 51 2013 Computer http://archive.ics.uci.edu/ml/datasets/First-order+theorem+proving 244 Wearable Computing: Classification of Body Postures and Movements (PUC-Rio) A dataset with 5 classes (sitting-down, standing-up, standing, walking, and sitting) collected on 8 hours of activities of 4 healthy subjects. We also established a baseline performance index. Sequential Classification Integer, Real 165632 18 2013 Computer http://archive.ics.uci.edu/ml/datasets/Wearable+Computing%3A+Classification+of+Body+Postures+and+Movements+%28PUC-Rio%29 245 Gas sensor arrays in open sampling settings The dataset contains 18000 time-series recordings from a chemical detection platform at six different locations in a wind tunnel facility in response to ten high-priority chemical gaseous substances Multivariate, Time-Series Classification Real 18000 1950000 2013 Computer http://archive.ics.uci.edu/ml/datasets/Gas+sensor+arrays+in+open+sampling+settings 246 Climate Model Simulation Crashes Given Latin hypercube samples of 18 climate model input parameter values, predict climate model simulation crashes and determine the parameter value combinations that cause the failures. Multivariate Classification Real 540 18 2013 Physical http://archive.ics.uci.edu/ml/datasets/Climate+Model+Simulation+Crashes 247 MicroMass A dataset to explore machine learning approaches for the identification of microorganisms from mass-spectrometry data. Multivariate Classification Real 931 1300 2013 Life http://archive.ics.uci.edu/ml/datasets/MicroMass 248 QSAR biodegradation Data set containing values for 41 attributes (molecular descriptors) used to classify 1055 chemicals into 2 classes (ready and not ready biodegradable). Multivariate Classification Integer, Real 1055 41 2013 Other http://archive.ics.uci.edu/ml/datasets/QSAR+biodegradation 249 BLOGGER In this paper, we look for to recognize the causes of users tend to cyber space in Kohkiloye and Boyer Ahmad Province in Iran Multivariate Classification 100 6 2013 Computer http://archive.ics.uci.edu/ml/datasets/BLOGGER 250 Daily and Sports Activities The dataset comprises motion sensor data of 19 daily and sports activities each performed by 8 subjects in their own style for 5 minutes. Five Xsens MTx units are used on the torso, arms, and legs. Multivariate, Time-Series Classification, Clustering Real 9120 5625 2013 Computer http://archive.ics.uci.edu/ml/datasets/Daily+and+Sports+Activities 251 User Knowledge Modeling It is the real dataset about the students' knowledge status about the subject of Electrical DC Machines. The dataset had been obtained from Ph.D. Thesis. Multivariate Classification, Clustering Integer 403 5 2013 Computer http://archive.ics.uci.edu/ml/datasets/User+Knowledge+Modeling 252 Reuters RCV1 RCV2 Multilingual, Multiview Text Categorization Test collection This test collection contains feature characteristics of documents originally written in five different languages and their translations, over a common set of 6 categories. Multivariate Classification Real 111740 2013 Life http://archive.ics.uci.edu/ml/datasets/Reuters+RCV1+RCV2+Multilingual%2C+Multiview+Text+Categorization+Test+collection 253 NYSK NYSK (New York v. Strauss-Kahn) is a collection of English news articles about the case relating to allegations of sexual assault against the former IMF director Dominique Strauss-Kahn (May 2011). Multivariate, Sequential, Text Clustering 10421 7 2013 Social http://archive.ics.uci.edu/ml/datasets/NYSK 254 Turkiye Student Evaluation This data set contains a total 5820 evaluation scores provided by students from Gazi University in Ankara (Turkey). There is a total of 28 course specific questions and additional 5 attributes. Multivariate Classification, Clustering 5820 33 2013 Other http://archive.ics.uci.edu/ml/datasets/Turkiye+Student+Evaluation 255 ser Knowledge Modeling Data (Students' Knowledge Levels on DC Electrical Machines) The dataset is about the users' learning activities and knowledge levels on subjects of DC Electrical Machines. The dataset had been obtained from online web-courses and reported in my Ph.D. Thesis. Multivariate Classification Real 403 5 2013 Computer http://archive.ics.uci.edu/ml/datasets/ser+Knowledge+Modeling+Data+%28Students%27+Knowledge+Levels+on+DC+Electrical+Machines%29 256 EEG Eye State The data set consists of 14 EEG values and a value indicating the eye state. Multivariate, Sequential, Time-Series Classification Integer, Real 14980 15 2013 Life http://archive.ics.uci.edu/ml/datasets/EEG+Eye+State 257 Physicochemical Properties of Protein Tertiary Structure This is a data set of Physicochemical Properties of Protein Tertiary Structure. The data set is taken from CASP 5-9. There are 45730 decoys and size varying from 0 to 21 armstrong. Multivariate Regression Real 45730 9 2013 Life http://archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+Protein+Tertiary+Structure 258 seismic-bumps The data describe the problem of high energy (higher than 10^4 J) seismic bumps forecasting in a coal mine. Data come from two of longwalls located in a Polish coal mine. Multivariate Classification Real 2584 19 2013 Other http://archive.ics.uci.edu/ml/datasets/seismic-bumps 259 banknote authentication Data were extracted from images that were taken for the evaluation of an authentication procedure for banknotes. Multivariate Classification Real 1372 5 2013 Computer http://archive.ics.uci.edu/ml/datasets/banknote+authentication 260 USPTO Algorithm Challenge, run by NASA-Harvard Tournament Lab and TopCoder Problem: Pat Data used for USPTO Algorithm Competition. Contains drawing pages from US patents with manually labeled figure and part labels. Domain-Theory Classification Integer 306 5 2013 Other http://archive.ics.uci.edu/ml/datasets/USPTO+Algorithm+Challenge%2C+run+by+NASA-Harvard+Tournament+Lab+and+TopCoder++++Problem%3A+Pat 261 YouTube Multiview Video Games Dataset This dataset contains about 120k instances, each described by 13 feature types, with class information, specially useful for exploring multiview topics (cotraining, ensembles, clustering,..). Multivariate, Text Classification, Clustering Integer, Real 120000 1000000 2013 Computer http://archive.ics.uci.edu/ml/datasets/YouTube+Multiview+Video+Games+Dataset 262 Gas Sensor Array Drift Dataset at Different Concentrations This archive contains 13910 measurements from 16 chemical sensors exposed to 6 different gases at various concentration levels. Multivariate, Time-Series Classification, Regression, Clustering, Causa Real 13910 129 2013 Computer http://archive.ics.uci.edu/ml/datasets/Gas+Sensor+Array+Drift+Dataset+at+Different+Concentrations 263 Activities of Daily Living (ADLs) Recognition Using Binary Sensors This dataset comprises information regarding the ADLs performed by two users on a daily basis in their own homes. Multivariate, Sequential, Time-Series Classification, Clustering 2747 2013 Computer http://archive.ics.uci.edu/ml/datasets/Activities+of+Daily+Living+%28ADLs%29+Recognition+Using+Binary+Sensors 264 SkillCraft1 Master Table Dataset This data was used in Thompson et al. (2013). A list of possible game actions is discussed in Thompson, Blair, Chen, & Henrey (2013). Multivariate Regression Integer, Real 3395 20 2013 Game http://archive.ics.uci.edu/ml/datasets/SkillCraft1+Master+Table+Dataset 265 Weight Lifting Exercises monitored with Inertial Measurement Units Six young health subjects were asked to perform 5 variations of the biceps curl weight lifting exercise. One of the variations is the one predicted by the health professional. Multivariate Classification Real 39242 152 2013 Physical http://archive.ics.uci.edu/ml/datasets/Weight+Lifting+Exercises+monitored+with+Inertial+Measurement+Units 266 SML2010 This dataset is collected from a monitor system mounted in a domotic house. It corresponds to approximately 40 days of monitoring data. Multivariate, Sequential, Time-Series, Text Regression Real 4137 24 2014 Computer http://archive.ics.uci.edu/ml/datasets/SML2010 267 Bike Sharing Dataset This dataset contains the hourly and daily count of rental bikes between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information. Univariate Regression Integer, Real 17389 16 2013 Social http://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset 268 Predict keywords activities in a online social media The data from Twitter was collected during 360 consecutive days. It was done by querying 1497 English keywords sampled from Wikipedia. This dataset is proposed in a Learning to rank setting. Multivariate, Sequential, Time-Series Integer, Real 51 35 2013 Computer http://archive.ics.uci.edu/ml/datasets/Predict+keywords+activities+in+a+online+social+media 269 Thoracic Surgery Data The data is dedicated to classification problem related to the post-operative life expectancy in the lung cancer patients: class 1 - death within one year after surgery, class 2 - survival. Multivariate Classification Integer, Real 470 17 2013 Life http://archive.ics.uci.edu/ml/datasets/Thoracic+Surgery+Data 270 EMG dataset in Lower Limb 3 different exercises: sitting, standing and walking in the muscles: biceps femoris, vastus medialis, rectus femoris and semitendinosus addition to goniometry in the exercises. Multivariate, Time-Series Real 132 5 2014 Computer http://archive.ics.uci.edu/ml/datasets/EMG+dataset+in+Lower+Limb 271 SUSY This is a classification problem to distinguish between a signal process which produces supersymmetric particles and a background process which does not. Classification Real 5000000 18 2014 Physical http://archive.ics.uci.edu/ml/datasets/SUSY 272 HIGGS This is a classification problem to distinguish between a signal process which produces Higgs bosons and a background process which does not. Classification Real 11000000 28 2014 Physical http://archive.ics.uci.edu/ml/datasets/HIGGS 273 Qualitative_Bankruptcy Predict the Bankruptcy from Qualitative parameters from experts. Multivariate Classification 250 7 2014 Computer http://archive.ics.uci.edu/ml/datasets/Qualitative_Bankruptcy 274 LSVT Voice Rehabilitation 126 samples from 14 participants, 309 features. Aim: assess whether voice rehabilitation treatment lead to phonations considered 'acceptable' or 'unacceptable' (binary class classification problem). Multivariate Classification Real 126 309 2014 Life http://archive.ics.uci.edu/ml/datasets/LSVT+Voice+Rehabilitation 275 Dataset for ADL Recognition with Wrist-worn Accelerometer Recordings of 16 volunteers performing 14 Activities of Daily Living (ADL) while carrying a single wrist-worn tri-axial accelerometer. Multivariate, Time-Series Classification, Clustering 3 2014 Computer http://archive.ics.uci.edu/ml/datasets/Dataset+for+ADL+Recognition+with+Wrist-worn+Accelerometer 276 Wilt High-resolution Remote Sensing data set (Quickbird). Small number of training samples of diseased trees, large number for other land cover. Testing data set from stratified random sample of image. Multivariate Classification 4889 6 2014 Life http://archive.ics.uci.edu/ml/datasets/Wilt 277 User Identification From Walking Activity The dataset collects data from an Android smartphone positioned in the chest pocket from 22 participants walking in the wild over a predefined path. Univariate, Sequential, Time-Series Classification, Clustering Real 2014 Other http://archive.ics.uci.edu/ml/datasets/User+Identification+From+Walking+Activity 278 Activity Recognition from Single Chest-Mounted Accelerometer The dataset collects data from a wearable accelerometer mounted on the chest. The dataset is intended for Activity Recognition research purposes. Univariate, Sequential, Time-Series Classification, Clustering Real 2014 Other http://archive.ics.uci.edu/ml/datasets/Activity+Recognition+from+Single+Chest-Mounted+Accelerometer 279 Leaf This dataset consists in a collection of shape and texture features extracted from digital images of leaf specimens originating from a total of 40 different plant species. Multivariate Classification Real 340 16 2014 Computer http://archive.ics.uci.edu/ml/datasets/Leaf 280 Dresses_Attribute_Sales This dataset contain Attributes of dresses and their recommendations according to their sales.Sales are monitor on the basis of alternate days. Text Classification, Clustering 501 13 2014 Computer http://archive.ics.uci.edu/ml/datasets/Dresses_Attribute_Sales 281 Tamilnadu Electricity Board Hourly Readings This data can be effectively produced the result to fewer parameter of the Load profile can be reduced in the Database Multivariate Classification, Regression, Clustering Real 45781 5 2013 Life http://archive.ics.uci.edu/ml/datasets/Tamilnadu+Electricity+Board+Hourly+Readings 282 Airfoil Self-Noise NASA data set, obtained from a series of aerodynamic and acoustic tests of two and three-dimensional airfoil blade sections conducted in an anechoic wind tunnel. Multivariate Regression Real 1503 6 2014 Physical http://archive.ics.uci.edu/ml/datasets/Airfoil+Self-Noise 283 Wholesale customers The data set refers to clients of a wholesale distributor. It includes the annual spending in monetary units (m.u.) on diverse product categories Multivariate Classification, Clustering Integer 440 8 2014 Business http://archive.ics.uci.edu/ml/datasets/Wholesale+customers 284 Twitter Data set for Arabic Sentiment Analysis This problem of Sentiment Analysis (SA) has been studied well on the English language but not Arabic one. Two main approaches have been devised: corpus-based and lexicon-based. Text Classification 2000 2 2014 Social http://archive.ics.uci.edu/ml/datasets/Twitter+Data+set+for+Arabic+Sentiment+Analysis 285 Combined Cycle Power Plant The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the plant was set to work with full load. Multivariate Regression Real 9568 4 2014 Computer http://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant 286 Urban Land Cover Classification of urban land cover using high resolution aerial imagery. Intended to assist sustainable urban planning efforts. Multivariate Classification 168 148 2014 Physical http://archive.ics.uci.edu/ml/datasets/Urban+Land+Cover 287 Diabetes 130-US hospitals for years 1999-2008 This data has been prepared to analyze factors related to readmission as well as other outcomes pertaining to patients with diabetes. Multivariate Classification, Clustering Integer 100000 55 2014 Life http://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008 288 Bach Choral Harmony The data set is composed of 60 chorales (5665 events) by J.S. Bach (1675-1750). Each event of each chorale is labelled using 1 among 101 chord labels and described through 14 features. Sequential Classification 5665 17 2014 Other http://archive.ics.uci.edu/ml/datasets/Bach+Choral+Harmony 289 StoneFlakes Stone flakes are waste products of the stone tool production in the prehistoric era. The variables are means of geometric and stylistic features of the flakes contained in different inventories. Multivariate Classification, Clustering, Causal-Discovery Real 79 8 2014 Other http://archive.ics.uci.edu/ml/datasets/StoneFlakes 290 Tennis Major Tournament Match Statistics This is a collection of 8 files containing the match statistics for both women and men at the four major tennis tournaments of the year 2013. Each file has 42 columns and a minimum of 76 rows. Multivariate Classification, Regression, Clustering Integer, Real 127 42 2014 Other http://archive.ics.uci.edu/ml/datasets/Tennis+Major+Tournament+Match+Statistics 291 Parkinson Speech Dataset with Multiple Types of Sound Recordings The training data belongs to 20 Parkinson's Disease (PD) patients and 20 healthy subjects. From all subjects, multiple types of sound recordings (26) are taken. Multivariate Classification, Regression Integer, Real 1040 26 2014 Life http://archive.ics.uci.edu/ml/datasets/Parkinson+Speech+Dataset+with++Multiple+Types+of+Sound+Recordings 292 Gesture Phase Segmentation The dataset is composed by features extracted from 7 videos with people gesticulating, aiming at studying Gesture Phase Segmentation. It contains 50 attributes divided into two files for each video. Multivariate, Sequential, Time-Series Classification, Clustering Real 9900 50 2014 Other http://archive.ics.uci.edu/ml/datasets/Gesture+Phase+Segmentation 293 Perfume Data This data consists of odors of 20 different perfumes. Data was obtained by using a handheld odor meter (OMX-GR sensor) per second for 28 seconds period. Univariate, Domain-Theory Classification, Clustering Integer 560 2 2014 Computer http://archive.ics.uci.edu/ml/datasets/Perfume+Data 294 BlogFeedback Instances in this dataset contain features extracted from blog posts. The task associated with the data is to predict how many comments the post will receive. Multivariate Regression Integer, Real 60021 281 2014 Social http://archive.ics.uci.edu/ml/datasets/BlogFeedback 295 REALDISP Activity Recognition Dataset The REALDISP dataset is devised to evaluate techniques dealing with the effects of sensor displacement in wearable activity recognition as well as to benchmark general activity recognition algorithms Multivariate, Time-Series Classification Real 1419 120 2014 Computer http://archive.ics.uci.edu/ml/datasets/REALDISP+Activity+Recognition+Dataset 296 Newspaper and magazine images segmentation dataset Dataset is well suited for segmentation tasks. It contains 101 scanned pages from different newspapers and magazines in Russian with ground truth pixel-based masks. Classification 101 2014 Computer http://archive.ics.uci.edu/ml/datasets/Newspaper+and+magazine+images+segmentation+dataset 297 AAAI 2014 Accepted Papers This data set compromises the metadata for the 2014 AAAI conference's accepted papers, including paper titles, authors, abstracts, and keywords of varying granularity. Multivariate Clustering 399 6 2014 Computer http://archive.ics.uci.edu/ml/datasets/AAAI+2014+Accepted+Papers 298 Gas sensor array under flow modulation The data set contains 58 time series acquired from 16 chemical sensors under gas flow modulation conditions. The sensors were exposed to different gaseous binary mixtures of acetone and ethanol. Multivariate, Time-Series Classification, Regression Real 58 120432 2014 Computer http://archive.ics.uci.edu/ml/datasets/Gas+sensor+array+under+flow+modulation 299 Gas sensor array exposed to turbulent gas mixtures A chemical detection platform composed of 8 chemoresistive gas sensors was exposed to turbulent gas mixtures generated naturally in a wind tunnel. The acquired time series of the sensors are provided. Multivariate, Time-Series Classification, Regression Real 180 150000 2014 Computer http://archive.ics.uci.edu/ml/datasets/Gas+sensor+array+exposed+to+turbulent+gas+mixtures 300 UJIIndoorLoc The UJIIndoorLoc is a Multi-Building Multi-Floor indoor localization database to test Indoor Positioning System that rely on WLAN/WiFi fingerprint. Multivariate Classification, Regression Integer, Real 21048 529 2014 Computer http://archive.ics.uci.edu/ml/datasets/UJIIndoorLoc 301 Sentence Classification Contains sentences from the abstract and introduction of 30 articles annotated with a modified Argumentative Zones annotation scheme. These articles come from biology, machine learning and psychology. Text Classification Integer 2014 Other http://archive.ics.uci.edu/ml/datasets/Sentence+Classification 302 Dow Jones Index This dataset contains weekly data for the Dow Jones Industrial Index. It has been used in computational investing research. Time-Series Classification, Clustering Integer, Real 750 16 2014 Business http://archive.ics.uci.edu/ml/datasets/Dow+Jones+Index 303 sEMG for Basic Hand movements The “sEMG for Basic Hand movements” includes 2 databases of surface electromyographic signals of 6 hand movements using Delsys' EMG System. Healthy subjects conducted six daily life grasps. Time-Series Classification Real 3000 2500 2014 Life http://archive.ics.uci.edu/ml/datasets/sEMG+for+Basic+Hand+movements 304 AAAI 2013 Accepted Papers This data set compromises the metadata for the 2013 AAAI conference's accepted papers (main track only), including paper titles, abstracts, and keywords of varying granularity. Multivariate Clustering 150 5 2014 Computer http://archive.ics.uci.edu/ml/datasets/AAAI+2013+Accepted+Papers 305 Geographical Original of Music Instances in this dataset contain audio features extracted from 1059 wave files. The task associated with the data is to predict the geographical origin of music. Multivariate Classification, Regression Real 1059 68 2014 Other http://archive.ics.uci.edu/ml/datasets/Geographical+Original+of+Music 306 Condition Based Maintenance of Naval Propulsion Plants Data have been generated from a sophisticated simulator of a Gas Turbines (GT), mounted on a Frigate characterized by a COmbined Diesel eLectric And Gas (CODLAG) propulsion plant type. Multivariate Regression Real 11934 16 2014 Computer http://archive.ics.uci.edu/ml/datasets/Condition+Based+Maintenance+of+Naval+Propulsion+Plants 307 Grammatical Facial Expressions This dataset supports the development of models that make possible to interpret Grammatical Facial Expressions from Brazilian Sign Language (Libras). Multivariate, Sequential Classification, Clustering Real 27965 100 2014 Computer http://archive.ics.uci.edu/ml/datasets/Grammatical+Facial+Expressions 308 NoisyOffice Corpus intended to do cleaning (or binarization) and enhancement of noisy grayscale printed text images using supervised learning methods. Noisy images and their corresponding ground truth provided. Multivariate Classification, Regression Real 216 216 2015 Computer http://archive.ics.uci.edu/ml/datasets/NoisyOffice 309 MHEALTH Dataset The MHEALTH (Mobile Health) dataset is devised to benchmark techniques dealing with human behavior analysis based on multimodal body sensing. Multivariate, Time-Series Classification Real 120 23 2014 Computer http://archive.ics.uci.edu/ml/datasets/MHEALTH+Dataset 310 Student Performance Predict student performance in secondary education (high school). Multivariate Classification, Regression Integer 649 33 2014 Social http://archive.ics.uci.edu/ml/datasets/Student+Performance 311 ElectricityLoadDiagrams20112014 This data set contains electricity consumption of 370 points/clients. Time-Series Regression, Clustering Real 370 140256 2015 Computer http://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014 312 Gas sensor array under dynamic gas mixtures The data set contains the recordings of 16 chemical sensors exposed to two dynamic gas mixtures at varying concentrations. For each mixture, signals were acquired continuously during 12 hours. Multivariate, Time-Series Classification, Regression Real 4178504 19 2015 Computer http://archive.ics.uci.edu/ml/datasets/Gas+sensor+array+under+dynamic+gas+mixtures 313 microblogPCU MicroblogPCU data is crawled from sina weibo microblog[http://weibo.com/]. This data can be used to study machine learning methods as well as do some social network research. Multivariate, Univariate, Sequential, Text Classification, Causal-Discovery Integer, Real 221579 20 2015 Computer http://archive.ics.uci.edu/ml/datasets/microblogPCU 314 Firm-Teacher_Clave-Direction_Classification The data are binary attack-point vectors and their clave-direction class(es) according to the partido-alto-based paradigm. Multivariate Classification 10800 20 2015 Other http://archive.ics.uci.edu/ml/datasets/Firm-Teacher_Clave-Direction_Classification 315 Dataset for Sensorless Drive Diagnosis Features are extracted from motor current. The motor has intact and defective components. This results in 11 different classes with different conditions. Multivariate Classification Real 58509 49 2015 Computer http://archive.ics.uci.edu/ml/datasets/Dataset+for+Sensorless+Drive+Diagnosis 316 TV News Channel Commercial Detection Dataset TV Commercials data set consists of standard audio-visual features of video shots extracted from 150 hours of TV news broadcast of 3 Indian and 2 international news channels ( 30 Hours each). Multivariate Classification, Clustering Real 129685 12 2015 Computer http://archive.ics.uci.edu/ml/datasets/TV+News+Channel+Commercial+Detection+Dataset 317 Phishing Websites This dataset collected mainly from: PhishTank archive, MillerSmiles archive, Google’s searching operators. Classification Integer 2456 30 2015 Computer Security http://archive.ics.uci.edu/ml/datasets/Phishing+Websites 318 Greenhouse Gas Observing Network Design an observing network to monitor emissions of a greenhouse gas (GHG) in California given time series of synthetic observations and tracers from weather model simulations. Multivariate, Time-Series Regression Real 2921 5232 2015 Physical http://archive.ics.uci.edu/ml/datasets/Greenhouse+Gas+Observing+Network 319 Diabetic Retinopathy Debrecen Data Set This dataset contains features extracted from the Messidor image set to predict whether an image contains signs of diabetic retinopathy or not. Multivariate Classification Integer, Real 1151 20 2014 Life http://archive.ics.uci.edu/ml/datasets/Diabetic+Retinopathy+Debrecen+Data+Set 320 HIV-1 protease cleavage The data contains lists of octamers (8 amino acids) and a flag (-1 or 1) depending on whether HIV-1 protease will cleave in the central position (between amino acids 4 and 5). Multivariate Classification Categorical 6590 1 2015 Life http://archive.ics.uci.edu/ml/datasets/HIV-1+protease+cleavage 321 Sentiment Labelled Sentences The dataset contains sentences labelled with positive or negative sentiment. Text Classification 3000 2015 Other http://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences 322 Online News Popularity This dataset summarizes a heterogeneous set of features about articles published by Mashable in a period of two years. The goal is to predict the number of shares in social networks (popularity). Multivariate Classification, Regression Integer, Real 39797 61 2015 Business http://archive.ics.uci.edu/ml/datasets/Online+News+Popularity 323 Forest type mapping Multi-temporal remote sensing data of a forested area in Japan. The goal is to map different forest types using spectral data. Multivariate Classification 326 27 2015 Life http://archive.ics.uci.edu/ml/datasets/Forest+type+mapping 324 wiki4HE Survey of faculty members from two Spanish universities on teaching uses of Wikipedia Multivariate Regression, Clustering, Causal-Discovery 913 53 2015 Social http://archive.ics.uci.edu/ml/datasets/wiki4HE 325 Online Video Characteristics and Transcoding Time Dataset The dataset contains a million randomly sampled video instances listing 10 fundamental video characteristics along with the YouTube video ID. Multivariate Regression Integer, Real 168286 11 2015 Computer http://archive.ics.uci.edu/ml/datasets/Online+Video+Characteristics+and+Transcoding+Time+Dataset 326 Chronic_Kidney_Disease This dataset can be used to predict the chronic kidney disease and it can be collected from the hospital nearly 2 months of period. Multivariate Classification Real 400 25 2015 Other http://archive.ics.uci.edu/ml/datasets/Chronic_Kidney_Disease 327 Machine Learning based ZZAlpha Ltd. Stock Recommendations 2012-2014 The data here are the ZZAlpha® machine learning recommendations made for various US traded stock portfolios the morning of each day during the 3 year period Jan 1, 2012 - Dec 31, 2014. Sequential, Time-Series Classification Real 314080 0 2015 Business http://archive.ics.uci.edu/ml/datasets/Machine+Learning+based+ZZAlpha+Ltd.+Stock+Recommendations+2012-2014 328 Folio 20 photos of leaves for each of 32 different species. Multivariate Classification, Clustering 637 20 2015 Other http://archive.ics.uci.edu/ml/datasets/Folio 329 Taxi Service Trajectory - Prediction Challenge, ECML PKDD 2015 An accurate dataset describing trajectories performed by all the 442 taxis running in the city of Porto, in Portugal. Multivariate, Sequential, Time-Series, Domain-Theory Clustering, Causal-Discovery Real 1710671 9 2015 Computer http://archive.ics.uci.edu/ml/datasets/Taxi+Service+Trajectory+-+Prediction+Challenge%2C+ECML+PKDD+2015 330 Cuff-Less Blood Pressure Estimation This Data set provides preprocessed and cleaned vital signals which can be used in designing algorithms for cuff-less estimation of the blood pressure. Multivariate Classification, Regression Real 12000 3 2015 Life http://archive.ics.uci.edu/ml/datasets/Cuff-Less+Blood+Pressure+Estimation 331 Smartphone-Based Recognition of Human Activities and Postural Transitions Activity recognition data set built from the recordings of 30 subjects performing basic activities and postural transitions while carrying a waist-mounted smartphone with embedded inertial sensors. Multivariate, Time-Series Classification Real 10929 561 2015 Life http://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions 332 Mice Protein Expression Expression levels of 77 proteins measured in the cerebral cortex of 8 classes of control and Down syndrome mice exposed to context fear conditioning, a task used to assess associative learning. Multivariate Classification, Clustering Real 1080 82 2015 Life http://archive.ics.uci.edu/ml/datasets/Mice+Protein+Expression 333 UJIIndoorLoc-Mag The UJIIndoorLoc-Mag is an indoor localization database to test Indoor Positioning System that rely on Earth's magnetic field variations. Multivariate, Sequential, Time-Series Classification, Regression, Clustering Integer, Real 40000 13 2015 Computer http://archive.ics.uci.edu/ml/datasets/UJIIndoorLoc-Mag 334 Heterogeneity Activity Recognition The Heterogeneity Human Activity Recognition (HHAR) dataset from Smartphones and Smartwatches is a dataset devised to benchmark human activity recognition algorithms (classification, automatic data segmentation, sensor fusion, feature extraction, etc.) in real-world contexts; specifically, the dataset is gathered with a variety of different device models and use-scenarios, in order to reflect sensing heterogeneities to be expected in real deployments. Multivariate, Time-Series Classification, Clustering Real 43930257 16 2015 Computer http://archive.ics.uci.edu/ml/datasets/Heterogeneity+Activity+Recognition 335 Educational Process Mining (EPM): A Learning Analytics Data Set Educational Process Mining data set is built from the recordings of 115 subjects' activities through a logging application while learning with an educational simulator. Multivariate, Sequential, Time-Series Classification, Regression, Clustering Integer 230318 13 2015 Computer http://archive.ics.uci.edu/ml/datasets/Educational+Process+Mining+%28EPM%29%3A+A+Learning+Analytics+Data+Set 336 HEPMASS The search for exotic particles requires sorting through a large number of collisions to find the events of interest. This data set challenges one to detect a new particle of unknown mass. Multivariate Classification Real 10500000 28 2016 Physical http://archive.ics.uci.edu/ml/datasets/HEPMASS 337 Indoor User Movement Prediction from RSS data This dataset contains temporal data from a Wireless Sensor Network deployed in real-world office environments. The task is intended as real-life benchmark in the area of Ambient Assisted Living. Multivariate, Sequential, Time-Series Classification Real 13197 4 2016 Computer http://archive.ics.uci.edu/ml/datasets/Indoor+User+Movement+Prediction+from+RSS+data 338 Open University Learning Analytics dataset Open University Learning Analytics Dataset contains data about courses, students and their interactions with Virtual Learning Environment for seven selected courses and more than 30000 students. Multivariate, Sequential, Time-Series Classification, Regression, Clustering Integer 2015 Computer http://archive.ics.uci.edu/ml/datasets/Open+University+Learning+Analytics+dataset 339 default of credit card clients This research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. Multivariate Classification Integer, Real 30000 24 2016 Business http://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients 340 Mesothelioma’s disease data set Mesothelioma’s disease data set were prepared at Dicle University Faculty of Medicine in Turkey. Three hundred and twenty-four Mesothelioma patient data. In the dataset, all samples have 34 features. Multivariate Classification Real 324 34 2016 Computer http://archive.ics.uci.edu/ml/datasets/Mesothelioma%E2%80%99s+disease+data+set+ 341 Online Retail This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. Multivariate, Sequential, Time-Series Classification, Clustering Integer, Real 541909 8 2015 Business http://archive.ics.uci.edu/ml/datasets/Online+Retail 342 SIFT10M In SIFT10M, each data point is a SIFT feature which is extracted from Caltech-256 by the open source VLFeat library. The corresponding patches of the SIFT features are provided. Multivariate Causal-Discovery Integer 11164866 128 2016 Computer http://archive.ics.uci.edu/ml/datasets/SIFT10M 343 GPS Trajectories The dataset has been feed by Android app called Go!Track. It is available at Goolge Play Store(https://play.google.com/store/apps/details?id=com.go.router). Multivariate Classification, Regression Real 163 15 2016 Computer http://archive.ics.uci.edu/ml/datasets/GPS+Trajectories 344 Detect Malacious Executable(AntiVirus) I extract features from malacious and non-malacious and create and training dataset to teach svm classifier.Dataset made of unknown executable to detect if it is virus or normal safe executable. Multivariate Classification Real 373 513 2016 Computer http://archive.ics.uci.edu/ml/datasets/Detect+Malacious+Executable%28AntiVirus%29 345 STUDENT ALCOHOL CONSUMPTION The result also provides the correlation between alcohol usage and the social, gender and study time attributes for each student. Multivariate Classification Integer 1044 32 2016 Social http://archive.ics.uci.edu/ml/datasets/STUDENT+ALCOHOL+CONSUMPTION 346 Occupancy Detection Experimental data used for binary classification (room occupancy) from Temperature,Humidity,Light and CO2. Ground-truth occupancy was obtained from time stamped pictures that were taken every minute. Multivariate, Time-Series Classification Real 20560 7 2016 Computer http://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+ 347 Improved Spiral Test Using Digitized Graphics Tablet for Monitoring Parkinson’s Disease Handwriting database consists of 25 PWP(People with Parkinson) and 15 healthy individuals.Three types of recordings (Static Spiral Test, Dynamic Spiral Test and Stability Test) are taken. Multivariate Classification, Regression, Clustering Real 40 7 2016 Computer http://archive.ics.uci.edu/ml/datasets/Improved+Spiral+Test+Using+Digitized+Graphics+Tablet+for+Monitoring+Parkinson%E2%80%99s+Disease 348 News Aggregator References to news pages collected from an web aggregator in the period from 10-March-2014 to 10-August-2014. The resources are grouped into clusters that represent pages discussing the same story. Multivariate Classification, Clustering 422937 5 2016 Other http://archive.ics.uci.edu/ml/datasets/News+Aggregator 349 Air Quality Contains the responses of a gas multisensor device deployed on the field in an Italian city. Hourly responses averages are recorded along with gas concentrations references from a certified analyzer. Multivariate, Time-Series Regression Real 9358 15 2016 Computer http://archive.ics.uci.edu/ml/datasets/Air+Quality 350 Twin gas sensor arrays 5 replicates of an 8-MOX gas sensor array were exposed to different gas conditions (4 volatiles at 10 concentration levels each). Multivariate, Time-Series, Domain-Theory Classification, Regression Real 640 480000 2016 Computer http://archive.ics.uci.edu/ml/datasets/Twin+gas+sensor+arrays 351 Gas sensors for home activity monitoring 100 recordings of a sensor array under different conditions in a home setting: background, wine and banana presentations. The array includes 8 MOX gas sensors, and humidity and temperature sensors. Multivariate, Time-Series Classification Real 919438 11 2016 Computer http://archive.ics.uci.edu/ml/datasets/Gas+sensors+for+home+activity+monitoring 352 Facebook Comment Volume Dataset Instances in this dataset contain features extracted from facebook posts. The task associated with the data is to predict how many comments the post will receive. Multivariate Regression Integer, Real 40949 54 2016 Other http://archive.ics.uci.edu/ml/datasets/Facebook+Comment+Volume+Dataset 353 Smartphone Dataset for Human Activity Recognition (HAR) in Ambient Assisted Living (AAL) This data is an addition to an existing dataset on UCI. We collected more data to improve the accuracy of our human activity recognition algorithms applied in the domain of Ambient Assisted Living. Time-Series Classification Real 5744 561 2016 Computer http://archive.ics.uci.edu/ml/datasets/Smartphone+Dataset+for+Human+Activity+Recognition+%28HAR%29+in+Ambient+Assisted+Living+%28AAL%29 354 Polish companies bankruptcy data The dataset is about bankruptcy prediction of Polish companies.The bankrupt companies were analyzed in the period 2000-2012, while the still operating companies were evaluated from 2007 to 2013. Multivariate Classification Real 10503 64 2016 Business http://archive.ics.uci.edu/ml/datasets/Polish+companies+bankruptcy+data 355 Activity Recognition system based on Multisensor data fusion (AReM) This dataset contains temporal data from a Wireless Sensor Network worn by an actor performing the activities: bending, cycling, lying down, sitting, standing, walking. Multivariate, Sequential, Time-Series Classification Real 42240 6 2016 Computer http://archive.ics.uci.edu/ml/datasets/Activity+Recognition+system+based+on+Multisensor+data+fusion+%28AReM%29 356 Dota2 Games Results Dota 2 is a popular computer game with two teams of 5 players. At the start of the game each player chooses a unique hero with different strengths and weaknesses. Multivariate Classification 102944 116 2016 Game http://archive.ics.uci.edu/ml/datasets/Dota2+Games+Results 357 Facebook metrics Facebook performance metrics of a renowned cosmetic's brand Facebook page. Multivariate Regression Integer 500 19 2016 Business http://archive.ics.uci.edu/ml/datasets/Facebook+metrics 358 UbiqLog (smartphone lifelogging) UbiqLog is the smartphone lifelogging tool that runs on the smartphone of 35 users for about 2 months. Multivariate Causal-Discovery 9782222 2016 Computer http://archive.ics.uci.edu/ml/datasets/UbiqLog+%28smartphone+lifelogging%29 359 Amazon book reviews 213.335 book reviews for 8 different books. There are books which are scored very negatively in general and books which are scored very positively. Multivariate, Text Classification, Clustering Integer, Real 213335 4 2016 Computer http://archive.ics.uci.edu/ml/datasets/Amazon+book+reviews 360 NIPS Conference Papers 1987-2015 This data set contains the distribution of words in the full text of the NIPS conference papers published from 1987 to 2015. Text Clustering Integer 11463 5812 2016 Computer http://archive.ics.uci.edu/ml/datasets/NIPS+Conference+Papers+1987-2015