Multivariate Analysis & Statistical Learning

About this content

Multivariate analysis is the branch of statistics that generalizes methods of inferential statistics, so that a population can be characterized through a finite collection of random explicatives variables. As in classical inferential statistics, multivariate analysis the main idea is to generalize parameters or obtain useful conclusions from a multivariate population based on the information of the sample however in this case the information is multidimensional.

There are often situations in which it is necessary to make inferences about the future behavior of one or several variables in terms of random vectors, infer the population type of a random vector since there are several populations that share the same explicative variables but with different distribution, or find boundaries and structures of clustering since there are different types of mixed populations of which the membership of the vectors is not known. For these situations and some more, exist results based on multivariate analysis that provides methods for a non-exactly teoric solution to the problem, this is called statistical learning. In addition to this, a computational approach is added considering algorithms, complexity, expenditure, data structures, etc. then the set of these techniques known as machine learning.

About this material.

The material presented has been developed with free software, the code that the notebooks contain is mostly its own and made for academic and teaching purposes.

Software used during the elaboration of the content:

Apache Spark 2.0.1 https://spark.apache.org/
Jupyter notebooks http://jupyter.org/
Python 3 https://www.python.org/download/releases/3.0/
Docker https://hub.docker.com/r/jupyter/all-spark-notebook/

Sklearn contents

TopicSubTopic
Multivariate Data Sets Multivariate data sets and descriptive statistics View notebook
Multivariate Data Sets Linear transformations over data matrix View notebook
Multivariate Distributions Multivariate normal distribution View notebook
Dimensionality Reduction Principal component analysis View notebook
Dimensionality Reduction Factor analysis View notebook
Dimensionality Reduction Multidimensional scaling View notebook
Discriminant Analysis Logistic regression View notebook
Discriminant Analysis Roc curve View notebook
Discriminant Analysis Support vector machine View notebook
Go to github repository

Spark Contents

TopicSubTopic
Multivariate Data Sets Multivariate data sets and descriptive statistics View notebook
Multivariate Data Sets Linear transformations over data matrix View notebook
Multivariate Distributions Multivariate normal distribution View notebook
Dimensionality Reduction Principal component analysis View notebook
Dimensionality Reduction Factor analysis View notebook
Dimensionality Reduction Multidimensional scaling View notebook
Discriminant Analysis Logistic regression View notebook
Go to github repository

References

Most of the theoretical content is not proper but is a compendium of classic results and examples from of diverse sources.

"Multivariate Statistics" John I. Marden, Department of Statistics, University of Illinois at Urbana-Champaign.
"Nuevos Métodos de Análisis Multivariante" Carles M. Cuadras, c C. M. Cuadras, CMC Editions, Manacor 30, 08023 Barcelona, Spain.
"Methods of Multivariate Analysis" Alvin C. Rencher, A John Wiley and Sons Inc. publication, Brigham Young University, United States of America.
"Applied Multivariate Statistical Analysis" Richard A. Jhonson, Dean W. Wichern, Pearson prentice hall, United States of America.
"Applied Multivariate Statistical Analysis" PennState university, online course STAT 505