Topological Data Analysis
An Overview
A motivation in topological data analysis (TDA) is to study the shape of data. It can summarize high dimensional data when visualizations aren’t useful or possible. TDA aims to extract topological features from data, such as connected components, holes, and voids. These features are essentially dimension reduction which further (statistical) analysis can be done.
There are 4 major concepts to cover in going from data to results: creating complexes from data, extracting features from complexes with homology, using persistence to extract important features, and stability to ensure results are robust.
Creating complexes from data is how we take a point cloud, a sample of data, with individual data points and create a mesh that is an approximation of a manifold. 1 We can use methods like triangulation to create these meshes, which are called simplicial complexes. Read about it here.
With a simplicial complex in hand, we extract features, holes and voids, using homology 2. Since we have an approximation of a manifold, through homeomorphism3, we can attribute topological properties of the manifold to the simplicial complex. Read about it here.
Persistence is how we determine which features in the simplicial complex have some importance. Since we only have an approximation of a smooth surface, features that are noisy or aren’t a true representation of the shape of phenomena we are representing with our sample of data, can be removed or ignored. Read about it here.
Lastly, we have a sense of stability or robustness that can be thought of similarly to persistence and statisical significance. We are interested in our results being resistant to noise, outliers, or other artifacts present in the data. Read about it here.
Back to topFootnotes
A manifold is a topological space that locally resembles Euclidean space. It is a shape that can be described by coordinates in a way that is similar to how we describe points in space.↩︎
Homology has multiple usages, all closely related. For our purposes, homology of a topological space is the most relevant usage; but homology of a chain complex will also arise. See this Wikipedia article for more information.↩︎
A homeomorphism is a continuous function between topological spaces that has a continuous inverse. It is a way to show that two topological spaces are equivalent in terms of their topological properties. Note: homeomorphism is also present in graph theory, with some overlap in their application, see here for more information.↩︎