Designing a Markov Chain

$$ %---- MACROS FOR SETS ----% \newcommand{\znz}[1]{\mathbb{Z} / #1 \mathbb{Z}} \newcommand{\twoheadrightarrowtail}{\mapsto\mathrel{\mspace{-15mu}}\rightarrow} % popular set names \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}} \newcommand{\I}{\mathbb{I}} % popular vector space notation \newcommand{\V}{\mathbb{V}} \newcommand{\W}{\mathbb{W}} \newcommand{\B}{\mathbb{B}} \newcommand{\D}{\mathbb{D}} %---- MACROS FOR FUNCTIONS ----% % linear algebra \newcommand{\T}{\mathrm{T}} \renewcommand{\ker}{\mathrm{ker}} \newcommand{\range}{\mathrm{range}} \renewcommand{\span}{\mathrm{span}} \newcommand{\rref}{\mathrm{rref}} \renewcommand{\dim}{\mathrm{dim}} \newcommand{\col}{\mathrm{col}} \newcommand{\nullspace}{\mathrm{null}} \newcommand{\row}{\mathrm{row}} \newcommand{\rank}{\mathrm{rank}} \newcommand{\nullity}{\mathrm{nullity}} \renewcommand{\det}{\mathrm{det}} \newcommand{\proj}{\mathrm{proj}} \renewcommand{\H}{\mathrm{H}} \newcommand{\trace}{\mathrm{trace}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\card}{\mathrm{card}} \newcommand\norm[1]{\left\lVert#1\right\rVert} % differential equations \newcommand{\laplace}[1]{\mathcal{L}\{#1\}} \newcommand{\F}{\mathrm{F}} % misc \newcommand{\sign}{\mathrm{sign}} \newcommand{\softmax}{\mathrm{softmax}} \renewcommand{\th}{\mathrm{th}} \newcommand{\adj}{\mathrm{adj}} \newcommand{\hyp}{\mathrm{hyp}} \renewcommand{\max}{\mathrm{max}} \renewcommand{\min}{\mathrm{min}} \newcommand{\where}{\mathrm{\ where\ }} \newcommand{\abs}[1]{\vert #1 \vert} \newcommand{\bigabs}[1]{\big\vert #1 \big\vert} \newcommand{\biggerabs}[1]{\Bigg\vert #1 \Bigg\vert} \newcommand{\equivalent}{\equiv} \newcommand{\cross}{\times} % statistics \newcommand{\cov}{\mathrm{cov}} \newcommand{\var}{\mathrm{var}} \newcommand{\bias}{\mathrm{bias}} \newcommand{\E}{\mathrm{E}} \newcommand{\prob}{\mathrm{prob}} \newcommand{\unif}{\mathrm{unif}} \newcommand{\invNorm}{\mathrm{invNorm}} \newcommand{\invT}{\mathrm{invT}} % real analysis \renewcommand{\sup}{\mathrm{sup}} \renewcommand{\inf}{\mathrm{inf}} %---- MACROS FOR ALIASES AND REFORMATTING ----% % logic \newcommand{\forevery}{\ \forall\ } \newcommand{\OR}{\lor} \newcommand{\AND}{\land} \newcommand{\then}{\implies} % set theory \newcommand{\impropersubset}{\subseteq} \newcommand{\notimpropersubset}{\nsubseteq} \newcommand{\propersubset}{\subset} \newcommand{\notpropersubset}{\not\subset} \newcommand{\union}{\cup} \newcommand{\Union}[2]{\bigcup\limits_{#1}^{#2}} \newcommand{\intersect}{\cap} \newcommand{\Intersect}[2]{\bigcap\limits_{#1}^{#2}} \newcommand{\intersection}[2]{\bigcap\limits_{#1}^{#2}} \newcommand{\Intersection}[2]{\bigcap\limits_{#1}^{#2}} \newcommand{\closure}{\overline} \newcommand{\compose}{\circ} % linear algebra \newcommand{\subspace}{\le} \newcommand{\angles}[1]{\langle #1 \rangle} \newcommand{\identity}{\mathbb{1}} \newcommand{\orthogonal}{\perp} \renewcommand{\parallel}[1]{#1^{||}} % calculus \newcommand{\integral}[2]{\int\limits_{#1}^{#2}} \newcommand{\limit}[1]{\lim\limits_{#1}} \newcommand{\approaches}{\rightarrow} \renewcommand{\to}{\rightarrow} \newcommand{\convergesto}{\rightarrow} % algebra \newcommand{\summation}[2]{\sum\limits_{#1}^{#2}} \newcommand{\product}[2]{\prod\limits_{#1}^{#2}} \newcommand{\by}{\times} \newcommand{\integral}[2]{\int_{#1}^{#2}} % exists commands \newcommand{\notexist}{\nexists\ } \newcommand{\existsatleastone}{\exists\ } \newcommand{\existsonlyone}{\exists!} \newcommand{\existsunique}{\exists!} \let\oldexists\exists \renewcommand{\exists}{\ \oldexists\ } % statistics \newcommand{\distributed}{\sim} \newcommand{\onetoonecorresp}{\sim} \newcommand{\independent}{\perp\!\!\!\perp} \newcommand{\conditionedon}{\ |\ } \newcommand{\given}{\ |\ } \newcommand{\notg}{\ngtr} \newcommand{\yhat}{\hat{y}} \newcommand{\betahat}{\hat{\beta}} \newcommand{\sigmahat}{\hat{\sigma}} \newcommand{\muhat}{\hat{\mu}} \newcommand{\transmatrix}{\mathrm{P}} \renewcommand{\choose}{\binom} % misc \newcommand{\infinity}{\infty} \renewcommand{\bold}{\textbf} \newcommand{\italics}{\textit} $$

Stationary distribution

As stated in the introduction, a Markov chain is useful for representing stochastic systems because we can infer the stable behavior of the system over a period of time by finding a stationary distribution, $\pi$, of the Markov chain such that

$$\pi \transmatrix = \pi$$

where $\pi$ is a row vector (it’s also a distribution vector since its entries sum to 1) and P is a row stochastic matrix. So the goal here is to find a distribution vector that, after being transformed by the matrix, is left untouched. The equation above should look eerily similar to the equation for finding eigenvectors and eigenvalues

$$Mx = \lambda x$$

especially when you take the transpose of $\pi \transmatrix = \pi$ which is equivalent to $\transmatrix^T \pi^T = \pi^T$. So this interpretation means that $\transmatrix^T$ has stationary eigenvectors associated with an eigenvalue of 1. So if we can find the eigenvectors of the transition matrix, then we’ve found the stationary distributions associated with it. This essentially makes $\pi$ a left eigenvector of P.

Setting up a finite Markov chain

When designing a Markov chain, you decide which states are being considered, i.e. the state space, and the probability of moving between states. Once this Markov chain/transition matrix is built, you then decide at which state a sample should start in the Markov chain which we call an initial distribution vector, $\phi$. If the initial distribution vector was (0, 0, 1, 0), then there would be 100% probability that a sample would start at state 3. Since this vector is a distribution, you can technically assign probabilities of starting in any given state, so long as the entries sum to 1.

If you assign equal probablility between all states and design it in such a way that each sample must move to another state at each time step, then such a Markov chain would be called a simple random walk. Now if you want to bias the Markov chain where probabilities between each states aren’t equal and if there’s a chance that a sample can remain at its present state at any given time step, then the Markov chain is called a lazy baised random walk.

States of a Markov chain can be classified in a variety of ways and some depend on where in the chain these states are located (like boundary states). If a state is transient, then there is a nonzero probability that a sample starting at that state will never return to that state (a Markov chain will only visit that state finitely many times). If a state is recurrent, then the Markov chain is gauranteed to visit it inifintely many times (a Markov chain naturally wants to settle in a recurrent state).

The other three types of states can be considered when looking at the boundary states of your Markov chain:

  1. You can make the state absorbing, where once a sample has landed on this state, it never leaves
  2. You can make the state reflecting, which forces the sample to go back to the state it came from
  3. Or you can make the state semi-reflecting, where there’s a 50% chance the sample stays and a 50% chance the sample goes back to the state it came from (this is a hybrid of the absorbing and reflecting conditions)

Possible design issues

When designing a Markov chain, there are a couple issues you can run into. One of these issues is reducibility, which happens when the Markov chain has more than one communicating class (i.e. a maximal set of states such that every pair of states communicate with each other). This results in each communicating class operating independently of one another each with their own unique stationary distribution (meaning you’ll get multiple eigenvectors associated with the eigenvalue of 1). And as your time steps approach infinity, transient states die off and eventually a sample falls into one of the recurrent classes (a Markov chain is gauranteed to have a least one) and stays there forever.

The second issue deals with the topic of periodicity. We say a state i has period k ≥ 1 if any chain starting at and returning to state i with positive probability takes a number of steps divisible by k. If k = 1, then the state is aperiodic whereas if k > 1, then the state is periodic. So this second issue occurs when a Markov chain has a periodic recurrent class, i.e. a recurrent class that has a period ≥ 2 (every recurrent class has at least a period of 1). In such a case, the recurrent class still has a unique stationary distribution, but Markov chain will oscillate around it. So instead, we can take the average of those oscillations. And if you don’t feel like doing that, you can instead fix the root cause of these oscillations by making at least one state by lazy (i.e. give the sample a chance to remain at its present state); however, note that adding laziness to a state will slow down convergence of the Markov chain.

One the topic of periodicity, it might be useful knowing how to find the period of a recurrent class. The following are four ways to go about it:

  • find the number of persistent eigenvectors and one to that number to account for the stationary distribution
  • find the period of any state in the recurrent class
  • find the number of eigenvectors associated with the eigenvalue = 1
  • find the number of distributions a sample is oscillating through in the long run

Perron-Frobenius theorem

The Perron-Frobenius Theorem states that if a transition matrix, P, is irreducible and row stochastic, then there exists a unique stationary distribution, $\pi$, such that for every initial distribution, $\phi$, then

$$\phi P^t \approaches \pi$$

The big idea here is that as the number of steps (or time) increases, only eigenvectors associated with eigenvalues of length 1 matter because those whose length is less than 1 die since they approach 0 as time approaches infinity; we call the eigenvectors associated with these particular eignvalues vanishing eigenvectors, aptly named because they become insignificant over time.

Remember that any eigenvalue of the transition matrix whose value is equal to 1 corresponds to a recurrent class and will have multiple eigenvectors associated with it if there are multiple recurrent classes. And the support of each eigenvector, i.e. the positive entries of each eigenvector, correspond to the states in the recurrent class.

Proof and application of the Perron-Frobenius in Google’s PageRank algorithm can be found in this pdf.