Introduction to Stochastic Models

$$ %---- MACROS FOR SETS ----% \newcommand{\znz}[1]{\mathbb{Z} / #1 \mathbb{Z}} \newcommand{\twoheadrightarrowtail}{\mapsto\mathrel{\mspace{-15mu}}\rightarrow} % popular set names \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}} \newcommand{\I}{\mathbb{I}} % popular vector space notation \newcommand{\V}{\mathbb{V}} \newcommand{\W}{\mathbb{W}} \newcommand{\B}{\mathbb{B}} \newcommand{\D}{\mathbb{D}} %---- MACROS FOR FUNCTIONS ----% % linear algebra \newcommand{\T}{\mathrm{T}} \renewcommand{\ker}{\mathrm{ker}} \newcommand{\range}{\mathrm{range}} \renewcommand{\span}{\mathrm{span}} \newcommand{\rref}{\mathrm{rref}} \renewcommand{\dim}{\mathrm{dim}} \newcommand{\col}{\mathrm{col}} \newcommand{\nullspace}{\mathrm{null}} \newcommand{\row}{\mathrm{row}} \newcommand{\rank}{\mathrm{rank}} \newcommand{\nullity}{\mathrm{nullity}} \renewcommand{\det}{\mathrm{det}} \newcommand{\proj}{\mathrm{proj}} \renewcommand{\H}{\mathrm{H}} \newcommand{\trace}{\mathrm{trace}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\card}{\mathrm{card}} \newcommand\norm[1]{\left\lVert#1\right\rVert} % differential equations \newcommand{\laplace}[1]{\mathcal{L}\{#1\}} \newcommand{\F}{\mathrm{F}} % misc \newcommand{\sign}{\mathrm{sign}} \newcommand{\softmax}{\mathrm{softmax}} \renewcommand{\th}{\mathrm{th}} \newcommand{\adj}{\mathrm{adj}} \newcommand{\hyp}{\mathrm{hyp}} \renewcommand{\max}{\mathrm{max}} \renewcommand{\min}{\mathrm{min}} \newcommand{\where}{\mathrm{\ where\ }} \newcommand{\abs}[1]{\vert #1 \vert} \newcommand{\bigabs}[1]{\big\vert #1 \big\vert} \newcommand{\biggerabs}[1]{\Bigg\vert #1 \Bigg\vert} \newcommand{\equivalent}{\equiv} \newcommand{\cross}{\times} % statistics \newcommand{\cov}{\mathrm{cov}} \newcommand{\var}{\mathrm{var}} \newcommand{\bias}{\mathrm{bias}} \newcommand{\E}{\mathrm{E}} \newcommand{\prob}{\mathrm{prob}} \newcommand{\unif}{\mathrm{unif}} \newcommand{\invNorm}{\mathrm{invNorm}} \newcommand{\invT}{\mathrm{invT}} % real analysis \renewcommand{\sup}{\mathrm{sup}} \renewcommand{\inf}{\mathrm{inf}} %---- MACROS FOR ALIASES AND REFORMATTING ----% % logic \newcommand{\forevery}{\ \forall\ } \newcommand{\OR}{\lor} \newcommand{\AND}{\land} \newcommand{\then}{\implies} % set theory \newcommand{\impropersubset}{\subseteq} \newcommand{\notimpropersubset}{\nsubseteq} \newcommand{\propersubset}{\subset} \newcommand{\notpropersubset}{\not\subset} \newcommand{\union}{\cup} \newcommand{\Union}[2]{\bigcup\limits_{#1}^{#2}} \newcommand{\intersect}{\cap} \newcommand{\Intersect}[2]{\bigcap\limits_{#1}^{#2}} \newcommand{\intersection}[2]{\bigcap\limits_{#1}^{#2}} \newcommand{\Intersection}[2]{\bigcap\limits_{#1}^{#2}} \newcommand{\closure}{\overline} \newcommand{\compose}{\circ} % linear algebra \newcommand{\subspace}{\le} \newcommand{\angles}[1]{\langle #1 \rangle} \newcommand{\identity}{\mathbb{1}} \newcommand{\orthogonal}{\perp} \renewcommand{\parallel}[1]{#1^{||}} % calculus \newcommand{\integral}[2]{\int\limits_{#1}^{#2}} \newcommand{\limit}[1]{\lim\limits_{#1}} \newcommand{\approaches}{\rightarrow} \renewcommand{\to}{\rightarrow} \newcommand{\convergesto}{\rightarrow} % algebra \newcommand{\summation}[2]{\sum\limits_{#1}^{#2}} \newcommand{\product}[2]{\prod\limits_{#1}^{#2}} \newcommand{\by}{\times} \newcommand{\integral}[2]{\int_{#1}^{#2}} % exists commands \newcommand{\notexist}{\nexists\ } \newcommand{\existsatleastone}{\exists\ } \newcommand{\existsonlyone}{\exists!} \newcommand{\existsunique}{\exists!} \let\oldexists\exists \renewcommand{\exists}{\ \oldexists\ } % statistics \newcommand{\distributed}{\sim} \newcommand{\onetoonecorresp}{\sim} \newcommand{\independent}{\perp\!\!\!\perp} \newcommand{\conditionedon}{\ |\ } \newcommand{\given}{\ |\ } \newcommand{\notg}{\ngtr} \newcommand{\yhat}{\hat{y}} \newcommand{\betahat}{\hat{\beta}} \newcommand{\sigmahat}{\hat{\sigma}} \newcommand{\muhat}{\hat{\mu}} \newcommand{\transmatrix}{\mathrm{P}} \renewcommand{\choose}{\binom} % misc \newcommand{\infinity}{\infty} \renewcommand{\bold}{\textbf} \newcommand{\italics}{\textit} $$

Stochastic models have a variety of applications including Google’s Page Rank algorithm and reinforcement learning algorithms. These models can be thought of as mathematical processes or systems which can be described using a graph of states that are interconnected in some way. In short, any process or system that has an element of randomness to it is considered stochastic.

In mathematics, there are two types of processes:

  1. a deterministic process, where moving to a new state is based on present and past states (i.e. it’s pre-determined)
  2. and a stochastic process, where moving to a new state is based on a probablility

I really like Noah Burger’s layman explanation of stochastic processes in which he states

A stochastic process can be thought of as a description of the movement of an object over time. At every step in time, the object could assume one of many possible states, and each state has a probability associated with it. So while we can’t determine the exact path the object will take, we can make inferences about the path it might take based on those probabilities.

In this series of posts, we will be focusing on Markov processes, a type of stochastic process that’s memoryless in that it makes predictions for the future based on its present state. More specifically, we’ll be looking at Markov chains, a type of Markov process that considers only finite and countably infinite states over discrete time.

Now in the context of linear algebra, we’re used to writing vectors as columns and mulitplied them from the right. However, when dealing with stochastic models, it’s more convenient to represent vectors as row vectors and multiply from the left. This makes it more convenient when later interpreting transition matrices like Pij which can be interpreted as the probability of moving from state i to state j.

Lastly, we can introduce a few basic definitions before jumping in. We say that a square matrix is stochastic if each one of its columns sum to 1. In most of our examples, we’ll be considering a row stochastic matrix, where each rows sums to 1. A Markov chain can be represented by a matrix, typically called a transition matrix, where the index of each entry in the matrix represents moving from one state to another with the probability of moving being the entry value. In a Markov chain, we say that two states, i and j, communicate if there exists a m, n ≥ 0 such that the probability of moving from state i to state j in m steps is greater than 0 and the probability of moving from state j to state i in n steps is greater than 0 as well. Trivially, every state communicates with itself.