Computational Finance Journal

Wednesday, September 22, 2004

Reading Sources:

Financial aspects:

"Options, Futures, and Other Derivative Securities", John C. Hull.

"The Mathematics of Financial Derivatives: A Student Introduction",
Paul Wilmott, Sam Howison, Jeff Dewynne

Statistical Foundations (courses)

  • Stochastic Calculus for Derivatives

  • Statistical Portfolio Methods and Risk Management

  • Advanced Statistical Inference


Dirichlet priors used by Borodin et al.

On the Dirichlet Prior and Bayesian Regularization:

The Problem: To understand how Bayesian regularization using a Dirichlet prior over the model parameters affects the learned model structure.

Motivation & PreviousWork: A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. Regularization is essential when learning from finite data sets. It provides not only smoother estimates of the model parameters compared to maximum likelihood but also guides the selection of model structures. In the Bayesian approach, regularization is achieved by specifying a prior distribution over the parameters and subsequently averaging over the posterior distribution. In domains comprising discrete variables with a multinomial distribution, the Dirichlet distribution is the most commonly used prior over the parameters. This is because of the following two reasons: first, the Dirichlet distribution is the conjugate prior to the multinomial distribution and hence permits analytical calculations, and second, the Dirichlet prior is intimately tied to the desirable likelihood-equivalence property of network structures [1, 3]. The so-called equivalent sample size measures the strength of the prior belief. In [3], it was pointed out that a very strong prior belief can degrade predictive accuracy of the learned model due to severe regularization of the parameter estimates. In contrast to that, the dependence of the learned network structure on the prior strength has not received much attention in the literature, despite its relevance for the recovery of the true network structure underlying the data.

Approach: Our work focuses on the effects of prior strength on the regularization of the learned network structure; in particular, we consider the class of Bayesian network (Belief Networks) models. Surprisingly, it turns out that a weak prior in the sense of a small equivalent sample size leads to a strong regularization of the model structure (sparse graph) given a sufficiently large data set. In particular, the empty graph is obtained in the limit of a vanishing strength of prior belief, independent of any dependencies implied by the (sufficiently large) data set. This is diametrically opposite to what one may expect in this limit, namely the complete graph from an (unregularized) maximum likelihood estimate.

This surprising effect is a consequence of the Dirichlet prior distribution. In the limit of a vanishing prior strength, the Dirichlet prior converges to a discrete distribution over the parameter simplex in the sense that the probability mass concentrates on the corners of the simplex. This is due to the vanishing hyper-parameters of the Dirichlet prior.

In the other extreme case, where the prior strength is very large, a very dense graph structure is typically obtained. Between these two extreme cases, there is a gradual transition from sparser to denser graph structures as the prior strength increases. This implies that regularization of network structure diminishes with a growing prior strength. Surprisingly, this is in the opposite direction to the regularization of parameters, as the latter behaves as expected, i.e., parameter regularization increases with a growing prior strength. Hence, the strength of the prior belief balances the trade-off between regularizing the parameters and the structure of the Bayesian network model.

When learning Bayesian networks from data, a careful choice of prior strength is hence necessary in order to achieve a (close to) optimal trade-off. The extreme cases do not provide useful insight in the statistical dependencies among the variables in a domain: the limit of a vanishing prior strength entails that, given a sufficiently large data set, the parameters pertaining to each individual variable are estimated in a maximum-likelihood manner, independently of all the other variables (empty graph); in the other extreme case, while a very strong prior belief can entail the complete graph, the estimated parameters are so severely smoothed that the resulting model predicts a uniform distribution over all the variables in the domain (an uninformative prior over the parameters is assumed).

Impact: Our work shows that the prior strength does not determine the degree of regularization of the model as an entity; instead, the prior strength determines the trade-off between regularizing the parameters vs. the structure of the model. Not only does this surprising finding enhance the theoretical understanding of Bayesian regularization using a Dirichlet prior, but also has it a major impact on practical applications of learning Bayesian network models in domains with discrete variables: the prior strength has to be chosen with great care in order to achieve an optimal trade-off, enabling one to recover the true network structure underlying the data.

this was derived from http://www.csail.mit.edu/research/abstracts/abstracts03/
machine-learning/17steck.pdf

Trying to learn winners

Let's digress to a protfolio selection algorithm. The most direcxt approach to expert learning and portfolio selection is a "reward based weighted average prediction" algorithm which adaptively computes a weighted average of experts by gradually increasing (by multiplicative or additive factors) the relative weights of the more successful experts.

consider the exponential gradient algorithm by Helmboldt et al:

b_{t+1}(j) = b_t(j) . exp{\eta x_t(j) / b_t.x_t} / \Sum [b_t(j) . exp{\eta x_t(j) / b_t.x_t}]

where \eta is a leraning parameter which is proportional to x_min, root(log m) and inversely propotional to root(n).

setting \eta to 0 for instance is nothing but the uniform cbal and hence is not universal.
combining a small learing rate with a "reasonably balanced" market we expect the performance of EG to be similar to that of the uniform CBAL and this is confirmed by experiments.

First meeting

We discussed the following papers:



The first paper introduces the intuitive random walk model and modifies it to fit market data and yet retaining the markovian property. More on this in later posts.
The second paper tries to test the markovian property by looking at a statistical property common to variants of the generalized weiner process or probably all markovian processes, which is that $ln P_t = ln P_{t-1} + \mu + \epsilon_t where \mu is an arbitrary drift parameter and \epsilon_t is a random disturbance term, and \forall t, E[\epsilon_t] = 0 .. disturbance mean is 0.

Definitive Post

In this blog we shall document our journey through computational finance.