## Isip.piconepress.com

**NEW APPROACHES TO STOCHASTIC MODELING OF SPEECH**
Institute for Signal and Information Processing
Department of Electrical, Computer, and Systems Engineering

**ABSTRACT**
specifically on the spontaneous conversational speechfound in the Switchboard corpus where error rates have not
Hidden Markov Models and n-gram language modeling
moved much beyond 50% word accuracy in the past
have been the dominant approach in continuous speech
two to three years. In addition, commercial researchers
recognition for almost 15 years. Though successes have
often observe error rate increases of factors of 3-4 when
been well-documented, fundamental limitations of this
paradigm surface at both the acoustic and languagemodeling ends of the speech recognition problem.

Of course, both trigram language models and hidden
Although acoustic models based on linear statistical
M a r k o v a c o u s t i c m o d e l s h ave m a d e t r e m e n d o u s
assumptions have led to steadily improved performance on
contributions to progress in speech recognition. They
speech collected in benign environments, they are still
clearly established the value of automatic training
sorely lacking on spontaneous data encountered in the field.

algorithms and the usefulness of Markov assumptions in
Similarly, robust parsing of dialogs and unconstrained man-
simplifying both recognition and training complexity. In
machine communications is a serious problem for today’s
addition, the successes provide important evidence that
local context — neighboring phonemes in acousticmodeling and neighboring words in language modeling —
In this session, we attempt to stimulate a discussion on new
provides the most important information for speech
approaches in statistical modeling. Researchers from both
recognition. Conditioning on local context seems to be an
inside and outside of the speech community are invited to
important attribute of a good statistical model. The
present new perspectives on how complex behavior can be
question raised here is whether sufficient progress can be
modeled in a parsimonious manner. Our panel discussion
made simply by increasing the number of parameters in
will attempt to identify and debate a handful of promising
these models. Adaptation certainly helps improve
new directions in statistical modeling of speech.

performance, but it is predicated on reasonably accuratebaseline performance. Progress in speech recognition is

**1. INTRODUCTION**
ultimately limited by the sophistication of statisticalmodels, and current technology is unlikely to provide the
Currently, the most successful speech recognition systems
capability for computers to really converse with humans.

use detailed models of local context with large numbers of
New models are needed to better capture variability at a
parameters trained in a limited domain. In acoustic
local level and/or to model trends operating at a higher
modeling, context-dependent hidden Markov models
(HMMs) have become a standard approach forhandling the variability due to local phonetic context,
Nonlinear systems theory has been an active area of
where local context may be a window of 3-5 phones. In
research in the last twenty years in the field of
language modeling, trigrams have dominated the field,
dynamics. What is new is that it promises to be an active
with major improvements coming from use of higher-order
area of engineering in the next twenty years as a new wave
n-grams. For problems where training data has been
of mathematics begins moving from the laboratory to the
steadily increasing these models show steady improvement,
field. Much as linear system theory provided tools for
as demonstrated in the NAB benchmarks where word
scientists to analyze classes of problems previously thought
accuracy rates of less than 10% have been achieved on an
too complicated, nonlinear system theory offers hope of
open vocabulary task However, the same technology
providing tools to unlock the mysteries of a wide range of
has provided minimal advances on less constrained tasks,
important biological signals such as speech.

Similarly, computational research in language has spanned
acoustic modeling and language modeling. On each of these
decades. In the late 1950s, a hierarchy of grammatical
topics, we have included a speaker drawn from outside the
formalisms was defined in an attempt to document the
normal speech research community, and a speaker
complexity of language. As HMMs were introduced in
representing a somewhat more mainstream viewpoint.

speech recognition, great excitement was generated by thefact that both acoustic models and language models could
The first two talks deal with the issue of nonlinear acoustic
be represented as state machines. Researchers were quick to
modeling. The piecewise constant model has been a staple
see, however, that this was simply the first step in
of digital speech processing since the early 1970’s. A
representing the entire speech recognition problem as a
multivariate Gaussian model of observation vectors has
formal language theory problem. HMMs were shown to be
been employed in hidden Markov model-based speech
equivalent to regular grammars, and shown to simply be
recognition systems since the early 1980’s. Though neural
one step in a progression towards context-sensitive
network-based approaches have been researched since the
grammars. Today, we find systems routinely implementing
mid-1980’s, only recently has the performance of such
context-free grammars and left regular grammars. True
models rivaled conventional technology.

context-sensitive grammars, however, have so far been
Simon Haykin suggests a new approach to signal modeling
impractical for speech recognition and understanding
based on chaotic signals. His research is representative of a
applications. In addition, experiments in understanding
new body of science devoted to the application of nonlinear
spontaneous speech (e.g. in the ATIS task have shown
dynamics to conventional classification problems.

that conventional parsing techniques are ill-suited to
Classification of signals into deterministic and stochastic
processing spontaneous spoken language and various
ignores an important class of signals, known as chaotic
robust parsing algorithms are now being explored.

signals, that are deterministic by nature yet random in
In sum, research in both acoustic and language modeling
appearance. While direct modeling of the speech signal as
currently benefits from the power of context-sensitive
output from a nonlinear system has not proven to provide
statistics, but both are also limited in not moving beyond
enhancements over conventional analyses, recent research
the local level. Too many parameters are dedicated to the
suggests these techniques are applicable to the statistical
local structure at the expense of capturing global structure.

modeling problem that is the core of the acoustic modeling
As noted in “Knowing the microscopic laws of how
problem. In his talk, Haykin advocates an architecture that
things move still leaves us in the dark as to their larger
employs neural networks to perform the actual prediction/
consequences.” One of the attractions of nonlinear systems
detection task. This is not unlike many hybrid speech
is the hope of modeling the coarse behavior of a system in
recognition systems that now use a combination of hidden
which a detailed analysis is not required, a very common
problem in statistical mechanics. Similarly, one of the
In a companion talk, Tony Robinson discusses issues in
attractions of grammatical language models is the potential
acoustic modeling in the context of connectionist/HMM
for capturing the higher level structure inherent to language.

systems, which use neural networks to estimate posterior
It is clear that our current formalisms are not adequate for
distributions for HMM states, and can be thought of as a
the difficult recognition tasks at hand. Here, we take a look
non-linear extension of current HMM technology. While his
at some new directions that may offer a means of
general approach is based on non-linear models, the theme
overcoming limitations of existing statistical models.

of his talk is new applications of one of the most

**2. SESSION OVERVIEW**
powerful tools behind HMM technology: the expectation-maximization (EM) algorithm [Robinson examines the
This session consists of four invited panelists:
notion of a hidden component of the process, e.g. the statein an HMM. He explores how this state, which is currently
used to capture contextual and temporal phonetic
variability, can improve aspects of speech recognition from
feature extraction to posterior distribution modeling to
“Some New Uses of EM in Acoustic Modeling”
channel or outlier (goat) identification.

Ted Briscoe, Cambridge University“Language Modeling or Statistical Parsing?”
The subsequent two talks deal with issues related to the
language modeling problem, which many people believe
“A Context-Free Headword Language Model”
will be the source of most of the improvement in speechunderstanding systems in the next few years. We have seen
The panelists were selected to provide perspectives on two
many attempts at improving recognition through more
key dimensions of the speech understanding problem:
sophisticated language modeling, from higher-order and
variable-order n-grams to the introduction of grammatical
• If linear models work well for variations within a phone
structure, but such techniques have generally resulted in
as long as the time span is sufficiently small, might that
modest improvements in performance at the expense of
argue for the use of non-linear models to represent the
significant increases in complexity. However, new work
aimed at combining the advantages of grammatical
New and improved spectral estimation techniques have
structure and local context are emerging in various forms of
often not proven to provide substantial gains in
lexicalized grammars. This combined approach may offer
recognition performance, perhaps leading us to believe
the most hope for performance gains, and one theme of the
the hidden component of the acoustic model is the area
two language modeling talks is lexicalization and its
needing a better statistical model. However, non-linear
techniques are also motivated by articulatory models,where non-linearities are usually placed at the output
Ted Briscoe suggests that interpretation of text requires a
level rather than embedded within the internal structure
framework beyond n-gram models, due to a need of such
systems to evaluate the relative likelihoods of complexgrammatical relationships, which are hierarchical and
• Are the language models presented here only practical in
therefore not easily captured by n-gram models. He
an n-best rescoring framework, or can we envision using
advocates the use of statistical parse selection models over
them to reduce the search space? Are there better ways to
stochastic context free grammars, but quickly notes that
integrate new language models with acoustic modeling to
integration of estimate-maximize (EM) techniques into
produce more efficient recognition systems?
these formalisms is challenging. Such formalisms do notlend themselves to the treatment of conditional probabilities
When recognition performance is poor, or the search
a n d m a x i m u m l i k e l i h o o d c a l c u l a t i o n s i nvo l v i n g
space is large, n-best outputs can be quite limiting,
ambiguous-in-time candidate partial parses. Research
forcing one to sift through large numbers of competing
to-date in formalisms beyond CFGs has been disappointing.

hypotheses that look very similar. New language models
Yet, it is clear that such formalisms are needed to deal with
must find a way to limit the number of hypotheses passed
the complex language models required for spontaneous

**REFERENCES**
In a companion talk, Fred Jelinek introduces lexicalized
S. Young, “Large Vocabulary Continuous Speech Rec-
stochastic context-free language model that takes advantage
ognition: A Review,” presented at the 1995 IEEE Auto-
of a parser to define the phrase structure of a word string.

matic Speech Recognition Workshop, Snowbird, Utah,
The authors use the notion of a phrase headword to relate
non-terminals directly to lexical items, and use the given
D.S. Pallett,

*et. al.*, “1994 Benchmark Tests for the
parse structure to reduce the cost of computing the
ARPA Spoken Language Program,” in

*Proceedings of*
probability of a word string. The problem of sparse data in

*the ARPA Spoken Language Systems Technology Work-*
parameter estimation is addressed by defining word classes

*shop*, Austin, Texas, USA, January 1995, pp. 5-36.

as would be used in a class grammar, but here the classes
F. Jelinek, R. Mercer and S. Roukos, “Principles of
simply define a smoothing hierarchy. This approach is
Lexical Language Modeling for Speech Recognition,”
representative of a growing trend towards lexicalized
in

*Readings in Speech Recognition*, ed. A. Waibel and
K.-F. Lee, Morgan Kaufmann Publishers, 1990.

**3. DISCUSSION**
A summary of recent SWITCHBOARD results areavailable at the URL: http://cspjhu.ece.jhu.edu.

Perhaps the most important aspect of this session will be the
J.R. Deller, J.G. Proakis, and J.H.L. Hansen,

*Discrete*
panel discussion held after the plenary talks. Some of the

*Time Processing of Speech Signals*, MacMillan, New
issues we feel are an outgrowth of the papers presented in
H.O. Peitgen, H. Jurgens, and D. Saupe,

*Chaos andFractals: New Frontiers of Science*, Springer-Verlag,
• What will be the impact on the number of parameters
required in a speech recognition system based on
A.P. Dempster, N.M. Laird and D.B. Rubin, “Maxi-
mum likelihood from incomplete data via the EM algo-
If more parameters are required, then the benefits of such
rithm,”

*Journal of the Royal Statistical Society*, Vol. 37,
an approach might vanish for small training sets. If the
number of parameters decreases, perhaps sensitivity tospeaker or channel will increase.

**CHAOTIC SIGNAL PROCESSING**
Traditionally, signals have been classified into two basictypes: deterministic, and stochastic. This classificationignores an important family of signals known as chaoticsignals, which are deterministic by nature and yet exhibitmany of the characteristics that are normally associatedwith stochastic signals.

In this talk, we begin by reviewing some important aspectsof nonlinear dynamics. This would then naturally lead intoa discussion of chaotic systems, how they arise, physicalphenomena that are known to be chaotic, and their practicalapplications.

The second half of the talk will be devoted to thecharacterization of chaotic signals and the theory ofembodology, with emphasis on time series analysis.

Specifically, we will describe the following notions:
• Attractor dimension, and the correlation dimension
• Minimum embedding dimension, and its estimation usingthe method of false nearest neighbors
• Lyapunov spectrum, and its estimation
• Recursive prediction, and how to implement it using

**SOME NEW USES OF EM IN ACOUSTIC MODELING**
also be used However, the conversion of the posteriorprobabilities to likelihoods involves some approximations
This talk will raise some problems with the current
which means that, as currently implemented, the training
techniques used in acoustic modeling and suggest some
algorithm is not an EM algorithm. These approximations
directions for future research. Firstly the connectionist/
aside, we have shown that we can train on posterior
HMM system known as ABBOT will be briefly introduced.

probabilities and that this results in better models over the
The talk will then progress to suggest a series of new and
largely untested applications of the EM algorithm inacoustic vector and acoustic model estimation for automatic
It is interesting to consider the connectionist architecture
speech recognition. These topics are under investigation at
within the EM framework. We consider each unit as
Cambridge University and it is hoped that they will
estimating an indicator variable which has values of “fire”
contribute to the ABBOT system in the future.

of “not fire”. We can estimate the MAP probability of firingif we know both the input and the output to the network
It is acknowledged that the current acoustic vectors used in
A l t h o u g h t h i s wo r k i s c u r r e n t l y c o m p u t a t i o n a l l y
speech recognition systems are a poor representation of the
constrained by an exhaustive search it does propose
speech signal. This is clear from speech coding work
approximations applicable to large networks or the use of
whereby a standard LPC coder (e.g. LPC10e) may produce
unintelligible output in the case of certain speaking styles ormild background noise.

A recent improvement has been the modeling of contextdependent phones Here we assume an indicator
Drawing from speech coding, we can aim to model the
variable not only for the phone class at a given time but the
parameters of source-filter model such as LPC. In such a
phone context given the phone class. We have been able to
model the source is Gaussian white noise or an impulse
use connectionist models to estimate this variable which has
train. In conventional applications the LPC filter parameters
resulted in improved speech recognition accuracy and speed
of voiced speech are estimated assuming the white noise
source, but it has been shown that an application of the EMalgorithm can provide a maximum likelihood estimate to
Another promising candidate for acoustic modeling is the
both the LPC parameters and the excitation parameters (the
hierarchical mixture of experts This is essentially a
decision tree with a probability of branching associatedwith each node. The EM algorithm may be used to
Drawing from speech perception we know that formant
reestimate the parameters of the system. There are several
locations are a important to vowel identity and that formant
practical aspects of this architecture that need to be
frequencies are determined by vocal tract length and are
addressed before it can be applied to large speech tasks
speaker dependent. The simplest speaker invariant
The HME can either be applied as a static pattern classifier
parameter is a formant ratio. However, we are not close to
and a Markov model used to model the dynamics in much
incorporating this knowledge in current ASR system as they
the same way as connectionist/HMM hybrid systems or the
generally work in the power spectra or cepstral domain. As
dynamics can be directly incorporated.

start in this direction is to estimate the power spectraldensity as a Gaussian mixture and then to model the
Finally, an acknowledged problem in speech recognition is
that some speakers are much easier to recognize thanothers. Out of ten speakers in a unlimited vocabulary read
Currently HMMs model acoustic vector densities. We have
speech evaluation it is not uncommon for the best speaker
shown that statistical models of posterior probabilities can
to have an order of magnitude lower error than the worst
speaker. Hence the overall error rate is dominated by a fewoutliers (the goats). A proposed “EM” solution to thisproblem is to label every speaker with an indicator variable(sheep or goat) and use the observed recognition rate toestimate the probability of being a goat. By weighting thetraining set by these probabilities it is expected that theava i l a b l e m o d e l i n g p ow e r c a n b e b e t t e r u s e d a n dthe expected error rate decreased.

Another viewpoint on the same scenario is that thegoatiness factor is determined by the channel conditions.

We consider broadcast speech as a major source of acousticdata for future speech systems. By considering theconfidence that the decoded speech came from a cleansource rather than a dirty source we hope to filter theunending supply of broadcast speech and train in proportionto the sequential MAP estimate of sheepishness. We hopethat this will liberate us from the very significant resourcesrequired to construct today’s speech corpora and henceresult in significantly better speech systems.

**REFERENCES**
Burshtein 1990: “Joint maximum likelihood estimationof Pitch and AR parameters using the EM algorithm”,pages 797-800,

*Proceedings of ICASSP 90*, IEEE.

Zolfaghari and Robinson 1995: Cambridge UniversityEngineering Department Technical Report.

Bourlard and Morgan 1994:

*Continuous Speech Recog-nition: A Hybrid Approach*, Kluwer Academic Publish-ers.

Robinson Hochberg and Renals 1995: “The use ofrecurrent networks in continuous speech recognition”,chapter 19 in

*Automatic Speech and Speaker Recogni-tion - Advanced Topics*, Editors C. H. Lee, K. K. Pali-wal and F. K. Soong, Kluwer Academic Publishers.

retraining of recurrent neural networks”,

*Advances inNeural Information Processing Systems*, Vol. 8, Mor-gan Kaufmann.

Cook and Robinson 1995: “Training MLPs via theEstimation-Maximisation algorithm”,

*Proceedings ofthe IEE conference on Artificial Neural Networks*.

Jordan and Jacobs 1994: “Hierarchical mixtures ofexperts and the EM algorithm”,

*Neural Computation*,Vol. 6, pp. 181-214.

Kershaw, Hochberg and Robinson 1995: “Context-Dependent Modeling in the ABBOT LVSCR System,”presented at the 1995 IEEE Automatic Speech Recog-nition Workshop, Snowbird, Utah, U.S.A., Dec. 1995.

Waterhouse and Robinson 1995: “Pruning and Grow-ing Hierarchical Mixtures of Experts”,

*Proceedings ofthe IEE conference on Artificial Neural Networks*.

**LANGUAGE MODELING OR STATISTICAL PARSING?**
From the perspective of speech recognition, the alternativederivations for (a) are not as important as the relative
In

*language modeling*, a corpus of sentences is treated as a
likelihood with which

*rabbit *may follow

*play with *(though
set of observed outputs of an unknown stochastic
accurate assessment of the likelihood with which a noun
generation model, and the task is to find the model which
denoting a plaything rather than playmate will be followed
maximises the probability of the observations. When the
by

*is/was/gets/.played with *might well require both
model is non-deterministic and contains hidden states, the
modeling the distinct derivations and passivization.)
probability of a sentence is the average probability of eachdistinct derivation (sequence of hidden states) which could
For interpretation, the relative likelihood of the distinct
have generated it. For speech recognition, language
derivations is crucial. Furthermore, the derivation must
modeling is an appropriate tool for evaluating the
encode hierarchical consistuency (i.e. bracketing) to be
plausibility of different possible continuations (word
useful. Thus, for (a) we need to know whether the
candidates) for a partially recognized utterance. For speech
preposition

*with *combines with

*play *to form a phrasal
or text understanding the derivations themselves are crucial
transitive verb (d) or whether it combines with h

*er rabbit *to
to distinguish different interpretations. For example:
form a prepositional phrase which in turn combines withthe intransitive verb

*play *(e), because recognition of which
verb we are dealing with determines the difference of
(c) ?Charlotte’s father gets played with a lot
N-gram models cannot directly encode such differences of
(d) (S(NP Charlotte) (VP is (VP (V playing with)
hierarchical organization which is why most stochastic
approaches to text understanding have employed stochasticcontext-free grammars (SCFGs) or feature/unification-
(e) (S(NP Charlotte) (VP(VP is (VP playing))
based abbreviations of them. Within this framework it is
possible to treat the grammar as a language model and
In (a), there is an ambiguity between an interpretation in
r e t u r n t h e m o s t l i ke l y d e r iva t i o n a s t h e b a s i s f o r
which Charlotte is playing accompanied by her (pet) rabbit
and one in which her plaything is a (toy) rabbit. In the latter
However, there are a number of problems with this
case

*play with *is best analyzed as a phrasal verb and

*the*
approach. Firstly, SCFGs and their feature-based variants

*rabbit *as a direct object, predicting for example the
associate a global probability with each grammar rule rather
possibility of passivization in (b).

than a set of conditional probabilities that a given rule will
However, given the former interpretation,

*with *is best
apply in different parse contexts. A number of experiments
analyzed as introducing an adverbial prepositional phrase
by different groups have independently confirmed that
modifier of the verb

*play *predicting the accompaniment
modeling aspects of the parse context improves the
interpretation of

*with *(as one possibility). The oddity of (c),
accuracy with which the correct derivation is selected by up
in which we are forced to interpret Charlotte’s father as a
to 30%. Most researchers are now experimenting with
p l a y t h i n g , i s t h e n a c o n s e q u e n c e o f t h e fa c t t h a t
statistical parse selection models rather than language
passivization only applies to direct object noun phrases and
models within the text understanding community.

Secondly, SCFGs and close variants, unlike n-gram models,are not easily lexicalized, in the sense that the contributionof individual words to the likelihood of a given derivation is
not captured. Those models which are lexicalized use wordforms rather than word senses to condition the probabilityof different derivations. It is likely that considerableimprovement could be obtained by utilizing broad semanticclass information (such as that

*rabbit *can denote an animalor toy) to evaluate the plausibility of different predicate-argument combinations.

Selecting a correct derivation is only one aspect of theproblem of robust practical parsing for text or speechunderstanding. Another bigger problem is ensuring that thegrammar covers the (sub)language. Language models offero n e
estimation-maximisation (EM) techniques can be utilizednot only to find the (locally) optimal probabilities forgrammar rules but also to find the optimal set of rules (by,for example, removing rules re-estimated to some “floor”threshold probability). However, it is not clear that EMtechniques can be coherently utilized in this way withstatistical parse selection models which cannot beinterpreted as language models.

In the talk I will discuss the issues of parse selection vs.

language models, lexicalization of models and integratedapproaches to statistical rule induction as well as ranking infurther detail.

**A CONTEXT FREE HEADWORD LANGUAGE MODEL**
*Ciprian Chelba, Anna Corazza, and Frederick Jelinek*
Center for Language and Speech Processing
{chelba, corazza, jelinek}@cspjhu.ece.jhu.edu
model. A headword language model whose power could becompared to that of the

*trigram *language model would have
This is an attempt to base a language model on a context-
free lexicalized grammar. A language model must bestatistical, and therefore simple enough so its parameters
can be estimated from data. Since it should reflect message

*H *∠ ( ) =

*F *→

*HG*
*H *∠

*RB*(

*H*) =

*F *→

*HG*
The lexical non-terminals will be related directly to the

*H *∠

*LB*(

*H*) =

*F *→

*GH*
words belonging to a vocabulary

*V *and will correspond to
the intuitive notion of

*headwords*. Headwords are thought
to have

*inheritance *properties, and so the production rules

*H *∠

*LB*(

*H*) =

*F *→

*v*(

*H*)
should basically have the form (

*s *denotes the unique

*H *∠

*RB*(

*H*) =

*F *→

*v*(

*H*)
sentence non-terminal located at the root of the tree)
where the notation

*H *∠

*LB*(

*H*) =

*F *→

*HG *means that

*F *is the

*left brother *of

*H *and that

*HG *is generated from

*H *. Naturally,

*RB*(

*H*) =

*F *means that

*F *is the

*right*
We use the automatic transformational (AT) parser of
In Eq.

*v*(

*H*) denotes the unique word

*v *which has the
Brill [to parse a large amount of text and thus provide a

*same* name as the headword

*H *.

basis, in conjunction with

*headword inheritance rules*1, forthe collection of headword production statistics. The
Figure 1 illustrates a possible derivation of the sentence

*Her*
availability of an AT parser solves in principle the problems

*step-mother, kissing her again, seemed charmed *Note
of scarcity of data and of transportability. Any domain with
that attached to the root node of the tree is the headword
sufficient text data provides sufficient parse data.

corresponding to the main verb seemed of the sentence.

A language model is a device that provides to the
It is interesting to observe that a

*bigram *language model
can be considered to have the special headword form
hypothesis about the string of the preceding

*k *– 1 words.

1. That is rules of the type:

*the headword of a simple nounphraseis the last noun; *or

*the headword of a verb phrase is its first verb*,
Comparing Eq. with Eq. , we see that the general
etc. Such rules have been derived by linguists for use in the
headword language model defined by Eq. has essentially
IBM statistical direct parser Headword inheritance rules can
the same parameter complexity as the

*bigram *language
also be derived automatically from data by use of information the-oretic principles.

Figure 1. An illustration of a possible derivation of the sentence “Her step-mother, kissing her again, seemedcharmed.”
For context free grammars these probabilities can be

*P*(

*H *→

*HG*) = λ

*f *(

*H *→

*HG*) +
computed by the method of Jelinek and Lafferty The
headword language model is particularly suited to

*N *best

*resolution *when we use the hypothesis that the word strings
( (

*H*) →

*c*(

*H*)

*c*(

*G*))
were produced by the process corresponding to the parse
specified by the AT parser applied to these strings. Thecomputation of such a probability involves

*one *derivation
We have derived the classes

*c*(

*H*) based on the method of
only and is therefore linear with the length of the sentence.

Clearly, a large amount of parsed data is necessary to

**REFERENCES**
adequately estimate headword production probabilities. In
Henry James:

*What Masie knew*, Penguin Books, New
fact, many more productions of the types

*H *→

*HG *and

*H *→

*GH *will have non-zero probabilities than would
Eric Brill:

*A corpus-Based Approach to Language*
*P*(

*v*(

*G*)

*v*(

*H*))

*Learning*, Ph.D. dissertation, Department of Computerand Information Science, University of Pennsylvania,
=

*v*(

*H*),

*w *=

*v G*
*H *→

*HG *and

*H *→

*GH *the headwords

*H *and
may correspond to words that are quite separated in the text
F. Jelinek, et. al.: Decision tree parsing using a hidden
derivation model, in

*Proceedings of the ARPA SpokenL a n g u a g e S y s t e m s Te c h n o l o g y Wo r k s h o p *,
Therefore, we must smooth efficiently the relative
frequencies obtained in the collection process. Let
F. Jelinek and J. Lafferty: Computation of the probabil-
denote an appropriate

*class *of the headword denote an
ity of initial substring generation by stochastic contextfree grammars,

*Computational Linguistics*, Vol. 17,
appropriate class of the headword

*H *. Then the appropriate
P.F. Brown, et. al.: Class-based n-gram models of natu-ral language,

*Computational Linguistics*, Vol. 18,No. 4, pp. 467 - 480, December 1992.

Source: http://www.isip.piconepress.com/publications/conference_proceedings/1995/ieee_asrw/stochastic_modeling/session_overview.pdf

Evaluation DAO-deficiency patients migraine. Carmen Vidal1, Feliu Titus2 y Rafael Guayta-Escolies3 1 Professor of Nutrition and Bromatology at University of Barcelona, Barcelona (Spain); Member of honour in the Spanish Society of Neorology, Madrid (Spain) and Scientific assessor in the Spanish Association of Patients with Headache (AEPAC), Valencia (Spain); Assessor in th

Uso do Prediderm® (prednisolona) associado ao micofenolato de mofetil no tratamento de anemia hemolítica imunomediada canina: relato de caso Introdução A anemia hemolítica imunomediada (AHIM) é caracterizada pela redução no número de eritrócitos, resultante da destruição por meio de resposta autoimune (1, 2). Tais respostas são mediadas por anticorpos, as imunoglobulinas