Archived Seminars by Year

Show all Seminars     Only show seminars with video

2014

December 2, 2014 | noon

TBA

Joel Tetreault, Yahoo! Labs

November 18, 2014 | 12:00 p.m.

TBA

Yann LeCun, Facebook

November 14, 2014 | noon

TBA

Jill Burstein, Educational Testing Service

November 7, 2014 | 12:00 p.m.

“Learning to Generate Understandable Animations of American Sign Language”

Matt Huenerfauth, Rochester Institute of Technology

[abstract] [biography]

Abstract

Standardized testing has revealed that many deaf adults in the U.S. have lower levels of English literacy; therefore, providing American Sign Language (ASL) on websites can make information and services more accessible. Unfortunately, video recordings of human signers are difficult to update when information changes, and there is no way to support just-in-time generation of web content from a query. Software is needed that can automatically synthesize understandable animations of a virtual human performing ASL, based on an easy-to-update script as input. The challenge is for this software to select the details of such animations so that they are linguistically accurate, understandable, and acceptable to users. This software can also serve as the final surface realization component in future ASL generation or translation technologies. This talk will discuss Huenerfauth's research at the intersection of the fields of computer accessibility, human computer interaction, and computational linguistics. His methodology includes: experimental evaluation studies with native ASL signers, motion-capture data collection from signers to collect a corpus of ASL, linguistic annotation of this corpus, statistical modeling techniques, and animation synthesis. In this way, his laboratory has found models that underlie the accurate and natural movements of virtual human characters performing ASL. His recent focus has been on how signers use 3D points in space and how this affects the hand-movements required for ASL verb signs, and in upcoming work, he is investigating technologies for supporting students who are learning ASL.

Speaker Biography

Matt Huenerfauth is an associate professor at The Rochester Institute of Technology (RIT) in the Golisano College of Computer and Information Sciences; his research focuses on the design of computer technology to benefit people who are deaf or have low levels of written-language literacy. He is an editor-in-chief of the ACM Transactions on Accessible Computing, the major journal in the field of computer accessibility for people with disabilities. Since 2007, Huenerfauth has secured over $2.1 million in external research funding to support his work, including a National Science Foundation CAREER Award in 2008. He has authored 40 peer-reviewed scientific journal articles, book chapters, and conference papers, and he has twice received the Best Paper Award at the ACM SIGACCESS Conference on Computers and Accessibility, the major computer science conference on assistive technology for people with disabilities. He served as the general chair for this conference in 2012 and is a member of the steering committee for the conference series. He received his PhD from the University of Pennsylvania in 2006.

October 31, 2014 | 12:00 p.m.

“Graph Grammars and Automata for NLP”

David Chiang, Notre Dame

[abstract] [biography]

Abstract

All natural language processing systems implicitly or explicitly depend on some representation of linguistic objects and some framework for defining (probabilistic) transformations of those representations. Two such representations that have been extraordinarily successful are sequences (and finite-state machines over sequences) and trees (and finite-state machines over trees). For representing linguistic meaning, however, there is currently much interest in going beyond sequences or trees to graphs (for example, abstract meaning representations), and in this talk I will present some recent and current work on graphs, graph automata, and graph grammars. Hyperedge replacement grammar (HRG) is a formalism for generating graphs whose kinship with context-free grammars immediately leads to some important results: a probabilistic version, a synchronous version, and efficient algorithms for searching or summing packed "forests." However, until recently there was no practical algorithm for finding, given an input graph, an HRG derivation of the graph. An algorithm due to Lautemann was known to be polynomial-time for graphs that are connected and of bounded degree. I will describe an optimization of this algorithm and some important implementation details, resulting in an algorithm that is practical for natural-language applications. This is joint work with Kevin Knight and the ISI summer interns of 2013. DAG acceptors are another formalism, much simpler than HRG, which were recently extended by Quernheim and Knight, but still lacked efficient algorithms for searching or summing over the derivations of an input graph, and lacked a well-defined probability model. I will present solutions to both of these problems, and discuss the problems that still remain. This is joint work with Frank Drewes, Dan Gildea, Adam Lopez, Giorgio Satta, and other participants in the JHU summer workshop in 2014.

Speaker Biography

David Chiang (PhD, University of Pennsylvania, 2004) is an associate professor in the Department of Computer Science and Engineering at the University of Notre Dame. His research is on computational models for learning human languages, particularly how to translate from one language to another. His work on applying formal grammars and machine learning to translation has been recognized with two best paper awards (at ACL 2005 and NAACL HLT 2009) and has transformed the field of machine translation. He has received research grants from DARPA, NSF, and Google, has served on the executive board of NAACL and the editorial board of Computational Linguistics and JAIR, and is currently on the editorial board of Transactions of the ACL.

October 28, 2014 | 12:00 p.m.

“Using Tree Structures for Improved Dependency Parsing Algorithms”

Emily Pitler, Google

[abstract] [biography]

Abstract

Dependency parse trees have been found to be particularly useful for machine translation, question answering, and other practical applications. The two most common search spaces are the set of projective trees or the set of all directed spanning trees. The first requires sacrificing coverage of non-projective structures, which commonly occur in natural language; the second introduces significant computational challenges for many scoring models. In this talk we show how milder assumptions about output tree structures can help us to design parsers that can produce the vast majority of natural language structures while at the same time introducing only constant-time overhead over projective parsers. This talk will mostly focus on graph-based non-projective parsing. We introduce 1-Endpoint-Crossing trees; this property covers 95.8% or more of dependency parses across a variety of languages. We then introduce a "crossing-sensitive" generalization of a third-order factorization that trades off complexity in the model structure (i.e., scoring with features over multiple edges) with complexity in the output structure (i.e., producing crossing edges). The third-order 1-Endpoint-Crossing parser has the same asymptotic run-time as the third-order projective parser and is significantly more accurate under many experimental settings and significantly less accurate on none. The same narrative applies to transition-based parsing: by making similar assumptions on the structures of crossing dependencies, we can design a transition-based parser that is sound and complete for a broad class of trees while adding only constant overhead in run-time compared to popular projective systems.

Speaker Biography

Emily Pitler is a Research Scientist at Google, where she works on natural language parsing and its applications. She received her Ph.D. in Computer and Information Science from the University of Pennsylvania in 2013. She received her undergraduate degree in Computer Science from Yale University.

October 21, 2014 | 12:00 p.m.

“Improving Access to Clinical Data Locked in Narrative Reports: An Informatics Approach”

Wendy Chapman, University of Utah

[abstract] [biography]

Abstract

What symptoms are associated with the patient's genotype? Did patients treated with medication fare better than patients treated surgically? Which patients are more likely to be readmitted to the hospital? Many of the pressing problems in health care today require access to detailed information locked in narrative reports. Natural language processing offers better access to symptoms, risk factors, diagnoses, and treatment outcomes described in text; however, accurately identifying this information requires a plethora of resources and tools. In this talk I will describe the work our research lab has done to help NLP developers, clinicians and researchers, informaticists, and patients better access the rich data contained in narrative reports.

Speaker Biography

Dr. Chapman earned her Bachelor's degree in Linguistics and her PhD in Medical Informatics from the University of Utah in 2000. From 2000-2010 she was a National Library of Medicine postdoctoral fellow and then a faculty member at the University of Pittsburgh. She joined the Division of Biomedical Informatics at the University of California, San Diego in 2010. In 2013, Dr. Chapman became the chair of the University of Utah, Department of Biomedical Informatics.

October 14, 2014 | 12:00 p.m.

“Duolingo: Improving Language Education with Data”

Burr Settles, DuoLingo

[abstract] [biography]

Abstract

Duolingo is a free online education service that allows people to learn new languages while helping to translate the World Wide Web. Since launching two years ago, Duolingo has grown to more than 40 million students from all over the world, and our mobile apps were awarded top honors from both Apple and Google in 2013. In this talk, I will discuss our architecture and present some examples of how we use a data-driven approach to improve the system, drawing on various disciplines including psychometrics, natural language processing, and machine learning.

Speaker Biography

Burr Settles is a lead scientist and software engineer at Duolingo, an award-winning language education website and mobile app. Previously, we was a postdoc in machine learning at Carnegie Mellon University, and earned a PhD in computer sciences from the University of Wisconsin-Madison. His book "Active Learning" an introduction to learning algorithms that "ask questions" was published by Morgan & Claypool in 2012. In his spare time, Burr gets around by bike and plays guitar in the Pittsburgh pop band Delicious Pastries.

October 7, 2014 | 12:00 p.m.

“Single-Channel Mixed Speech Recognition Using Deep Neural Networks”

Dong Yu, Microsoft Research

[abstract] [biography]

Abstract

While significant progress has been made in improving the noise robustness of speech recognition systems, recognizing speech in the presence of a competing talker remains one of the most challenging unsolved problems in the field. In this talk, I will present our first attempt in attacking this problem using deep neural networks (DNNs). Our approach adopted a multi-style training strategy using artificially mixed speech data. I will discuss the strengths and weaknesses of several different setups that we have investigated including a WFST-based two-talker decoder to work with the trained DNNs. Experiments on the 2006 speech separation and recognition challenge task demonstrate that the proposed DNN-based system has remarkable robustness to the interference of a competing speaker. The best setup of our proposed systems achieves an overall WER of 18.8% which improves upon the results obtained by the state-of-the-art IBM superhuman system by 2.8% absolute, with fewer assumptions.

Speaker Biography

Dr. Dong Yu is a principal researcher at the Microsoft speech and dialog research group. His research interests include speech processing, robust speech recognition, discriminative training, and machine learning. He has published over 130 papers in these areas and is the co-inventor of more than 50 granted/pending patents. His recent work on the context-dependent deep neural network hidden Markov model (CD-DNN-HMM) has been shaping the direction of research on large vocabulary speech recognition and was recognized by the IEEE SPS 2013 best paper award.

October 3, 2014 | 12:00 p.m.

“Computational Language Analyses for Health and Psychological Discovery”

Hansen Andrew Schwartz, University of Pennsylvania

[abstract] [biography]

Abstract

What can language analyses reveal about human health and well-being? I build on computational linguistics, typically focused on a better understanding of language, to better understand people -- their health and psychological characteristics -- as revealed through Facebook status updates, tweets, and other personal discourse. With colleagues from psychology and medicine, we found the language people use, captured in word collocations and latent Dirichlet allocation topics, is highly predictive of personality, gender, age, and depression. Similarly, the language in Tweets from different counties predicts the local life satisfaction, HIV prevalence, and heart disease rates, often more accurately than standard socio-behavioral predictors (e.g. rates of coronary heart disease were predicted above and beyond a combination of demographics, socio-economics, smoking rates, and hypertension rates). Beyond prediction, our language-based analyses yield data-driven insights. For example, language variation by personality is both face valid (e.g. extroverts mention "party", neurotic people mention "depression", and conscientious people talk more about the future) and revealing (e.g. introverts are disproportionately interested in Japanese culture, emotionally stable individuals mention topics associate with an active life, and conscientious individuals don't just talk more about "work" but also about vacations and relaxation).

Speaker Biography

Andy Schwartz is a Visiting Assistant Professor in Computer & Information Science at the University of Pennsylvania and he will begin as Assistant Professor at Stony Brook University (SUNY) in the Fall of 2015. His interdisciplinary research focused on large and scalable language analyses for health and social sciences. Utilizing natural language processing and machine learning techniques he seeks to discover new behavioral and psychological factors of health and well-being as manifest through language in social media. He received his PhD in Computer Science from the University of Central Florida in 2011 with research on acquiring lexical semantic knowledge from the Web. His recent work has been featured in The Atlantic and The Washington Post.

September 26, 2014 | 12:00 p.m.

“Opportunities of Social Media in Personal and Societal Well-Being”

Munmun De Choudhury, Georgia Tech

[abstract] [biography]

Abstract

People are increasingly using social media platforms, such as Twitter and Facebook, to share their thoughts and opinions with their contacts. Consequently, there has been a corresponding surge of interest in utilizing continuing streams of evidence from social media on posting activity to reflect on people's psyches and social milieus. In this talk, I will discuss how the ubiquitous use of social media as well as the abundance and growing repository of such data bears potential to provide a new type of "lens" for inferring mental and behavioral health challenges in individuals and populations, namely, postpartum depression, unipolar depression, eating disorders, and post-traumatic stress and anxiety. I will also discuss clinical and design implications, as well as our social and ethical responsibilities around interpretation and automatic inference of health states of people from their online social activities.

Speaker Biography

Munmun De Choudhury is currently an assistant professor at the School of Interactive Computing, Georgia Tech and a faculty associate with the Berkman Center for Internet and Society at Harvard. Munmun's research interests are in computational social science, with a specific focus on reasoning about our health behaviors and wellbeing from social digital footprints. At Georgia Tech, she directs the Social Dynamics and Wellbeing Lab. Munmun has been a recipient of the Grace Hopper Scholarship, recognized with an IBM Emergent Leaders in Multimedia award, and recipient of ACM SIGCHI 2014 best paper award and ACM SIGCHI honorable mention awards in 2012 and 2013. Previously, Munmun was a postdoctoral researcher at Microsoft Research, and obtained her PhD from Arizona State University in 2011.

September 5, 2014 | 12:00 p.m.

“Robust Automatic Speech Recognition in the 21st Century”

Richard M. Stern, Carnegie Mellon University

[abstract] [biography]

Abstract

Over the past decade, speech recognition technology has become increasingly commonplace in consumer, enterprise, and government applications. As higher expectations and greater demands are being placed on speech recognition as the technology matures, robustness in recognition is becoming increasingly important. This talk will review and discuss several classical and contemporary approaches that render the performance of automatic speech recognition systems and related technology robust to changes and degradations in the acoustical environment within which they operate. While distortions produced by quasi-stationary additive noise and quasi-stationary linear filtering can be largely ameliorated by "classical" techniques such as cepstral high-pass filtering as well as by techniques that develop statistical models of the distortion (such as vector Taylor series expansion), these types of approaches fail to provide much useful improvement when speech is degraded by transient or non-stationary noise such as background music or speech, or in environments that include nonlinear distortion. We describe and compare the effectiveness in difficult acoustical environments of techniques based on missing-feature compensation, combination of complementary streams of information, multiple microphones, physiologically-motivated auditory processing, and specialized techniques directed at compensation for nonlinearities, with a focus on how these techniques are applied to the practical problems facing us today.

Speaker Biography

Richard M. Stern received the S.B. degree from the Massachusetts Institute of Technology in 1970, the M.S. from the University of California, Berkeley, in 1972, and the Ph.D. from MIT in 1977, all in electrical engineering. He has been on the faculty of Carnegie Mellon University since 1977, where he is currently a Professor in the Department of Electrical and Computer Engineering, the Department of Computer Science, and the Language Technologies Institute, and a Lecturer in the School of Music. Much of Dr. Stern's current research is in spoken language systems, where he is particularly concerned with the development of techniques with which automatic speech recognition can be made more robust with respect to changes in environment and acoustical ambience. In addition to his work in speech recognition, Dr. Stern has worked extensively in psychoacoustics, where he is best known for theoretical work in binaural perception. Dr. Stern is a Fellow of the IEEE, the Acoustical Society of America, and the International Speech Communication Association (ISCA). He was the ISCA 2008-2009 Distinguished Lecturer, a recipient of the Allen Newell Award for Research Excellence in 1992, and he served as the General Chair of Interspeech 2006. He is also a member of the Audio Engineering Society.

July 30, 2014 | 10am-11am

“Grammar Factorization by Tree Decomposition”

Dan Gildea, University of Rochester

[abstract]

Abstract

We describe the application of the graph-theoretic property known as treewidth to the problem of finding efficient parsing algorithms. This method, similar to the junction tree algorithm used in graphical models for machine learning, allows automatic discovery of efficient algorithms such as the O(n^4) algorithm for bilexical grammars of Eisner and Satta (1999). We examine the complexity of applying this method to parsing algorithms for general Linear Context-Free Rewriting Systems (LCFRS). We show that any polynomial-time algorithm for this problem would imply an improved approximation algorithm for the well-studied treewidth problem on general graphs. All Participant Lectures will be held in Room S1, 4th Floor. Seminar tags : ws14 , ws14mt , ws14clamr , ws14participant

July 29, 2014 | 10am-11am

“Deep, Long and Wide Artificial Neural Networks in Processing of Speech”

Hynek Hermansky, Johns Hopkins University

[abstract]

Abstract

Up to recently, automatic recognition of speech (ASR) proceeded in a single stream: from a speech signal, through a feature extraction module and pattern classifier into search for the best word sequence. Features were mostly hand-crafted based and represented relative short (10-20 ms) instantaneous snapshots of speech signal. Introduction of artificial neural nets (ANNs) into speech processing allowed for much more ambitious and more effective schemes. Today's speech features for ASR are derived from large amounts of speech data, often using complex deep neural net architectures. The talk argues for ANNs that are not only deep but also wide (i.e., processing information in multiple parallel processing streams) and long (i.e., extracting information from speech segments much longer than 10-20 ms). Support comes from psychophysics and physiology of speech perception, as well as from speech data itself. The talk reviews history of gradual shift towards nonlinear multi-stream extraction of information from spectral dynamics of speech, and shows some advantages of this approach in ASR. All Participant Lectures will be held in Room S1, 4th Floor. Seminar tags : ws14 , ws14semantics , ws14participant

July 28, 2014 | 10am-11am

“Toward more linguistically-informed translation models”

Adam Lopez, Johns Hopkins University

[abstract]

Abstract

Modern translation systems model translation as simple substitution and permutation of word tokens, sometimes informed by syntax. Formally, these models are probabilistic relations on regular or context-free sets, a poor fit for many of the world's languages. Computational linguists have developed more expressive mathematical models of language that exhibit high empirical coverage of annotated language data, correctly predict a variety of important linguistic phenomena in many languages, explicitly model semantics, and can be processed with efficient algorithms. I will discuss some ways in which such models can be used in machine translation, focusing particularly on combinatory categorial grammar (CCG). All Participant Lectures will be held in Room S1, 4th Floor. Seminar tags : ws14 , ws14mt , ws14clamr , ws14participant

July 25, 2014 | 10am-11am

“'About' attitudes”

Kyle Rawlins, Johns Hopkins University

[abstract]

Abstract

A central problem in linguistic semantics is the grammar (lexical semantics, compositional semantics, syntactic behavior) of clause-embedding predicates, such as 'know', 'tell', 'wonder', and 'think'. I present an investigation of this problem through the lens of the interaction with 'about'-phrases. I argue that the best account of this interaction involves the verbs being neo-Davidsonian eventuality predicates that characterize events and states with 'content' (following recent work by Kratzer, Hacquard, and others). Arguments and modifiers of attitude verbs function to characterize the content, leading to a clean separation of syntactic argument structure and event structure; 'about'-phrases in particular indirectly characterize content via a notion of aboutness adapted from work by David Lewis. All Participant Lectures will be held in Room S1, 4th Floor. Seminar tags : ws14 , ws14semantics , ws14participant

July 24, 2014 | 10am-11am

“SEMANTICS Lecture (iVectors)”

Lukas Burget, Brno University of Technology

[abstract]

Abstract

Lukas will relate NN approaches of generating iVectors to the current pursuits of the ASR team - eliminating unknown unknowns. All Participant Lectures will be held in Room S1, 4th Floor. Seminar tags : ws14 , ws14semantics , ws14participant

July 24, 2014 | 2pm-3pm

“Kernels for Relational Learning from Text Pairs”

Alessandro Moschitti, Qatar Computing Research Institute

[abstract] [biography]

Abstract

Linguistic relation learning is a pervasive research area in Natural Language Processing, which ranges from syntactic relations captured by syntactic parsers to semantic relations, e.g., modeled with Semantic Role Labeling, coreference resolution, discourse structure approaches or more directly with systems for relation extraction applied to pairs of entities. Such methods typically target constituents spanning one or multiple sentences.An even more challenging class regards relational learning from pairs of entire (short) texts, which, to be captured, requires the joint analysis of the relations between the different constituents in both texts. Typical examples of such relations are: textual entailment, paraphrasing, correct vs. incorrect association of question with its target answer passage, correct/incorrect translation between a text and its translation, etc.Given the complexity of providing a theory modeling such relations, researchers rely on machine learning methods.Such models define vector of features for training relational classifiers, which are based on several textual similarities. The latter are computed using different representations, applied to the two texts.This talk will show a different approach to relational learning from text pairs, which is based on structural kernels: first, a structural/linguistic representation of the text is provided, e.g., using syntactic parsing or semantic role labeling. Then, semantic links between the constituents of the two texts are automatically derived, e.g., using string matching or lexical similarity. Finally, the obtained structures are processed by structural kernels, which automatically map them in feature spaces, where learning algorithms can learn the target relation encoded by the data labels. The talk will show results using different representations for passage reranking in question answering systems. All Participant Lectures will be held in Room S1, 4th Floor. Seminar tags : ws14 , ws14semantics , ws14participant

Speaker Biography

Alessandro Moschitti is a Senior Research Scientist at the Qatar Computing Research Institute (QCRI) and a tenured professor at the Computer Science (CS) Department of the University of Trento, Italy. He obtained his PhD in CS from the University of Rome in 2003. He has been the only non-US faculty member to participate in the IBM Watson Jeopardy! challenge. He has significant expertise in both theoretical and applied ML for NLP, IR and Data Mining. He has devised innovative kernels for advanced syntactic/semantic processing with support vector and other kernel-based machines. He is an author or co-author of more than 190 scientific articles in many different areas of NLP, ranging from Semantic Role Labeling to Opinion Mining. He has been an area chair for the semantics track at ACL and IJCNLP conferences and for machine learning track at ACL and ECML. Additionally, he has been PC chair of other important conferences and workshops for the ML and ACL communities. Currently, he is the General Chair of EMNLP 2014 and he is on the editorial board of JAIR, JNLE and JoDS. He has received three IBM Faculty Awards, one Google Faculty Award and three best paper awards.

July 23, 2014 | 10am-11am

“Synchronous Rewriting for Natural Language Processing”

Giorgio Satta, University of Padua

[abstract]

Abstract

In synchronous rewriting two or more rewriting processes, typically context-free, can be carried out in a synchronous way. Synchronous rewriting systems are exploited in machine translation and syntax/semantics interface, as well as in parsing applications where one needs to model syntactic structures based on discontinuous phrases or on non-projective dependency trees. In this presentation I overview some formalisms using synchronous rewriting, including the linear context-free rewriting systems of (Vijay-Shanker, Weir, and Joshi, 1987) and discuss several computational problems that arise in the context of the above mentioned applications. All Participant Lectures will be held in Room S1, 4th Floor. Seminar tags : ws14 , ws14semantics , ws14participant

July 22, 2014 | 10am-11am

“Relating human perceptual data to speech representations through cognitive modeling”

Naomi Feldman, University of Maryland

[abstract] [biography]

Abstract

Abstract of the talk to be announced shortly All Participant Lectures will be held in Room S1, 4th Floor. Seminar tags : ws14 , ws14speech , ws14asr , ws14participant

Speaker Biography

Naomi Feldman is an assistant professor in the Department of Linguistics and the Institute of Advanced Computer Studies at the University of Maryland, and a member of the computational linguistics and information processing lab. She works primarily on computational models of human language, using techniques from machine learning and statistics to formalize models of how people learn and process language. She received her Ph.D. in cognitive science at Brown University in 2011.

July 21, 2014 | 10am-11am

“Generation or transfer: this is still the question”

Nianwen Xue, Brandeis University

[abstract]

Abstract

I will discuss what makes AMR abstract, comparing it with syntactic structures as they are annotated in the treebanks. I will also discuss how similar ideas can be implemented in the alignment of parallel parse trees, sharing our experience working on a hierarchically aligned Chinese-English parallel treebank. Finally I will speculate on the relative strengths and weaknesses of these two types of resources and the MT approaches they support. Some references: L. Banarescu, C. Bonial, S. Cai, M. Georgescu, K. Griffitt, U. Hermjakob, K. Knight, P. Koehn, M. Palmer, and N. Schneider. 2013. "Abstract Meaning Representation for Sembanking" Proc. Linguistic Annotation Workshop, 2013. Nianwen Xue, Ondrej Bojar, Jan Hajic, Martha Palmer, Zdenka Uresova and Xiuhong Zhang. 2014. Not an Interlingua, but Close: Comparison of English AMRs to Chinese and Czech. Proceedings of LREC-2014. Reykjavik, Iceland. Dun Deng and Nianwen Xue. 2014 (to appear). Aligning Chinese-English Parallel Parse Trees: Is it Feasible?. Proceedings of LAW VIII 2014. Dublin, Ireland. Dun Deng and Nianwen Xue. 2014 (to appear). Building a Hierarchically Aligned Parallel Chinese-English TreeBank. Proceedings of COLING-2014. Dublin, Ireland. All Participant Lectures will be held in Room S1, 4th Floor. Seminar tags : ws14 , ws14mt , ws14clamr , ws14participant

July 18, 2014 | 10am-11am

“Reconsidering Source Characteristics as Keys to Meaning”

Jordan Cohen, Spelamode Consulting

[abstract] [biography]

Abstract

The source signal in conversational speech carries syntactic, semantic, and emotional information. While the source/filter theory of speech synthesis has a long history, but the technology used has failed to separate the source from the filter in a convincing way. New methods in speech morphing promise a cleaner separation between the source and filter using STFT techniques. I will explain the voice morphing system, and demonstrate how to derive the source signal from speech. Details of the source signal itself can be examined for clues to the semantic representation of the resulting utterances. All Participant Lectures will be held in Room S1, 4th Floor. Seminar tags : ws14 , ws14speech , ws14asr , ws14participant

Speaker Biography

Jordan Cohen holds a BSEE from the University of Massachusetts, Amherst, Massachusetts in 1968, a MSEE from the University of Illinois, Champaign/Urbana, Ill, in1970, a MS in Linguistics from the University of Connecticut, Storrs, Connecticut in 1976, and a PhD in Linguistics from the University of Connecticut in 1982. He is currently the Founder and Chief Technologist at Spelamode Consulting, engaged in technical and intellectual property pursuits. He serves as the Co-CTO to Kextil, an emerging field service support company. He was previously the Principal Investigator for GALE at SRI International, and the CTO of Voice Signal Technologies. He was a research staff member at IDA, and at IBM Research, and worked at NSA as a research engineer. He served in the USAF from 1970 to 1974. He is engaged in the application of speech recognition for practical systems, and in various aspects of intellectual property pursuit. He is a co-author of 13 US patents.

July 17, 2014 | 10am-11am

“(Dis)similarities between the Tectogrammatical Representation of Meaning and the AMR”

Jan Hajic, Charles University in Prague

[abstract]

Abstract

Tectogrammatical Representation is the meaning representation used in the family of Prague Dependency Treebanks, including the parallel English-Czech dependency treebank based on the WSJ texts used for the Penn Treebank. In the talk, the tectogrammatical representation (TR) and its basic principles will be shortly introduced, and the main features will be demonstrated on examples from a 100-sentence corpus annotated both at the TR level as well as with AMRs. Differences between the two representations will be highlited and discussed, together with their possible influence on (meaning-based) MT systems. All Participant Lectures will be held in Room S1, 4th Floor. Seminar tags : ws14 , ws14mt , ws14clamr , ws14participant

July 16, 2014 | 10am-11am

“Applying Physiologically-Motivated Models of Auditory Processing to Automatic Speech Recognition: Promise and Progress”

Richard Stern, Carnegie Mellon University

[abstract] [biography]

Abstract

For many years the human auditory system has been an inspiration for developers of automatic speech recognition systems because of its ability to interpret speech accurately in a wide variety of difficult acoustical environments. This talk will discuss the application of physiologically-motivated and psychophysically-motivated approaches to signal processing that facilitates robust automatic speech recognition. The talk will begin by reviewing selected aspects of auditory processing that are believed to be especially relevant to speech perception, and that had been components of signal processing schemes that were proposed in the 1980s. We will review and discuss the motivation for, and the structure of, classical and contemporary computational models of auditory processing that have been applied to speech recognition, and we will evaluate and compare their impact on improving speech recognition accuracy. We will discuss some of the general observations and results that have been obtained during the renaissance of activity in auditory-based features over the past 15 years. Finally, we will identify certain attributes of auditory processing that we believe to be generally helpful, and share insights that we have gleaned about auditory processing from recent work at Carnegie Mellon. All Participant Lectures will be held in Room S1, 4th Floor. Seminar tags : ws14 , ws14speech , ws14asr , ws14participant

Speaker Biography

Richard M. Stern received the S.B. degree from the Massachusetts Institute of Technology in 1970, the M.S. from the University of California, Berkeley, in 1972, and the Ph.D. from MIT in 1977, all in electrical engineering. He has been on the faculty of Carnegie Mellon University since 1977, where he is currently a Professor in the Department of Electrical and Computer Engineering, the Department of Computer Science, and the Language Technologies Institute, and a Lecturer in the School of Music. Much of Dr. Stern's current research is in spoken language systems, where he is particularly concerned with the development of techniques with which automatic speech recognition can be made more robust with respect to changes in environment and acoustical ambience. In addition to his work in speech recognition, Dr. Stern has worked extensively in psychoacoustics, where he is best known for theoretical work in binaural perception. Dr. Stern is a Fellow of the IEEE, the Acoustical Society of America, and the International Speech Communication Association (ISCA). He was the ISCA 2008-2009 Distinguished Lecturer, a recipient of the Allen Newell Award for Research Excellence in 1992, and he served as the General Chair of Interspeech 200

July 15, 2014 | 10am-11am

“Neural Networks in Machine Translation”

David Chiang, University of Southern California

[abstract]

Abstract

Having brought dramatic improvements in speech recognition, neural networks are now beginning to make an impact in machine translation as well. I will give a survey of some recent advances, including our own work at USC/ISI on fast training and decoding with neural language models, work at BBN on neural translation models, and other approaches as well. All Participant Lectures will be held in Room S1, 4th Floor. Seminar tags : ws14 , ws14semantics , ws14participant

July 14, 2014 | 10am-11am

“Manifold Constrained Deep Neural Networks for ASR”

Richard Rose, McGill University

[abstract] [biography]

Abstract

This presentation investigates the application of manifold learning approaches to acoustic modeling in automatic speech recognition (ASR). Acoustic models in ASR are defined over high dimensional feature vectors which can be represented by a graph with nodes corresponding to the feature vectors and weights describing the local relationships between feature vectors. This representation underlies manifold learning approaches which assume that high dimensional feature representations lie on a low dimensional imbedded manifold. A manifold based regularization framework is presented for deep neural network (DNN) training of tandem bottle-neck feature extraction networks for ASR. It is argued that this framework has the effect of preserving the underlying low dimensional manifold based relationships that exists among speech feature vectors within the hidden layers of the DNN. This is achieved by imposing manifold based locality preserving constraints on the outputs of the network. The ASR word error rates obtained using these networks is evaluated for speech in noise tasks and compared to that obtained using DNN bottle-neck networks trained without manifold constraints. All Participant Lectures will be held in Room S1, 4th Floor. Seminar tags : ws14 , ws14speech , ws14asr , ws14participant

Speaker Biography

Received B.S. and M.S. degrees from the Electrical and Computer Engineering Department at the University of Illinois, obtained a Ph.D. E.E. degree from what is now known as the The Center for Signal and Image Processing (CSIP) at the Georgia Institute of Technology in 1988 with a thesis in speech coding and speech analysis. From 1980 to 1984, he was with Bell Laboratories, now a division of Lucent Technologies working on signal processing and digital switching systems. From 1988 to 1992, he was a member of the Speech Systems and Technology group, now called the Information Systems Technology Group, at MIT Lincoln Laboratory working on speech recognition and speaker recognition. I was with AT&T from 1992 to 2003, specifically in the Speech and Image Processing Services Laboratory at AT&T Labs - Research in Florham Park, NJ after 1996. Currently, he is an associate professor of Electrical and Computer Engineering at McGill University in Montreal, Quebec. An IEEE Fellow, served as a member of the IEEE Signal Processing Society Technical Committee on Digital Signal Processing from 1990 to 1995, and was on the organizing committee of the 1990 and 1992 DSP workshops. Also have served as an adjunct faculty member of the Georgia Institute of Technology, was elected as an at large member of the Board of Governers for the Signal Processing Society during the period from 1995 to 1997, and served as membership coordinator during that time. Pro Rose also spend the spring of 1996 at Furui Lab at NTT in Tokyo. Sadaoki Furui now has a laboratory at the Tokyo Institute of Technology, served as an associate editor for the IEEE Transactions on Speech and Audio Processing from 1997 to 1999, served as a member of the IEEE SPS Speech Technical Committee (STC) and was the founding editor of the STC Newsletter from 2002 through 2005. Also have served as an associate editor of the IEEE Transactions on Audio, Speech, and Language Processing

July 11, 2014 | 11am-12pm

“Designing Abstract Meaning Representations for Machine Translation”

Martha Palmer, University of Colorado

[abstract]

Abstract

Abstract Meaning Representations (AMRs) are rooted, directional and labeled graphs that abstract away from morpho-syntactic idiosyncrasies such as word category (verbs and nouns), word order, and function words (determiners, some prepositions). They also make explicit many implicit semantic and pragmatic relations. Because syntactic idiosyncrasies and different choices about what to make explicit account for many cross-lingual differences, it is worth exploring whether this representation can serve as a useful, minimally divergent transfer layer in machine translation. This talk will present some of the challenges of semantic representations and discuss the contributions AMRs provide over and above other current representation schemes. All PRELIM Seminars will be held in Room S9, 1st Floor. Seminar tags : ws14 , ws14prelim , ws14semantics

July 11, 2014 | 3:30pm-4:30pm

“Common Sense and Language”

Benjamin van Durme, Johns Hopkins University

[abstract]

Abstract

It is widely assumed that natural language understanding requires a significant amount of general world knowledge, or 'common sense'. I will first review various expressions of this claim, and define common sense (Common Sense for Language). I then will describe two approaches to automatically acquiring this knowledge, Common Sense from Language, either from the generalization over multiple situational descriptions, or in the direct interpretation of generic sentences. I will claim that both lead to the same roadblock: we can acquire common sense in the form of generic-like statements, but standard text corpora on their own do not easily, explicitly relay the underlying quantifier domain restrictions, nor quantifier strengths, that are required for full generic interpretation. Moving from a 'possible' to a 'probable' interpretation of generics is then the major obstacle in acquiring general world knowledge for NLU (if we wish to rely exclusively on text-based acquisition). All PRELIM Seminars will be held in Room S9, 1st Floor. Seminar tags : ws14 , ws14prelim , ws14semantics

July 10, 2014 | 11am-12pm

“Perceptual Semantics and Coordination in Dialogue”

Staffan Larsson, University of Gothenburg

[abstract]

Abstract

A formal semantics for low-level perceptual aspects of meaning is presented, tying these together with the logical-inferential aspects of meaning traditionally studied in formal semantics. The key idea is to model perceptual meanings as classifiers of perceptual input. Furthermore, we show how perceptual aspects of meaning can be updated as a result of observing language use in interaction, thereby enabling fine-grained semantic plasticity and semantic coordination. This requires a framework where intensions are (1) represented independently of extensions, and (2) structured objects which can be modified as a result of learning. We use Type Theory with Records (TTR), a formal semantics framework which starts from the idea that information and meaning is founded on our ability to perceive and classify the world, i.e., to perceive objects and situations as being of types. As an example of our approach, we show how a simple classifier of spatial information based on the Perceptron can be cast in TTR. Time permitting, we will also outline preliminary accounts of compositionality and vagueness of perceptual meanings, the latter using probabilistic TTR. All PRELIM Seminars will be held in Room S9, 1st Floor. Seminar tags : ws14 , ws14prelim , ws14semantics

July 10, 2014 | 3:30pm-4:30pm

“The State of the Art in Semantic Parsing”

Percy Liang, Stanford University

[abstract]

Abstract

Semantic parsing, the task of mapping utterances to semantic representations (e.g. logical forms), has its roots in the early natural language understanding systems of the 1960s. These rule-based systems were representationally sophisticated, but brittle, and thus fell out of favor as the statistical revolution swept NLP. Since the late 1990s, however, there has been a resurgence of interest in semantic parsing from the statistical perspective, where the representations are logical but the learning is not. Most recently, there are efforts to learn these logical forms automatically from denotations, a much more realistic but also challenging setting. The learning perspective has both led to practical large-scale semantic parsers, but interestingly also has implications for the semantic representations. All PRELIM Seminars will be held in Room S9, 1st Floor. Seminar tags : ws14 , ws14semantics , ws14prelim

July 9, 2014 | 11am-12pm

“Bayesian Pragmatics”

Dan Lassiter, Stanford University

[abstract]

Abstract

An influential body of work in recent cognitive science makes use of structured Bayesian models to understand diverse cognitive activities in which uncertainty plays a critical role, such as reasoning, vision, learning, social cognition, and syntactic processing. Similar problems arise in semantics and pragmatics, where ambiguity, vagueness, and context-sensitivity are commonplace, and rich pragmatic interpretation are massively underdetermined by the linguistic signal. In recent work Noah Goodman and I have developed a framework combining structured Bayesian models with compositional model-theoretic semantics, incorporating insights from Gricean and game-theoretic pragmatics. We have argued that this gives new insight into how and why context and background knowledge influence interpretation. I will describe the approach at a high level and consider three test cases: (1) predictions about the effects of background expectations on ambiguity resolution, (2) how the model derives graded implicatures as probabilistic inferences about speaker intentions, and (3) how context influences the information conveyed by vague and context-sensitive language. All PRELIM Seminars will be held in Room S9, 1st Floor. Seminar tags : ws14 , ws14prelim , ws14semantics

July 9, 2014 | 3:30pm-4:30pm

“The Problem of Reference”

David McAllester, Toyota Technical Institute

[abstract]

Abstract

This talk will present an approach to the semantics of natural language focusing on the problem of reference. Phrases of natural language refer to things in the world such as "Obama", "Air Force One", "the disappeared Malaysian airliner" or "the annexation of Crimea". Sampled sentences will be used to argue that resolving reference is essential to any treatment of semantics. The discussion of natural language semantics will include both philosophical considerations, such as the notion of "a thing in the world" and the problem of grounding, as well as concrete engineering problems such as achieving good performance on the CoNLL 2012 shared task coreference evaluation. A new grammar formalism for modeling reference --- entity grammars --- will also be presented. All PRELIM Seminars will be held in Room S9, 1st Floor. Seminar tags : ws14 , ws14prelim , ws14semantics

July 8, 2014 | 11am-12pm

“A Rich Probabilistic Type Theory for the Semantics of Natural Language”

Shalom Lappin, King's College London

[abstract]

Abstract

In a classical semantic theory meaning is defined in terms of truth conditions. The meaning of a sentence is built up compositionally through a sequence of functions from the semantic values of constituent expressions to the value of the expression formed from these syntactic elements (Montague, 1974). In this framework the type system is categorical. A type T identifies a set of possible denotations for expressions from the values of its constituents. There are at least two problems with this framework. First, it cannot represent the gradience of semantic properties that is pervasive in speakers' judgements concerning truth, predication, and meaning relations. Second, it offers no account of semantic learning. It is not clear how a reasonable account of semantic learning could be constructed on the basis of the categorical type systems that a classical semantic theory assumes. Such a system does not appear to be efficiently learnable from the primary linguistic data (with weak learning biases), nor is there much psychological data to suggest that it expresses biologically determined constraints on semantic learning. A semantic theory that assigns probability rather than truth conditions to sentences is in a better position to deal with both of these issues. Gradience is intrinsic to the theory by virtue of the fact that speakers assign values to declarative sentences in the continuum of real numbers [0,1], rather than Boolean values in {0,1}. Moreover, a probabilistic account of semantic learning is facilitated if the target of learning is a probabilistic representation of meaning. We consider two strategies for constructing a probabilistic semantics. One is a top-down approach where one sustains classical (categorical) type and model theories, and then specifies a function that assigns probability values to the possible worlds that the model provides. The probability value of a sentence relative to a model M is the sum of the probabilities of the worlds in which it is true. The other is a bottom-up approach where one defines a probabilistic type theory and characterizes the probability value of an Austinian proposition relative to a set of situation types (Cooper (2012)). This proposition is the output of the function that applies to the probabilistic semantic type judgements associated with the syntactic constituents of the proposition. All PRELIM Seminars will be held in Room S9, 1st Floor. Seminar tags : ws14 , ws14prelim , ws14semantics

July 8, 2014 | 3:30pm-4:30pm

“Semantics, Science, and 10-year Olds”

Oren Etzioni, University of Washington, Allen Institute

[abstract]

Abstract

The Allen Institute of AI (AI2) has is building software that aims to achieve proficiency on standardized science & math tests. My talk will introduce AI2, its research methodology, and describe a series of semantic challenges, and associated data sets that we are sharing with the community. Unlike other speakers, I'm offering problems not solutions. All PRELIM Seminars will be held in Room S9, 1st Floor. Seminar tags : ws14 , ws14prelim , ws14semantics

July 7, 2014 | 3:30pm-4:30pm

“Distinguishing "possible" from "probable" meaning shifts: How distributions impact linguistic theory”

James Pustejovsky, Brandeis University

[abstract]

Abstract

In this talk, I discuss the changing role of data in modeling natural language, as captured in linguistic theories. The generative tradition of introducing data using only "evaluation procedures", rather than "discovery procedures", promoted by Chomsky in the 1950s, is slowly being unraveled by the exploitation of significant language datasets that were unthinkable in the 1960s. Evaluation procedures focus on possible generative devices in language without constraints from actual (probable) occurrences of the constructions. After showing how both procedures are natural to scientific inquiry, I describe the natural tension between data and the theory that aims to model it, with specific reference to the nature of the lexicon and semantic selection. The seeming chaos of organic data inevitably violates our theoretical assumptions. But in the end, it is restrictions apparent in the data that call for postulating structure within a revised theoretical model. All PRELIM Seminars will be held in Room S9, 1st Floor. Seminar tags : ws14 , ws14semantics , ws14prelim

May 2, 2014 | 12:00 p.m.

“Latent Semantic Analysis and Concept Inventories; Discovering and Classifying Student Misconceptions in STEM Education”   Video Available

Isidoros Doxas, BAE systems

[abstract] [biography]

Abstract

Latent Semantic Analysis (LSA) is a vector-based bag-of-words model originally intended for use in information retrieval systems, which has found use in a wide range of pure and applied settings, from providing feedback to pilots on landing technique to diagnosing mental disorders from prose. Education applications of LSA include selecting instructional materials for individual students, grading student essays, improving student reading comprehension, and facilitating Concept Inventory construction. Concept Inventories are multiple choice instruments that can provide researchers with a map of students' conceptual understanding of a field. They are extensively used in all areas of Science, Technology, Engineering and Mathematics (STEM) education, but their construction is significantly hampered by the need to discover student's particular misconceptions in each field. I will give a short introduction to LSA and to the use and construction of Concept Inventories, and I will describe how we use LSA to discover and classify student misconceptions, accelerating the construction and validation of these expensive instruments. Work performed at the University of Colorado, Boulder. Current Address: BAE Systems, Columbia, MD.

Speaker Biography

Isidoros Doxas received a B.Sc. in Physics from Queen Mary College, University of London in 1981, a M.A. in Physics from Columbia University in New York in 1983, and a Ph.D. in Plasma Physics from the University of Texas at Austin in 1988. He spent 19 years at the University of Colorado, Boulder, first at the Astrophysical, Planetary and Atmospheric Sciences Department, and then at the Center for Integrated Plasma Studies, where he was a Fellow. He moved to BAE Systems in 2008. He has worked on plasma turbulence, nonlinear dynamics and chaos in fusion devices and space plasmas. His work is mainly analytical and numerical, and he has developed or co-developed novel numerical techniques for high performance computing in turbulence, forecasting, and detector technology. He has worked on science education since 1996. Since 2001 he has worked on intelligent tutoring systems and machine learning, which has led to his current interest in the geometry and dynamics of discourse and of people's visual experience.

April 29, 2014 | 12:00 p.m.

“Summarizing the Patient Record at the Point of Patient Care”

Nomie Elhadad, Columbia University

[abstract] [biography]

Abstract

The Electronic Health Record (EHR) comes laden with an ambitious array of promises: at the point of patient care, it will improve quality of documentation, reduce cost of care, and promote patient safety; in parallel, as more and more data are collected about patients, the EHR gives rise to exciting opportunities for mining patient characteristics and holds out the hope of compiling comprehensive phenotypic information. Leveraging the information present in the EHR is not a trivial step, however, especially when it comes to the information conveyed in clinical notes. In this talk I will focus on one of the challenges faced by the EHR and its users: information overload. With ever-growing longitudinal health records, it is difficult for physicians to keep track of what is salient to their information needs when treating individual patients. As for text mining purposes, it is not clear that more data is always better. I will report and discuss our ongoing efforts in generating patient record summaries for clinicians.

Speaker Biography

Noemie Elhadad is an assistant professor in Biomedical Informatics at Columbia University. Her research is in informatics, natural language processing, and data mining. She investigates ways in which large, unstructured clinical datasets (e.g., patient records) and health consumer datasets (e.g., online health communities) can be processed automatically to enhance access to relevant information for clinicians, patients and health researchers alike.

April 25, 2014 | 12:00 p.m.

“Beyond Left-to-Right: Multiple Decomposition Structures for SMT”

Kristina Toutanova, Microsoft Research

[abstract] [biography]

Abstract

Standard phrase-based translation models do not explicitly model context dependence between translation units. As a result, they rely on large phrase pairs and target language models to recover contextual effects in translation. In this work, we explore n-gram models over Minimal Translation Units (MTUs) to explicitly capture contextual dependencies across phrase boundaries. We examine the independence assumptions entailed by the direction of the n-gram decomposition order, and explore multiple static alternatives to the standard left-to-right decomposition. Additionally, we implement and test a dynamic bidirectional decomposition order, in which each translation unit can select its most predictive context. The resulting models are evaluated in an intrinsic task of lexical selection for MT as well as a full MT system, through n-best reranking. These experiments demonstrate that additional contextual modeling does indeed benefit a phrase-based system and that the direction of conditioning is important. Integrating multiple conditioning orders provides consistent benefit, and the most important directions differ by language pair. Joint work with Hui Zhang, Chris Quirk, and Jianfeng Gao

Speaker Biography

Kristina Toutanova is a researcher at Microsoft Research, Redmond and an affiliate assistant professor at the University of Washington. She obtained her Ph.D. from the Computer Science Department at Stanford University with Christopher Manning. She has been active in research on modeling the structure of natural language using machine learning, especially in the areas of machine translation, syntactic and semantic parsing, and morphological analysis. She is a Program Co-chair for ACL 2014, a member of the Computational Linguistics editorial board as well as an action editor for TACL.

April 15, 2014 | 12:00 p.m.

“Finding Information in Disfluencies”

Mari Ostendorf, University of Washington

[abstract] [biography]

Abstract

One of the characteristics of spontaneous speech that distinguishes it from written text is the presence of disfluencies, including filled pauses (um, uh), repetitions, and self corrections. In spoken language processing applications, disfluencies are typically thought of as "noise" in the speech signal. However, there are several systematic patterns associated with where disfluencies occur that can be leveraged to automatically detect them and to improve natural language processing. Further, rates of different types of disfluencies appear to depend on multiple levels of speech production planning and to vary depending on the individual speaker and the social context. Thus, detecting different disfluency types provides additional information about spoken interactions -- beyond the literal meaning of the words. In this talk, we describe both computational models for multi-domain disfluency detection and analyses of different corpora that provide some insights into inter- and intra-speaker variability in both high-stakes and casual contexts..

Speaker Biography

Mari Ostendorf is a Professor of Electrical Engineering at the University of Washington. After receiving her PhD in electrical engineering from Stanford University, she worked at BBN Laboratories, then Boston University, and then joined the University of Washington (UW) in 1999. At UW, she is an Endowed Professor of System Design Methodologies in Electrical Engineering and an Adjunct Professor in Computer Science and Engineering and in Linguistics. From 2010-2012, she served as the Associate Dean for Research and Graduate Studies in the College of Engineering. She has previously been a visiting researcher at the ATR Interpreting Telecommunications Laboratory and at the University of Karlsruhe, a Scottish Informatics and Computer Science Alliance Distinguished Visiting Fellow, and an Australia-America Fulbright Scholar at Macquarie University. Prof. Ostendorf's research interests are in dynamic and linguistically-motivated statistical models for speech and language processing. Her work has resulted in over 200 publications and 2 paper awards. Prof. Ostendorf has served as co-Editor of Computer Speech and Language, as the Editor-in-Chief of the IEEE Transactions on Audio, Speech and Language Processing, and she is currently the VP Publications for the IEEE Signal Processing Society. She is also a member of the ISCA Advisory Council. She is a Fellow of IEEE and ISCA, a recipient of the 2010 IEEE HP Harriett B. Rigas Award, and a 2013-2014 IEEE Signal Processing Society Distinguished Lecturer.

April 8, 2014 | 12:00 p.m.

“Scalable Topic Models and Applications to Machine Translation”

Ke Zhai, University of Maryland, College Park

[abstract] [biography]

Abstract

Topic models are powerful tools for statistical analysis in text processing. Despite their success, application to large datasets is hampered by scaling inference to large parameter spaces. In this talk, we describe two ways to speed up topic models: parallelization and streaming. We propose a scalable and flexible implementation using variational inference on MapReduce. We further demonstrate two extensions of this model: using informed priors to incorporate word correlations, and extracting topics from a multilingual corpus. An alternative approach to achieve scalability is streaming, where the algorithm sees a small part of data at a time and update the model gradually. Although many streaming algorithms have been proposed for topic models, they all overlook a fundamental but challenging problem---the vocabulary is constantly evolving over time. We propose an online topic models with infinite vocabulary, which address the missing piece, and show that our algorithm is able to discover new words and refine topics on the fly. In addition, we also examine how topic models are helpful in acquiring domain knowledge and improving machine translation.

Speaker Biography

Ke Zhai is a PhD candidate in Department of Computer Science, University of Maryland, College Park, working with Prof. Jordan Boyd-Graber. He is expected to receive his PhD degree in Fall 2014. He works in the area of Machine Learning and Natural Language Processing, with an additional focus on the scalability and cloud computing. He also worked on several research projects on applying probabilistic Bayesian models in the area of image processing and dialogue modelling. He had open-sourced some libraries, including Mr. LDA, which is a package for large-scale topic modeling and has been adopted in research and industry.

April 4, 2014 | 12:00 p.m.

“Speech Morphing - A Signal Processing Solution”

Jordan Cohen

[abstract] [biography]

Abstract

Speech Morphing (or Voice Morphing) is changing one person's voice to sound like another person, or like something else. There are many issues to be dealt with, including the size and shape of the vocal apparatus, the pitch of his or her speech, the particular habits of the two speakers, the accents of the two people, and other linguistic elements. This talk focuses on only the first issues - changing the size and shape of the vocal tract and pitch of the apparent talker. Since the advent of vocoding, and its promise to separate the source and filter of the speech signal from each other, this manipulation has been an unfulfilled promise. Speech synthesizers have likewise failed to solve these issues. Jordan will discuss a modern signal processing solution which promises to rectify the difficulties of past solutions, and is computable.

Speaker Biography

Jordan Cohen is the technologist at Spelamode, a consulting firm specializing in speech and language technology, and intellectual property of these and associated fields. He is the co-CTO of Kextil, and the Chief Scientist of Speech Morphing, Inc. Jordan was the Principal Investigator for GALE at SRI, served as the CTO of Voice Signal, was the Director of Business Relations at Dragon, and was on the technical staffs of IDA, IBM, and NSA. He has a PhD in Linguistics from the University of Connecticut, and an MS in Electrical Engineering from U. Ill. He currently lives three blocks from the ocean in Kure Beach, North Carolina.

March 25, 2014 | 12:00 p.m.

“Towards Large-Scale Natural Language Inference with Distributional Semantics”   Video Available

Jackie CK Cheung, University of Toronto

[abstract] [biography]

Abstract

Language understanding and semantic inference are crucial for solving complex natural language applications, from intelligent personal assistants to automatic summarization systems. However, current systems often require hand-coded information about the domain of interest, an approach that will not scale up to the large array of possible domains and topics in text collections today. In this talk, I demonstrate the potential of distributional semantics (DS), an approach to modeling meaning by using the contexts in which a word or phrase appears, to assist in acquiring domain knowledge and to support the desired inference. I present a method that integrates phrasal DS representations into a probabilistic model in order to learn about the important events and slots in a domain, resulting in state-of-the-art performance on template induction and multi-document summarization for systems that do not rely on hand-coded domain knowledge. I also propose to evaluate DS by their ability to support inference, the hallmark of any semantic formalism. These results demonstrate the utility of DS for current natural language applications, and provide a principled framework for measuring progress towards automated inference in any domain

Speaker Biography

Jackie CK Cheung is a PhD candidate at the University of Toronto. His research interests span several areas of natural language processing, including computational semantics, automatic summarization, and natural language generation. His work is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), as well as a Facebook Fellowship.

March 11, 2014 | 12:00 p.m.

“Deep Learning of Generative Models”   Video Available

Yoshua Bengio, University of Montreal

[abstract] [biography]

Abstract

Deep learning has been highly successful in recent years mostly thanks to progress in algorithms for training deep but supervised feedforward neural networks. These deep neural networks have become the state-of-the-art in speech recognition, object recognition, and object detection. What's next for deep learning? We argue that progress in unsupervised deep learning algorithms is a key to progress on a number of fronts, such as better generalization to new classes from only one or few labeled examples, domain adaptation, transfer learning, etc. It would also be key to extend the output spaces from simple classification tasks to structured outputs, e.g., for machine translation or speech synthesis. This talk discusses some of the challenges involved in unsupervised learning of models with latent variables for AI tasks, in particular the difficulties due to the partition function, mixing between modes, and the potentially huge number of real or spurious modes. The manifold view of deep learning and experimental results suggest that many of these challenges could be greatly reduced by performing the hard work in the learned higher-level more abstract spaces discovered by deep learning, rather than in the space of visible variables. Further gains are seeked by exploiting the idea behind GSNs (Generative Stochastic Networks) and denoising auto-encoders: learning a Markov chain operator that generates the desired distribution rather than parametrizing that distribution directly. The advantage is that each step of the Markov chain transition involves fewer modes, i.e., a partition function that can be more easily approximated.

Speaker Biography

Yoshua Bengio (CS PhD, McGill University, 1991) was post-doc with Michael Jordan at MIT and worked at AT&T Bell Labs before becoming professor at U. Montreal. He wrote two books and around 200 papers, the most cited being in the areas of deep learning, recurrent neural networks, probabilistic learning, NLP and manifold learning. Among the most cited Canadian computer scientists and one of the scientists responsible for reviving neural networks research with deep learning in 2006, he sat on editorial boards of top ML journals and of the NIPS foundation, holds a Canada Research Chair and an NSERC chair, is a Fellow of CIFAR and has been program/general chair for NIPS. He is driven by his quest for AI through machine learning, involving fundamental questions on learning of deep representations, the geometry of generalization in high-dimension, manifold learning, biologically inspired learning, and challenging applications of ML. In February 2014, Google Scholar finds almost 16000 citations to his work, yielding an h-index of 55.

February 25, 2014 | 12:00 p.m.

“Opportunities from Social Media Data for Public Health”   Video Available

Mark Dredze, Johns Hopkins Human Language Technology Center of Excellence

[abstract] [biography]

Abstract

Twitter and other social media sites contain a wealth of information about populations and has been used to track sentiment towards products, measure political attitudes, and study social linguistics. In this talk, we investigate the potential for Twitter and social media to impact public health research. Broadly, we explore a range of applications for which social media may hold relevant data, including disease surveillance, public safety, and drug usage patterns. To uncover these trends, we develop new statistical models that can reveal trends and patterns of interest to public health from vast quantities of data. Our results suggest that social media has broad applicability for public health research.

Speaker Biography

Mark Dredze is an Assistant Research Professor in Computer Science at Johns Hopkins University and a research scientist at the Human Language Technology Center of Excellence. He is also affiliated with the Center for Language and Speech Processing and the Center for Population Health Information Technology. His research in natural language processing and machine learning has focused on graphical models, semi-supervised learning, information extraction, large-scale learning, and speech processing. His recent work includes health information applications, including information extraction from social media, biomedical and clinical texts. He obtained his PhD from the University of Pennsylvania in 2009.

February 18, 2014 | 12:00 p.m.

“Modeling Consonant and Vowel Perception in Human Listeners”

Naomi Feldman, University of Maryland

[abstract] [biography]

Abstract

Cross-linguistic differences in speech perception arise early in development and persist into adulthood. These differences are typically attributed to listeners' early experience with their native language, but the precise connection between phonetic knowledge and perceptual abilities has remained unclear. In this talk I describe a quantitative framework for connecting listeners' phonetic knowledge with behavior that can be measured in perceptual discrimination tasks. The framework is shown to account for a number of results from the speech perception literature and to provide a unified account of both strong categorical effects in consonants and weak categorical effects in vowels. I conclude by outlining a novel method, currently under development, that can enable us to predict listeners' perceptual patterns based on the data found in natural speech corpora.

Speaker Biography

Naomi Feldman is an assistant professor in the Department of Linguistics at the University of Maryland and a member of the computational linguistics and information processing lab. She works primarily on computational models of human language, using techniques from machine learning and statistics to formalize models of how people learn and process language. She received her Ph.D. in cognitive science at Brown University in 2011.

February 11, 2014 | 12:00 p.m.

“Hearing Without Listening: Voice Authentication with Privacy”   Video Available

Bhiksha Raj, Carnegie Mellon University

[abstract] [biography]

Abstract

Speech processing systems require access to recordings of the speaker's voice. A person's voice carries information about their gender, nationality etc., all of which become accessible to the system, which could abuse this knowledge. In this talk we discuss the issue of privacy in speech processing systems, and approaches that may be employed to address it. Specifically, we will consider voice authentication systems. A user's voice prints may be stolen, or used without authorization to detect the user's voice in unintended scenarios. They may even be used to impersonate the user elsewhere. In order to avoid this, the system must not possess an interpretable voice print for the user, and yet be able to authenticate them. We will discuss how this can be achieved through cryptographic methods, the limitations of these solutions, and an alternate approach based on a modified version of locality sensitive hashing known as secure binary embeddings that may actually enable a practical solution.

Speaker Biography

Bhiksha Raj is an associate professor in Carnegie Mellon University's Language Technologies Institute, with additional affiliations to the Electrical and Computer Engineering, Machine Learning and Music Technology departments. Dr. Raj's research interests include automatic speech recognition, audio processing, machine learning, and privacy. In particular, he worries about the casualness with which people give away samples of their voice without concern about its implications. Some of his latest research includes investigations into whether it can be made possible for people to use voice-based systems without giving away their voice.

February 7, 2014 | 12:00 p.m.

“Get Cooking with Words!: Mining Actionable Information from User Generated Content”

Bo Pang, Google

[abstract] [biography]

Abstract

People turn to the Web to seek answers to a wide variety of questions, a significant portion of which being``how to" questions. For a given ``how to" question, rather than one single canonical set of instructions that satisfies everyone, there can be variations catering to the different needs of different people. Indeed, there are a growing number of popular websites where users submit and review instructions as varied as building a table and baking a pie. In addition to providing their subjective evaluation, reviewers often provide actionable refinements. These refinements clarify, correct, improve, or provide alternatives to the original instructions. However, identifying and reading all relevant reviews is a daunting task for a user. In this paper, we propose a generative model that jointly identifies user-proposed refinements in instruction reviews at multiple granularities, and aligns them to the appropriate steps in the original instructions. We view this as the first step towards addressing the more general task of identifying actionable information from unrestricted sources, and help users consume such information in a way that best suits their personal needs.

Speaker Biography

Bo Pang is a research scientist at Google. She obtained her PhD in Computer Science from Cornell University in 2006. Her primary research interests are in natural language processing. Her past work include sentiment analysis and opinion mining, paraphrasing, querylog analysis, bridging structured and unstructured data, personalized text consumption, and computational advertising

January 28, 2014 | 12:00 p.m.

“Natural Language Processing for Health”   Video Available

Guergana Savova, Harvard

[abstract] [biography]

Abstract

There is an abundance of health-related free text that can be used for a variety of immediate biomedical applications - phenotyping for Genome Wide Studies (GWAS), clinical point of care, patient powered applications, biomedical research. The presentation will cover current research problems in Natural Language Processing (NLP) relevant to health applications such as event and temporal expression discovery, linking of events to create timelines of patient's clinical histories. Applications of NLP to biomedical problems will be discussed within the framework of national networks such as electronic Medical Records and Genomics (eMERGE), Pharmacogenomics Research Network (PGRN), Informatics for Integrating the Biology and the Bedside (i2b2), Patient Centered Outcomes Research Institute (PCORI).

Speaker Biography

Dr. Guergana Savova is Assistant Professor at Harvard Medical School and Boston Children's Hospital. Her research interests are in natural language processing (NLP) especially as applied to the text generated by physicians (the clinical narrative). This is usually referred to as clinical NLP. She has been creating gold standard annotated resources based on computable definitions and developing methods for computable solutions. The focus of Dr. Savova's research is higher level semantic and discourse processing of the clinical narrative which includes tasks such as named entity recognition, event recognition, relation detection and classification including coreference and temporal relations. The methods are mostly machine learning spanning supervised, lightly supervised and completely unsupervised. The result of Dr. Savova's research with her collaborators has led to the creation of the clinical Text Analysis and Knowledge Extraction System (cTAKES; ctakes.apache.org). cTAKES is an information extraction system comprising of a number of NLP components. cTAKES has been applied to a number of biomedical use cases to mine the data within the clinical narrative such as i2b2, PGRN and eMERGE to name a few. Within the Integrating Informatics and Biology to the Bedside (i2b2), cTAKES has been used to extract patient characteristics for determining their status related to a specific phenotype (Multiple Scleroris, Inflamatory Bowel Disease, Type 2 Diabetes). Within the Pharmacogenomics Research Network (PGRN), cTAKES has been applied to automatically determine patient's disease activity and detect responders versus non-responders to a specific treatment. Within the Electronic Medical Record and Genomics (eMERGE), cTAKES has been applied to automatically discover patients with Peripheral Arterial Disease.

Back to Top

2013

December 6, 2013 | 12:00 p.m.

“Aren't You Tired of Gradable Adjectives? They are Fascinating: Automatically Deriving Adjectival Scales”   Video Available

Marie-Catherine de Marneffe, Ohio State University

[abstract] [biography]

Abstract

In this talk, I will discuss how to automatically derive the orderings and meanings of gradable adjectives (such as okay < good < great < wonderful). To determine whether the intended answer is "yes" or "no" in a dialogue such as "Was the movie wonderful? It was worth seeing", we need to evaluate how "worth seeing" relates to "wonderful". Can we automatically learn from real texts the scalar orderings people assign to these modifiers? I will show how we can exploit the availability of large amounts of text on the web (such as online reviews ratings) to approximate these orderings. Then I will turn to neural network language models. I will show that continuous space word representations extracted from such models can be used to derive adjectival scales of high quality, emphasizing that neural network language models do capture semantic regularities. I evaluate the quality of the adjectival scales on several datasets. Next, I will briefly turn to biomedical data: what does it mean to show "severe symptoms of cardiac disease" or "mild pulmonary symptoms"? I will outline work in progress targeting the meaning of gradable adjectives in that domain. Not only do we want to get an ordering between such adjectives, but we also want to learn what counts as "severe" or "mild" symptoms of a disease.

Speaker Biography

Marie-Catherine de Marneffe is an assistant professor in Linguistics at The Ohio State University. She received her PhD from Stanford University in December 2012 under the supervision of Christopher D. Manning. She is developing computational linguistic methods that capture what is conveyed by speakers beyond the literal meaning of the words they say. Primarily she wants to ground meanings in corpus data, and show how such meanings can drive pragmatic inference. She has also worked on Recognizing Textual Entailment and contributed to defining the Stanford Dependencies representation, which is designed to be a practical representation of grammatical relations and predicate argument structure.

November 19, 2013 | 12:00 p.m.

“Pursuit of Low-dimensional Structures in High-dimensional Data”   Video Available

Yi Ma, Microsoft Research

[abstract] [biography]

Abstract

In this talk, we will discuss a new class of models and techniques that can effectively model and extract rich low-dimensional structures in high-dimensional data such as images and videos, despite nonlinear transformation, gross corruption, or severely compressed measurements. This work leverages recent advancements in convex optimization for recovering low-rank or sparse signals that provide both strong theoretical guarantees and efficient and scalable algorithms for solving such high-dimensional combinatorial problems. These results and tools actually generalize to a large family of low-complexity structures whose associated (convex) regularizers are decomposable. We illustrate how these new mathematical models and tools could bring disruptive changes to solutions to many challenging tasks in computer vision, image processing, and pattern recognition. We will also illustrate some emerging applications of these tools to other data types such as web documents, image tags, microarray data, audio/music analysis, and graphical models. This is joint work with John Wright of Columbia, Emmanuel Candes of Stanford, Zhouchen Lin of Peking University, and my students Zhengdong Zhang, Xiao Liang of Tsinghua University, Arvind Ganesh, Zihan Zhou, Kerui Min and Hossein Mobahi of UIUC.

Speaker Biography

Yi Ma is a Principal Researcher and the Research Manager of the Visual Computing group at Microsoft Research Asia in Beijing since January 2009. Before that he was a professor at the Electrical & Computer Engineering Department of the University of Illinois at Urbana-Champaign. His main research interest is in computer vision, high-dimensional data analysis, and systems theory. He is the first author of the popular vision textbook "An Invitation to 3-D Vision," published by Springer in 2003. Yi Ma received his Bachelors degree in Automation and Applied Mathematics from Tsinghua University (Beijing, China) in 1995, a Master of Science degree in EECS in 1997, a Master of Arts degree in Mathematics in 2000, and a PhD degree in EECS in 2000, all from the University of California at Berkeley. Yi Ma received the David Marr Best Paper Prize at the International Conference on Computer Vision 1999, the Longuet-Higgins Best Paper Prize at the European Conference on Computer Vision 2004, and the Sang Uk Lee Best Student Paper Award with his students at the Asian Conference on Computer Vision in 2009. He also received the CAREER Award from the National Science Foundation in 2004 and the Young Investigator Award from the Office of Naval Research in 2005. He was an associate editor of IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) from 2007 to 2011. He is currently an associate editor of the International Journal of Computer Vision (IJCV), the IMA journal on Information and Inference, SIAM journal on Imaging Sciences, and IEEE transactions on Information Theory. He has served as the chief guest editor for special issues for the Proceedings of IEEE and the IEEE Signal Processing Magazine. He will also serve as Program Chair for ICCV 2013 and General Chair for ICCV 2015. He is a Fellow of IEEE.

November 15, 2013 | 12:00 pm

“Scalable Training for Machine Translation Made Successful for the First Time”   Video Available

Liang Huang, CUNY

[abstract] [biography]

Abstract

While large-scale discriminative training has triumphed in many NLP problems, its definite success on machine translation has been largely elusive. Most recent efforts along this line are not scalable: they only train on the small dev set with an impoverished set of rather “dense” features. We instead present a very simple yet theoretically motivated approach by extending my recent framework of “violation-fixing perceptron” to the latent variable setting, and use forced decoding to compute the target derivations. Our method allows structured learning to scale, for the first time, to a large portion of the training data, which enables a rich set of sparse, lexicalized, and non-local features. Extensive experiments show very significant gains in BLEU (by at least +2.0) over MERT and PRO baselines with the help of over 20M sparse features.

Speaker Biography

Liang Huang is currently an Assistant Professor at the City University of New York (CUNY). He graduated in 2008 from Penn and has worked as a Research Scientist at Google and a Research Assistant Professor at USC/ISI. His work is mainly on the theoretical aspects (algorithms and formalisms) of computational linguistics, and related theoretical problems in machine learning. He has received a Best Paper Award at ACL 2008, several best paper nominations (ACL 2007, EMNLP 2008, and ACL 2010), two Google Faculty Research Awards (2010 and 2013), and a University Graduate Teaching Prize at Penn (2005).

November 12, 2013 | 12:00 p.m.

“Bayesian Models for Social Interactions”   Video Available

Katherine Heller, Duke University

[abstract] [biography]

Abstract

A fundamental part of understanding human behavior is understanding social interactions between people. We would like to be able to make better predictions about social behavior so that we can improve people's social interactions or somehow make them more beneficial. This is very relevant in light of the fact that an increasing number of interactions are happening in online environments which we design, but is also useful for offline interactions such as structuring interactions in the work place, or even being able to advise people about their individual health based on who they've come into contact with.I will focus on two recent projects. In the first we use nonparametric Bayesian methods to predict group structure in social networks based on the social interactions of individuals over time, based on actual events (emails, conversations, etc.) instead of declared relationships (e.g. facebook friends). The time series of events is modeled using Hawkes processes, while relational grouping is done via the Infinite Relational Model.In the second, we use Graph-Coupled Hidden Markov Models to predict the spread of infection in a college dormitory. This is done by looking at a social network of students living in the dorm, and leveraging mobile phone data which reports on students' locations and daily health symptoms.

Speaker Biography

Katherine Heller received a B.S. in Computer Science and Applied Math and Statistics from the State University of New York at Stony Brook, followed by an M.S. in Computer Science from Columbia University. In 2008 she received her Ph.D. from the Gatsby Computational Neuroscience Unit at University College London in the UK, and went on to do postdoctoral research in the Engineering Department at the University of Cambridge, and the Brain and Cognitive Science department at MIT. In 2012 she joined the Department of Statistical Science and Center for Cognitive Neuroscience at Duke University. She is the recipient of an NSF graduate research fellowship, an EPSRC postdoctoral fellowship, and an NSF postdoctoral fellowship. Her current research interests include Bayesian statistics, machine learning, and computational cognitive science.

November 5, 2013 | 12:00 p.m.

“Submodularity and Big Data”   Video Available

Jeff Bilmes, University of Washington

[abstract] [biography]

Abstract

The amount of data available today is a problem not only for humans but also for computer consumers of information. At the same time, bigger is different, and discovering how is an important challenge in big data sciences. In this talk, we will discuss how submodular functions can address these problems. After giving a brief background on submodularity, we will first discuss document summarization, and how one can achieve optimal results on a number of standard benchmarks using very efficient algorithms. Next we will discuss data subset selection for speech recognition systems, and how choosing a good subset has many advantages, showing results on both the TIMIT and the Fisher corpora. We will also discuss data selection for machine translation systems. Lastly, we will discuss similar problems in computer vision. The talk will include sufficient background to make it accessible to everyone.

Speaker Biography

Jeff Bilmes is a professor at the Department of Electrical Engineering at the University of Washington, Seattle Washington, and also an adjunct professor in Computer Science & Engineering and the department of Linguistics. He received his Ph.D. from the Computer Science Division of the department of Electrical Engineering and Computer Science, University of California in Berkeley. He was a 2001 NSF Career award winner, a 2002 CRA Digital Government Fellow, a 2008 NAE Gilbreth Lectureship award recipient, and a 2012/2013 ISCA Lecturer.

November 1, 2013 | 12:00 p.m.

“Low-Pass Semantics (with a Bit of Discourse)”   Video Available

Fernando Pereira, Google

[abstract] [biography]

Abstract

Advances in statistical and machine learning approaches to natural-language analysis have yielded a wealth of methods and applications in information retrieval, speech recognition, machine translation, and information extraction. Yet, even as we enjoy these advances, we recognize that our successes are to a large extent the result of clever exploitation of redundancy in language structure and use, allowing our algorithms to eke out a few useful bits that we can put to work in applications. By focusing on applications that extract a limited amount of information from the text, finer structures such as word order or syntactic structure could be largely ignored in information retrieval or speech recognition. However, by leaving out those finer details, our language-processing systems have been stuck in an "idiot savant" stage where they can find everything but cannot understand anything. The main language processing challenge of the coming decade is to create robust, accurate, efficient methods that learn to understand the main entities and concepts discussed in any text, and the main claims made. That will enable our systems to answer questions more precisely, to verify and update knowledge bases, and to trace arguments for and against claims throughout the written record. I will argue with examples from our recent research that we need deeper levels of linguistic analysis to do this. But I will also argue that it is possible to do much that is useful even with our very partial understanding of linguistic and computational semantics, by taking (again) advantage of distributional regularities and redundancy in large text collections to learn effective analysis and understanding rules. Thus low-pass semantics: our scientific knowledge is very far from being able to map the full spectrum of meaning, but by combining signals from the whole Web, our systems are learning to read the simplest factual information reliably.

Speaker Biography

Fernando Pereira is research director at Google. His previous positions include chair of the Computer and Information Science department of the University of Pennsylvania, head of the Machine Learning and Information Retrieval department at AT&T Labs, and research and management positions at SRI International. He received a Ph.D. in Artificial Intelligence from the University of Edinburgh in 1982, and he has over 120 research publications on computational linguistics, machine learning, bioinformatics, speech recognition, and logic programming, as well as several patents. He was elected AAAI Fellow in 1991 for contributions to computational linguistics and logic programming, and ACM Fellow in 2010 for contributions to machine-learning models of natural language and biological sequences. He was president of the Association for Computational Linguistics in 1993.

October 22, 2013 | 12:00 p.m.

“Recent Progress in Acoustic Speaker and Language Recognition”   Video Available

Alan McCree, Johns Hopkins Human Language Technology Center of Excellence

[abstract] [biography]

Abstract

In this talk, I give an overview of recent progress in the fields of speaker and language recognition, with emphasis on our current work at the JHU HLTCOE. After a brief review of modern GMM subspace methods, in particular i-vectors, I will present approaches for pattern classification using these features, with an emphasis on simple Gaussian probabilistic models. For language recognition, these are quite effective, but our recent work has shown that discriminative training can improve performance. As a bonus, this also provides meaningful probability outputs without requiring a separate calibration process. For speaker recognition, on the other hand, classification is more difficult due to the limited enrollment data per speaker, and Bayesian methods have been successful. I will discuss a number of such methods, including the popular PLDA approach. Finally, I'll describe our recent successes in adapting these Gaussian parameters to new domains when labeled training data is not available.

Speaker Biography

Alan McCree is a Principal Research Scientist at the JHU HLTCOE, where his primary interest is in the theory and application of speaker and language recognition. His research in speech and signal processing at the COE, and previously at MIT Lincoln Laboratory, Texas Instruments, AT&T Bell Laboratories, and Linkabit, has found applications in international speech coding standards, digital answering machines, talking toys, and cellular telephones. He has an extensive publication and patent portfolio, and was named an IEEE Fellow in 2005. He received his PhD from Georgia Tech in 1992 after undergraduate and graduate degrees from Rice University.

October 17, 2013 | 10:45

“Using Semantics to help learn Phonetic Categories”   Video Available

Stella Frank, University of Edinburgh

[abstract] [biography]

Abstract

Computational models of language acquisition seek to replicate human linguistic learning capabilities, such as an infant's ability to identify the relevant sound categories in a language, given similar inputs. In this talk I will present some on-going work which extends a Bayesian model of phonetic categorisation (Feldman et al., 2013). The original model learns a lexicon as well as phonetic categories, incorporating the constraint that phonemes appear in word contexts. However, it has trouble separating minimal pairs (such as 'cat'/'caught'/'kite'). The proposed extension adds further information via situational context information, a form of weak semantics or world knowledge, to disambiguate potential minimal pairs. I will present our current results and discuss potential next steps.

Speaker Biography

Stella Frank is currently a postdoc at the University of Edinburgh, from whence she received a PhD in Informatics in 2013. Her research interests lie in computational modelling of language acquisition using unsupervised Bayesian modelling techniques.

October 8, 2013 | 12:00 p.m.

“Cross-Stream Event Detection”   Video Available

Miles Osborne, University of Edinburgh

[abstract] [biography]

Abstract

Social Media (especially Twitter) is widely seen as a source of real-time breaking news.  For example, when Osama Bin Laden was killed by US forces the news was first made public on Twitter.  Rapidly finding all breaking news has clear economic and humanitarian benefits. Finding all such breaking news presents hard computational challenges.  We need to detect news-related novelty in massive streams (upwards of two thousand posts per second) as quickly as possible.  Efficiency is not the only consideration however and we also need to confront the enormous quantity of irrelevant posts.  In this talk I will outline how we tackle the first problem using Locality Sensitive Hashing, taking constant time per post.  In tandem I will mention how we use Storm to parallelise this computation, yielding a system capable of processing 2k tweets per second.  The second problem is tackled by interesecting the Twitter stream with Wikipedia page requests, filtering-out spurious first stories.  Taken together, this results in processing more than 250 million items per day.  Finally I will consider the question of whether Twitter really does lead Newswire for breaking news. Joint work with Sasa Petrovic (Edinburgh), Craig MacDonald (Glasgow), Iadh Ounis (Glasgow) and Richard McCreadie (Glasgow)

Speaker Biography

Miles Osborne is a Reader in Informatics at Edinburgh, with research interests in Machine Translation, Social Media and large scale processing of natural language.  He received his PhD from the University of York in 1994 and had travelled the land, carrying out Post Docs at Cambridge and Groningen prior to being in Edinburgh.  He spent a sabbatical at Google in 2006 working within their Machine Translation group and for 2013-2014, will be spending a sabbatical at the Johns Hopkins University.

October 1, 2013 | 12:00 p.m.

“Modeling "Bootstrapping" in Language Acquisition”   Video Available

Sharon Goldwater, University of Edinburgh

[abstract] [biography]

Abstract

The term "bootstrapping" appears frequently in the literature on child language acquisition, but is often defined vaguely (if at all) and can mean different things to different people. In this talk, I define bootstrapping as the use of structured correspondences between different levels of linguistic structure as a way to aid learning, and discuss how probabilistic models can be used to investigate the nature of these correspondences and how they might help the child learner. I will discuss two specific examples, showing 1) that using correspondences between acoustic and syntactic information can help with syntactic learning ("prosodic bootstrapping") and 2) that using correspondences between syntactic and semantic information in a joint learning model can help with learning both syntax and semantics while also simulating important findings from the child language acquisition literature.

Speaker Biography

Sharon Goldwater is a Reader (≈ US Associate Professor) in the Institute for Language, Cognition and Computation at the University of Edinburgh's School of Informatics, and is currently a Visiting Associate Professor in the Department of Cognitive Science at Johns Hopkins University. She worked as a researcher in the Artificial Intelligence Laboratory at SRI International from 1998-2000 before starting her Ph.D. at Brown University, supervised by Mark Johnson. She completed her Ph.D. in 2006 and spent two years as a postdoctoral researcher at Stanford University before moving to Edinburgh. Her current research focuses on unsupervised learning for automatic natural language processing and computer modeling of language acquisition in children. She is particularly interested in Bayesian approaches to the induction of linguistic structure, ranging from phonemic categories to morphology and syntax.

September 24, 2013 | 12:00 p.m.

“Language as Influence: Power and Memorability”   Video Available

Lillian Lee, Cornell University

[abstract] [biography]

Abstract

What effect does language have on people, and what effect do people have on language?You might say in response, "Who are you to discuss these problems?" and you would be right to do so; these are Major Questions that science has been tackling for many years. But as a field, I think natural language processing and computational linguistics have much to contribute to the conversation, and I hope to encourage the community to further address these issues. To this end, I'll describe two efforts I've been involved in.The first project provides evidence that in group discussions, power differentials between participants are subtly revealed by how much one individual immediately echoes the linguistic style of the person they are responding to. We consider multiple types of power: status differences (which are relatively static), and dependence (a more ''situational'' relationship). Using a precise probabilistic formulation of the notion of linguistic coordination, we study how conversational behavior can reveal power relationships in two very different settings: discussions among Wikipedians and arguments before the U.S. Supreme Court.Our second project is motivated by the question of what information achieves widespread public awareness. We consider whether, and how, the way in which the information is phrased --- the choice of words and sentence structure --- can affect this process. We introduce an experimental paradigm that seeks to separate contextual from language effects, using movie quotes as our test case. We find that there are significant differences between memorable and non-memorable quotes in several key dimensions, even after controlling for situational and contextual factors. One example is lexical distinctiveness: in aggregate, memorable quotes use less common word choices (as measured by statistical language models), but at the same time are built upon a scaffolding of common syntactic patterns.Joint work with Justin Cheng, Cristian Danescu-Niculescu-Mizil, Jon Kleinberg, and Bo Pang.

Speaker Biography

Lillian Lee is a professor of computer science at Cornell University. Her research interests include natural language processing, information retrieval, and machine learning. She is the recipient of the inaugural Best Paper Award at HLT-NAACL 2004 (joint with Regina Barzilay), a citation in "Top Picks: Technology Research Advances of 2004" by Technology Research News (also joint with Regina Barzilay), and an Alfred P. Sloan Research Fellowship; and in 2013, she was named a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI). Her group's work has received several mentions in the popular press, including The New York Times, NPR's All Things Considered, and NBC's The Today Show.

September 20, 2013 | 12:00 p.m.

“Neural Networks for Speech Recognition: From Basic Structures to Piled Higher and Deeper”   Video Available

Nelson Morgan, International Computer Science Institute, UC Berkeley

[abstract] [biography]

Abstract

Artificial neural networks have been applied to speech tasks for well over 50 years. In particular, multilayer perceptrons (MLPs) have been used as components in HMM-based systems for 25 years. This presentation will describe the long journey from early speech classification experiments with MLPs in the 1960s to the present day implementations. There will be an emphasis on hybrid HMM/MLP approaches that have dominated the use of artificial neural networks for speech recognition since the late 1980s, but which have only recently gained mainstream adoption.  

Speaker Biography

Nelson Morgan has been working on problems in signal processing and pattern recognition since 1974, with a primary emphasis on speech processing. He may have been the first to use neural networks for speech classification in a commercial application. He is a former Editor-in-chief of Speech Communication, and is also a Fellow of the IEEE and of ISCA. In 1997 he received the Signal Processing Magazine best paper award (together with co-author Hervé Bourlard) for an article that described the basic hybrid HMM/MLP approach. He also co-wrote a text (written jointly with Ben Gold) on speech and audio signal processing, with a new (2011) second edition that was revised in collaboration with Dan Ellis of Columbia University. He is the deputy director (and former director) of the International Computer Science Institute (ICSI), and is a Professor-in-residence in the EECS Department at the University of California at Berkeley.  

September 13, 2013 | 12:00 p.m.

“My Adventures With Speech”   Video Available

Hynek Hermansky, Johns Hopkins Center for Language and Speech Processing

[abstract] [biography]

Abstract

I intend to mention some techniques I got involved in during the past 40 years. I will not dwell too much on the details of the techniques. These are documented in various publications.  Rather, I will try to talk about things which we, researchers, may say in private but seldom write about: about personal intuitions and beliefs, our excitements, frustrations, surprises, and interesting encounters on the road, while struggling to understand and emulate one of the most significant achievements of the human race, the ability to communicate by speech.  

Speaker Biography

Hynek Hermansky is the Julian S. Smith Professor in Electrical and Computer Engineering and the Director of the Center for Language and Speech Processing at the Johns Hopkins University in Baltimore, Maryland. He is also a Research Professor at the Brno University of Technology, Czech Republic, and an External Fellow of the International Computer Science Institute at Berkeley, California. He is a Fellow of the International Speech Communication Association and of the Institute of Electrical and Electronic Engineers, and is the recipient of the 2013 International Speech Communication Association Medal for Scientific Achievement.  He holds Dr.Eng. Degree from the University of Tokyo,  and Dipl. Ing. Degree from Brno University of Technology, Czech Republic.  His main research interests are in acoustic processing for speech recognition.  

September 10, 2013 | 12:00 p.m.

“Computing Meaning: What's Semantics Got To Do With It?”

Emily Bender, University of Washington

[abstract] [biography]

Abstract

Recent years have seen an surge of work in natural language understanding which aspires to extract meaning from text or speech inputs for a variety of applications. In this talk, I will address what is meant by "meaning" in that context and the relationship between "meaning" and (linguistic) "semantics". This will lead to a discussion of the role of morphology and syntax in meaning-targeting NLP and how NLU systems can be made more cross-linguistically portable through a typologically and linguistically aware approach.

Speaker Biography

Emily M. Bender is an Associate Professor of Linguistics and Adjunct Associate Professor of Computer Science & Engineering at the University of Washington, where she has been on the faculty since 2003. She earned her PhD (in Linguistics) at Stanford University in 2001. Her primary research interests lie in multi-lingual grammar engineering, including the design of semantic representations, and its applications including to language documentation and typological research. Bender is the lead developer of the Grammar Matrix, a starter-kit for creating precision grammars compatible with DELPH-IN processing tools, and the author of _Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax_ (Morgan & Claypool, 2013).

April 26, 2013 | noon

“I Want to Talk About, Again, My Record On Energy...: Modeling Agendas and Framing in Political Debates and Other Conversations”

Philip Resnik, University of Maryland

[abstract] [biography]

Abstract

Computational social science has been emerging over the last several years as a hotbed of interesting work, taking advantage of, to quote Lazer et al. (Science, v.323), "digital traces that can be compiled into comprehensive pictures of both individual and group behavior, with the potential to transform our understanding of our lives, organizations, and societies." Within that larger setting, I'm interested in how language is used to influence people, with an emphasis on computational modeling of agendas (who is most effectively directing attention, and toward what topics?), framing or "spin" (what underlying perspective does this language seek to encourage?), and sentiment (how does someone feel, as evidenced in the language they use)? These questions are particularly salient in political discourse. In this talk, I'll present recent work looking at political debates and other conversations using Bayesian models to capture relevant aspects of the conversational dynamics, as well as new methods for collecting people's reactions to speeches, debates, and other public conversations on a large scale. This talk includes work done in collaboration with Jordan Boyd-Graber, Viet-An Nguyen, Deborah Cai, Amber Boydstun, Rebecca Glazier, Matthew Pietryka, and Tim Jurka.

Speaker Biography

Philip Resnik is Professor of Linguistics at the University of Maryland, holding a joint appointment at UMD's Institute for Advanced Computer Studies, and the director of UMD's Computational Linguistics and Information Processing (CLIP ) Laboratory. He received his Ph.D. in Computer and Information Science at the University of Pennsylvania (1993), and has worked in industry R&D at Bolt Beranek and Newman, IBM T.J. Watson Research Center, and Sun Microsystems Laboratories. His research emphasizes combining linguistic knowledge and statistical methods in computational linguistics, with a focus on applications in machine translation and computational social science. He co-edited The Balancing Act: Combining Symbolic and Statistical Approaches to Language (MIT Press, 1996, with Judith Klavans), and has served on the editorial boards of Computational Linguistics, Cognition, Computers and the Humanities, and Linguistics in Language Technology. As extracurricular activities, he was a technical founder of CodeRyte Inc., a provider of language technology solutions in healthcare (acquired last year by 3M), he has served as lead scientist for Converseon, a leading social media consultancy, and he is currently commercializing React Labs, a mobile platform for real-time polling.

April 19, 2013 | 12 p.m.

“The Latest in DNN Research at IBM: DNN-based features, Low-Rank Matrices for Hybrid DNNs, and Convolutional Neural Networks”   Video Available

Tara Sainath, IBM Research

[abstract] [biography]

Abstract

Deep Neural Networks have become the state-of-the-art for acoustic modeling, showing gains between 10-30% relative compared to Gaussian Mixture Model/Hidden Markov Models . In this talk, I discuss how to improve the performance of these networks further. First, I present work on using these networks to extract NN-based features. I show that NN-based features offer between a 10-15% relative improvement on various LVCSR tasks compared to cross-entropy trained hybrid DNNs. Furthermore, NN-based features match the performance of sequence-trained hybrid DNNs while being 2x faster to train. I will also show that if a hybrid DNN is preferred, low-rank matrix factorization can also allow for a 50% reduction in parameters and a 2x speedup in training time. Second, I present work on Convolutional Neural Networks (CNNs), an alternative type of neural network that can be used to reduce spectral variations and model spectral correlations which exist in signals. Since speech signals exhibit both of these properties, CNNs are a more effective model for speech compared to DNNs. On a variety of LVCSR tasks, we find that CNN-based features offer an additional 4-12% improvement over DNN-based features.

Speaker Biography

Tara Sainath received her B.S (2004), M. Eng (2005) and PhD (2009) in Electrical Engineering and Computer Science all from MIT. The main focus of her PhD work was in acoustic modeling for noise robust speech recognition. She joined the Speech and Language Algorithms group at IBM T.J. Watson Research Center upon completion of her PhD. She organized a Special Session on Sparse Representations at Interspeech 2010, as well as a workshop on Deep Learning at ICML 2013. In addition, she has served as a staff reporter for the IEEE Speech and Language Processing Technical Committee (SLTC) Newsletter. She currently holds over 30 US patents. Her research interests are in acoustic modeling, including deep belief networks and sparse representations.

April 9, 2013 | 12:00 p.m.

“Learning with Marginalized Corrupted Features”   Video Available

Kilian Weinberger, University of Washington in St. Louis

[abstract] [biography]

Abstract

If infinite amounts of labeled data are provided, many machine learning algorithms become perfect. With finite amounts of data, regularization or priors have to be used to introduce bias into a classifier. We propose a third option: learning with marginalized corrupted features. We (implicitly) corrupt existing data as a means to generate additional, infinitely many, training samples from a slightly different data distribution -- this is computationally tractable, because the corruption can be marginalized out in closed form. Our framework leads to machine learning algorithms that are fast, generalize well and naturally scale to very large data sets. We showcase this technology as regularization for general risk minimization and for marginalized deep learning for document representations. We provide experimental results on part of speech tagging as well as document and image classification.

Speaker Biography

Kilian Q. Weinberger is an Assistant Professor in the Department of Computer Science & Engineering at Washington University in St. Louis. He received his Ph.D. from the University of Pennsylvania in Machine Learning under the supervision of Lawrence Saul. Prior to this, he obtained his undergraduate degree in Mathematics and Computer Science at the University of Oxford. During his career he has won several best paper awards at ICML, CVPR and AISTATS. In 2011 he was awarded the AAAI senior program chair award and in 2012 he received the NSF CAREER award. Kilian Weinberger's research is in Machine Learning and its applications. In particular, he focuses on high dimensional data analysis, metric learning, machine learned web-search ranking, transfer- and multi-task learning as well as bio medical applications.

April 5, 2013 | 12 p.m.

“Learning from Speech Production for Improved Recognition”

Karen Livescu, TTI Chicago

[abstract] [biography]

Abstract

Speech production has motivated several lines of work in the speech recognition research community, including using articulator positions predicted from acoustics as additional observations and using discrete articulatory features as lexical units instead of or in addition to phones.  Unfortunately, our understanding of speech production is still quite limited, and articulatory data is scarce.  How can we take advantage of the intuitive usefulness of speech production, without relying too much on noisy information?  This talk will cover recent work exploring several ideas in this area, with the theme of using machine learning to automatically infer information where our knowledge and data are lacking.  The talk will include work on deriving new acoustic features using articulatory data in a multi-view learning setting, as well as lexical access and spoken term detection using hidden articulatory features.

Speaker Biography

Karen Livescu is an Assistant Professor at TTI-Chicago, where she has been since 2008.  She completed her PhD in 2005 at MIT in the Spoken Language Systems group of the Computer Science and Artificial Intelligence Laboratory. In 2005-2007 she was a post-doctoral lecturer in the MIT EECS department.  Karen's main research interests are in speech and language processing, with a slant toward combining machine learning with knowledge from linguistics and speech science.  She is a member of the IEEE Spoken Language Technical Committee and has organized or co-organized a number of recent workshops, including the ISCA SIGML workshops on Machine Learning in Speech and Language Processing and Illinois Speech Day.  She is co-organizing the upcoming Midwest Speech and Language Days and the Interspeech 2013 Workshop on Speech Production in Automatic Speech Recognition.

March 26, 2013

“Corpora and Statistical Analysis of Non-Linguistic Symbol Systems”

Richard Sproat, Google

[abstract] [biography]

Abstract

We report on the creation and analysis of a set of corpora of non-linguistic symbol systems.  The resource, the first of its kind, consists of data from seven systems, both ancient and modern, with four further systems under development, and several others planned. The systems represent a range of types, including heraldic systems, formal systems, and systems that are mostly or purely decorative. We also compare these systems statistically with a large set of linguistic systems, which also range over both time and type. We show that none of the measures proposed in published work by Rao and colleagues (Rao et al., 2009a; Rao, 2010) or Lee and colleagues (Lee et al., 2010a) works. In particular, Rao’s entropic measures are evidently useless when one considers a wider range of examples of real non-linguistic symbol systems. And Lee’s measures, with the cutoff values they propose, misclassify nearly all of our non-linguistic systems. However, we also show that one of Lee’s measures, with different cutoff values, as well as another measure we develop here, do seem useful. We further demonstrate that they are useful largely because they are both highly correlated with a rather trivial feature:  mean text length.

Speaker Biography

Richard Sproat received his Ph.D. in Linguistics from the Massachusetts Institute of Technology in 1985. He has worked at AT&T Bell Labs, at Lucent's Bell Labs and at AT&T Labs -- Research, before joining the faculty of the University of Illinois. From there he moved to the Center for Spoken Language Understanding at the Oregon Health & Science University. In the Fall of 2012 he moved to Google, New York as a Research Scientist. Sproat has worked in numerous areas relating to language and computational linguistics, including syntax, morphology, computational morphology, articulatory and acoustic phonetics, text processing, text-to-speech synthesis, and text-to-scene conversion. Some of his recent work includes multilingual named entity transliteration, the effects of script layout on readers' phonological awareness, and tools for automated assessment of child language. At Google he works on multilingual text normalization and finite-state methods for language processing. He also has a long-standing interest in writing systems and symbol systems more generally.

March 15, 2013

“Reverse Engineering The Brain Computations Involved in Speech Processing”

Nima Mesgarani, University of California, San Francisco

[abstract] [biography]

Abstract

The brain empowers humans and other animals with remarkable abilities to sense and perceive their acoustic environment in highly degraded conditions. These seemingly trivial tasks for humans have proven extremely difficult to model and implement in machines.  One crucial limiting factor has been the need for a deep interaction between two very different disciplines, that of neuroscience and engineering.  In this talk, I will present results of an interdisciplinary research effort to address the following fundamental questions: 1) what computation is performed in the brain when we listen to complex sounds? 2) How could this computation be modeled and implemented in computational systems? and 3) how could one build an interface to connect brain signals to machines? I will present results from recent invasive neural recordings in human auditory cortex that show a distributed representation of speech in auditory cortical areas. This representation remains unchanged even when an interfering speaker is added, as if the second voice is filtered out by the brain.  In addition, I will show how this knowledge has been successfullyincorporated in novel automatic speech processing applications and used by DARPA and other agencies for their superior performance.  Finally, I will demonstrate how speech can be read directly from the brain that eventually, can allow for communication by people who have lost their ability to speak. This integrated research approach leads to better scientific understanding of the brain, innovative computational algorithms, and a new generation of Brain-Machine interfaces.

Speaker Biography

Nima Mesgarani is a postdoctoral scholar at the neurosurgery department of UC San Francisco. He received his Ph.D. in Electrical Engineering from University of Maryland College Park and was a postdoctoral scholar at Johns Hopkins University prior to joining UCSF. His research interests are in human-like information processing of acoustic signals at the interface of engineering and brain science. His goal is to develop an interdisciplinary research program designed to bridge the gap between these two very different disciplines by reverse engineering the signal processing in the brain, which in turn inspires novel approaches to emulate human abilities in machines. This integrated research approach leads to better scientific understanding of the brain, novel speech processing algorithms for automated systems, and a new generation of Brain-Machine Interface and neural prosthesis.

March 12, 2013 | 12 p.m.

“Mining Online User Behavior: From Improving Search to Detecting Cognitive Impairment”

Eugene Agichtein, Emory University

[abstract] [biography]

Abstract

The increasing reach of the Web enables billions of people around the world to create, share, and find information online. The behavior data created by these activities have been a goldmine for improving nearly all aspects of web search and information retrieval, and are now influencing other domains far beyond search.   I will first describe how mining document authoring behavior data leads to new, more effective retrieval models. Then, I will show how mining search interaction data, such as mouse cursor movements and scrolling, can be used to model the searcher’s attention and interest at scale, with the precision previously only possible in the lab using eye tracking equipment. This enables dramatic improvements to search ranking, presentation, and search quality evaluation. The resulting techniques can be naturally adapted for other applications requiring measuring user attention. A key example is a test measuring the subject's visual novelty preference, widely used in psychology and neuroscience to study visual recognition memory. Degraded performance on this test has been linked to cognitive impairment, in particular Alzheimer's disease. Adapting our techniques allowed us to develop an automatic web-based version of this test, which we are now validating as an accessible and low-cost diagnostic for early detection of Alzheimer's disease.

Speaker Biography

Eugene Agichtein is an Associate Professor of Computer Science at Emory University, where he founded and leads the Emory Intelligent Information Access Laboratory (IR Lab). The active projects in IR Lab include mining searcher behavior and interactions data, modelling social content creation and sharing, and applications to medical informatics.  Eugene obtained a Ph.D. in Computer Science from Columbia University, and did a Postdoc at Microsoft Research. He has published extensively on web search, information retrieval, and web and data mining. Dr. Agichtein's work has been supported by DARPA, NIH, NSF, Yahoo!, Microsoft, and Google, and has been recently recognized with the A.P. Sloan Research Fellowship and the ‘Best Paper’ award at the SIGIR 2011 conference.

March 8, 2013 | 12 p.m.

“Improving the Accuracy, Efficiency and Data Use for Natural Language Parsing”

Shay Cohen, Columbia University

[abstract] [biography]

Abstract

We are facing enormous growth in the amount of information available from various data resources. This growth is even more notable when it comes to text data; the number of pages on the internet, for example, is expected to double itself every five years, with billions of multilingual webpages already available.In order to make use of this textual data in natural language understanding systems, we need to rely on text analysis that structures this information. Natural language parsing is one such example, a fundamental problem in NLP. it provides the basic structure to text, representing its syntax computationally. This structure is used in most NLP applications that analyze language to understand meaning.I will discuss the three important facets of modeling syntax: (a) accuracy of learning; (b) efficiency of parsing unseen sentences; and (c) selection of data to learn from. In this talk, the common theme of these three ideas is the concept of learning from incomplete data. To model syntax more effectively, I will first describe a model called latent-variable probabilistic context-free grammars (L-PCFGs) which, because of the hardness of learning from incomplete data, has until recently been used for learning in tandem with many heuristics and approximations. I will show a much more principled and statistically consistent approach to learning L-PCFGs using spectral algorithms, and will also show how L-PCFGs can parse unseen sentences much more efficiently through the use of tensor decomposition.In addition, I will touch on work with unsupervised language learning, one of the holy grails of NLP, in the Bayesian setting. In this setting, priors are used to guide the learner, compensating for the lack of labeled data. I will survey novel priors that were developed for this setting, and mention how they can be used monolingually and multilingually.

Speaker Biography

Shay Cohen is a postdoctoral research scientist in the Department of Computer Science at Columbia University. He holds a CRA Computing Innovation Fellowship. He received his B.Sc. and M.Sc. from Tel Aviv University in 2000 and 2004, and his Ph.D. from Carnegie Mellon University in 2011. His research interests span a range of topics in natural language processing and machine learning, with a focus on structured prediction. He is especially interested in developing efficient and scalable parsing algorithms as well as learning algorithms for probabilistic grammars.

March 5, 2013 | 12 p.m.

“Auto-Synchronous Analysis of Speech”

Pascal Clark, Human Language Technology Center of Excellence

[abstract] [biography]

Abstract

It is well known that information is embedded in the speech signal as smooth variations over time and frequency. Since the 90's, feature-extraction front-ends have routinely exploited this fact in the form of subband-modulation and spectro-temporal filtering. A key aspect of such methods is averaging over short time-scale structure to estimate smooth, long-term power envelopes. In this talk, I will argue that the short-term structure itself is useful when considered jointly with long-term envelopes. Toward this end, I propose replacing the time-worn concept of pitch, which is based on dubious assumptions of periodicity, with self-similar recurrence, which is statistically flexible and consistent with long-term coherences. Viewing speech in terms of recurrences suggests an intrinsic, stochastic timing reference for what I refer to as "auto-synchronization." I will demonstrate how synchronous estimation is complementary to existing power envelopes, and asymptotically immune to interference from slowly-varying noise. My objective in this talk is to lay the groundwork for further experiments and practical development of robust speech features.

Speaker Biography

Pascal Clark is a post-doctoral researcher at the Johns Hopkins Human Language Technology Center of Excellence. His current work focuses on signal processing for speech applications, including detection of speech in noise, and stochastic modeling for invariances in speech. Prior to joining the HLTCOE, he received his Ph.D. at the University of Washington, where he was also an author of the Modulation Toolbox.

March 1, 2013 | 12:00pm

“Multimodality, Context and Continuous Emotional Dynamics for Recognition and Analysis of Emotional Human States, and Applications to Healthcare”   Video Available

Angeliki Metallinou, University of Southern California

[abstract] [biography]

Abstract

Human expressive communication is characterized by the continuous flow of multimodal information, such as vocal, facial and bodily gestures, which may convey the participant's affect. Additionally, the emotional state of a participant is typically expressed in context, and generally evolves with variable intensity and clarity over the course of an interaction.  In this talk, I will present computational approaches to address such complex aspects of emotional expression, namely multimodality, the use of context and continuous emotional dynamics. Firstly, I will describe hierarchical frameworks that incorporate temporal contextual information for emotion recognition, and demonstrate the utility of such approaches, that are able exploit typical emotional patterns, for improving recognition performance.  Secondly, extending this notion of emotional evolution, I will describe methods for continuously estimating emotional states, such as the degree of intensity or positivity of a participant's emotion, during dyadic interactions. Such continuous estimates could highlight emotionally salient regions in long interactions. The systems described are multimodal and combine a variety of information such as speech, facial expressions, and full body language in the context of dyadic settings. Finally, I will discuss the utility of computational approaches for healthcare applications by describing ongoing work on facial expression analysis for the quantification of atypicality in affective facial expressions of children with autism spectrum disorders.

Speaker Biography

Angeliki Metallinou received her Diploma in electrical and computer engineering from the National Technical University of Athens, Greece, in 2007, and her Masters degree in Electrical Engineering in 2009 from University of Southern California (USC), where she is currently pursuing her Ph.D. degree. Since Fall 2007 she has been a member of the Signal Analysis and Interpretation Lab (SAIL) at USC, where she has worked on projects regarding multimodal emotion recognition, computational analysis of theatrical performance and computational approaches for autism research. During summer 2012, she interned at Microsoft Research working on belief state tracking for spoken dialog systems. Her research interests include speech and multimodal signal processing, affective computing, machine learning, statistical modeling and dialog systems.

February 22, 2013 | 12 p.m.

“Big Data Goes Mobile”

Kenneth Church, IBM

[abstract] [biography]

Abstract

What is "big"? Time & Space? Expense? Pounds? Power? Size of machine? Size of market? We will discuss many of these dimensions, but focus on throughput and latency (mobility of data). If our clouds can't import and export data at scale, they may turn into roach motels where data can check in; but it can't check out. DataScope is designed to make it easy to import and export 100s of TBs of disks.  Amdahl's Laws have stood up remarkably well to the test of time. These laws explain how to balance memory, cycles and IO. There is an opportunity to extend these laws to balance for mobility.

Speaker Biography

Ken is currently at IBM working on Siri-like applications of speech on phones. Before that, he was the Chief Scientist of the HLTCOE at JHU. He has worked at Microsoft and AT&T, as well. Education: MIT (undergrad and graduate). He enjoys working with large datasets. Back in the 1980s, we thought that Associated Press newswire (1million words per week) was big, but he has since had the opportunity to work with much larger datasets such as AT&T's billing records and Bing's web logs.  He has worked on many topics in computational linguistics including: web search, language modeling, text analysis, spelling correction, word-sense disambiguation, terminology, translation, lexicography, compression, speech (recognition and synthesis), OCR, as well as applications that go well beyond computational linguistics such as revenue assurance and virtual integration (using screen scraping and web crawling to integrate systems that traditionally don't talk together as well as they could such as billing and customer care). Service: past president of ACL and former president of SIGDAT (the organization that organizes EMNLP).

February 15, 2013

“Advances to machine translation and language understanding”   Video Available

Chris Callison-Burch, Johns Hopkins

[abstract] [biography]

Abstract

Modern approaches to machine translation, like those used in Google's online translation system, are data-driven. Statistical translation models are trained using bilingual parallel texts, which consist of sentences in one language paired with their translation into another language. Probabilistic word-for-word and phrase-for-phrase translation tables are extracted from human-translated parallel texts, and are then used as the basic building blocks in the automatic translation systems. Although these data-driven methods have been successfully applied to a small handful of the world's languages, can they be used to translate all the world's languages? I'll describe cost- and model-focused innovations that make it plausible. I'll also briefly outline how these methods can be used to help with the longstanding artificial intelligence goal of language understanding. In this talk, I will present four research areas that my lab has been working on: (1) Improved translation models I will demonstrate that syntactic translation models significantly outperform linguistically naive models for Urdu-English. Urdu is a low resource language with a word order that is significantly divergent from English. Syntactic information allows better generalizations to be learned from the bilingual training data. (2) Crowdsourcing I have been using Amazon's Mechanical Turk crowdsourcing platform to translate large volumes of text at low cost. I will show how we can achieve professional level translation quality using non-professional translators, at a cost that is an order of magnitude cheaper than professional translation. This makes it feasible to collect enough data to train statistical models, which I demonstrate for Arabic dialect translation. (3) Translation without bilingual training data In addition to using crowdsourcing to reduce costs, I am introducing new methods that remove the dependence on expensive bilingual data by redesigning translation models so that they can be trained using inexpensive monolingual data. I will show end-to-end translation performance for a system trained only using a small bilingual dictionary and two large monolingual texts. (4) Natural language understanding I will show how the data and methods from translation can be applied to the classic AI problem of understanding language. I will show how to learn paraphrases and other meaning-preserving English transformations using bilingual data. I will demonstrate how these can be used for a variety of monolingual text-to-text generation tasks like sentence compression, simplification, English as a Second Language (ESL) error correction, and poetry generation.

Speaker Biography

Chris Callison-Burch is currently an Associate Research Professor in the Computer Science Department at Johns Hopkins University, where he has built a research group within the Center for Language and Speech Processing (CLSP). In the fall he will be starting a tenure-track job at the Computer and Information Sciences Department at the University of Pennsylvania. He received his PhD from the University of Edinburgh's School of Informatics in 2008 and his bachelors from Stanford University's Symbolic Systems Program in 2000. His research focuses on statistical machine translation, crowdsourcing, and broad coverage semantics via paraphrasing. He has contributed to the research community by releasing open source software like Moses and Joshua, and by organizing the shared tasks for the annual Workshop on Statistical Machine Translation (WMT). He is the Chair of the North American chapter of the Association for Computational Linguistics (NAACL) and serves on the editorial boards of Computational Linguistics and the Transactions of the ACL (TACL).

February 4, 2013 | 12:00 p.m.

“Google's Speech Internationalization Project: From 1 to 300 Languages and Beyond”

Pedro Moreno, Google

[abstract] [biography]

Abstract

Presentation Slides The speech team at google has built speech recognition systems in more that 40 languages in little more than 3 years.  In this talk I will describe the history of this project and what technologies have been developed to achieve this goal.  I'll explore a bit some of the acoustic modeling, lexicon, language modeling, infrastructure and even social engineering techniques used to achieve our ultimate goal, to build speech recognition systems in the top 300 languages of the planet as fast as possible.

Speaker Biography

Dr. Pedro J. Moreno leads the speech internationalization engineering group at the Android division of Google.  His team is in charge of the infrastructure, engineering, and research needed to deploy and maintain multilingual speech recognition services worldwide. He joined google 9 years ago after working as a research scientist at HP Labs.  During his work at HP he worked mostly in audio indexing systems.  Dr. Moreno completed his Ph.D. studies at Carnegie Mellon University under the direction of Prof. Richard Stern. His work there was focused on noise robustness in speech recognition systems.  His Ph.D. studies were sponsored by a Fulbright scholarship.  Before that he completed an Electrical Engineering degree at Universidad Politecnica de Madrid, Spain

January 29, 2013 | 12:00 p.m.

“Aiding Human Translators”

Philipp Koehn, University of Edinburgh

[abstract] [biography]

Abstract

Despite all the recent successes of machine translation, when it comes to high quality publishable translation, human translators are still unchallenged. Since we can't beat them, can we help them to become more productive? I will talk about some recent work on developing assistance tools for human translators.  You can also check out a prototype at http://www.caitra.org/ and learn about our ongoing European projects CASMACAT at http://www.casmacat.eu/ and MATECAT at http://www.matecat.com/

Speaker Biography

Philipp Koehn is Professor of Machine Translation at the School of Informatics at the University of Edinburgh, Scotland.  He received his PhD at the University of Southern California and spent a year as postdoctoral researcher at MIT.  He is well-known in the field of statistical machine translation for the leading open source toolkit Moses, the organization of the annual Workshop on Statistical Machine Translation and its evaluation campaign as well as the Machine Translation Marathon. He is founding president of the ACL SIG MT and currently serves a vice president-elect of the ACL SIG DAT.  He has published over 80 papers and the textbook in the field. He manages a number of EU and DARPA funded research projects aimed at morpho-syntactic models, machine learning methods and computer assisted translation tools.

Back to Top

2012

December 7, 2012 | 12:00 p.m.

“Probablistic Linear Discriminant Analysis of i--Vector Posterior Distributions”   Video Available

Sandro Cumani, Brno University of Technology

[abstract]

Abstract

The i--vector extraction process is characterized by an intrinsic uncertainty represented by the i--vector posterior covariance.  The usual PLDA models, however, ignore such uncertainty and perform speaker inference based only on point estimates of the i--vector distributions. We therefore propose a new PLDA model which takes into account the i--vector uncertainty.  Since utterance length is the main factor affecting i--vector covariances, we designed a set of experiments to compare the proposed model and the classical PLDA model over segments with short and missmatching durations. The results show that the proposed model allows to improve the accuracy on short segments while retainig the accuracy of the original PLDA over long utterances.

December 7, 2012 | 12:30 p.m.

“Patrol Team Speaker Identification System for DARPA RATS Evaluation”   Video Available

Oldrich Plchot, Brno University of Technology

[abstract]

Abstract

I will descrine the speaker identification (SID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. I will describe the general architecture of the system and I will address the issues we are facing in the RATS project. We will also discuss the strategy for the next evaluation and areas where the system can be improved.

November 27, 2012 | 12:00 p.m.

“Bridging the Gap: From Sounds to Words”

Micha Elsner, Ohio State University

[abstract] [biography]

Abstract

During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary inpronunciation-- for instance "you" might be realized as 'you' with a full vowel or reduced to 'yeh' with a schwa. Previous models of acquisition have generally tackled these problems in isolation, yet behavioral evidence suggests infants acquire lexical and phonetic knowledge simultaneously. I will present ongoing research on constructing a Bayesian model which can simultaneously group together phonetic variants of the same lexical item, learn a probabilistic language model predicting the next word in an utterance from its context, and learn a model of pronunciation variability based on articulatory features.I will discuss a model which takes word boundaries as given and focuses on clustering the lexical items (published at ACL 2012). I will also give preliminary results for a model which searches for word boundaries at the same time as performing the clustering.

Speaker Biography

Micha Elsner is an Assistant Professor of Linguistics at the Ohio State University, where he started in August. He completed his PhD in 2011 at Brown University, working on models of local coherence. He then worked on Bayesian models of language acquisition as a postdoctoral researcher at the University of Edinburgh.

November 20, 2012 | 12:00 p.m.

“Advances in Deterministic Dependency Parsing”   Video Available

Yoav Goldberg, Google Research

[abstract] [biography]

Abstract

Transition-based dependency parsers are fast, surprisingly accurate and easy to implement. However, many formal aspects of these parsing systems are not well understood. Specifically, little can be said about the effect of individual parsing decisions on the global parse structure. We help bridge this gap by introducing a property which holds for many transition systems (including the popular arc-eager system) and allows us to reason about the global effects of individual parsing actions in these systems. This kind of reasoning paves the path to many interesting applications.I will describe two immediate applications: (1) a novel arc-constrained decoding algorithm ("find a tree that includes the following edges") for transition-based parsers, and (2) a novel oracle which can return a *set* of optimal actions for *any* (configuration,gold-tree) pair, in sharp contrast to traditional oracles that return a single, static sequence of transitions. Thenew oracles allows for a better training procedure which teaches the parser to respond optimally to non-optimal configurations and helps in mitigating error-propagation mistakes. The new oracle and training procedure produce greedy parsers that greatly outperform parsers trained with the traditional, static oracles on a wide range of datasets.This is a joint work with Joakim Nivre.

Speaker Biography

Yoav Goldberg is a post-doctoral researcher at Google Research NY, working primarily on syntactic parsing and its applications. Prior to that, he completed his PhD in Ben Gurion University, where he worked with Prof. Michael Elhadad on automatic processing of Modern Hebrew, a specimen of a morphologically rich language. He spent a summer at USC/ISI working on Machine Translation with Kevin Knight, David Chiang and Liang Huang. Coming February, Yoav will leave Google to assume a Tenure-track senior-lecturer position ("assistant professorship") in Bar Ilan University's Computer Science Department.

November 13, 2012 | 12:00 p.m.

“From Bases to Exemplars, and From Separation to Understanding”   Video Available

Paris Smaragdis, University of Illinois at Urbana-Champaign

[abstract] [biography]

Abstract

Audio source separation is an extremely useful process but most of the time not a goal by itself. Even though most research focuses on better separation quality, ultimately separation is needed so that we can perform tasks such as noisy speech recognition, music analysis, single-source editing, etc.  In this talk I'll present some recent work on audio source separation that extends the idea of basis functions to that of using 'exemplars' and then builds off that idea in order to provide direct computation of some of the above goals without having to resort to an intermediate separation step. In order to do so I'll discuss some of the interesting geometric properties of mixed audio signals and how one can employ massively large decommissions with aggressive sparsity settings in order to achieve the above results.

Speaker Biography

Paris Smaragdis is faculty in the Computer Science and the Electrical and Computer Science departments at the University of illinois at Urbana-Champaign. He completed his graduate and postdoctoral studies at MIT, where he conducted research on computational perception and audio processing. Prior to the University of Illinois he was a senior research scientist at Adobe Systems and a research scientist at Mitsubishi Electric Research Labs, during which time he was selected by the MIT Technology Review as one of the top 35 young innovators of 2006. Paris' research interests lie in the intersection of machine learning and signal processing, especially as they apply to audio problems.

November 6, 2012 | 12:00 p.m.

“New Machine Learning Tools for Structured Prediction”   Video Available

Veselin Stoyanov, Johns Hopkins HLTCOE

[abstract] [biography]

Abstract

I am motivated by structured prediction problems in NLP and social network analysis. Markov Random Fields (MRFs) and other Probabilistic Graphical Models (PGMs) are suitable for representing structured prediction: they can model joint distributions and utilize standard inference procedures. MRFs also provide a principled ways for incorporating background knowledge and combining multiple systems. Two properties of structured prediction problems make learning challenging. First, structured prediction almost inevitably requires approximation to inference, decoding or model structure. Second, unlike the traditional ML setting that assumes i.i.d. training and test data, structured learning problems often consist of a single example used both for training and prediction. We address the two issues above. First, we argue that the presence of approximations in MRF-based systems requires a novel perspective on training. Instead of maximizing data likelihood, one should seek the parameters that minimize the empirical risk of the entire imperfect system. We show how to locally optimize this risk using error back-propagation and local optimization. On four NLP problems our approach significantly reduces loss on test data compared to choosing approximate MAP parameters. Second, we utilize data imputation in the limited data setting. At test time we use sampling to impute data that is a more accurate approximation of the data distribution. We use our risk minimization techniques to train fast discriminative models on the imputed data. This we can: (i) train discriminative models given a single training and test example; (ii) train generative/discriminative hybrids that can incorporate useful priors and learn from semi-supervised data.

Speaker Biography

Veselin Stoyanov is a postdoctoral researcher at the Human Language Technology Center of Excellence (HLT-COE) at Johns Hopkins University (JHU). Previously he spent two years working with Prof. Jason Eisner at JHU's Center for Language and Speech Processing supported by a Computing Innovation Postdoctoral Fellowship. He received the Ph.D. degree from Cornell University under the supervision of Prof. Claire Cardie in 2009 and the Honors B.Sc. from the University of Delaware in 2002. His research interests reside in the intersection of Machine Learning and Computational Linguistics. More precisely, he is interested in using probabilistic models for complex structured problems with applications to knowledge base population, modeling social networks, extracting information from text and coreference resolution. In addition to the CIFellowship, Ves Stoyanov is the recipient of an NSF Graduate Research Fellowship and other academic honors.

October 23, 2012 | 12:00 p.m.

“New Waves of Innovation in Large-Scale Speech Technology Ignited by Deep Learning”   Video Available

Li Deng, Microsoft Research

[abstract] [biography]

Abstract

Semantic information embedded in the speech signal manifests itself in a dynamic process rooted in the deep linguistic hierarchy as an intrinsic part of the human cognitive system. Modeling both the dynamic process and the deep structure for advancing speech technology has been an active pursuit for over more than 20 years, but it is only within past two years that technological breakthrough has been created by a methodology commonly referred to as "deep learning". Deep Belief Net (DBN) and the related deep neural nets are recently being used to supersede the Gaussian mixture model component in HMM-based speech recognition, and has produced dramatic error rate reduction in both phone recognition and large vocabulary speech recognition of industry scale while keeping the HMM component intact. On the other hand, the (constrained) Dynamic Bayesian Networks have been developed for many years to improve the dynamic models of speech aimed to overcome the IID assumption as a key weakness of the HMM, with a set of techniques commonly known as hidden dynamic/trajectory models or articulatory-like segmental representations. A history of these two largely separate lines of research will be critically reviewed and analyzed in the context of modeling the deep and dynamic linguistic hierarchy for advancing speech recognition technology. The first wave of innovation has successfully unseated Gaussian mixture model and MFCC-like features --- two of the three main pillars of the 20-year-old technology in speech recognition. Future directions will be discussed and analyzed on supplanting the final pillar --- HMM --- where frame-level scores are to be enhanced to dynamic-segment scores through new waves of innovation capitalizing on multiple lines of research that has enriched our knowledge of the deep, dynamic process of human speech.

Speaker Biography

Li Deng received the Ph.D. from Univ. Wisconsin-Madison. He was an Assistant (1989-1992), Associate (1992-1996), and Full Professor (1996-1999) at the University of Waterloo, Ontario, Canada. He then joined Microsoft Research, Redmond, where he is currently a Principal Researcher and where he received Microsoft Research Technology Transfer, Goldstar, and Achievement Awards. Prior to MSR, he also worked or taught at Massachusetts Institute of Technology, ATR Interpreting Telecom. Research Lab. (Kyoto, Japan), and HKUST. He has published over 300 refereed papers in leading journals/conferences and 3 books covering broad areas of human language technology and machine learning. He is a Fellow of the Acoustical Society of America, a Fellow of the IEEE, and a Fellow of the International Speech Communication Association. He is an inventor or co-inventor of over 50 granted US, Japanese, or international patents. Recently, he served as Editor-in-Chief for IEEE Signal Processing Magazine (2009-2011), which ranked first in year 2010 and 2011 among all 247 publications within the Electrical and Electronics Engineering Category worldwide in terms of its impact factor, and for which he received the 2011 IEEE SPS Meritorious Service Award. He currently serves as Editor-in-Chief for IEEE Transactions on Audio, Speech and Language Processing. His technical work over the past three years brought the power of deep learning into the speech recognition and signal processing fields.

October 9, 2012 | 12:00 p.m.

“Beyond MaltParser - Recent Advances in Transition-Based Dependency Parsing”   Video Available

Joakim Nivre, Uppsala University

[abstract] [biography]

Abstract

The transition-based approach to dependency parsing has become popular thanks to its simplicity and efficiency. Systems like MaltParser achieve linear-time parsing with projective dependency trees using locally trained classifiers to predict the next parsing action and greedy best-first search to retrieve the optimal parse tree, assuming that the input sentence has been morphologically disambiguated using a part-of-speech tagger. In this talk, I survey recent developments in transition-based dependency parsing that address some of the limitations of the basic transition-based approach. First, I show how globally trained classifiers and beam search can be used to mitigate error propagation and enable richer feature representations. Secondly, I discuss different methods for extending the coverage to non-projective trees, which are required for linguistic adequacy in many languages. Finally, I present a model for joint tagging and parsing that leads to improvements in both tagging and parsing accuracy as compared to the standard pipeline approach.

Speaker Biography

Joakim Nivre is Professor of Computational Linguistics at Uppsala University. He holds a Ph.D. in General Linguistics from the University of Gothenburg and a Ph.D. in Computer Science from Växjö University. Joakim's research focuses on data-driven methods for natural language processing, in particular for syntactic and semantic analysis. He is one of the main developers of the transition-based approach to syntactic dependency parsing, described in his 2006 book Inductive Dependency Parsing and implemented in the MaltParser system. Joakim's current research interests include the analysis of mildly non-projective dependency structures, the integration of morphological and syntactic processing for richly inflected languages, and methods for cross-framework parser evaluation. He has produced over 150 scientific publications, including 3 books, and has given nearly 70 invited talks at conferences and institutions around the world. He is the current secretary of the European Chapter of the Association for Computational Linguistics.

October 2, 2012 | 12:00 p.m.

“Making Computers Good Listeners”   Video Available

Joseph Keshet, TTI Chicago

[abstract] [biography]

Abstract

A typical problem in speech and language processing has a very large number of training examples, is sequential, highly structured, and has a unique measure of performance, such as the word error rate in speech recognition, or the BLEU score in machine translation. The simple binary classification problem typically explored in machine learning is no longer adequate for the complex decision problems encountered in speech and language applications. Binary classifiers cannot handle the sequential nature of these problems, and are designed to minimize the zero-one loss, i.e., correct or incorrect, rather than the desired measure of performance.In addition, the current state-of-the-art models in speech and language processing are generative models that capture some temporal dependencies, such as Hidden Markov Models (HMMs). While such models have been immensely important in the development of accurate large-scale speech processing applications, and in speech recognition in particular, theoretical and experimental evidence have led to a wide-spread belief that such models have nearly reached a performance ceiling.In this talk, I first present a new theorem stating that a general learning update rule directly corresponds to the gradient of the desired measure of performance. I present a new algorithm for phoneme-to-speech alignment based on this update rule, which surpasses all previously reported results on a standard benchmark. I show a generalization of the theorem to training non-linear models such as HMMs, and present empirical results on phoneme recognition task which surpass results from HMMs trained with all other training techniques.I will then present the problem of automatic voice onset time (VOT) measurement, one of the most important variables measured in phonetic research and medical speech analysis. I will present a learning algorithm for VOT measurement which outperforms previous work and performs near human inter-judge reliability. I will discuss the algorithm’s implications for tele-monitoring of Parkinson’s disease, and for predicting the effectiveness of chemo-radiotherapy treatment of head and neck cancer.

Speaker Biography

Joseph Keshet received his B.Sc. and M.Sc. degrees in Electrical Engineering in 1994 and 2002, respectively, from Tel Aviv University. He received his Ph.D. in Computer Science from The School of Computer Science and Engineering at The Hebrew University of Jerusalem in 2007. From 1995 to 2002 he was a researcher at IDF, and won the prestigious Israeli award, "Israel Defense Prize", for outstanding research and development achievements. From 2007 to 2009 he was a post-doctoral researcher at IDIAP Research Institute in Switzerland. From 2009 He is a research assistant professor at TTI-Chicago, a philanthropically endowed academic computer science institute within the campus of university of Chicago. Dr. Keshet's research interests are in speech and language processing and machine learning. His current research focuses on the design, analysis and implementation of machine learning algorithms for the domain of speech and language processing.

September 28, 2012 | 12:00 p.m.

“Constrained Conditional Models: Integer Linear Programming Formulations for Natural Language Understanding”   Video Available

Dan Roth, University of Illinois at Urbana-Champaign

[abstract] [biography]

Abstract

Computational approaches to problems in Natural Language Understanding and Information Access and Extraction often involve assigning values to sets of interdependent variables.  Examples of tasks of interest include semantic role labeling (analyzing natural language text at the level of “who did what to whom, when and where”), syntactic parsing, information extraction (identifying events, entities and relations), transliteration of names, and textual entailment (determining whether one utterance is a likely consequence of another).  Over the last few years, one of the most successful approaches to studying these problems involves Constrained Conditional Models (CCMs), an Integer Learning Programming formulation that augments probabilistic models with declarative constraints as a way to support such decisions.   I will present research within this framework, discussing old and new results pertaining to inference issues, learning algorithms for training these global models, and the interaction between learning and inference.

Speaker Biography

Dan Roth is a Professor in the Department of Computer Science and the Beckman Institute at the University of Illinois at Urbana-Champaign and a University of Illinois Scholar. He is the director of a DHS Center for Multimodal Information Access & Synthesis (MIAS) and holds faculty positions in Statistics, Linguistics and at the School of Library and Information Sciences. Roth is a Fellow of the ACM and of AAAI for his contributions to Machine Learning and to Natural Language Processing. He has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely by the research community. Prof. Roth has given keynote talks in major conferences, including AAAI, EMNLP and ECML and presented several tutorials in universities and major conferences. Roth was the program chair of AAAI’11, ACL’03 and CoNLL'02, has been on the editorial board of several journals in his research areas and has won several teaching and paper awards.  Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D in Computer Science from Harvard University in 1995.  

September 14, 2012 | 12:00 p.m.

“OUCH (Outing Unfortunate Characteristics of HiddenMarkovModels) or What's Wrong with Speech Recognition and What Can We Do About it?”   Video Available

Jordan Cohen, Spelamode

[abstract] [biography]

Abstract

Speech recognition has become a critical part of the user interface in mobile, telephone, and other technology applications. However, current recognition systems consistently underperform their users' and designers' expectations. This talk reports on a project, OUCH, which investigates one aspect of the most commonly used speech recognition algorithms. In most Hidden Markov Model implementations, frame-to-frame independence is assumed by the model, but in fact the frame observations are not independent. This mismatch between the model assumptions and the data have been well known. Following work of Gillick and Wegmann, the OUCH project is measuring and cataloging some of the implications of these assumptions, using a procedure which does not fix the model, but rather which creates speech data which satisfies the model assumptions. (See Don't Multiply Lightly: Quantifying Problems with the Acoustic Model Assumptions in Speech RecognitionDan Gillick, Larry Gillick, and Steven Wegmann, ASRU, 2011)In addition to our work in modeling, we are surveying the field using a snowball technique to document how the researchers and engineers in speech and language technology view the current situation. This talk with review our modeling findings to date, and will offer a preliminary look at our survey.

Speaker Biography

Jordan Cohen is a group leader in the OUCH project at Berkeley, and founder and technologist at Spelamode Consulting. He was the principal investigator for GALE at SRI, the CTO of Voice Signal Technologies, the Director of Business Relations at Dragon, and a member of the research staff at IDA and IBM. Dr. Cohen assists companies with technology issues, and he is engaged in intellectual property evaluation and litigation.

September 11, 2012 | 12:00 p.m.

“Weak and Strong Learning of Context-Free Grammars”   Video Available

Alexander Clark, Royal Holloway University of London

[abstract] [biography]

Abstract

Rapid progress has been made in the last few years in the 'unsupervised' learning of context-free grammars using distributional techniques: a core challenge for theoretical linguistics and NLP. However these techniques are on their own of limited value because they are merely weak results -- we learn a grammar that generates the right strings, but not necessarily a grammar that defines the right structures. In this talk we will look at various ways of moving from weak learning algorithms to strong algorithms that can provably learn also the correct structures. Of course in order to do this we need to define a mathematically precise notion of syntactic structure. We will present a new theoretical approach to this based on considering transformations of grammars through morphisms of algebraic structures that interpret grammars. Under this model we can say that the simplest/smallest grammar for a language will always use a certain set of syntactic categories, and a certain set of lexical categories; these categories will be drawn from the syntactic concept lattice, a basis for several weak learning algorithms for CFGs. This means that under mild Bayesian assumptions we can consider only grammars that use these categories; this leads to some nontrivial predictions about the nature of syntactic structure in natural languages.

Speaker Biography

Alexander Clark is in the Department of Computer Science at Royal Holloway, University of London. His research interests are in grammatical inference, theoretical and mathematical linguistics and unsupervised learning. He is currently president of SIGNLL and chair of the steering committee of the ICGI; a book coauthored with Shalom Lappin, 'Linguistic Nativism and the Poverty of the Stimulus' was published by Wiley-Blackwell in 2011.

September 4, 2012 | 12:00 p.m.

“How Geometric Should Our Semantic Models Be?”   Video Available

Katrin Erk, University of Texas

[abstract] [biography]

Abstract

Presentation Slides Vector space models represent the meaning of a word through the contexts in which it has been observed. Each word becomes a point in a high-dimensional space in which the dimensions stand for observed context items. One advantage of these models is that they can be acquired from corpora in an unsupervised fashion. Another advantage is that they can represent word meaning in context flexibly and without recourse to dictionary senses: Each occurrence gets its own point in space; the points for different occurrences may cluster into senses, but they do not have to. Recently, there have been a number of approaches aiming to extend the vector space success story from word representations to the representation of whole sentences. However, they have a lot of technical challenges to meet (apart from the open question of whether all semantics tasks can be reduced to similarity judgments). An alternative is to combine the depth and rigor of logical form with the flexibility of vector space approaches.

Speaker Biography

Katrin Erk is an associate professor in the Department of Linguistics at the University of Texas at Austin. She completed her dissertation on tree description languages and ellipsis at Saarland University in 2002. From 2002 to 2006, she held a researcher position at Saarland University, working on manual and automatic frame-semantic analysis. Her current research focuses on computational models for word meaning and the automatic acquisition of lexical information from text corpora.

July 25, 2012 | 10:30 a.m. to Noon

“How Does the Brain Solve Visual Object Recognition”   Video Available

James DiCarlo, McGovern Institute for Brain Research at MIT

[abstract] [biography]

Abstract

Visual object recognition is a fundamental building block of memory and cognition, but remains a central unsolved problem in systems neuroscience, human psychophysics, and computer vision (engineering). The computational crux of visual object recognition is that the recognition system must somehow be robust to tremendous image variation produced by different “views” of each object -- the so-called, “invariance problem.” The primate brain is an example of a powerful recognition system and my laboratory aims to understand and emulate its solution to this problem. A key step in isolating and constraining the brain’s solution is to first find the patterns of neuronal activity and ways to read that neuronal activity that quantitatively express the brain’s answer to visual recognition. To that end, we have previously shown that a part of the primate ventral visual stream (inferior temporal cortex, IT) rapidly and automatically conveys neuronal population rate codes that qualitatively solve the invariance problem for vision. While this is a good start, it only weakly constrains the brain’s solution. Thus, we have recently set the bar higher -- are such codes quantitatively sufficient to explain behavioral performance? In this talk, I will show how primate systems neuroscience combined with human psychophysics reveals that some (but not all) IT population codes are sufficient to explain human performance on invariant object recognition. This stands in stark contrast to all tested codes in earlier visual areas and computer vision codes, which are all insufficient (falsified by experimental data). These results argue that these rapidly and automatically computed IT population codes are common to primate brains, and that they are the direct substrate of object recognition performance. While this progress constrains and frames the kinds of algorithms we should be searching for in the primate brain, it does not directly reveal their key principles of image encoding or the myriad key “details” of that encoding. While this remains an area of active research, I will conclude by outlining how we aim to combine our experimental results in unsupervised learning with novel computer vision technology to guide us toward discovery of the true underlying cortical algorithm.  

Speaker Biography

DiCarlo joined the McGovern Institute in 2002, and is an associate professor in the Department of Brain and Cognitive Sciences. He received his Ph.D. and M.D. from Johns Hopkins University and did postdoctoral work at Baylor College of Medicine. In 1998, he received the Martin and Carol Macht Young Investigator Research Prize from Johns Hopkins University. In 2002, he received an Alfred P. Sloan Research fellowship and a Pew Scholar Award. He received MIT's Surdna Research Foundation Award and its School of Science Prize for Excellence in Undergraduate Teaching in 2005, and he won a Neuroscience Scholar Award from the McKnight Foundation in 2006.

July 16, 2012

“Clustering Techniques for Phonetic Categories and Their Implications for Phonology”

William Idsardi, University of Maryland

[abstract]

Abstract

I will review some recent work in collaboration with Ewan Dunbar and Brian Dillon on the use of unsupervised clustering techniques to discover vowel categories. The novel and important point of this work is to try to discover categories with predictable variants, i.e. phonemes with their related allophones. We achieve this by finding categories and transforms on the categories rather than first finding a larger set of more detailed categories (phones) and then later grouping the induced categories into more abstract categories (phonemes). A similar approach can be used to cluster "higher-order invariants" for consonants, in this case locus equations. Finally, we will examine some of the implications of this work for other problems in phonology such as speaker variation and incomplete neutralization.  

July 11, 2012 | 10:30 - Noon

“Motion Magnification and Motion Denoising”   Video Available

William T. Freeman, Massachusetts Institute of Technology

[abstract] [biography]

Abstract

  I'll present two topics relating to the analysis and re-display of motion: (1) Motion denoising: We'd like to take a video sequence and break it into different components, corresponding to each different physical process observed in the video sequence.  (Then you could modify each component separately and re-combine them).  Here's a first step in that direction: we separate a video sequence into its short-term (motion noise) and longer-term components.  The machinery behind this is an MRF.  Motion is never explicitly computed, allowing to manipulate sequences where occlusion artifacts would otherwise. (2) Motion magnification:  We've developed a new, simple and fast way to magnify and re-render small motions in videos.  This has complementary strengths to the SIGGRAPH 2005 work of Liu et al, and is 100,000 times faster, making it practical for real-time applications, making a real-time motion microscope possible.

Speaker Biography

  William T. Freeman is Professor of Electrical Engineering and Computer Science at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT, joining the faculty in 2001. From 1992 - 2001 he worked at Mitsubishi Electric Research Labs (MERL), in Cambridge, MA, most recently as Sr. Research Scientist and Associate Director. He studied computer vision for his PhD in 1992 from the Massachusetts Institute of Technology, and received a BS in physics and MS in electrical engineering from Stanford in 1979, and an MS in applied physics from Cornell in 1981. His current research interests include machine learning applied to computer vision, Bayesian models of visual perception, and computational photography. He received outstanding paper awards at computer vision or machine learning conferences in 1997, 2006 and 2009. Previous research topics include steerable filters and pyramids, the generic viewpoint assumption, color constancy, computer vision for computer games, and bilinear models for separating style and content. He holds 30 patents. From 1981 - 1987, he worked at the Polaroid Corporation . There he co-developed an electronic printer (Polaroid Palette) , and developed algorithms for color image reconstruction which are used in Polaroid's electronic camera . In 1987-88, Dr. Freeman was a Foreign Expert at theTaiyuan University of Technology , P. R. of China. Dr. Freeman was an Associate Editor of IEEE Trans. on Pattern Analysis and Machine Intelligence (IEEE-PAMI), and a member of the IEEE PAMI TC Awards Committee. He is active in the program or organizing committees of Computer Vision and Pattern Recognition (CVPR), the International Conference on Computer Vision (ICCV), Neural Information Processing Systems (NIPS), and SIGGRAPH. He was the program co-chair for ICCV 2005, and will be program co-chair for CVPR 2013.

July 3, 2012 | 10:30am

“Under Pressure: Transforming the Way We Think About and Use Water in the Home”   Video Available

Jon Froehlich, University of Maryland, College Park

[abstract] [biography]

Abstract

  Cities across the world are facing an escalating demand for potable water and sanitation infrastructure due to growing populations, higher population densities and warmer climates. According to the United Nations, this is one of the most pressing issues of the century. As new sources of water become more environmentally and economically costly to extract, water suppliers and governments are shifting their focus from finding new supplies to using existing supplies more efficiently. One challenge in improving residential efficiency, however, is the lack of awareness that occupants have about their in-home water consumption habits. This disconnect makes it difficult, even for motivated individuals, to make informed decisions about what steps can be taken to conserve.   To help address this problem, my research focuses on creating new types of sensors to monitor and infer everyday human activity such as driving to work or taking a shower, then feeding back this sensed information in novel, engaging, and informative ways with the goal of increasing awareness and promoting environmentally responsible behavior. In this talk, I will present a novel, low-cost, and easy-to-install water sensing system called HydroSense, which infers usage data at the level of individual water fixtures from a single-sensing point and a real-time ambient water usage feedback display called Reflect2O, which leverages HydroSense’s data granularity to inform and promote efficient water usage practices in the home. My talk will emphasize the sensor and inference algorithm development, our two in-home evaluations, and our preliminary evaluations of our feedback visualization designs. Our goal is to reach a 15-20% reduction in water use amongst deployed homes, which, according to the American Water Works Association, would save approximately 2.7 billion gallons per day and more than $2 billion per year.   

Speaker Biography

Jon Froehlich is an Assistant Professor in the Department of Computer Science at the University of Maryland, College Park and a member of the Human-Computer Interaction Laboratory (HCIL) and the Institute for Advanced Computer Studies (UMIACS). His research focuses on building and studying interactive technology that addresses high value social issues such as environmental sustainability, computer accessibility, and personal health and wellness. Jon earned his PhD from the University of Washington (UW) in Computer Science in 2011 with a focus on Human-Computer Interaction (HCI) and Ubiquitous Computing (UbiComp). For his doctoral research, Jon was recognized with the Microsoft Research Graduate Fellowship (2008-2010) and the College of Engineering Graduate Student Research Innovator of the Year Award (2010). His work has been published in many top-tier academic venues including CHI, UbiComp, IJCAI, MobiSys and ICSE and has earned a best paper award and two best paper nominations. Jon received his MS in Information and Computer Science in 2004 from the University of California, Irvine.

May 4, 2012 | noon

“I know that voice: an interactive lecture-demonstration of human assisted speaker recognition”   Video Available

John J. Godfrey and Craig S. Greenberg, Department of Defense/ National Institute of Standards and Technology

[abstract] [biography]

Abstract

As we heard from our recent seminar guest Diana Sidtis, the ability to recognize other human voices, but most especially those of our family and close associates, has deep biological roots and an interesting neurological basis, including a sharp difference between familiar and unfamiliar voices. Computers make no such distinction. While we have made enormous progress in enabling computers to recognize voices, we have not paid much attention to how humans do it. We should – we need to know both the limits and the special capabilities of humans, both to improve our modeling and to enable computers to work hand in hand with humans in practical applications like forensics and biometrics. So how good are humans at utilizing automatic speaker recognition technology for performing speaker verification tasks? Don’t believe what you see on CSI or in the papers! and keep an eye on the case surrounding the tragic death of Trayvon Martin in Florida which is likely to involve such matters. The 2010 NIST Speaker Recognition Evaluation (SRE10) included a test of Human Assisted Speaker Recognition (HASR) in which systems based in whole or in part on human expertise were evaluated on limited sets of trials. Results were submitted for 20 systems from 15 sites from 6 countries. The performance results suggest that the chosen trials were indeed difficult, as is often the case in real- life situations, and that the HASR systems did not appear to perform as well as the best fully automatic systems on these trials. This does not mean that machines are simply, always, everywhere “better” than people at speaker recognition. But what does it mean? This is worth discussing. This lecture-demonstration will provide a live, interactive speaker recognition exercise for the audience, giving everyone a firsthand experience of the task a human forensic examiner often faces. Prepared with such experience, the audience will then hear the highlights of the NIST HASR evaluations.

Speaker Biography

John Godfrey received his PhD in Linguistics from Georgetown University, did a postdoc at AMRL in Dayton, and spent 10 years at UT-Dallas’ Callier Center as an Assist./Assoc. Professor, where he focused on speech perception and psycholinguistics. He later joined Texas Instruments Speech Research Group where, in addition to phonetics research, he worked on corpus-based evaluation, designing and collecting corpora such as: Wall Street Journal, TI-MIT, ATIS, and SWITCHBOARD. It is widely acknowledged that these helped drive speech research for the next decade and more. He also served as the first Executive Director of the LDC, creating the infrastructure for evaluation-based “big data” research in HLT ever since. In 1999 he became Chief of HLT Research at NSA where he oversaw both government and external R&D efforts in Speaker, Language and Speech Recognition, as well as the annual NIST evaluations in these areas. His strategic responsibilities also included liaison with academic and industrial labs, DARPA, IARPA, and NSF. In recent years his research group’s success on classified applications has become widely known and demonstrated in the Intelligence Community. They won the NSA Research Team of the Year award in 2010. As HLT Chief Scientist for NSA Research, he also conducts and oversees research in speaker recognition by man and machine. Craig Greenberg received his B.A.(Hons.) degree in Logic, Information, & Computation from the University of Pennsylvania(2007), and his B.M. degree from Vanderbilt University(2003). He is currently working toward his M.S. degree (to be awarded in May 2012) in Applied Mathematics at Johns Hopkins University in the Engineering and Applied Science Program for Professionals. He works as a Mathematician at the Gaithersburg, Maryland campus of the National Institute of Standards and Technology (NIST) in the areas of speaker recognition and language recognition. Previous positions he has held include: Computer Scientist Intern at the National Institute of Standards and Technology, Research Assistant for Professor Mitch Marcus at the University of Pennsylvania, Programmer at the Linguistic Data Consortium, and English Language Annotator at the Institute for Research in Cognitive Science. Mr. Greenberg has been a member of the International Speech Communication Association (ISCA) since 2008. He has received two official letters of recognition for his contribution to speaker recognition evaluation.

April 24, 2012 | noon

“Not Just for Kids: Enriching Information Retrieval with Reading Level Metadata”

Kevyn Collins-Thompson, Microsoft Research

[abstract] [biography]

Abstract

A document isn't relevant - at least, not immediately -  if you can't understand it, yet search engines have traditionally ignored the problem of finding content at the right level of difficulty as an aspect of relevance.  Moreover, little is currently known about the nature of the Web, its users, and how users interact with content when seen through the lens of reading difficulty.  I'll present our recent research progress in combining reading difficulty prediction with information retrieval, including models, algorithms and large-scale data analysis.   Our results show how the availability of reading level metadata - especially in combination with topic metadata - opens up new and sometimes surprising possibilities for enriching search systems, from personalizing Web search results by reading level to predicting user and site expertise, improving result caption quality, and estimating searcher motivation. This talk includes joint work with Paul N. Bennett, Ryen White, Susan Dumais, Jin Young Kim, Sebastian de la Chica, and David Sontag.

Speaker Biography

Kevyn Collins-Thompson is a Researcher in the Context, Learning and User Experience for Search (CLUES) group at Microsoft Research (Redmond).  His research lies in an area combining information retrieval, machine learning, and computational linguistics, and focuses on models, algorithms, and evaluation methods for making search technology more reliable and effective. His recent work has explored algorithms and Web search applications for reading level prediction; optimization strategies that reduce the risk of applying risky retrieval algorithms like personalization and automatic query rewriting; and educational applications of IR such as intelligent tutoring systems.  Kevyn received his Ph.D. and M.Sc. from the Language Technologies Institute at Carnegie Mellon University and B.Math from the University of Waterloo.

April 17, 2012 | noon

“Factored Adaptation for Separating Speaker and Environment Variability”   Video Available

Mike Seltzer, Microsoft Research

[abstract] [biography]

Abstract

Acoustic model adaptation can reduce the degradation in speech recognition accuracy caused by mismatch between the speech seen at runtime and that seen in training. This mismatch is caused by many factors, including as the speaker and the environment. Standard data-driven adaptation techniques address any and all of these differences blindly. While this is a benefit, it can also be a drawback as its unknown precisely what mismatch is being compensated. This prevents the transforms from being reliably reused across sessions of an application that can be used in different environments such as voice search on a mobile phone. In this talk, I'll discuss our recent research in factored adaptation, which jointly compensates for acoustic mismatch in a manner that enables multiple sources of variability to be separated. By performing adaptation in this way, we can increase the utility of the adaptation data and more effectively reuse transforms across user sessions. The effectiveness of the proposed approach will be shown on a series of experiments on a small vocabulary noisy digits task and a large vocabulary voice search task.

Speaker Biography

Mike Seltzer received the Sc.B. with honors from Brown University in 1996, and M.S. and Ph.D. degrees from Carnegie Mellon University in 2000 and 2003, respectively, all in electrical engineering.  From 1996 to 1998, he was an applications engineer at Teradyne, Inc., Boston, MA working on semiconductor test solutions for mixed-signal devices.  From 1998 to 2003, he was a member of the Robust Speech Recognition group at Carnegie Mellon University. In 2003, Dr. Seltzer joined the Speech Technology Group at Microsoft Research, Redmond, WA. In 2006, Dr. Seltzer was awarded the Best Young Author paper award from the IEEE Signal Processing Society. From 2006 to 2008, he was a member of the Speech & Language Technical Committee (SLTC) and was the Editor-in-Chief of the SLTC e-Newsletter. He was a general co-chair of the 2008 International Workshop on Acoustic Echo and Noise Control and Publicity Chair of the 2008 IEEE Workshop on Spoken Language Technology. He is currently an Associate Editor of the IEEE Transactions on Audio, Speech and Language Processing. His current research interests include speech recognition in adverse acoustical environments, acoustic model adaptation, acoustic modeling, microphone array processing, and machine learning for speech and audio applications.

April 13, 2012 | noon

“Text Geolocation and Dating: Light-Weight Language Grounding”   Video Available

Jason Baldridge, University of Texas

[abstract] [biography]

Abstract

It used to be that computational linguists had to collaborate with roboticists in order to work on grounding language in the real world. However, since the advent of the internet, and particularly in the last decade, the world has been brought within digital scope. People's social and business interactions are increasingly mediated through a medium that is dominated by text. They tweet from places, express their opinions openly, give descriptions of photos, and generally reveal a great deal about themselves in doing so, including their location, gender, age, social status, relationships and more. In this talk, I'll discuss work on geolocation and dating of texts, that is, identifying a sets of latitude-longitude pairs and time periods that a document is about or related to. These applications and the models developed for them set the stage for deeper investigations into computational models of word meaning that go beyond standard word vectors and into augmented multi-component representations that include dimensions connected to the real world via geographic and temporal values and beyond.

Speaker Biography

Jason Baldridge is an associate professor in the Department of Linguistics at the University of Texas at Austin. He received his Ph.D. from the University of Edinburgh in 2002, where his doctoral dissertation was awarded the 2003 Beth Dissertation Prize from the European Association for Logic, Language, and Information. His main research interests include categorial grammars, parsing, semi-supervised learning, coreference resolution, and georeferencing. He is one of the co-creators of the Apache OpenNLP Toolkit and has been active for many years in the creation and promotion of open source software for natural language processing.

April 6, 2012 | noon

“In The Beginning was the Familiar Voice”   Video Available

Diana Sidtis, New York University

[abstract] [biography]

Abstract

Hearing and sound, compared with vision, are latecomers and second cousins in cultural and scientific history.  Still today, voice scientists are scattered across many disciplines and much of vocal function remains elusive. In modern linguistics, speech sounds have received more attention than voice, and in neuropsychology,  voices have only recently begun to catch up with faces.  Yet vocalization likely played a major role in biological evolution, appearing long before speech, and contributing crucially to survival and social behaviors in numerous species.  Paralinguistic communication by voice, an inborn ability arising from this evolutionary trajectory, has flowered to a prodigious competence in humans.  Voice information is multiplex, signaling affective, attitudinal, linguistic, pragmatic, physiological and psychological characteristics, as well as personal identity. The cues for this long list of characteristics likewise constitute a very large repertory of auditory-acoustic, physiological, perceptual, and speech-like parameters. This many-to-many relationship between characteristics signaled in the voice and the cues to them presents a great challenge to voice research. Further, because of important differences between familiar and unfamiliar voices, the role of the listener is key.  Studies of persons with focal brain damage indicate that perception of unfamiliar and recognition of familiar voices are independent and unordered cerebral abilities.  These and related findings lead to a model of voice perception that posits an interplay between featural analysis and pattern recognition.  From this perspective, the personally familiar voice, viewed as a complex auditory pattern for which idiosyncratic featural attributes arise adventitiously, is preeminent in evolution and in human communication.    

Speaker Biography

Diana Sidtis (formerly Van Lancker) is Professor of Communicative Sciences and Disorders at New York University and performs research at the Nathan Kline Institute for Psychiatric Research. An experienced clinician, her publications include numerous scholarly articles and book chapters.

March 30, 2012 | noon

“Machine Learning in the Loop”

John Langford, Yahoo! Research

[abstract] [biography]

Abstract

The traditional supervised machine learning paradigm is inadequate for a wide array of potential machine learning applications where the learning algorithm decides on an action in the real world and gets feedback about that action. This inadequacy results in kludgy systems, such as for ad targeting at internet companies or deep systemic mistrust and skepticism such as for personalized medicine or adaptive clinical trials. I will discuss a new formal basis, algorithms, and practical tricks for doing machine learning in this setting.

Speaker Biography

John Langford is a computer scientist, working as a senior researcher at  Yahoo! Research. He studied Physics and Computer Science at the  California Institute of Technology, earning a double bachelor's degree  in 1997, and received his Ph.D. from Carnegie Mellon University in 2002. Previously, he was affiliated with the Toyota Technological Institute  and IBM's Watson Research Center. He is also the author of the popular Machine Learning weblog, hunch.net and the principle developer of Vowpal Wabbit.

March 27, 2012 | noon

“Linguistic Structure Prediction with AD3”   Video Available

Noah Smith, Carnegie Mellon University

[abstract] [biography]

Abstract

In this talk, I will present AD3 (Alternating Directions Dual Decomposition), an algorithm for approximate MAP inference in loopy graphical models with discrete random variables, including structured prediction problems.  AD3 is simple to implement and well-suited to problems with hard constraints expressed in first-order logic.  It often finds the exact MAP solution, giving a certificate when it does; when it doesn't, it can be embedded within an exact branch and bound technique.  I'll show experimental results on two natural language processing tasks, dependency parsing and frame-semantic parsing.  This work was done in collaboration with Andre Martins, Dipanjan Das, Pedro Aguiar, Mario Figueiredo, and Eric Xing.

Speaker Biography

I am the Finmeccanica Associate Professor of Language Technologies and Machine Learning in the School of Computer Science at Carnegie Mellon University. I received my Ph.D. in Computer Science, as a Hertz Foundation Fellow, from Johns Hopkins University in 2006 and my B.S. in Computer Science and B.A. in Linguistics from the University of Maryland in 2001. My research interests include statistical natural language processing, especially unsupervised methods, machine learning for structured data, and applications of natural language processing. My book, Linguistic Structure Prediction, covers many of these topics. I serve on the editorial board of the journal Computational Linguistics and the Journal of Artificial Intelligence Research and received a best paper award at the ACL 2009 conference. My research group, Noah's ARK, is supported by the NSF (including an NSF CAREER award), DARPA, Qatar NRF, IARPA, ARO, Portugal FCT, and gifts from Google, HP Labs, IBM Research,and Yahoo Research.

March 13, 2012 | noon

“Measuring and Using Speech Production Information”   Video Available

Shri Narayanan, Viterbi School of Engineering/University of Southern California

[abstract] [biography]

Abstract

The human speech signal carries crucial information not only about communication intent but also affect, and emotions.  From a basic scientific perspective, understanding how such rich information is encoded in human speech can shed light on the underlying communication mechanisms. From a technological perspective, finding ways for automatically processing and decoding this complex information in speech continues to be of interest for a variety of applications. One line of work in this realm aims to connect these perspectives by creating technological advances to obtain insights about basic speech communication mechanisms and in utilizing direct information about human speech production to inform technology development. Both these engineering problems will be considered in this talk.   A longstanding challenge in speech production research has been the ability to examine real-time changes in the shaping of the vocal tract; a goal that has been furthered by imaging techniques such as ultrasound, movement tracking and magnetic resonance imaging. The spatial and temporal resolution afforded by these techniques, however, has limited the scope of the investigations that could be carried out.    In this talk, we will highlight recent advances that allow us to perform near real-time investigations on the dynamics of vocal tract shaping during speech. We will also use examples from recent and ongoing research to describe some of the methods and outcomes of processing such data, especially toward facilitating lingusitic analysis and modeling, and speech technology development. [Work supported by NIH, ONR, and NSF].

Speaker Biography

Shrikanth (Shri) Narayanan is Andrew J. Viterbi Professor of Engineering at the University of Southern California (USC), where he holds appointments as Professor of Electrical Engineering, Computer Science, Linguistics and Psychology, and as Director of the USC Ming Hsieh Institute. Prior to USC he was with AT&T Bell Labs and AT&T Research. His research focuses on human-centered information processing and communication technologies. He is a Fellow of the Acoustical Society of America, IEEE, and the American Association for the Advancement of Science (AAAS). He is also an Editor for the Computer Speech and Language and an Associate Editor for the IEEE Transactions on Multimedia, IEEE Transactions on Affective Computing, APSIPA Transactions on Signal and Information Processing and the Journal of the Acoustical Society of America. He is a recipient of several honors including Best Paper awards from the IEEE Signal Processing society in 2005 (with Alex Potamianos) and in 2009 (with Chul Min Lee) and selection as a Distinguished Lecturer for the IEEE Signal Processing society for 2010-11. He has published over 475 papers, and has twelve granted US patents.

March 6, 2012 | noon

“Fast, Accurate and Robust Multilingual Syntactic Analysis”   Video Available

Slav Petrov, Google

[abstract] [biography]

Abstract

To build computer systems that can 'understand' natural language, we need to go beyond bag-of-words models and take the grammatical structure of language into account. Part-of-speech tag sequences and dependency parse trees are one form of such structural analysis thatis easy to understand and use. This talk will cover three topics. First, I will present a coarse-to-fine architecture for dependency parsing that uses linear-time vine pruning and structured prediction cascades. The resulting pruned third-order model is twice as fast as an unpruned first-order model and compares favorably to a state-of-the-art transition-based parser in terms of speed and accuracy. I will then present a simple online algorithm for training structured prediction models with extrinsic loss functions. By tuning a parser with a loss function for machine translation reordering, we can show that parsing accuracy matters for downstream application quality, producing improvements of more than 1 BLEU point on an end-to-end machine translation task. Finally, I will present approaches for projecting part-of-speech taggers and syntactic parsers across language boundaries, allowing us to build models for languages with no labeled training data. Our projected models significantly outperform state-of-the-art unsupervised models and constitute a first step towards an universal parser. This is joint work with Ryan McDonald, Keith Hall, Dipanjan Das, Alexander Rush, Michael Ringgaard and Kuzman Ganchev (a.k.a. the Natural Language Parsing Team at Google).

Speaker Biography

Slav Petrov is a Senior Research Scientist in Google's New York office. He works on problems at the intersection of natural language processing and machine learning. He is in particular interested in syntactic parsing and its applications to machine translation and information extraction. He also teaches a class on Statistical Natural Language Processing at New York University every Fall. Prior to Google, Slav completed his PhD degree at UC Berkeley, where he worked with Dan Klein. He holds a Master's degree from the Free University of Berlin, and also spent a year as an exchange student at Duke University. Slav was a member of the FU-Fighters team that won the RoboCup 2004 world championship in robotic soccer and recently won a best paper award at ACL 2011 for his work on multilingual syntactic analysis. Slav grew up in Berlin, Germany, but is originally from Sofia, Bulgaria. He therefore considers himself a Berliner from Bulgaria. Whenever Bulgaria plays Germany in soccer, he supports Bulgaria.

March 2, 2012 | noon

“Efficient Search and Learning for Language Understanding and Translation”   Video Available

Liang Huang, Information Sciences Institute/ University of Southern California

[abstract] [biography]

Abstract

What is in common between translating from English into Chinese and compiling C++ into machine code? And yet what are the differences that make the former so much harder for computers? How can computers learn from human translators? This talk sketches an efficient (linear-time) "understanding + rewriting" paradigm for machine translation inspired by both human translators as well as compilers. In this paradigm, a source language sentence is first parsed into a syntactic tree, which is then recursively converted into a target language sentence via tree-to-string rewriting rules. In both "understanding" and "rewriting" stages, this paradigm closely resembles the efficiency and incrementality of both human processing and compiling. We will discuss these two stages in turn. First, for the "understanding" part, we present a linear-time approximate dynamic programming algorithm for incremental parsing that is as accurate as those much slower (cubic-time) chart parsers, while being as fast as those fast but lossy greedy parsers, thus getting the advantages of both worlds for the first time, achieving state-of-the-art speed and accuracy. But how do we efficiently learn such a parsing model with approximate inference from huge amounts of data? We propose a general framework for structured prediction based on the structured perceptron that is guaranteed to succeed with inexact search and works well in practice. Next, the "rewriting" stage translates these source-language parse trees into the target language. But parsing errors from the previous stage adversely affect translation quality. An obvious solution is to use the top-k parses, rather than the 1-best tree, but this only helps a little bit due to the limited scope of the k-best list. We instead propose a "forest-based approach", which translates a packed forest encoding *exponentially* many parses in a polynomial space by sharing common subtrees. Large-scale experiments showed very significant improvements in terms of translation quality, which outperforms the leading systems in literature. Like the "understanding" part, the translation algorithm here is also linear-time and incremental, thus resembles human translation. We conclude by drawing a few future directions.

Speaker Biography

Liang Huang is a Research Assistant Professor at University of Southern California (USC), and a Research Scientist at USC's Information Sciences Institute (ISI). He received his PhD from the University of Pennsylvania in 2008, and worked as a Research Scientist at Google before moving to USC/ISI. His research focuses on efficient search algorithms for natural language processing, esp. in parsing and machine translation, as well as related structured learning problems. His work received a Best Paper Award at ACL 2008, and three Best Paper Nominations at ACL 2007, EMNLP 2008, and ACL 2010.

February 21, 2012 | noon

“Bayesian Nonparametric Methods for Complex Dynamical Phenomena”   Video Available

Emily Fox, University of Pennsylvania

[abstract] [biography]

Abstract

  Markov switching processes, such as hidden Markov models (HMMs) and switching linear dynamical systems (SLDSs), are often used to describe rich classes of dynamical phenomena.  They describe complex temporal behavior via repeated returns to a set of simpler models: imagine, for example, a person alternating between walking, running and jumping behaviors, or a stock index switching between regimes of high and low volatility.Traditional modeling approaches for Markov switching processes typically assume a fixed, pre-specified number of dynamical models.  Here, in contrast, I develop Bayesian nonparametric approaches that define priors on an unbounded number of potential Markov models.  Using stochastic processes including the beta and Dirichlet process, I develop methods that allow the data to define the complexity of inferred classes of models, while permitting efficient computational algorithms for inference.  The new methodology also has generalizations for modeling and discovery of dynamic structure shared by multiple related time series.Interleaved throughout the talk are results from studies of the NIST speaker diarization database, stochastic volatility of a stock index, the dances of honeybees, and human motion capture videos.  

Speaker Biography

Emily B. Fox received the S.B. degree in 2004, M.Eng. degree in 2005, and E.E. degree in 2008 from the Department of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology (MIT). She is currently an assistant professor in the Wharton Statistics Department at the University of Pennsylvania. Her Ph.D. was advised by Prof. Alan Willsky in the Stochastic Systems Group, and she recently completed a postdoc in the Department of Statistical Science at Duke University working with Profs. Mike West and David Dunson. Emily is a recipient of the National Defense Science and Engineering Graduate (NDSEG), National Science Foundation (NSF) Graduate Research fellowships, and NSF Mathematical Sciences Postdoctoral Research Fellowship. She has also been awarded the 2009 Leonard J. Savage Thesis Award in Applied Methodology, the 2009 MIT EECS Jin-Au Kong Outstanding Doctoral Thesis Prize, the 2005 Chorafas Award for superior contributions in research, and the 2005 MIT EECS David Adler Memorial 2nd Place Master's Thesis Prize. Her research interests are in multivariate time series analysis and Bayesian nonparametric methods.

February 17, 2012 | noon

“Learning to Read the Web”   Video Available

Tom Mitchell, Carnegie Mellon University

[abstract] [biography]

Abstract

We describe our efforts to build a Never-Ending Language Learner (NELL) that runs 24 hours per day, forever, learning to read the web.  Each day NELL extracts (reads) more facts from the web, and integrates these into its growing knowledge base of beliefs.  Each day NELL also learns to read better than yesterday, enabling it to go back to the text it read yesterday, and extract more facts, more accurately.NELL has now been running 24 hours/day for over two years.  The result so far is a collection of 15 million interconnected beliefs (e.g., servedWtih(coffee, applePie), isA(applePie, bakedGood) ), that NELL is considering at different levels of confidence, along with hundreds of thousands of learned phrasings, morphoogical features, and web page structures that NELL uses to extract beliefs from the web.The approach implemented by NELL is based on three key ideas: (1) coupling the semi-supervised training of thousands of different functions that extract different types of information from different web sources, (2) automatically discovering new constraints that more tightly couple the training of these functions over time, and (3) a curriculum or sequence of increasing difficult learning tasks.  Track NELL's progress at http://rtw.ml.cmu.edu.

Speaker Biography

Tom M. Mitchell is the E. Fredkin University Professor and founding head of the Machine Learning Department at Carnegie Mellon University. His research interests lie in machine learning, artificial intelligence, and cognitive neuroscience.  Mitchell is a member of the U.S. National Academy of Engineering, a Fellow of the American Association for the Advancement of Science (AAAS), and a Fellow and Past President of the Association for the Advancement of Artificial Intelligence (AAAI).  Mitchell believes the field of machine learning will be the fastest growing branch of computer science during the 21st century.  His web page is http://www.cs.cmu.edu/~tom.

February 7, 2012 | noon

“Extending the search space of the Minimum Bayes-Risk Decoder for Machine Translation”

Shankar Kumar, Google

[abstract] [biography]

Abstract

A Minimum Bayes-Risk (MBR) decoder seeks the hypothesis with the least expected loss function for a given task. In the field of machine translation, the technique was originally developed for rescoring k-best lists of hypotheses generated by a statistical model. In this talk, I will present our work on extending the search space of the MBR decoder to very large lattices and hypergraphs that contain on an average about 10^81 hypotheses! I will describe conditions on the loss function that enable efficient implementation of the decoder on such large search spaces. I will focus on the BLEU score (Papineni et. al.) as the loss function for machine translation. To satisfy the conditions on the loss function, I will introduce a linear approximation to the BLEU score. The MBR decoder under linearized BLEU can be easily implemented using Weighted Finite State Transducers. However, the resulting procedure is computationally expensive for a moderately large lattice. The costly step is the computation of n-gram posterior probabilities. I will next present an approximate algorithm which is much faster than our Weighted Finite State Transducer approach. This algorithm extends to translation hypergraphs generated by systems based on synchronous context free grammars. Inspired by work in speech recognition, I will finally present an exact and yet efficient algorithm to compute n-gram posteriors on both lattices and hypergraphs. The linear approximation to BLEU contains parameters which were initially derived from n-gram precisions seen on our development data. I will describe how we employed Minimum Error Rate training (MERT) to estimate these parameters. In the final part of the talk, I will describe an MBR inspired scheme to learn a consensus model over the n-gram features of multiple underlying component models. This scheme works on a collection of hypergraphs or lattices produced by syntax or phrase based translation systems. MERT is used to train the parameters. The approach outperforms a pipeline of MBR decoding followed by standard system combination while using less total computation. This is joint work with Wolfgang Macherey, Roy Tromble, Chris Dyer, John DeNero, Franz Och and Ciprian Chelba.

Speaker Biography

Shankar Kumar is a researcher in the speech group at Google. Prior to this, he worked in the Google’s effort on language translation. His current interests are in statistical methods for language processing with a particular emphasis on speech recognition and translation.

January 31, 2012 | noon

“Scalable Topic Models”   Video Available

David Blei, Princeton University

[abstract] [biography]

Abstract

Probabilistic topic modeling provides a suite of tools for analyzing large collections of documents. Topic modeling algorithms can uncover the underlying themes of a collection and decompose its documents according to those themes. We can use topic models to explore the thematic structure of a corpus and to solve a variety of prediction problems about documents.At the center of a topic model is a hierarchical mixed-membership model, where each document exhibits a shared set of mixture components with individual (per-document) proportions. Our goal is to condition on the observed words of a collection and estimate the posterior distribution of the shared components and per-document proportions. When analyzing modern corpora, this amounts to posterior inference with billions of latent variables.How can we cope with such data?  In this talk, I will describe stochastic variational inference, an algorithm for computing with topic models that can handle very large document collections and even endless streams of documents. I will demonstrate the algorithm with models fitted to millions of articles. I will show how stochastic variational inference can be generalized to many kinds of hierarchical models. I will highlight several open questions and outstanding issues.(This is joint work with Francis Bach, Matt Hoffman, John Paisley, and Chong Wang.)

Speaker Biography

David Blei is an associate professor of Computer Science at Princeton University. His research interests include probabilistic topic models, graphical models, approximate posterior inference, and Bayesian nonparametrics.

Back to Top

2011

December 2, 2011 | 04:30 pm

“A computational approach to early language bootstrapping”   Video Available

Emmanuel Dupoux, Ecole Normale Suprieure

[abstract]

Abstract

Human infants learn spontaneously and effortlessly the language(s) spoken in their environments, despite the extraordinary complexity of the task. In the past 30 years, tremendous progress has been made regarding the empirical investigation of the linguistic achievements of infants during their first two years of life. In that short period of their life, infants learn in an essentially unsupervised fashion the basic building blocks of the phonetics, phonology, lexical and syntactic organization of their native language (see Jusczyk, 1987). Yet, little is known about the mechanisms responsible for such acquisitions. Do infants rely on general statistical inference principles? Do they rely on specialized algorithms devoted to language? Here, I will present an overview of the early phases of language acquisition and focus on one area where a modeling approach is currently being conducted, using tools of signal processing and automatic speech recognition: the unsupervized acquisition of phonetic categories. It is known that during the first year of life, before they are able to talk, infants construct a detailed representation of the phonemes of their native language and loose the ability to distinguish nonnative phonemic contrasts (Werker & Tees, 1984). It will be shown that the only mechanism that has been proposed so far, that is, unsupervised statistical clustering (Maye, Werker and Gerken, 2002), may not converge on the inventory of phonemes, but rather on contextual allophonic units that are smaller than the phoneme (Varadarajan, 2008). Alternative algorithms will be presented using three sources of information: the statistical distribution of their contexts, the phonetic plausibility of the grouping, and the existence of lexical minimal pairs (Peperkamp et al., 2006; Martin et al, submitted). It is shown that each of the three sources of information can be acquired without presupposing the others, but that they need to be combined to arrive at good performance. Modeling results and experiments in human infants will be presented. The more general proposal is that early language bootrapping may not rely on learning principles necessarily specific to language. What is presumably unique to language though, is the way in which these principles are combined in a particular ways to optimize the emergence of linguistic categories after only a few months of unsupervized exposure to speech signals. Jusczyk, P. (1997). The discovery of spoken language. Cambridge, MA: MIT Press. Martin, A., Peperkamp, S., & Dupoux, E. (submitted). Learning phonemes with a pseudo-lexicon. Maye, J., Werker, J., & Gerken, L. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition, 82, B101-B111. Peperkamp, S., Le Calvez, R., Nadal, J.P. and Dupoux, E. (2006). The acquisition of allophonic rules: statistical learning with linguistic constraints. Cognition, 101, B31-B41 Varadarajan, B., Khudanpur, S. & Dupoux, E. (2008). Unsupervised Learning of Acoustic Subword Units, in Proceedings of ACL-08: HLT, 165-168. Werker, J.F., & Tees, R.C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 7, 49-63.  

November 22, 2011 | 04:30 pm

“Robust Representation of Attended Speech in Human Brain with Implications for ASR”

Nima Mesgarani, University of California, San Francisco

[abstract] [biography]

Abstract

Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker background. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and indeed, it is not clear how attended speech is internally represented. Here, using multi-electrode recordings from the cortex of epileptic patients engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in the temporal lobe faithfully encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal salient spectral and temporal features of the attended speaker, as if listening to that speaker alone. Therefore, a simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both local single-electrode and population-level cortical responses. These findings demonstrate that the temporal lobe cortical representation of speech does not merely reflect the external acoustic environment, but instead correlates to the perceptual aspects relevant for the listener's intended goal. An engineering approach for ASR that is inspired by a model of this process is shown to improve recognition accuracy in new noisy conditions.

Speaker Biography

Nima Mesgarani is a postdoctoral scholar at the department of neurological surgeries of University of California San Francisco. He received his Ph.D. in electrical engineering from University of Maryland College Park. He was a postdoctoral fellow at Center for Speech and Language processing at Johns Hopkins University prior to joining UCSF. His research interests include studying the representation of speech in brain and its implications for speech processing technologies.

November 15, 2011 | 04:30 pm

“Object Detection Grammars”   Video Available

David McAllester, Toyota Technological Institute at Chicago

[abstract] [biography]

Abstract

As statistical methods came to dominate computer vision, speech recognition and machine translation there was a tendency toward shallow models. The late Fred Jelinek is famously quoted as saying that every time he fired a linguist the performance of his speech recognition system improved. A major challenge of modern statistical methods is to demonstrate that deep models can be made to perform better than shallow models. This talk will describe an object detection system which tied for first place in the 2008 and 2009 PASCAL VOC object detection challenge and won a PASCAL "lifetime achievement" award in 2010. The system exploits a grammar model for representing object appearance. This model seems "deeper" than those used in the previous generation of statistically trained object detectors. This object detection system and the associated grammar formalism will be described in detail and future directions discussed.

Speaker Biography

Professor McAllester received his B.S., M.S., and Ph.D. degrees from the Massachusetts Institute of Technology in 1978, 1979, and 1987 respectively. He served on the faculty of Cornell University for the academic year of 1987-1988 and served on the faculty of MIT from 1988 to 1995. He was a member of technical staff at AT&T Labs-Research from 1995 to 2002. Since 2002 he has been Chief Academic Officer at the Toyota Technological Institute at Chicago. He has been a fellow of the American Association of Artificial Intelligence (AAAI) since 1997. A 1988 paper on computer game algorithms influenced the design of the algorithms used in the Deep Blue system that defeated Gary Kasparov. A 1991 paper on AI planning proved to be one of the most influential papers of the decade in that area. A 1998 paper on machine learning theory introduced PAC-Bayesian theorems which combine Bayesian and nonBayesian methods. A 2001 paper with Andrew Appel introduced the influential step-index model of recursive types. He is currently part of a team that scored in the top two places in the PASCAL object detection challenge (computer vision) in 2007, 2008 and 2009.

November 11, 2011 | 12:00 pm

“Learning Semantic Parsers for More Languages and with Less Supervision”   Video Available

Luke Zettlemoyer, University of Washington

[abstract] [biography]

Abstract

Recent work has demonstrated effective learning algorithms for a variety of semantic parsing problems, where the goal is to automatically recover the underlying meaning of input sentences. Although these algorithms can work well, there is still a large cost in annotating data and gathering other language-specific resources for each new application. This talk focuses on efforts to address these challenges by developing scalable, probabilistic CCG grammar induction algorithms. I will present recent work on methods that incorporate new notions of lexical generalization, thereby enabling effective learning for a variety of different natural languages and formal meaning representations. I will also describe a new approach for learning semantic parsers from conversational data, which does not require any manual annotation of sentence meaning. Finally, I will sketch future directions, including our recurring focus on building scalable learning techniques while attempting to minimize the application-specific engineering effort. Joint work with Yoav Artzi, Tom Kwiatkowski, Sharon Goldwater, and Mark Steedman

Speaker Biography

Luke Zettlemoyer is an Assistant Professor at the University of Washington. His research interests are in the intersections of natural language processing, machine learning and decision making under uncertainty. He spends much of his time developing learning algorithms that attempt to recover and make use of detailed representations of the meaning of natural language text. He was a postdoctoral research fellow at the University of Edinburgh and received his Ph.D. from MIT.  

November 1, 2011 | 04:30 pm

“Detecting Deceptive On-Line Reviews”

Claire Cardie, Cornell University

[abstract] [biography]

Abstract

Consumers increasingly rate, review, and research products online. Consequently, websites containing consumer reviews are becoming targets of opinion spam. While recent work has focused primarily on manually identifiable instances of opinion spam, this talk describes the first study of "deceptive opinion spam" --- fictitious opinions that have been deliberately written to sound authentic. Integrating work from psychology and computational linguistics, we develop and compare three approaches to detecting deceptive opinion spam, and ultimately develop a classifier that is nearly 90% accurate on our gold-standard opinion spam dataset. Feature analysis of our learned models reveals a relationship between deceptive opinions and imaginative writing. Finally, the talk will describe the results of a preliminary study that uses the opinion spam classifier to estimate the prevalence of fake reviews on two popular hotel review sites.

Speaker Biography

Claire Cardie is a Professor in the Computer Science and Information Science departments at Cornell University. She got her B.S. in Computer Science from Yale University and an M.S. and PhD, also in Computer Science, at the University of Massachusetts at Amherst. Her research in the area of Natural Language Processing has focused on the application and development of machine learning methods for information extraction, coreference resolution, digital government applications, the analysis of opinions and subjective text, and, most recently, deception detection. Cardie is a recipient of a National Science Foundation CAREER award, and has served elected terms as an executive committee member of the Association for Computational Linguistics (ACL), an executive council member of the Association for the Advancement of Artificial Intelligence (AAAI), and twice as secretary of the North American chapter of the ACL (NAACL). Cardie is also co-founder and chief scientist of Appinions.com, a start-up focused on extracting and aggregating opinions from on-line text and social media.

October 25, 2011 | 04:30 pm

“Sparse Models of Lexical Variation”   Video Available

Jacob Eisenstein, Carnegie Mellon University

[abstract] [biography]

Abstract

Text analysis involves building predictive models and discovering latent structures in noisy and high-dimensional data. Document classes, latent topics, and author communities are often distinguished by a small number of trigger words or phrases -- needles in a haystack of irrelevant features. In this talk, I describe generative and discriminative techniques for learning sparse models of lexical differences. First, I show how multi-task regression with structured sparsity can identify a small subset of words associated with a range of demographic attributes in social media, yielding new insights about the complex multivariate relationship between demographics and lexical choice. Second, I present SAGE, a novel approach to sparsity in generative models of text, in which we induce sparse deviations from background log probabilities. As a generative model, SAGE can be applied across a range of supervised and unsupervised applications, including classification, topic modeling, and latent variable models.

Speaker Biography

Jacob Eisenstein is a postdoctoral fellow in the Machine Learning Department at Carnegie Mellon University. His research focuses on machine learning for social media analysis, discourse, and non-verbal communication. Jacob completed his Ph.D. at MIT in 2008, winning the George M. Sprowls dissertation award. In January 2012, Jacob will join Georgia Tech as an Assistant Professor in the School of Interactive Computing.

October 14, 2011 | 12:00 pm

“Probabilistic hashing for similarity searching and machine learning on large datasets in high dimensions”

Ping Li, Cornell University

[abstract]

Abstract

Many applications such as information retrieval make use of efficient (approximate) estimates of set similarity. A number of such estimates have been discussed in the literature: minwise hashing, random projections and compressed sensing. This talk presents an improvement: b-bit minwise hashing. An evaluation on large real-life datasets will show large gains in both space and time. In addition, we will characterize the improvement theoretically, and show that the theory matches the practice.More recently, we realized that (b-bit) minwise hashing can not only be used for similarity matching but also for machine learning. Applying logistic regression and SVMs to large datasets faces numerous practical challenges. As datasets become larger and larger, they take too long to load and may not fit in memory. Training and testing time can become an issue. Error analysis and exploratory data analysis are rarely performed on large datasets because it is too painful to run lots of what-if scenarios and explore lots of high-order interactions (pairwise, 3-way, etc.). The proposed method has been applied to two large datasets: a "smaller" dataset (24GB in 16M dimensions) and a "larger" dataset (200GB in 1B dimensions). Using a single desktop computer, the proposed method takes 3 seconds to train an SVM for the smaller dataset and 30 seconds for the larger dataset.

October 4, 2011 | 04:30 pm

“Decoding time set by neuronal oscillations locked to the input rhythm: a neglected cortical dimension in models of speech perception”   Video Available

Oded Ghitza, Hearing Research Center & Center for BioDynamics, Boston University

[abstract] [biography]

Abstract

Speech is an inherently rhythmic phenomenon in which the acoustic signal is transmitted in syllabic "packets" and temporally structured so that most of the energy fluctuations occur in the range between 3 and 10 Hz. The premise of our approach is that this rhythmic property reflects some fundamental property, one internal to the brain. We suggest that current models of speech perception, which are driven by acoustic features alone, are incomplete, and that the role of decoding time during memory access must be incorporated to account for the patterns of observed recognition phenomena. It is postulated that decoding time is governed by a cascade of neuronal oscillators, which guide template-matching operations at a hierarchy of temporal scales. Nested neuronal oscillations in the theta, beta and gamma frequency bands are argued to be crucial for speech intelligibility. Intelligibility is high so long as these neuronal oscillations remain phase-locked to the auditory input rhythm. A model (Tempo) is presented which seems capable of emulating recent psychophysical data on the intelligibility of speech sentences as a function of syllabic rate (Ghitza & Greenberg, 2009). The data show that intelligibility of speech that is time-compressed by a factor of 3 (i.e., a high syllabic rate) is poor (above 50% word error rate), but is substantially restored when silence gaps are inserted in between successive 40- ms long compressed-signal intervals -- a counterintuitive finding, difficult to explain using classical models of speech perception, but emerging naturally from the Tempo architecture. In my talk I will present the architecture of Tempo and discuss the implications of the new dimensions of the model seem necessary to account for the Ghitza & Greenberg data.Reading material: Ghitza, O. and Greenberg, S. (2009). "On the possible role of brain rhythms in speech perception: Intelligibility of time compressed speech with periodic and aperiodic insertions of silence." Phonetica 66:113--126. doi:10.1159/000208934 Ghitza, O. (2011). "Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm." Front. Psychology 2:130. doi: 10.3389/fpsyg.2011.00130

Speaker Biography

ODED GHITZA received the B.Sc., M.Sc. and Ph.D. degrees in Electrical Engineering from Tel-Aviv University, Israel, in 1975, 1977 and 1983, respectively. From 1968 to 1984 he was with the Signal Corps Research Laboratory of the Israeli Defense Forces. During 1984-1985 he was a Bantrell post-doctoral fellow at MIT, Cambridge, Massachusetts, and a consultant with the Speech Systems Technology Group at Lincoln Laboratory, Lexington, Massachusetts. From 1985 to early 2003 he was with the Acoustics and Speech Research Department, Bell Laboratories, Murray Hill, New Jersey, where his research was aimed at developing models of hearing and at creating perception based signal analysis methods for speech recognition, coding and evaluation. From early 2003 to early 2011 he was with Sensimetrics Corp., Malden, Massachusetts, where he continued to model basic knowledge of auditory physiology and of perception for the purpose of advancing speech, audio and hearing-aid technology. From 2005 to 2008 he was with the Sensory Communication Group at MIT. Since mid 2006 he is with the Hearing Research Center and with the Center for Biodynamics at Boston University, where he studies the role of brain rhythms in speech perception.

September 30, 2011 | 12:00 pm

“Predicting wh-dependencies: Parsing, interpretation, and learning perspectives”   Video Available

Akira Omaki, Johns Hopkins University

[abstract] [biography]

Abstract

This talk focuses on syntactic prediction and examines its implication for models of sentence processing and language learning. Predicting upcoming syntactic structures reduces processing demand and allows successful comprehension in the presence of noise, but on the other hand, such predictions are risky in that they could potentially lead readers/listeners to wrong analyses (i.e. garden-paths) and cause processing difficulties that we often fail to overcome. The goal of the talk is three-fold. First, I will present a series of eye-tracking data with adults to establish that predictive syntactic analyses and interpretations are indeed possible in processing wh-dependencies. Second, I will examine the risks of wh-dependency prediction for adults and children. The first risk factor is revision failure: the failure to revise the initial analysis can lead adults and children to misunderstand sentences with wh-dependencies. Here, I will present comprehension data on adults and children's revision failures in French, English and Japanese and demonstrate that the degree of revision difficulties can be attenuated by semantic properties of the verbs. The second risk factor is consequence on learning: If learners predictively analyze wh-dependencies and always disambiguate the dependencies with a bias, would the input distribution for learners be skewed in such a way that the learners fail to observe certain interpretive possibilities that are allowed in the target language? I will discuss the distribution of wh-dependencies in child-directed speech, and examine how the input distribution will be skewed when we incorporate the experimental findings on children's parsers. I will argue that a simple integration of syntactic prediction could potentially create a learnability problem, but that this problem could be overcome once we allow children to integrate verb information to reanalyze the parse.

Speaker Biography

Akira Omaki is an assistant professor of Cognitive Science at the Johns Hopkins University, and his research focuses on the dynamics of sentence processing and first/second language development. He received his PhD in Linguistics at the University of Maryland, and joined the Cognitive Science faculty after a post-doc at the University of Geneva.

September 27, 2011 | 04:30 pm

“Multilingual Guidance for Unsupervised Linguistic Structure Prediction”   Video Available

Dipanjan Das, Carnegie Mellon University

[abstract] [biography]

Abstract

Learning linguistic analyzers from unannotated data remains a major challenge; can multilingual text help? In this talk, I will describe learning methods that use unannotated data in a target language along with annotated data in more resource-rich "helper" languages. I will focus on two lines of work. First, I will describe a graph-based semi-supervised learning approach that uses parallel data to learn part-of-speech tag sequences through type-level lexical transfer from a helper language. Second, I will examine a more ambitious goal of learning part-of-speech sequences and dependency trees from raw text, leveraging parameter-level transfer from helper languages, but without any parallel data. Both approaches result in significant improvements over strong state-of-the-art monolingual unsupervised baselines.

Speaker Biography

Dipanjan Das is a Ph.D. student at the Language Technologies Institute, School of Computer Science at Carnegie Mellon University. He works on statistical natural language processing under the mentorship of Noah Smith. He ?nished his M.S. at the same institute in 2008, conducting research on language generation with Alexander Rudnicky. Das completed his undergraduate degree in 2005 from the Indian Institute of Technology, Kharagpur, where he received the best undergraduate thesis award in Computer Science and Engineering and the Dr. B.C. Roy Memorial Gold Medal for best all-round performance in academics and co-curricular activities. He worked at Google Research, New York as an intern in 2010 and received the best paper award at the ACL 2011 conference. He has published and served as program committee member and reviewer at conferences such as ACL, NIPS, NAACL, COLING, and EMNLP during 2008–2011.

September 20, 2011 | 04:30 pm

“When Topic Models Go Bad: Diagnosing and Improving Models for Exploring Large Corpora”   Video Available

Jordan Boyd-Graber, University of Maryland

[abstract] [biography]

Abstract

Imagine you need to get the gist of what's going on in a large text dataset such as all tweets that mention Obama, all e-mails sent within a company, or all newspaper articles published by the New York Times in the 1990s. Topic models, which automatically discover the themes which permeate a corpus, are a popular tool for discovering what's being discussed. However, topic models aren't perfect; errors hamper adoption of the model, performance in downstream computational tasks, and human understanding of the data. However, humans can easily diagnose and fix these errors. We describe crowdsourcing experiments to detect problematic topics and to determine which models produce comprehensible topics. Next, we present a statistically sound model to incorporate hints and suggestions from humans to iteratively refine topic models to better model large datasets.If time permits, we will also examine how topic models can be used to understand topic control in debates and discussions.

Speaker Biography

Jordan Boyd-Graber in an assistant professor in the College of Information Studies and the Institute for Advanced Computer Studies at the University of Maryland, focusing on the interaction of users and machine learning: how algorithms can better learn from human behaviors and how users can better communicate their needs to machine learning algorithms. Previously, he worked as a postdoc with Philip Resnik at the University of Maryland. Until 2009, he was a graduate student at Princeton University working with David Blei on linguistic extensions of topic models. His current work is supported by NSF, IARPA, and ARL.

September 16, 2011 | 12:00 pm

“Short URLs, Big Data: Machine Learning at Bitly”

Hilary Mason, bit.ly

[abstract] [biography]

Abstract

Bitly is a URL shortening service, gathering hundreds of millions of data points about the links people share every day. I'll discuss the data analysis techniques that we use, giving examples of machine learning problems that we are solving at scale, and talk about the differences between industry, startup, and academic research.

Speaker Biography

Hilary Mason is the Chief Scientist at bit.ly, where she finds sense in vast data sets. Her work involves both pure research and development of product-focused features. She's also a co-founder of HackNY (hackny.org), a non-profit organization that connects talented student hackers from around the world with startups in NYC. Hilary recently started the data science blog Dataists (dataists.com) and is a member of hacker collective NYC Resistor. She has discovered two new species, loves to bake cookies, and asks way too many questions.  

September 6, 2011 | 04:30 pm

“Learning to Describe Images”

Julia Hockenmaier, University of Illinois

[abstract] [biography]

Abstract

How can we create an algorithm that learns to associate images with sentences in natural language that describe the situations depicted in them? This talk will describe ongoing research towards this goal, with a focus on the natural language understanding aspects. Although we believe that this task may benefit from improved object recognition and deeper linguistic analysis, we show that models that rely on simple perceptual cues of color, texture and local feature descriptors on the image side, and on sequence-based features on the text side, can do surprisingly well. We also demonstrate how to leverage the availability of multiple captions for the same image.  

Speaker Biography

Julia Hockenmaier is assistant professor of computer science at the University of Illinois at Urbana-Champaign. She came to Illinois after a postdoc at the University of Pennsylvania and a PhD at the University of Edinburgh. She holds an NSF CAREER award.

September 2, 2011 | 12:00 pm

“Applications of weighted finite state transducers in a speech recognition toolkit”   Video Available

Daniel Povey, Microsoft Research

[abstract] [biography]

Abstract

The open-source speech recognition toolkit "Kaldi" uses weighted finite state transducer (WFSTs) for training and decoding, and uses the OpenFst toolkit as a C++ library. I will give an informal overview of WFSTs and of the standard AT&T recipe for WFST based decoding, and will mention some problems (in my opinion) with the basic recipe and how we addressed them while developing Kaldi. I will also describe how to use WFSTs to acheive "exact" lattice generation, in a sense will be explained. This is an interesting application of WFSTs because, unlike most WFST mechanisms, it does not have any obvious non-WFST analog.

Speaker Biography

Daniel Povey received his Bachelor's (Natural Sciences, 1997), Master's (Computer Speech and Language Processing, 1998) and PhD (Engineering, 2003) from Cambridge University. He is currently a researcher at Microsoft Research, Redmond, Washington, USA. From 2003 to 2008 he worked as a researcher in IBM Research in Yorktown Heights, NY. He is best known for his work on discriminative training for HMM-GMM based speech recognition (i.e. MMI, MPE, and their feature-space variants).

August 12, 2011 | 02:00 pm

“Low-dimensional speech representation based on Factor Analysis and its applications”

Najim Dehak, MIT

[abstract] [biography]

Abstract

We introduce a novel approach to data-driven feature extraction stemming from the field of speaker recognition. In the last five years, statistical methods rooted in factor analysis have greatly enhanced the traditional representation of a speaker using Gaussian Mixture Models (GMMs). In this talk, we build some intuition by outlining the historical development of these methods and then survey the variety of applications made possible by this approach. To begin, we discuss the development of Joint Factor Analysis (JFA), which was motivated by a desire to both model speaker variabilities and compensate for channel/session variabilities at the same time. In doing so, we introduce the notion of a GMM supervector, a high-dimensional vector created by concatenating the mean vectors of each GMM component. JFA assumes that this supervector can be decomposed into a sum of two parts: one containing relevant speaker-specific information and another containing channel-dependent nuisance factors that need to be compensated. We will describe the methods used to estimate these hidden parameters. The success of JFA led to a proposed simplification using just factor analysis for the extraction of speaker-relevant features. The key assumption here is that most of the variabilities between GMM supervectors can be explained by a (much) lower-dimensional space of underlying factors. In this approach, a given utterance of any length is mapped into a single, low-dimensional "total variability" space. We call the resulting vector an i-vector, short for "identity vector" in the speaker recognition sense or "intermediate vector" for its intermediate size between that of a supervector and that of an acoustic feature vector. Unlike in JFA, the total variability approach makes no distinction between speaker and inter-session variabilities in the high-dimensional supervector space; instead, channel compensation occurs in the lower-dimensional i-vector space. The presentation will provide an outline of the process that can be used to build a robust speaker verification system. Though originally proposed for speaker modeling, the i-vector representation can be seen more generally as an elegant framework for data-driven feature extraction. After covering the necessary background theory, we will discuss our recent work in applying this approach to a variety of other audio classification problems, including speaker diarization and language identification.  

Speaker Biography

Najim Dehak received his Engineering degree in Artificial Intelligence in 2003 from Universite des Sciences et de la Technologie d'Oran, Algeria, and his MS degree in Pattern Recognition and Artificial Intelligence Applications in 2004 from the Universite de Pierre et Marie Curie, Paris, France. He obtained his Ph.D. degree from Ecole de Technologie Superieure (ETS), Montreal in 2009. During his Ph.D. studies he was also with Centre de recherche informatique de Montreal (CRIM), Canada. In the summer of 2008, he participated in the Johns Hopkins University, Center for Language and Speech Processing, Summer Workshop. During that time, he proposed a new system for speaker verification that uses factor analysis to extract speaker-specific features, thus paving the way for the development of the i-vector framework. Dr. Dehak is currently a research scientist in the Spoken Language Systems (SLS) Group at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). His research interests are in machine learning approaches applied to speech processing and speaker modeling. The current focus of his research involves extending the concept of an i-vector representation into other audio classification problems, such as speaker diarization, language- and emotion-recognition.  

August 10, 2011 | 10:30 am

“Hierarchical modeling and prior information: an example from toxicology”   Video Available

Andrew Gelman, Columbia University

[abstract]

Abstract

We describe a general approach using Bayesian analysis for the estimation of parameters in physiological pharmacokinetic models. The chief statistical difficulty in estimation with these models is that any physiological model that is even approximately realistic will have a large number of parameters, often comparable to the number of observations in a typical pharmacokinetic experiment (e.g., 28 measurements and 15 parameters for each subject). In addition, the parameters are generally poorly identified, as in the well-known ill-conditioned problem of estimating a mixture of declining exponentials Our modeling includes (a)hierarchical population modeling, which allows partial pooling of information among different experimental subjects; (b) a pharmacokinetic model including compartments for well-perfused tissues, poorly perfused tissues, fat, and the liver; and (c) informative prior distributions for population parameters, which is possible because the parameters represent real physiological variables. We discuss how to estimate the models using Bayesian posterior simulation, a method that automatically includes the uncertainty inherent in estimating such a large number of parameters. We also discuss how to check model fit and sensitivity to the prior distribution using posterior predictive simulation.

August 3, 2011 | 10:30 am

“Predicting sentence specificity, with applications to news summarization”   Video Available

Ani Nenkova, University of Pennsylvania

[abstract] [biography]

Abstract

A well-written text contains a mix of general statements and sentences that provide specific details. Yet no current work in computational linguistics has addressed the task of predicting the level of specificity of a sentence. In this talk I will present the development and evaluation of an automatic classifier capable of identifying general and specific sentences in news articles. We show that it is feasible to use existing annotations of discourse relations as training data and we validate the resulting classifier on sentences directly judged by multiple annotators. We also provide a task-based evaluation of our classifier on general and specific summaries written by people and demonstrate that the classifier predictions are able to distinguish between the two types of human authored summaries. We also analyze the level of specific and general content in news documents and their human and automatic summaries. We discover that while human abstracts contain a more balanced mix of general and specific content, automatic summaries are overwhelmingly specific. We find that too much specificity adversely affects the quality of the summary. The study of sentence specificity extends our prior work on text quality which I will briefly overview. This is joint work with my student Annie Louis.

Speaker Biography

Ani Nenkova is an assistant professor of computer and information science at the University of Pennsylvania. Her main areas of research are automatic summarization, discourse and text quality. She obtained her PhD degree in computer science from Columbia University in 2006. She also spent a year and a half as a postdoctoral fellow at Stanford University before joining Penn in Fall 2007.

July 27, 2011 | 10:30 am

“Large Scale Supervised Embedding for Text and Images”   Video Available

Jason Weston, Google

[abstract] [biography]

Abstract

In this talk I will present two related pieces of research for text retrieval and image annotation that both use supervised embedding algorithms over large datasets. Part 1:The first part of the talk presents a class of models that are discriminatively trained to directly map from the word content in a query-document or document-document pair to a ranking score. Like latent semantic indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However unlike LSI, our models are trained with a supervised signal directly on the task of interest, which we argue is the reason for our superior results. We provide an empirical study on Wikipedia documents, using the links to define document-document or query-document pairs, where we beat several baselines. We also describe extensions to the nonlinear case and for dealing with huge dictionary sizes. (Joint work with Bing Bai, David Grangier and Ronan Collobert.) Part 2:Image annotation datasets are becoming larger and larger, with tens of millions of images and tens of thousands of possible annotations. We propose a well performing method that scales to such datasets by simultaneously learning to optimize precision at k of the ranked list of annotations for a given image and learning a low-dimensional joint embedding space for both images and annotations. Our method both outperforms several baseline methods and, in comparison to them, is faster and consumes less memory. We also demonstrate how our method learns an interpretable model, where annotations with alternate spellings or even languages are close in the embedding space. Hence, even when our model does not predict the exact annotation given by a human labeler, it often predicts similar annotations, a fact that we try to quantify by measuring the ``sibling'' precision metric, where our method also obtains good results. (Joint work with Samy Bengio and Nicolas Usunier.)

Speaker Biography

Jason Weston is a Research Scientist at Google NY since July 2009. He earned his PhD in machine learning at Royal Holloway, University of London and at AT&T Research in Red Bank, NJ (advisor: Vladimir Vapnik) in 2000. From 2000 to 2002, he was a Researcher at Biowulf technologies, New York. From 2002 to 2003 he was a Research Scientist at the Max Planck Institute for Biological Cybernetics, Tuebingen, Germany. From 2003 to June 2009 he was a Research Staff Member at NEC Labs America, Princeton. His interests lie in statistical machine learning and its application to text, audio and images. Jason has published over 80 papers, including best paper awards at ICML and ECML.

July 20, 2011 | 10:30 am

“Distribution Fields for Low Level Vision”   Video Available

Erik Learned-Miller, University of Massachusetts, Amherst

[abstract]

Abstract

Consider the following fundamental problem of low level vision: given a large image I an a patch J from another image, find the "best matching" location of the patch J to image I. We believe the solution to this problem can be significantly improved. A significantly better solution to this problem has the potential to improve a wide variety of low-level vision problems, such as backgrounding, tracking, medical image registration, optical flow, image stitching, and invariant feature definition. We introduce a set of techniques for solving this problem based upon a representation called distribution fields. Distribution fields are an attempt to take the best from a wide variety of low-level vision techniques including geometric blur (Berg), mixture of Gaussians backgrounding (Stauffer), SIFT (Lowe) and HoG (Dalal and Triggs), local color histograms, bilateral filtering, congealing (Learned-Miller) and many other techniques. We show how distribution fields solve this "patch" matching problem, and, in addition to finding the optimum match of patch J to image I with a high success rate, the algorithm produces, as a by-product, a very natural assessment of the quality of that match. We call this algorithm the "sharpening match". Using the sharpening match for tracking yields an extremely simple but state-of-the-art tracker. We also discuss application of these techniques to background subtraction and other low level vision problems.

July 13, 2011 | 10:30 am

“Navigating the Interaction Timestream (when your AAC device is a cement block tied to your ankle)”

Jeff Higginbotham, SUNY - Buffalo

[abstract] [biography]

Abstract

Interacting in time is something we all do, most of the time without thought about how it is accomplished. For many individuals who use technology to mediate their interactions, communication success is problematic. These problems entail operating the device within conversational time constraints, as well as coordinating their bodies with their device and their partner as they attempt to produce meaningful utterances. My talk will introduce several types of problems involving time and timing facing augmented speakers and their partners and explore some ways in which NLP and interface design my be helpful in addressing these difficulties.

Speaker Biography

I received my PhD in "comparative studies in human interaction" from the University of Wisconsin - Madison. I'm interested in how augmentative and alternative communication (AAC) devices are used for conversation and other tasks and how these technologies are socially "constructed" by the community in which they are used. I try to use findings from my research to work with designers to build more socially responsive AAC systems.

May 2, 2011 | 04:30 pm

“Multilingual Subjectivity Analysis”

Rada Mihalcea, University of North Texas

[abstract] [biography]

Abstract

There is growing interest in the automatic extraction of opinions, emotions, and sentiments in text (subjectivity), to provide tools and support for various natural language processing applications. Most of the research to date has focused on English, which is mainly explained by the availability of resources for subjectivity analysis, such as lexicons and manually labeled corpora. In this talk, I will describe methods to automatically generate resources for subjectivity analysis for a new target language by leveraging on the resources and tools available for English, which in many cases took years of work to complete. Specifically, I will try to provide answers to the following questions. First, can we derive a subjectivity lexicon for a new language using an existing English lexicon and a bilingual dictionary? Second, can we derive subjectivity- annotated corpora in a new language using existing subjectivity analysis tools for English and parallel corpora? Finally, third, can we build tools for subjectivity analysis for a new target language by relying on these automatically generated resources?

Speaker Biography

Rada Mihalcea is an Associate Professor in the Department of Computer Science and Engineering. She is currently involved in a number of research projects in computational linguistics, including word sense disambiguation, monolingual and cross-lingual semantic similarity, automatic keyword extraction and text summarization, subjectivity and sentiment analysis, and computational humor. She serves or has served on the editorial boards of the Journals of Computational Linguistics, Language Resources and Evaluations, Natural Language Engineering, and Research in Language in Computation. Her research has been funded by the National Science Foundation, National Endowment for the Humanities, Google, and the State of Texas. She is the recipient of a National Science Foundation CAREER award (2008) and a Presidential Early Career Award for Scientists and Engineers (2009).

April 26, 2011 | 04:30 pm

“Building Watson: An Overview of DeepQA for the Jeopardy! Challenge”

David Ferrucci, IBM

[abstract] [biography]

Abstract

Computer systems that can directly and accurately answer peoples' questions over a broad domain of human knowledge have been envisioned by scientists and writers since the advent of computers themselves. Open domain question answering holds tremendous promise for facilitating informed decision making over vast volumes of natural language content. Applications in business intelligence, healthcare, customer support, enterprise knowledge management, social computing, science and government would all benefit from deep language processing. The DeepQA project is aimed at exploring how advancing and integrating Natural Language Processing (NLP), Information Retrieval (IR), Machine Learning (ML), massively parallel computation and Knowledge Representation and Reasoning (KR&R) can greatly advance open-domain automatic Question Answering. An exciting proof-point in this challenge is to develop a computer system that can successfully compete against top human players at the Jeopardy! quiz show (www.jeopardy.com). Attaining champion-level performance Jeopardy! requires a computer system to rapidly and accurately answer rich open-domain questions, and to predict its own performance on any given category/question. The system must deliver high degrees of precision and confidence over a very broad range of knowledge and natural language content with a 3-second response time. To do this DeepQA evidences and evaluates many competing hypotheses. A key to success is automatically learning and combining accurate confidences across an array of complex algorithms and over different dimensions of evidence. Accurate confidences are needed to know when to "buzz in" against your competitors and how much to bet. High precision and accurate confidence computations are just as critical for providing real value in business settings where helping users focus on the right content sooner and with greater confidence can make all the difference. The need for speed and high precision demands a massively parallel computing platform capable of generating, evaluating and combing 1000's of hypotheses and their associated evidence. In this talk I will introduce the audience to the Jeopardy! Challenge and how we tackled it using DeepQA. www.ibmwatson.com  

Speaker Biography

Dr. David Ferrucci is the lead researcher and Principal Investigator (PI) for the Watson/Jeopardy! project. He has been a Research Staff Member at IBM's T.J. Watson's Research Center since 1995 where he heads up the Semantic Analysis and Integration department. Dr. Ferrucci focuses on technologies for automatically discovering valuable knowledge in natural language content and using it to enable better decision making. As part of his research he led the team that developed UIMA. UIMA is a software framework and open standard widely used by industry and academia for collaboratively integrating, deploying and scaling advanced text and multi-modal (e.g., speech, video) analytics. As chief software architect for UIMA, Dr. Ferrucci led its design and chaired the UIMA standards committee at OASIS. The UIMA software framework is deployed in IBM products and has been contributed to Apache open-source to facilitate broader adoption and development. In 2007, Dr. Ferrucci took on the Jeopardy! Challenge - tasked to create a computer system that can rival human champions at the game of Jeopardy!. As the PI for the exploratory research project dubbed DeepQA, he focused on advancing automatic, open-domain question answering using massively parallel evidence based hypothesis generation and evaluation. By building on UIMA, on key university collaborations and by taking bold research, engineering and management steps, he led his team to integrate and advance many search, NLP and semantic technologies to deliver results that have out-performed all expectations and have demonstrated world-class performance at a task previously thought insurmountable with the current state-of-the-art. Watson, the computer system built by Ferrucci's team is now competing with top Jeopardy! champions. Under his leadership they have already begun to demonstrate how DeepQA can make dramatic advances for intelligent decision support in areas including medicine, finance, publishing, government and law. Dr. Ferrucci has been the Principal Investigator (PI) on several government-funded research programs on automatic question answering, intelligent systems and saleable text analytics. His team at IBM consists of 28 researchers and software engineers specializing in the areas of Natural Language Processing (NLP), Software Architecture, Information Retrieval, Machine Learning and Knowledge Representation and Reasoning (KR&R). Dr. Ferrucci graduated from Manhattan College with a BS in Biology and from Rensselaer Polytechnic Institute in 1994 with a PhD in Computer Science specializing in knowledge representation and reasoning. He is published in the areas of AI, KR&R, NLP and automatic question-answering.  

April 19, 2011 | 4:30PM

“Integrating history-length interpolation and classes in language modeling”

Hinrich Schuetze, University of Stuttgart

[abstract] [biography]

Abstract

Building on earlier work that integrates different factors in language modeling, we view (i) backing off to a shorter history and (ii) class-based generalization as two complementary mechanisms of using a larger equivalence class for prediction when the default equivalence class is too small for reliable estimation. This view entails that the classes in a language model should be learned from rare events only and should be preferably applied to rare events. We construct such a model and show that both training on rare events and preferable application to rare events improve perplexity when compared to a simple direct interpolation of class-based with standard language models.

Speaker Biography

Hinrich Schuetze is a professor of computational linguistics in the School of Computer Science and Electrical Engineering at the Unversity of Stuttgart in Germany. He received his PhD in linguistics from Stanford University in 1995 and worked in the areas of text mining and information retrieval at a number of research institutions and startups in Silicon Valley until 2004. His research focuses on natural language processing problems that are important for applications like information retrieval and machine translation and at the same time contribute to our fundamental understanding of language as a cognitive phenomenon. He is a coauthor of Foundations of Statistical Natural Language Processing (MIT Press, with Chris Manning) and Introduction to Information Retrieval (Cambridge University Press, with Chris Manning and Prabhakar Raghavan).

April 12, 2011 | 4:30PM

“Information visualization and its application to machine translation”

Rebecca Hwa, University of Pittsburgh

[abstract] [biography]

Abstract

In this talk, I will present an interactive interface that helps users to explore and understand imperfect outputs from automatic machine translation (MT) systems. The target users of our system are people who do not understand the original (source) language. Through a visualization of multiple linguistic resources, our system enables users to identify potential translation mistakes and make educated guesses as to how to correct them. Experimental results suggest that users of our prototype are able to correct some difficult translation errors that they would have found baffling otherwise. The experiments further suggest adaptive methods to improve standard phrase-based machine translation systems.

Speaker Biography

Rebecca Hwa is an Associate Professor in the Department of Computer Science at the University of Pittsburgh. Before joining Pitt, she was a postdoc at University of Maryland. She received her PhD in Computer Science from Harvard University in 2001 and her B.S. in Computer Science and Engineering from UCLA in 1993. Dr. Hwa's primary research interests include multilingual processing, machine translation, and semi-supervised learning methods. Additionally, she has collaborated with colleagues on information visualization, sentiment analysis, and bioinformatics. She is a recipient of the NSF CAREER Award. Her work has also been supported by NIH and DARPA. Dr. Hwa currently serves as the chair of the executive board of the North American Chapter of the Association for Computational Linguistics.

April 5, 2011 | 4:30PM

“Statistical Topic Models for Computational Social Science”

Hanna Wallach, University of Massachusetts Amherst

[abstract] [biography]

Abstract

In order to draw data-driven conclusions, social scientists need quantitative tools for analyzing massive, complex collections of textual information. I will discuss the development of such tools. I will concentrate on a class of models known as statistical topic models, which automatically infer groups of semantically-related words (topics) from word co-occurrence patterns in documents, without requiring human intervention. The resultant topics can be used to answer a diverse range of research questions, including detecting and characterizing emergent behaviors, identifying topic-based communities, and tracking trends across languages. The foundation of statistical topic modeling is Bayesian statistics, which requires that assumptions, or prior beliefs, are made explicit. Until recently, most statistical topic models relied on two unchallenged prior beliefs. In this talk, I will explain how challenging these beliefs increases robustness to the skewed word frequency distributions common in text. I will also talk about recent work (with Rachel Shorey and Bruce Desmarais) on statistical topic models for studying temporal and textual patterns in formerly-classified government documents.

Speaker Biography

Hanna Wallach is an assistant professor in the Department of Computer Science at the University of Massachusetts Amherst. She is one of five core faculty members involved in UMass's newly-formed computational social science research initiative. Previously, Hanna was a postdoctoral researcher, also at UMass, where she developed Bayesian latent variable models for analyzing complex data regarding communication and collaboration within scientific and technological communities. Her recent work (with Ryan Adams and Zoubin Ghahramani) on infinite belief networks won the best paper award at AISTATS 2010. Hanna has co-organized multiple workshops on both computational social science and Bayesian latent variable modeling. Her tutorial on conditional random fields is widely referenced and used in machine learning courses around the world. As well as her research, Hanna works to promote and support women's involvement in computing. In 2006, she co-founded the annual workshop for women in machine learning, in order to give female faculty, research scientists, postdoctoral researchers, and graduate students an opportunity to meet, exchange research ideas, and build mentoring and networking relationships. In her not-so-spare time, Hanna is a member of Pioneer Valley Roller Derby, where she is better known as Logistic Aggression.

March 29, 2011 | 4:30PM

“Towards a Theory of Collective Social Computation: Connecting Individual Decision-making rules to Collective Patterns through Adaptive Causal Circuit Construction”

Jessica Flack, Santa Fe Institute

[abstract] [biography]

Abstract

I will discuss empirical and computational approaches my collaborators and I have been developing to build adaptive causal circuits that connect individual decision-making rules to collective patterns. This approach requires techniques that permit extraction of decision-making rules from time-series data. A goal of the research I will be discussing is to give an empirically grounded computational account of the emergence of robust aggregate features and hierarchical organization in social evolution.

Speaker Biography

Jessica Flack is Professor at the Santa Fe Institute and Co-Director of the Collective Social Computation Group. Her research program combines dynamical systems and computational perspectives in order to build a theory of how aggregate structure and hierarchy arise in social evolution. Primary goals are to understand the conditions and mechanisms supporting the emergence of slowly changing collective features that feed-down to influence component behavior, the role that conflict plays in this process, and the implications of multiple timescales and overlapping networks for robustness and adaptability in social evolution. Research foci include design principles for robust systems, conflict dynamics and control, the role of uncertainty reduction in the evolution of signaling systems, the implications of higher-order structures for social complexity and innovation, behavioral grammars and adaptive circuit construction. Flack approaches these issues using data on social process collected from animal society model systems, and through comparison of social dynamics with neural, immune, and developmental dynamics. Flack received her PhD in 2004 from Emory University in evolution, cognition and animal behavior. Flack was a Postdoctoral Fellow at SFI before joining the SFI Faculty in 2007.

March 18, 2011 | 12:00PM

“Enhancing ESL Education in India with a Reading Tutor that Listens”

Kalika Bali, Microsoft

[abstract] [biography]

Abstract

In this talk, I will talk about10 week pilot on CMU Project Listen's PC-based Reading Tutor program for enhancing English education in India. The pilot focused on low-income elementary school students, a population that has little or no exposure to English outside of school. The students showed measurable improvement on quantitative tests of reading fluency while using the tutor. Post-pilot interviews explored the students' experience of the reading tutor. I would be discussing both technical and non-technical factors that might effect the success of such a speech-technology based tutor for this demographics in India.

Speaker Biography

Kalika Bali is a researcher with the Multilingual Systems group at Microsoft Research Labs India (Bangalore). Her primary research interests are in Speech Technology and Computational Linguistics, especially for Indian Languages. A linguist by training, she has taught at the University of the South Pacific as an Assoc. Prof. She has worked in the area of research and development of Language Technology at both start-ups and established companies like Nuance, Simputer, Hewlett-Packard Labs and Microsoft Research. She has also been actively involved in development of Standards related to language technologies. Her current focus is on the use fo language technologies for Education.

March 8, 2011 | 4:30PM

“Query-focused Summarization Using Text-to-Text Generation: When Information Comes from Multilingual Sources”

Kathy McKeown, Columbia University

[abstract] [biography]

Abstract

The past five years have seen the emergence of robust, scalable natural language processing systems that can summarize and answer questions about online material. One key to the success of such systems is that they re-use text that appeared in the documents rather than generating new sentences from scratch. Re-using text is absolutely essential for the development of robust systems; full semantic interpretation of unrestricted text is beyond the state of the art. Better summaries and answers can be produced, however, if systems can generate new sentences from the input text, fusing relevant phrases and discarding irrelevant ones. When the underlying sources for summarization come from multiple languages, the need for text-to-text generation is even more pronounced. In this talk I first present the concept of text-to-text generation, showing the different kinds of editing that an be done. I then show how it has been used in our research on summarization and open-ended question-answering. Because our sources include informal genres as well as formal genres and draw from English, Arabic and Chinese, editing is critical for improving the intelligibility of responses. In our systems, we exploit information available at question answering time to edit sentences, removing redundant and irrelevant information and correcting errors in translated sentences. We also present new work on machine translation which uses information from multiple systems to post-edit the translations, again using text-to-text generation but within a TAG formalism.  

Speaker Biography

Kathleen R. McKeown is the Henry and Gertrude Rothschild Professor of Computer Science at Columbia University. She served as Department Chair from 1998-2003. Her research interests include text summarization, natural language generation, multi-media explanation, digital libraries, concept to speech generation and natural language interfaces. McKeown received the Ph.D. in Computer Science from the University of Pennsylvania in 1982 and has been at Columbia since then. In 1985 she received a National Science Foundation Presidential Young Investigator Award, in 1991 she received a National Science Foundation Faculty Award for Women, in 1994 was selected as a AAAI Fellow, and in 2003 was elected as an ACM Fellow. McKeown is also quite active nationally. She serves as a board member of the Computing Research Association and serves as secretary of the board. She served as President of the Association of Computational Linguistics in 1992, Vice President in 1991, and Secretary Treasurer for 1995-1997. She has served on the Executive Council of the Association for Artificial Intelligence and was co-program chair of their annual conference in 1991.

February 22, 2011 | 4:30PM

“Entrainment to the Other in Conversational Speech”

Julia Hirschberg, Columbia University

[abstract] [biography]

Abstract

When people engage in conversation, they adapt the way they speak to the speaking style of their conversational partner in a variety of ways. For example, they may adopt a certain way of describing something based upon the way their conversational partner describes it, or adapt their pitch range or speaking rate to a conversational partner's. They may even align their turn-taking style or use of cue phrases to match their partner's. These types of entrainment have been shown to correlate with various measures of task success and dialogue naturalness. While there is considerable evidence for lexical entrainment from laboratory experiments, less is known about other types of acoustic-prosodic and discourse-level entrainment and little work has been done to examine entrainments in multiple modalities for the same dialogue. I will discuss research in entrainment in multiple dimensions on the Columbia Games Corpus and the Switchboard Corpus. Our goal is to understand how the different varieties of entrainment correlate with one another and to determine which types of entrainment will be both useful and feasible to model in Spoken Dialogue Systems. (This is joint research with Rivka Levitan and Erica Cooper, Columbia University; Agustin Gravano, University of Buenos Aires; Ani Nenkova, University of Pennsylvania; Stefan Benus, Constantine the Philosopher University; and Jens Edlund and Mattias Heldner, KTH.)

Speaker Biography

Julia Hirschberg is a professor in the Department of Computer Science at Columbia University. She received her PhD in Computer Science from the University of Pennsylvania, after previously doing a PhD in sixteenth-century Mexican social history at the University of Michigan and teaching history at Smith. She worked at Bell Laboratories and AT&T Laboratories -- Research from 1985-2003 as a Member of Technical Staff and a Department Head, creating the Human-Computer Interface Research Department there. She has also served as editor-in-chief of Computational Linguisticsfrom 1993-2003 and was an editor-in-chief of Speech Communication from 2003-2006 and is now on the Editorial Board. Julia was on the Executive Board of the Association for Computational Linguistics (ACL) from 1993-2003, has been on the Permanent Council of International Conference on Spoken Language Processing (ICSLP) since 1996, and served on the board of the International Speech Communication Association (ISCA) from 1999-2007 (as President 2005-2007). She is on the board of the CRA-W and has been active in working for diversity at AT&T and at Columbia. Julia has also been a fellow of the American Association for Artificial Intelligence since 1994 and an ISCA Fellow since 2008. She received a Columbia Engineering School Alumni Association (CESAA) Distinguished Faculty Teaching Award in 2009.

February 15, 2011 | 4:30PM

“A Brief History of the Penn Treebank”

Mitch Marcus, University of Pennsylvania

[abstract] [biography]

Abstract

The Penn Treebank, initially released in 1992, was the first richly annotated text corpus widely available within the natural language processing (NLP) community. Its release led within a few years to the development of the first competent English parsers, and helped spark the statistical revolution within NLP. Over the past 20 years, the Penn Treebank has become the de facto standard for training and test English parsers, and still plays this role nearly 2 decades after its release. This talk will briefly describe the Penn Treebank and its applications, then discuss the history of the Treebank's development from Fred Jelinek's first proposal of a treebank to DARPA in 1987 through our development of the Treebank from 1989 until the release of Treebank II in 1995. I will attempt to explain the Penn Treebank's motivations and the process of creating it, perhaps explaining why it has some of its more peculiar properties. This talk describes joint work with Beatrice Santorini, Mary Ann Marcinkiewicz, Grace Kim, Ann Bies, and many others.  

Speaker Biography

Mitchell Marcus is the RCA Professor of Artificial Intelligence in the Department of Computer and Information Science at the University of Pennsylvania. He was the principal investigator for the Penn Treebank Project through the mid-1990s; he and his collaborators continue to develop hand-annotated corpora for use world-wide as training materials for statistical natural language systems. Other research interests include: statistical natural language processing, human-robot communication, and cognitively plausible models for automatic acquisition of linguistic structure. He has served as chair of Penn's Computer and Information Science Department, as chair of the Penn Faculty Senate, and as president of the Association for Computational Linguistics. He is also a Fellow of the American Association of Artificial Intelligence. He currently serves as chair of the Advisory Committee of the Center of Excellence in Human Language Technology at JHU, as well as serving as a member of the advisory committee for the Department of Computer and Information Science.

February 8, 2011 | 4:30PM

“A Scalable Distributed Syntactic, Semantic and Lexical Language Model”

Shaojun Wang, Wright State University

[abstract] [biography]

Abstract

In this talk, I'll present an attempt at building a large scale distributed composite language model that is formed by seamlessly integrating n-gram, structured language model and probabilistic latent semantic analysis under a directed Markov random field paradigm to simultaneously account for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm and a follow-up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the BLEU score and ``readability'' of translations when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.

Speaker Biography

Shaojun Wang received his B.S. and M.S. in Electrical Engineering at Tsinghua University in 1988 and 1992 respectively, M.S. in Mathematics and Ph.D. in Electrical Engineering at the University of Illinois at Urbana-Champaign in 1998 and 2001 respectively. From 2001 to 2005, he worked at CMU, Waterloo and University of Alberta as a post-doctoral fellow. He joined the Department of Computer Science and Engineering at Wright State University as an assistant professor in 2006. His research interest is statistical machine learning, natural language processing, and cloud computing. He is now mainly focusing on two projects: large scale distributed language modeling and semi-supervised discriminative structured prediction, that are funded by NSF, Google and AFOSR. Both emphasize on scalability and parallel/distributed approaches to process extremely large scale datasets.    

February 2, 2011 | 4:30PM

“Language Processing in the Web Era”

Kuansan Wang, Microsoft

[abstract] [biography]

Abstract

Natural language processing (NLP) has been dominated by statistical based data driven approaches. The massive amount of data available, especially those from the Web, have further fueled the progress in this area. In the past decades, it has been widely reported that simple methods can often outperform most complicated system when trained with large amount of data. In deploying many web scale applications, however, we regularly find that the size of training data is just one of several factors that contribute to the success of the applications. In this talk, we will use real world applications to illustrate the important design considerations in web scale NLP: (1) rudimentary multilingual capabilities to cope with the global nature of the web, (2) versatile modeling of the diverse styles of languages used in the web documents, (3) fast adaptation to keep pace with the changes of the web, (4) few heuristics to ensure system generalizability and robustness, and (5) possibilities for efficient implementations with minimal manual efforts.

Speaker Biography

Dr. Kuansan Wang is a Principal Researcher at Microsoft Research, Redmond WA, where he is currently managing Human Intelligence Technology Group in Internet Service Research Center. He joined Microsoft Research in 1998 with Speech Technology Group, conducting research in spoken language understanding and dialog system. He was responsible for architecting many speech products from Microsoft, ranging from desktop, embedded and server applications to mobile and cloud based services. His research outcomes, disclosed in more than 60 US and European patents and applications, have been adopted in three ISO, three W3C and four ECMA standards. He has also served as an organizing member/reviewer and panelist at WWW, ICASSP, InterSpeech, ACL and various workshops in speech, language and web research areas. Dr. Wang received B.S. from National Taiwan University, M.S. and PhD from University of Maryland, College Park, all in Electrical Engineering. Prior to joining Microsoft, he was a Member of Technical Staff in AT&T/Lucent Bell Labs in Murray Hill, NJ, and NYNEX/Verizon Science and Technology Center in White Plain, NY.  

Back to Top

2010

December 7, 2010 | 4:30PM

“Robust Speech Recognition: Principal Approaches and Exemplary Solutions”

Hans-Guenter Hirsch, Niederrhein University of Applied Sciences

[abstract]

Abstract

Most of the approaches to increase the robustness of a speech recognition system in bad acoustic conditions can be classified in two categories. The methods of the first class are based on an extraction of acoustic features that should be independent of the acoustic condition. Alternatively, the parameters of the HMMs can be adapted to the acoustic condition.Both prinicipal approaches will be presented in the talk as well as some exemplary realizations. The advantages and disadvantages will be shown. An approach will be shown in more detail to adapt the HMM parameters to the condition of a handsfree speech input in a reverberant environment. Some demonstrations will be included to enlighten the applied methods.

November 30, 2010 | 4:30PM

“Learning Hierarchies of Features”   Video Available

Yann LeCun, Courant Institute of Mathematical Sciences and Center for Neural Science, NYU

[abstract] [biography]

Abstract

Intelligent perceptual tasks such as vision and audition require the construction of good internal representations. Theoretical and empirical evidence suggests that the perceptual world is best represented by a multi-stage hierarchy in which features in successive stages are increasingly global, invariant, and abstract. An important challenge for Machine Learning, artifical perception, and AI is to devise "deep learning" methods than can automatically learn good feature hierarchies from labeled and unlabeled data.A class of such methods that combine unsupervised sparse coding, and supervised refinement will be described. A number of applications will be shown through videos and live demos, including a category-level object recognition system that can be trained on line, and a trainable vision system for off-road mobile robot. An implementation of these systems on an FPGA will be shown. It is based on a new programmable and reconfigurable "dataflow" architecture dubbed NeuFlow.

Speaker Biography

Yann LeCun is Silver Professor of Computer Science and Neural Science at the Courant Institute of Mathematical Sciences and at the Center for Neural Science of New York University. He received an Electrical Engineer Diploma from Ecole Supérieure d'Ingénieurs en Electrotechnique et Electronique (ESIEE), Paris in 1983, and a PhD in Computer Science from Université Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories in Holmdel, NJ, in 1988. He became head of the Image Processing Research Department at AT&T Labs-Research in 1996, and joined NYU in 2003, after a brief period as Fellow at the NEC Research Institute in Princeton. His current interests include machine learning, computer vision, pattern recognition, mobile robotics, and computational neuroscience. He has published over 150 technical papers on these topics as well as on neural networks, handwriting recognition, image processing and compression, and VLSI design. His handwriting recognition technology is used by several banks around the world to read checks. His image compression technology, called DjVu, is used by hundreds of web sites and publishers and millions of users to distribute and access scanned documents on the Web, and his image recognition technique, called Convolutional Network, has been deployed by companies such as Google, Microsoft, NEC, France Telecom and several startup companies for document recognition, human-computer interaction, image indexing, and video analytics. He has been on the editorial board of IJCV, IEEE PAMI, IEEE Trans on Neural Networks, was program chair of CVPR'06, and is chair of the annual Learning Workshop. He is on the science advisory board of Institute for Pure and Applied Mathematics, and is the co-founder of MuseAmi, a music technology company.

November 17, 2010 | 4:30PM

“A Summarization Journey: From Extraction to Abstraction”

Vasudeva Varma, IIIT Hyderabad, India

[abstract] [biography]

Abstract

In this talk, I will be sharing my (and my team’s) research journey of building summarization engines to produce various flavors of summaries. Starting from single document summarization, our experience of building multi-document summarization (MDS), query focused MDS, Update or Progressive summarization, more recently guided summarization, comparison summarization and personalized summarization systems can be seen as movement from Extraction based to Abstraction based summary generation. We have used variations of Relevance Based Language Model (RBLM) along with external knowledge sources in building these systems.I will also be sharing our experience of participating in Knowledge Base Population (KBP) track of Text Analysis Conference (TAC) and how our recent approach to this problem is influenced by the summarization technology. Since 2006, our team has consistently performed well and ranked as a top team in various tracks of DUC (Document Understanding Conference) and TAC (Text Analysis Conference) conferences conducted by NIST.

Speaker Biography

Vasudeva Varma is a faculty member at International Institute of Information Technology, Hyderabad Since 2002. His research interests include search (information retrieval), information extraction, information access, knowledge management, cloud computing and software engineering. He is heading Search and Information Extraction Lab and Software Engineering Research Lab at IIIT Hyderabad. He is also the chair of Post Graduate Programs since 2009.He published a book on Software Architecture (Pearson Education) and over seventy technical papers in journals and conferences. In 2004, he obtained young scientist award and grant from Department of Science and Technology, Government of India, for his proposal on personalized search engines. In 2007, he was given Research Faculty Award by AOL Labs.He was visiting professor at UPV, Valencia, Spain (Summer 2007), UBO, Bretagne, France (Summer 2009) and Language Technologies Institute, CMU, Pittsburgh, USA (Summer 2010)He obtained his Ph.D. from the Department of Computer and Information Sciences, University of Hyderabad in 1996. Prior to joining IIIT Hyderabad, he was the president of MediaCognition India Pvt. Ltd and Chief Architect at MediaCognition Inc. (Cupertino, CA). Earlier he was the director of Engineering and research at InfoDream Corporation, Santa Clara, CA. He also worked for Citicorp and Muze Inc. in New York as senior consultant.

November 16, 2010 | 4:30PM

“Speech Technologies: Understanding and Coping with Speech Variability”

Carol Espy-Wilson, University of Maryland

[abstract] [biography]

Abstract

There is a great deal yet to be understood about the systematic ways in which speech varies due to coarticulation, speech style and differences in the articulatory strategies that exists across speakers. Further, we don’t fully understand all the mechanisms that humans use to cope with this variability as well as that introduced by channel effects and everyday noisy environments. This lack of knowledge and models to cope with variability impedes our development of effective and unconstrained speech technologies. This talk will focus on research in the Speech Communication Group that is addressing ways in which to understand and cope with variability. Our approach involves the study of speech acoustics, speech production and speech perception and the integration of these studies with linguistics, signal processing, machine learning and other relevant fields. This talk will describe some of our basic research in vocal tract modeling, the development of speech signal representations that capture linguistic and speaker-specific information in the speech signal, acoustic-to-articulatory mapping and more applied research focused on new paradigms for speech recognition and speech enhancement based on gestural phonology.

Speaker Biography

Carol Espy-Wilson is a Professor in the Electrical and Computer Engineering Department and the Institute for Systems Research at the University of Maryland where she directs the Speech Communication Lab. Dr. Espy-Wilson received a B.S. in Electrical Engineering from Stanford University in 1979, and a M.S., E.E. and Ph.D. in Electrical Engineering from the Massachusetts Institute of Technology in 1981, 1984 and 1987, respectively. Prior to joining the faculty at the University of Maryland, Dr. Espy-Wilson was a faculty member at Boston University. She is the recipient of the NSF Minority Initiation Award (1990-1992), the Clare Booth Luce Professorship (1990-1995) the NIH Independent Scientist Award (1998-2003), the Honda Initiation Award (2004-2005), and a Radcliffe Fellowship (2008-2009). Dr. Espy-Wilson is a Fellow of the Acoustical Society of America (ASA) and she served as Chair of the Speech Technical Committee of the ASA from 2007 to 2010. She is a Senior Member of IEEE and an elected member of IEEE’s Speech and Language Technical Committee. She is a past Associate Editor of ASA’s magazine “Acoustics Today” and a past member of the NIH Language and Communication Study Section (2001-2004). She is currently an Associate Editor for the Journal of the Acoustical Society of America and a member of the National advisory Board for Medical Rehabilitation Research at NIH.

November 9, 2010 | 4:30PM

“Language-Universal Speech Modeling: What, Why, When and How”

Chin-Hui Lee, Georgia Institute of Technology

[abstract] [biography]

Abstract

Acoustic segmental modeling (ASM) is an extension of frame-based vector quantization (VQ) to establish a common set of segment-based fundamental speech units to characterize the acoustic universe. ASM has been applied to speech recognition by building an acoustic lexicon for all words in a vocabulary. ASM has also been utilized in spoken language recognition with latent semantic analysis features and vector space modeling to produce high recognition accuracies. With the recently proposed automatic speech attribute transcription (ASAT) paradigm another set of language-universal speech units using speech attributes emerges. In contrast to conventional HMM-based framework which is top-down in nature, one major goal of the ASAT paradigm is to develop a bottom-up approach to automatic speech recognition via attribute detection and knowledge integration. These two key technologies can also be applies to other applications. In this talk we report on recent studies with language-universal speech characterization in two related tasks, namely: (i) language-universal and cross-language attribute and phone recognition; and (ii) automatic spoken language recognition. We show that language-universal speech attribute models can perform better than language-specific attribute models for attribute detection and phone recognition. We also demonstrate that, by extending ASM-based algorithms to language recognition with two simple sets of manner and place of articulation models, recognition accuracies can outperform the state-of-the-art spoken language recognition systems. We anticipate that the universal speech attribute modeling tools to provide opportunities to explore future research in multilingual acoustic characterization and speech recognition.

Speaker Biography

Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Dr. Lee received the B.S. degree in Electrical Engineering from National Taiwan University, Taipei, in 1973, the M.S. degree in Engineering and Applied Science from Yale University, New Haven, in 1977, and the Ph.D. degree in Electrical Engineering with a minor in Statistics from University of Washington, Seattle, in 1981. Dr. Lee started his professional career at Verbex Corporation, Bedford, MA, and was involved in research on connected word recognition. In 1984, he became affiliated with Digital Sound Corporation, Santa Barbara, where he engaged in research and product development in speech coding, speech synthesis, speech recognition and signal processing for the development of the DSC-2000 Voice Server. Between 1986 and 2001, he was with Bell Laboratories, Murray Hill, New Jersey, where he became a Distinguished Member of Technical Staff and Director of the Dialogue Systems Research Department. His research interests include multimedia communication, multimedia signal and information processing, speech and speaker recognition, speech and language modeling, spoken dialogue processing, adaptive and discriminative learning, biometric authentication, and information retrieval. From August 2001 to August 2002 he was a visiting professor at School of Computing, The National University of Singapore. In September 2002, he joined the Faculty Georgia Institute of Technology. Prof. Lee has participated actively in professional societies. He is a member of the IEEE Signal Processing Society (SPS), Communication Society, and the International Speech Communication Association (ISCA). In 1991-1995, he was an associate editor for the IEEE Transactions on Signal Processing and Transactions on Speech and Audio Processing. During the same period, he served as a member of the ARPA Spoken Language Coordination Committee. In 1995-1998 he was a member of the Speech Processing Technical Committee and later became the chairman from 1997 to 1998. In 1996, he helped promote the SPS Multimedia Signal Processing Technical Committee in which he is a founding member. Dr. Lee is a Fellow of the IEEE, and has published more than 300 papers and 25 patents. He received the SPS Senior Award in 1994 and the SPS Best Paper Award in 1997 and 1999, respectively. In 1997, he was awarded the prestigious Bell Labs President's Gold Award for his contributions to the Lucent Speech Processing Solutions product. Dr. Lee often gives seminal lectures to a wide international audience. In 2000, he was named one of the six Distinguished Lecturers by the IEEE Signal Processing Society. He was also named one of the two ISCA's inaugural Distinguished Lecturers in 2007-2008. Recently he won the SPS's 2006 Technical Achievement Award for "Exceptional Contributions to the Field of Automatic Speech Recognition".

November 2, 2010 | 4:30PM

“Temporal Dynamics and Information Retrieval”

Susan Dumais, Microsoft Research

[abstract] [biography]

Abstract

Many digital resources, like the Web, are dynamic and ever-changing collections of information. However, most of the tools that have been developed for interacting with Web content, such as browsers and search engines, focus on a single static snapshot of the information. In this talk, I will present analyses of how Web content changes over time, how people re-visit Web pages over time, and how re-visitation patterns are influenced by changes in user intent and content. These results have implications for many aspects of information retrieval and management including crawling, ranking and information extraction algorithms, result presentation, and evaluation. I will describe a prototype system that supports people in understanding how the information they interact with changes over time, and a new retrieval model that incorporates features about the temporal evolution of content to improve core ranking. Finally, I conclude with an overview of some general challenges that need to be addressed to fully incorporate temporal dynamics into information retrieval systems.

Speaker Biography

Susan Dumais is a Principal Researcher and manager of the Context, Learning and User Experience for Search (CLUES) Group at Microsoft Research. She has been at Microsoft Research since 1997 and has published widely in the areas of human-computer interaction and information retrieval. Her current research focuses on personal information management, user modeling and personalization, novel interfaces for interactive retrieval, and implicit measures of user interest and activity. She has worked closely with several Microsoft groups (Bing, Windows Desktop Search, Live Search, SharePoint Portal Server, and Office Online Help) on search-related innovations. Prior to joining Microsoft Research, she was at Bellcore and Bell Labs for many years, where she worked on Latent Semantic Indexing (a well—known statistical method for concept-based retrieval), combining search and navigation, individual differences, and organizational impacts of new technology.Susan has published more than 200 articles in the fields of information science, human-computer interaction, and cognitive science, and holds several patents on novel retrieval algorithms and interfaces. She is Past-Chair of ACM's Special Interest Group in Information Retrieval (SIGIR), and served on the NRC Committee on Computing and Communications Research to Enable Better Use of Information Technology in Digital Government, and the NRC Board on Assessment of NIST Programs. She is on the editorial boards of ACM: Transactions on Information Systems, ACM: Transactions on Human Computer Interaction, Human Computer Interaction, Information Processing and Management, Information Retrieval, New Review of Hypermedia and Multimedia, and the Annual Review of Information Science and Technology. She is an associate editor for the first and second editions of the Handbook of Applied Cognition, and serves on program committees for several conferences. She was elected to the CHI Academy in 2005, an ACM Fellow in 2006, and received the SIGIR Gerard Salton Award for Lifetime Achievement in 2009. Susan is an adjunct professor in the Information School at the University of Washington, and has been a visiting faculty member at Stevens Institute of Technology, New York University, and the University of Chicago.

October 26, 2010 | 4:30PM

“Advances in Human Assessment: (i) Tracking Nonverbal Behavior [Carlos Busso]1 (ii) Speaker Variability for Speaker ID [John H.L. Hansen]2”   Video Available

Carlos Busso and John Hansen, University of Texas at Dallas

[abstract] [biography]

Abstract

In this presentation, two perspectives of assessing human interaction are considered: (i) nonverbal behavior, and (ii) variability in speech production for speaker recognition. Part 1: During inter-personal human interaction, speech and gestures are intricately coordinated to express and emphasize ideas, as well as provide suitable feedback to the listener. The tone and intensity of speech, spoken language patterns, facial expressions, head motion and hand movements are all weaved together in a nontrivial manner in order to convey intent and desires for natural human communication. A joint analysis of these modalities is necessary to fully decode human communication. Among other things, this is critically needed in designing next generation information technology that attempts to mimic and emulate how humans process and produce communication signals. This talk will summarize our ongoing research in recognizing paralinguistic information conveyed through multiple communication channels during human interaction, with emphasis on social emotional behaviors. Part 2: In addition to differences in multi-modal exchange between human speakers, within-speaker differences also play a major role in altering the performance of automatic speech and speaker recognition systems. In this portion of the talk, we will consider speech production variability including (i) vocal effort (e.g., whisper, soft, neutral, loud, shout), (ii) Lombard Effect (speech produced in noise), and (iii) speech style (read, spontaneous, singing) and his these impact speaker recognition systems, along with potential methods to improve system performance. These studies are intended to develop a more fundamental understanding of how humans interact, and how communication models might contribute to more effective biometrics in identifying and tracking humans.

Speaker Biography

Carlos Busso is an Assistant Professor of Electrical Engineering at The University of Texas at Dallas (UTD). He received his B.S. (2000) and M.S (2003) degrees from University of Chile, Santiago, Chile, and his Ph.D-EE (2008) from University of Southern California (USC), Los Angeles, USA. Before joining UTD, he was a Postdoctoral Research Associate at Signal Analysis and Interpretation Laboratory (SAIL), USC. At USC, he received a Provost Doctoral Fellowship from 2003 to 2005 and a Fellowship in Digital Scholarship from 2007 to 2008. His research interests are in digital signal processing, speech and video processing, and multimodal interfaces He has worked on audio-visual emotion recognition, analysis of emotional modulation in gestures and speech, designing realistic human-like virtual characters, speech source detection using microphone arrays, speaker localization and identification in an intelligent environment, and sensing human interaction in multi-person meetings. John H.L. Hansen, is Dept. Head and Professor of Electrical Engineering at Univ. of Texas – Dallas. He holds the UTD Endowed Chair in Telecommunications Engineering, and a joint appointment in UTD School of Behavioral & Brain Sciences (Speech & Hearing). He has published extensively in the fields of Speech Processing and Language Technology, and has supervised 51 PhD/MS thesis students. In 2005, he received Univ. of Colorado – Boulder Teacher of the Year Award for commitment to education in communication sciences and electrical engineering. He is an IEEE Fellow, an ISCA Fellow, and serves as Chair-Elect of IEEE Signal Processing Society Speech-Language Technical Committee. He also served as Co-Organizer and Technical Chair for IEEE ICASSP-2010, and Organizer for Interspeech-2002.

October 19, 2010 | 4:30PM

“Domain Adaptation in NLP”   Video Available

Hal Daume, University of Maryland

[abstract] [biography]

Abstract

The Wall Street Journal doesn't look like medical texts, which in turn don't look like tweets. We shouldn't expect statistical models trained on news to do well on other domains, and indeed they don't. The problem of moving a statistical model from one training domain to a different (set of) test domain(s) is the task of domain adaptation. I will discuss two algorithms for domain adaptation: one that works in a batch fashion, and one that works online. The online algorithm naturally adapts to an active setting wherein you can periodically query a human for the labels of data points in the new domains. In both cases I will present some theoretical results that quantify the amount of data necessary to learn (with high probability). This is joint work with Avishek Saha, Abhishek Kumar and Piyush Rai.

Speaker Biography

Hal Daume III is an assistant professor of Computer Science at the University of Maryland, College Park. He previously held a position in School of Computing at the University of Utah. His primary research interests are in understanding how to get human knowledge into a machine learning system in the most efficient way possible. In practice, he works primarily in the areas of Bayesian learning (particularly non-parametric methods), structured prediction and domain adaptation (with a focus on problems in language and biology). He associates himself most with conferences like ACL, ICML, NIPS and EMNLP. He earned his PhD at the University of Southern Californian with a thesis on structured prediction for language (his advisor was Daniel Marcu). He spent the summer of 2003 working with Eric Brill in the machine learning and applied statistics group at Microsoft Research. Prior to that, he studied math (mostly logic) at Carnegie Mellon University. He still likes math and doesn't like to use C (instead he uses O'Caml or Haskell). He doesn't like shoes, but does like activities that are hard on your feet: skiing, badminton, Aikido and rock climbing.

October 12, 2010 | 4:30PM

“Talking and Timing”

Zenzi M. Griffin, University of Texas at Austin

[abstract] [biography]

Abstract

People do not retrieve all of the words they will use in an utterance before beginning to speak. At the same time, they do not appear to blurt out each word as it is prepared. A wealth of experimental and observational data testify to this. But then how do speakers coordinate the preparation and articulation of words over time? In this talk, I will discuss experiments and hypotheses on this issue, which identifies different ways of conceptualizing language production.

Speaker Biography

Dr. Griffin studied psychology at Stockholm University for one year before transferring to Michigan State University, where she completed a BA in Psychology. In 1998, she earned a Ph.D. in Cognitive Psychology (with a minor in Linguistics) from the Department of Psychology at the University of Illinois at Urbana-Champaign. There she worked with Dr. Kathryn Bock and Dr. Gary Dell, becoming one of the first researchers to monitor eye movements to study language production. Dr. Griffin then spent three years as an assistant professor in the Department of Psychology at Stanford University before moving to the School of Psychology at Georgia Tech in the summer of 2001. In 2008, she joined the Department of Psychology at the University of Texas at Austin as a full professor. In addition to a wide range of collaborative projects, Dr. Griffin is currently studying the retrieval and use of personal names.

October 5, 2010 | 4:30PM

“Semi-supervised and unsupervised graph-based learning for natural language processing”

Katrin Kirchhoff, University of Washington

[abstract] [biography]

Abstract

The lack of labeled data is one of the key problems in many current natural language processing tasks. Semi-supervised and unsupervised learning techniques have been explored as alternatives to fully-supervised learning; however, most techniques concentrate on classification problems as opposed to learning preferences or structured outputs that characterize a large class of natural language problems. This talk presents recent work on adapting semi-supervised and unsupervised learning schemes to ranking rather than classification tasks. Techniques that will be discussed include discovering better feature representations from unlabeled data, graph-based semi-supervised ranking, and constrained unsupervised ranking. Experimental results will be presented on applications in information retrieval, machine translation of spoken dialogues, and machine translation of multi-party meeting conversations.

Speaker Biography

Katrin Kirchhoff obtained an M.A. in English Linguistics and her PhD in Computer Science from the University of Bielefeld, Germany. Since 1999 she has been with the University of Washington, where she is currently a Research Associate Professor in the Electrical Engineering Department. Her research interests include speech recognition and natural language processing (in particular multilingual applications), machine translation, machine learning, and human-computer interfaces. She has authored over 60 conference and journal publications and is co-editor of a book on ``Multilingual Speech Processing''. Katrin is a member of the Editorial Boards of the Speech Communication and Computer, Speech and Language journals and an Associate Editor for ACM Transactions in Speech and Language Processing. She is currently serving on the IEEE Speech Technical Committee.    

October 1, 2010 | 4:30PM

“Open vocabulary language modeling for binary switch typing interfaces”

Brian Roark, Oregon Health & Science University

[abstract]

Abstract

Locked-in syndrome can result from traumatic brain injury, such as a brain-stem stroke, or from neurodegenerative diseases such as amyotrophic lateral sclerosis (ALS or Lou Gehrig's disease). The condition is characterized by near total paralysis, though the individuals are cognitively intact. While vision is retained, the motor control impairments extend to eye movements. Often the only reliable movement that can be made by an individual is a particular muscle twitch or single eye blink, if that. Typing interfaces for these populations are typically based on a binary response, via blinks, muscle twitches or (more recently) ERP signals captured through EEG signal processing. In this talk, I'll discuss typing interfaces for impaired populations, with a particular focus on the role of language modeling within typing applications. I will contrast language modeling for binary switch response typing interfaces with the more standard use of language models for full sequence disambiguation in applications like speech recognition and machine translation, which has large implications for learning of such models. I will then highlight two key issues for construction of these language models: using Huffman coding versus simpler binary coding tree topologies; and handling of selection error within the model itself. I will present some language modeling results for two very large corpora: newswire text from the Gigaword corpus; and emails from the Enron corpus. I will also present experiments with a binary-switch, static-grid typing interface making use of varying language model contributions, using some novel scanning methods. (Joint work with Jacques de Villiers, Christopher Gibbons and Melanie Fried-Oken.)

September 28, 2010 | 4:30PM

“From Syntax to Natural Logic”

Lauri Karttunen, PARC

[abstract] [biography]

Abstract

Computational Linguistics is again becoming an exciting field for linguists because it is moving from information retrieval to understanding and reasoning. This requires the integration of syntax and semantics. Natural Logic is a cover term for a family of formal approaches to semantics and textual inferencing as currently practiced by computational linguists. They have in common a proof theoretical rather than a model-theoretic focus and an overriding concern with feasibility. In this context, the old insights on lexical classes such as implicatives and factives are being resurrected, implemented and extended. Like the term "natural logic" itself, the original work dates back to the 1970s. I will discuss specific problems of textual inference that involve veridicity, issue of whether the proposition expressed by a complement clause is entailed by the sentence that contains it. I will demonstrate how this issue is handled in the Bridge system built at the Palo Alto Research Center, one of the early promoters of the RTE (Recognizing Textual Entailment) challenge.  

Speaker Biography

Lauri Karttunen is known for his seminal contributions to the semantics of discourse referents, implicatives, presuppositions, and questions in the seventies and eighties. In the area of computational linguistics, Karttunen was one of the first pioneers to realize and exploit the potential of finite-state transducers for linguistic applications such as morphological analysis. His early work on the logic of complementation has turned out to be fundamental for computing local textual inferences of complex sentences, the topic of his current interest. In 2007 Karttunen received the Lifetime Achievement Award from the ACL (Association for Computational Linguistics).

September 21, 2010 | 4:30PM

“Searching for Information in Very Large Collections of Spoken Audio”   Video Available

Richard Rose, McGill University

[abstract] [biography]

Abstract

There are a vast number of applications that require the ability to extract information from spoken audio. These include, for example, searching for segments of lectures or videos in large media repositories that may be relevant to a given query. Other examples include topic classification, segmentation, and clustering in audio recordings of meetings, conversations, and broadcast news. While there has been a great deal of work in these areas, there are a number of constraints posed by these applications that limit the range of approaches that might be considered practical. Users demand sub-second response latencies to queries when searching collections which may contain thousands of hours of speech. System designers demand that the search engines be configured using little or no resources taken from the target domain. This presentation will begin with an introduction to the important problems involving search and classification of spoken audio material. An approach involving open vocabulary indexing and search of lattices generated offline by a large vocabulary continuous speech recognition engine will then be presented. The important aspect of the approach is its scalability to extremely large collections. Performance will be presented for a task involving recorded lectures taken from an online media server.

Speaker Biography

Richard Rose received B.S. and M.S. degrees in Electrical Engineering from the University of Illinois, and Ph.D. E.E. degree from the Georgia Institute of Technology. He has served on the technical staff at MIT Lincoln Laboratory working on speech recognition and speaker recognition. He was with AT&T for ten years, first with AT&T Bell Laboratories and then in the Speech and Image Processing Services Laboratory at AT&T Labs - Research. Currently, he is an Associate Professor of Electrical and Computer Engineering at McGill University in Montreal, Quebec. Professor Rose has served in various roles in the IEEE Signal Processing Society. He was a member of the IEEE Signal Processing Society Technical Committee on Digital Signal Processing. He was elected as an at large member of the Board of Governors for the Signal Processing Society. He has served as an associate editor for the IEEE Transactions on Speech and Audio Processing and again for the IEEE Transactions on Audio, Speech, and Language Processing. He is currently a member of the editorial board for the Speech Communication Journal. He was a member of the IEEE SPS Speech Technical Committee (STC) and was founding editor of the STC Newsletter. He also served as co-chair of the IEEE 2005 Workshop on Automatic Speech Recognition and Understanding.

September 14, 2010 | 4:30PM

“Linear-time Dynamic Programming for Incremental Parsing”

Liang Huang, University of Southern California / Information Sciences Institute

[abstract] [biography]

Abstract

Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible and polynomial-time for many shift-reduce parsers, by merging "equivalent" stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce dependency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.

Speaker Biography

Liang Huang is a Research Assistant Professor at University of Southern California (USC), and a Research Scientist at USC Information Sciences Institute (ISI). He received his PhD from the University of Pennsylvania in 2008, and worked as a Research Scientist at Google before moving back to ISI where he did two internships. He is mainly interested in the theoretical aspects of computational linguistics, in particular, efficient algorithms in parsing and machine translation, generic dynamic programming, and formal properties of synchronous grammars. His work received a Best Paper Award at ACL 2008, and Best Paper Nominations at ACL 2007, EMNLP 2008, and ACL 2010.

September 7, 2010 | 4:30PM

“Lifted Message Passing”

Kristian Kersting, University of Bonn

[abstract] [biography]

Abstract

Many AI inference problems arising in a wide variety of fields such as network communication, activity recognition, computer vision, machine learning, and robotics can be solved using message-passing algorithms that operate on factor graphs. Often, however, we are facing inference problems with symmetries not reflected in the factor graph structure and, hence, not exploitable by efficient message-passing algorithms. In this talk, I will survey lifted message-passing algorithms that exploit additional symmetries. Starting from a given factor graph, they essentially first construct a lifted factor graph of supernodes and superfactors, corresponding to sets of nodes and factors that send and receive the same messages, i.e., that are indistinguishable given the evidence. Then they run a modified message-passing algorithm on the lifted factor. In particular, I will present lifted variants of loopy and Gaussian belief propagation as well as warning and survey propagation, and demonstrate that significant efficiency gains are obtainable, often by orders of magnitude. This talk is based on collaborations with Babak Ahmadi, Youssef El Massaoudi, Fabian Hadiji, Sriraam Natarajan, and Scott Sanner.  

Speaker Biography

Kristian Kersting is the head of the "statisitcal relational activity mining" (STREAM) group at Fraunhofer IAIS, Bonn, Germany, a research fellow of the University of Bonn, Germany, and a research affiliate of the Massachusetts Institute of Technology (MIT), USA. He received his Ph.D. from the University of Freiburg, Germany, in 2006. After a PostDoc at MIT, he joined Fraunhofer IAIS in 2008 to build up the STREAM research group using an ATTRACT Fellowship. His main research interests are statistical relational reasoning and learning (SRL), acting under uncertainty, and robotics. He has published over 60 peer-reviewed papers, has received the ECML Best Student Paper Award in 2006 and the ECCAI Dissertation Award 2006 for the best European dissertation in the field of AI, and is an ERCIM Cor Baayen Award 2009 finalist for the "Most Promising Young Researcher In Europe in Computer Science and Applied Mathematics". He gave several tutorials at top conferences (AAAI, ECML-PKDD, ICAPS, IDA, ICML, ILP) and co-chaired MLG-07, SRL-09 and the recent AAAI-10 workshop on Statistical Relational AI (StarAI-10). He (will) serve(d) as area chair for ECML-06, ECML-07, ICML-10, as Senior PC member at IJCAO-10, and on the PC of several top conference (IJCAI, AAAI, ICML, KDD, RSS, ECAI, ECML/PKDD, ICML, ILP, ...). He was a guest co-editor for special issues of the Annals of Mathematics and AI (AMAI), the Journal of Machine Learning Research (JMLR), and the Machine Learning Journal (MLJ). Currently, he serves on the editorial board of the Machine Learning Journal (MLJ) and the Journal of Artififical Intelligence Research (JAIR).

August 31, 2010 | 4:30PM

“Heads in the Cloud: How Strangers, Virtual Farmers, and Your Friends From High School are Bringing Artificial Intelligence into the Real World”

Jeffrey Bigham, University of Rochester

[abstract] [biography]

Abstract

The past few decades have seen the development of wonderful new computing technology that serves as sensors onto an inaccessible world for disabled people - as examples, optical character recognition (OCR) makes printed text available to blind people, speech recognition makes spoken language available to deaf people, and way-finding systems help keep people with cognitive impairments on track. Despite advances, this intelligent technology remains both too prone to errors and too limited in the scope of problems it can reliably solve to address the problems faced by disabled people in their everyday lives. In this talk, I'll (i) discuss the potential for always-available human computation to fill in remaining gaps in order to make intelligent user interfaces useful and practical in the everyday lives of disabled people, (ii) show several examples of how my research is combining artificial and human intelligence, and (ii) discuss the potential of this approach to more broadly shape the intelligent user interfaces for all in the future.

Speaker Biography

Jeffrey P. Bigham is an Assistant Professor in the Department of Computer Science at the University of Rochester where he heads the ROC HCI Group. His works spans HCI, Access Technology, and Human Computation. Jeffrey received his Ph.D. in 2009 in Computer Science and Engineering from the University of Washington, and has won the Microsoft Imagine Cup Accessible Technology Award, the Andrew W. Mellon Foundation Award for Technology Collaboration, and the Technology Review Top 35 Innovators Under 35 Award. He's also been known to play a mean game of kickball.

July 22, 2010 | 2:30PM

“New Machine Learning Approaches to Speech Recognition”

Alex Acero, Microsoft

[abstract] [biography]

Abstract

In this talk I will describe some new approaches to speech recognition that leverage large amounts of data using techniques from information retrieval and machine learning.Speech recognition has been proposed for many years as a natural way to interact with computer systems, but this has taken longer than originally expected. At first dictation was thought to be the killer speech app, and researchers were waiting for Moore’s law to get the required computational power in low-cost PCs. Then it turned out that, due to social and cognitive reasons, many users do not want to dictate to their computers even if it’s inexpensive and technically feasible. Speech interfaces that users like require not only a sufficiently accurate speech recognition engine, but also many other less well known factors. These include a scenario where speech is the preferred modality, proper end-to-end design, error correction, robust dialog, ease of authoring for non-speech experts, data-driven grammars, a positive feedback loop with users, and robustness to noise. In this talk I will describe some of the work we have done in MSR on building natural user interfaces using speech technology, and will illustrate it with a few scenarios in gaming (Xbox Kinnect), speech in automotive environments (Hyundai/Kia UVO, For SYNC), etc.

Speaker Biography

Alex Acero is Research Area Manager in Microsoft Research, directing an organization with 60 engineers working on audio, multimedia, communication, speech, and natural language. He is also an affiliate Professor of Electrical Engineering at the University of Washington. He received a M.S. degree from the Polytechnic University of Madrid, Madrid, Spain, in 1985, a M.S. degree from Rice University, Houston, TX, in 1987, and a Ph.D. degree from Carnegie Mellon University, Pittsburgh, PA, in 1990, all in Electrical Engineering. Dr. Acero worked in Apple Computer’s Advanced Technology Group during 1990-1991. In 1992, he joined Telefonica I+D, Madrid, Spain, as Manager of the speech technology group. Since 1994 he has been with Microsoft Research.Dr. Acero is a Fellow of IEEE. He has served the IEEE Signal Processing Society as Vice President Technical Directions (2007-2009), Director Industrial Relations (2009-2011), 2006 Distinguished Lecturer, member of the Board of Governors (2004-2005), Associate Editor for IEEE Signal Processing Letters (2003-2005) and IEEE Transactions of Audio, Speech and Language Processing (2005-2007), and member of the editorial board of IEEE Journal of Selected Topics in Signal Processing (2006-2008) and IEEE Signal Processing Magazine (2008-2010). He also served as member (1996–2000) and Chair (2000-2002) of the Speech Technical Committee of the IEEE Signal Processing Society. He was Publications Chair of ICASSP98, Sponsorship Chair of the 1999 IEEE Workshop on Automatic Speech Recognition and Understanding, and General Co-Chair of the 2001 IEEE Workshop on Automatic Speech Recognition and Understanding. Dr. Acero served as member of the editorial board of Computer Speech and Language and member of Carnegie Mellon University Dean’s Leadership Council for College of Engineering. Dr. Acero is author of the books "Acoustical and Environmental Robustness in Automatic Speech Recognition" (Kluwer, 1993) and "Spoken Language Processing" (Prentice Hall, 2001), has written invited chapters in 4 edited books and 200 technical papers. He holds 78 US patents. Since 2004, Dr. Acero, along with co-authors Drs. Huang and Hon, has been using proceeds from their textbook “Spoken Language Processing” to fund the “IEEE Spoken Language Processing Student Travel Grant” for the best ICASSP student papers in the speech area.

July 21, 2010 | 10:30AM

“Two Faces of Active Learning”

Sanjoy Dasgupta, UC San Diego

[abstract] [biography]

Abstract

The active learning model is motivated by scenarios in which it is easy to amass vast quantities of unlabeled data (images and videos off the web, speech signals from microphone recordings, and so on) but costly to obtain their labels. Like supervised learning, the goal is ultimately to learn a classifier. But like unsupervised learning, the data come unlabeled. More precisely, the labels are hidden, and each of them can be revealed only at a cost. The idea is to query the labels of just a few points that are especially informative about the decision boundary, and thereby to obtain an accurate classifier at significantly lower cost than regular supervised learning.There are two distinct narratives for explaining when active learning is helpful. The first has to do with efficient search through the hypothesis space: perhaps one can always explicitly select query points whose labels will significantly shrink the set of plausible classifiers (those roughly consistent with the labels seen so far)? The second argument for active learning has to do with exploiting cluster structure in data. Suppose, for instance, that the unlabeled points form five nice clusters; with luck, these clusters will be ``pure'' and only five labels will be necessary!Both these scenarios are overly optimistic. But I will show that they each motivate realistic models that can effectively be exploited by active learning algorithms. These algorithms have provable label complexity bounds that are in some cases exponentially lower than for supervised learning. I will also present experiments with these algorithms, to illustrate their behavior and get a sense of the gulf that still exists between the theory and practice of active learning.This is joint work with Alina Beygelzimer, Daniel Hsu, John Langford, and Claire Monteleoni.

Speaker Biography

Sanjoy Dasgupta is an Associate Professor in the Department of Computer Science and Engineering at UC San Diego. He received his PhD from Berkeley in 2000, and spent two years at AT&T Research Labs before joining UCSD.His area of research is algorithmic statistics, with a focus on unsupervised and minimally supervised learning. He recently completed a textbook, "Algorithms" (with Christos Papadimitriou and Umesh Vazirani).

July 14, 2010 | 10:30AM

“Hybrid Deformable Modeling Methods for Reconstruction, Segmentation, Tracking and Classification”

Dimitris Metaxas, Rutgers

[abstract] [biography]

Abstract

Recent advances in deformable models have lead to new classes of methods that borrow the best features form level sets as well as traditional parametric deformable models. In the first part of the course I will present a new class of such models termed Metamorphs whose formulation integrates shape, intensity and texture by borrowing ideas from level sets and traditional parametric deformable models. Further extensions to these models include the inclusion of shape and texture priors. These new models can be used in medical segmentation and registration where organ boundaries are fuzzy and with no assumptions on the noise distribution. Applications include cancer and cardiac detection and reconstruction. In the second part of the talk I will present a real time system for facial and body movement analysis that can be used in ASL recognition, deception and other homeland security applications. Finally, I will highlight some new developments in sparsity theory and their applications to segmentation, sparse learning and medical imaging.

Speaker Biography

Dr. Dimitris Metaxas is a Professor II (Distinguished) in the Computer Science Department at Rutgers. He got his PhD in 1992 from the University of Toronto and was on the faculty at UPENN from 1992 to 2002. He is currently directing the Center for Computational Biomedicine, Imaging and Modeling (CBIM). Dr. Metaxas has been conducting research towards the development of formal methods upon which both computer vision, computer graphics and medical imaging can advance synergistically, as well as on massive data analytics problems. In computer vision, he works on the simultaneous segmentation and fitting of complex objects, shape representation, deterministic and statistical object tracking, learning, ASL and human activity recognition. In medical image analysis, he works on segmentation, registration and classification methods for cardiac and cancer applications. In computer graphics he is working on physics-based special effects methods for animation. He has pioneered the use of Navier-Stokes methods for fluid animations that were used in the Movie “Antz” in 1998 by his student Nick Foster. Dr. Metaxas has published over 350 research articles in these areas and has graduated 29 PhD students. His research has been funded by NSF, NIH, ONR, DARPA, AFOSR and the ARO. He is on the Editorial Board of Medical Imaging, an Associate Editor of GMOD, and CAD. Dr. Metaxas received several best paper awards for his work on in the above areas. He is an ONR YIP and a Fellow of the American Institute of Medical and Biological Engineers. He has been a Program Chair of ICCV 2007, a General Chair of MICCAI 2008 and will be a General Chair of ICCV 2011.

July 7, 2010 | 10:30AM

“The science of sleep mentation and REM sleep: Is dreaming functionally significant?”

Antonio Zadra, Universit de Montral

[abstract] [biography]

Abstract

Although many contemporary dream researchers suggest that dreaming is functionally significant, some argue that dreams are epiphenomenal to neurophysiological activity during REM sleep. This presentation will begin by examining methodological issues in dream research with a focus on challenges in the collection and quantification of dream reports. Next, findings from brain imaging studies of REM sleep are briefly reviewed but it is argued that REM sleep is neither a necessary nor sufficient physiological condition for dreaming to occur. Dreaming is a cognitive achievement that develops throughout childhood and a considerable amount of psychological information can be extracted from dream reports. In adults, dreams show systematic relationships to various dimensions of the dreamer’s waking life. These research findings are consistent with the continuity hypothesis, which posits a relationship between everyday dream content and general waking states and concerns. In the end, dreaming may or may not have a function, but data convincingly show that dream content is a unique and meaningful psychological product of the human brain.

Speaker Biography

Antonio Zadra, Ph.D. is a Professor of Psychology at the Université de Montréal, Director of the university’s Dream Laboratory and a Senior Research Scholar of the Quebec Health Research Fund. His research interests include recurrent dreams, nightmares, somnambulism and the assessment and treatment of parasomnias and dream-related disorders.

June 30, 2010 | 10:30AM

“Probabilistic Knowledge and Uncertain Input in Rational Human Sentence Comprehension”

Roger Levy, UC San Diego

[abstract]

Abstract

Considering the adversity of the conditions under which linguistic communication takes place in everyday life -- ambiguity of the signal, environmental competition for our attention, speaker error, our limited memory, and so forth -- it is perhaps remarkable that we are as successful at it as we are. Perhaps the leading explanation of this success is that (a) the linguistic signal is redundant, (b) diverse information sources are generally available that can help us obtain infer something close to the intended message when comprehending an utterance, and (c) we use these diverse information sources very quickly and to the fullest extent possible. This explanation suggests a theory of language comprehension as a rational, evidential process. In this talk, I describe recent research on how we can use the tools of computational linguistics to formalize and implement such a theory, and to apply it to a variety of problems in human sentence comprehension, including classic cases of garden-path disambiguation as well as processing difficulty in the absence of structural ambiguity. In addition, I address a number of phenomena that remain clear puzzles for the rational approach, due to an apparent failure to use information available in a sentence appropriately in global or incremental inferences about the correct interpretation of a sentence. I argue that the apparent puzzle posed by these phenomena for models of rational sentence comprehension may derive from the failure of existing models to appropriately account for the environmental and cognitive constraints -- in this case, the inherent uncertainty of perceptual input, and humans' ability to compensate for it -- under which comprehension takes place. I present a new probabilistic model of language comprehension under uncertain input and show that this model leads to solutions to the above puzzles. I also present behavioral data in support of novel predictions made by the model. More generally, I suggest that appropriately accounting for environmental and cognitive constraints in probabilistic models can lead to a more nuanced and ultimately more satisfactory picture of key aspects of human cognition.

June 23, 2010 | 10:30AM

“Human Activity Recognition using Simple Direct Sensors”

Henry Kautz, University of Rochester

[abstract]

Abstract

Simple, inexpensive sensors, including RFID-based object touch sensors, GPS location sensors, and cell-phone quality accelerometers, can be used to detect and distinguish a wide range of human activities with surprisingly high accuracy. I will provide an overview of hardware and algorithms for direct sensing, and speculate about how such sensor data could be used for embodied language and task learning.

May 4, 2010 | 4:30PM

“Boosting systems for LVCSR”

George Saon, IBM

[abstract] [biography]

Abstract

Current ASR systems can reach high levels of performance for particular domains as attested by various government-sponsored speech recognition evaluations. This comes at the expense of an ever increasing complexity in the recognition architecture. Typically, LVCSR systems employ multiple decoding and rescoring passes with several speaker adaptation passes in-between. Unfortunately, a lot of human intervention is required in choosing which systems are good for combination, knowledge which is often task-dependent and cannot be easily transferred to other domains. Ideally, one would want an automatic procedure for training accurate systems/models which make complementary recognition errors. Boosting is a popular machine learning technique for incrementally building linear combinations of ``weak'' models to generate an arbitrarily ``strong'' predictive model. We employ a variant of the popular Adaboost algorithm to train multiple acoustic models such that the aggregate system exhibits improved performance over the individual recognizers. Each model is trained sequentially on re-weighted versions of the training data. At each iteration, the weights are decreased for the frames that are correctly decoded by the current system. These weights are then multiplied with the frame-level statistics for the decision trees and Gaussian mixture components of the next iteration system. The composite system uses a log-linear combination of HMM state observation likelihoods. We report experimental results on several broadcast news transcription setups which differ in the language being spoken (English and Arabic) and amounts of training data. Additionally, we study the impact of boosting on ML and discriminatively trained acoustic models. Our findings suggest that significant gains can be obtained for small amounts of training data even after feature and model-space discriminative training.

Speaker Biography

George Saon received his M.Sc. and PhD degrees in Computer Science from the Henri Poincare University in Nancy, France in 1994 and 1997. From 1994 to 1998, he worked on two-dimensional stochastic models for off-line handwriting recognition at the Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA). Since 1998, Dr. Saon is with the IBM T.J. Watson Research Center where he tackled a variety of problems spanning several areas of large vocabulary continuous speech recognition such as discriminative feature processing, acoustic modeling, speaker adaptation and large vocabulary decoding algorithms. Some of the techniques that he co-invented are well known to the speech community like heteroscedastic discriminant analysis (HDA), implicit lattice discriminative training, lattice-MLLR, feature-space Gaussianization, fast FSM-based Viterbi decoding, etc. Since 2001, Dr. Saon has been a key member of IBM's speech recognition team which participated in several U.S, government-sponsored evaluations for the EARS, SPINE and GALE programs. In the context of GALE, he is also foraying into statistical machine translation. He has published over 70 conference and journal papers and holds several patents in the field of ASR.

April 27, 2010 | 4:30PM

“Deep learning with multiplicative interactions”   Video Available

Geoffrey E. Hinton, University of Toronto and Canadian Institute for Advanced Research

[abstract]

Abstract

Deep networks can be learned efficiently from unlabeled data. The layers of representation are learned one at a time using a simple learning module that has only one layer of latent variables. The values of the latent variables of one module form the data for training the next module. Although deep networks have been quite successful for tasks such as object recognition, information retrieval, and modeling motion capture data, the simple learning modules do not have multiplicative interactions which are very useful for some types of data.The talk will show how to introduce multiplicative interactions into the basic learning module in a way that preserves the simple rules for learning and perceptual inference. The new module has a structure that is very similar to the simple cell/complex cell hierarchy that is found in visual cortex. The multiplicative interactions are useful for modeling images, image transformations and different styles of human walking. They can also be used to create generative models of spectrograms. The features learned by these generative models are excellent for phone recognition.

April 20, 2010 | 4:30PM

“The Making of Gale”

Joseph Olive, DARPA

[abstract] [biography]

Abstract

This talk will present the making of Darpa's Global Autonomous Language Exploration (GALE) initiative, from soup to nuts. I will describe the development of human-language technology, specifically translation and distillation of both speech and text. I will also present the challenges of evaluating such technologies both within the constraints of Darpa's go/no-go requirements and within innovative evaluation paradigms that have arisen over the course of the initiative.

Speaker Biography

Dr. Joseph Olive is a Darpa Program Manager of the Global Autonomous Language Exploration (GALE) program. He has had over thirty years of experience in research and development at Bell Laboratories and 19 years of experience in management. He has been the world leader in research of text-to-speech synthesis and has managed a world-class team in computer dialogue systems and human-computer communication. In his role as director of speech research and CTO of Lucent's Business Unit, Lucent Speech Solutions, he supervised the productization of Bell-Labs core speech technologies: Automatic Speech Recognition (ASR), Text-to-Speech Synthesis (TTS), and Speaker Verification (SV). He also led the dialogue research team in creating a "next-generation" dialogue system for e-mail reading and navigation. Dr. Olive graduated from the University of Chicago with a Ph.D. in Physics. After leaving the University of Chicago, Dr. Olive combined his interest in computation and his interest in music and began research in acoustics and signal processing.

April 13, 2010 | 4:30PM

“Modeling Cognitive State”

Owen Rambow

[abstract]

Abstract

In the 80s and 90s of the last century, in subdisciplines such as planning, text generation, and dialog systems, there was considerable interest in modeling the cognitive states of interacting autonomous agents. Theories such as Speech Act Theory (Austin 1962), the belief-desire-intentions model of Bratman (1987), and RST (Mann and Thompson 1988) together provide a framework in which to link cognitive state with language use. However, in general natural language understanding, little use was made of such theories, presumably because of the difficulty of some underlying tasks (such as syntactic parsing). In this talk, I propose that it is time to again think about the explicit modeling of cognitive state for participants in discourse, including in an understanding perspective. The perspective of cognitive state can provide a context in which many disparate NLP tasks can be classified and related. I will present three projects at Columbia which relate to the modeling of cognitive state:* Discourse participants need to model each other's cognitive states, and language makes this possible by providing special morphological, syntactic, and lexical markers. I present results in automatically determining the degree of belief of a speaker in the propositions in his or her utterance.* In dialog (including written dialog), people pursue communicative intentions and usually cooperate on achieving them. Lack of cooperation can be a sign of lack of solidarity, and an assertion of power. I discuss some preliminary research on identifying dialog acts, communicative intention, and cooperative behavior in dialog.* A social network is actually a perception on the part of people of how they relate to other people (rather than merely a collection of friends lists on a social networking site), and thus also a component of cognitive state. I report on initial work in extracting a social network from text.

April 6, 2010 | 4:30PM

“Google search by Voice”

Johan Schalkwyk, Google

[abstract]

Abstract

Google search by Voice is a free application from Google that allows spoken access to google.com. We currently support Apple iPhone, Android, Blackberry and Nokia Series 60 devices. In this talk we will give an overview of Google Search by Voice and our efforts to make speech input on mobile devices truly ubiquitous. We describe the basic technology behind Google search by Voice. In particular we focus on the unique set of challenges faced while building Google search by Voice, ranging from cloud computing, speech recognition, natural language processing and user interface design, as well as a collection of important open research challenges in these realms.  

March 30, 2010 | 4:30PM

“Dynamic Finite-State Transducer Composition with Look-Ahead for Very-Large Scale Speech Recognition”

Mike Riley, Google

[abstract] [biography]

Abstract

This talk describes a weighted finite-state transducer composition algorithm that generalizes the concept of the composition filter and presents look-ahead filters that remove useless epsilon paths and push forward labels and weights along epsilon paths. This filtering permits the composition of very large speech recognition context-dependent lexicons and language models much more efficiently in time and space than previously possible. We present experiments on Broadcast News and a spoken query task that demonstrate a 5-10% overhead for dynamic, runtime composition compared to a static, offline composition of the recognition transducer in an FST-based decoder. In the spoken query task, we give results using LMs varying from 15M to 2G ngrams. To our knowledge, this is the first such system with so little overhead and such large LMs.Joint work with: Cyril Allauzen, Ciprian Chelba, Boulos Harb, and Johan Schalkwyk.

Speaker Biography

Michael Riley has a B.S., M.S., and Ph.D from MIT, all in computer science. He began his career at Bell Labs and AT&T Labs where he, together with Mehryar Mohri and Fernando Pereira, introduced and developed the theory and use of weighted finite-state transducers (WFSTs) in speech and language. He is currently a research scientist at Google, Inc. His interests include speech and natural language processing, machine learning, and information retrieval. He is a principal author of the OpenFst library and the AT&T FSM Library (TM).

March 23, 2010 | 4:30PM

“Exploring Web Scale Language Models for Search Query Processing”

Jianfeng Gao, Microsoft

[abstract] [biography]

Abstract

It has been widely observed that search queries are composed in a very different style from that of the body or the title of a document. Many techniques explicitly accounting for this language style discrepancy have shown promising results for information retrieval, yet a large scale analysis on the extent of the language differences has been lacking. In this paper, we present an extensive study on this issue by examining the language model properties of search queries and the three text streams associated with each web document: the body, the title, and the anchor text. Our information theoretical analysis shows that queries seem to be composed in a way most similar to how authors summarize documents in anchor texts or titles, offering a quantitative explanation to the observations in past work. We apply these web scale n-gram language models to three search query processing (SQP) tasks: query spelling correction, query bracketing and long query segmentation. By controlling the size and the order of different language models, we find that the perplexity metric to be a good accuracy indicator for these query processing tasks. We show that using smoothed language models yields significant accuracy gains for query bracketing for instance, compared to using web counts as in the literature. We also demonstrate that applying web-scale language models can have marked accuracy advantage over smaller ones.

Speaker Biography

I am a Researcher in Natural Language Processing Group at Microsoft Research. From 2005 to 2006, I was a software developer in Natural Interactive Services Division at Microsoft. From 1999 to 2005, I was a researcher in Natural Language Computing Group at Microsoft Research Asia. My research interests include Web search and mining, information retrieval and statistical natural language processing.

March 9, 2010 | 4:30PM

“TAG-based Structured Prediction Models for Parsing and Machine Translation”

Michael Collins, MIT

[abstract] [biography]

Abstract

In structured prediction problems the goal is to learn a function that maps input points to structured output labels: for example, strings, graphs, or trees. These problems are common in many fields---for example, natural language processing (NLP), computer vision, and computational biology---and have been the focus of a great deal of recent research in machine learning.In this talk I'll describe models for two structured prediction problems in NLP: parsing and machine translation. Central to both approaches is a variant of Tree Adjoining Grammar (TAG) (Joshi et al.,1975), which is computationally efficient, but which also allows the use of relatively rich syntactic representations. The TAG-based parser generalizes a powerful class of discriminative models (conditional random fields) to full syntactic parsing. The TAG-based translation system makes direct use of syntactic structures in modeling differences in word order between different languages, and in modeling the grammaticality of translation output. In both cases we show improvements over state-of-the-art systems.This is joint work with Xavier Carreras and Terry Koo.

Speaker Biography

Michael Collins is an Associate Professor of Computer Science at MIT. His research is focused on topics including statistical parsing, structured prediction problems in machine learning, and applications including machine translation, dialog systems, and speech recognition. His awards include a Sloan fellowship, an NSF career award, and best paper awards at EMNLP (2002 and 2004), UAI (2004 and 2005), and CoNLL 2008.

March 2, 2010 | 4:30PM

“Social Technology”

Marti A. Hearst, UC Berkeley

[abstract] [biography]

Abstract

We are in the midst of extraordinary change in how people interact with one another and with information. A combination of advances in technology and change in people's expectations is altering the way products are sold, scientific problems are solved, software is written, elections are conducted, and government is run.People are social animals, and as Shirky notes, we now have tools that are flexible enough to match our in-built social capabilities. Things can get done that weren't possible before because the right expertise, the missing information, or a large enough group of people can now be gathered together at low cost.These developments open a number of interesting research questions. and potentially change how scientific research should be conducted. In this talk I will attempt to summarize and put some structure around some of these developments.

Speaker Biography

Prof. Marti Hearst is a professor in the School of Information at UC Berkeley, with an affiliate appointment in the Computer Science Division. Her primary research interests are user interfaces for search engines, information visualization, natural language processing, and empirical analysis of social media. She has just completed the first book on Search User Interfaces.Prof. Hearst received BA, MS, and PhD degrees in Computer Science from the University of California at Berkeley, and she was a Member of the Research Staff at Xerox PARC from 1994 to 1997.Prof. Hearst has served on the Advisory Council of NSF's CISE Directorate and is co-chair of the Web Board for CACM. She is a member of the Usage Panel for the American Heritage Dictionary and is on the Edge.org panel of experts. Prof. Hearst is on the editorial boards of ACM Transactions on the Web and ACM Transactions on Computer-Human Interaction and was formerly on the boards of Computational Linguistics , ACM Transactions on Information Systems, and IEEE Intelligent Systems.Prof. Hearst has received an NSF CAREER award, an IBM Faculty Award, a Google Research Award, an Okawa Foundation Fellowship, two Excellence in Teaching Awards, and has been principle investigator for more than $3M in research grants.Prof. Hearst was for many years a researcher in the QCA group at Xerox PARC, and before that a member of the BAIR group in graduate school.

February 23, 2010 | 4:30PM

“Voice Applications for Low Literate Users”

Roni Rosenfeld, Carnegie Mellon University

[abstract] [biography]

Abstract

In the developing world, critical information, such as in the field of healthcare, can often mean the difference between life and death. While information and communications technologies enable multiple mechanisms for information access by literate users, there are limited options for information access by low literate users.In this talk, I will give an overview of the use of spoken language interfaces by low literate users in the developing world, with a specific focus on health information access by community health workers in Pakistan. I will present results from user studies comparing a variety of information access interfaces for these users, and show that speech interfaces outperform alternative interfaces for both low literate and literate users.I will also describe some of the challenges, both technical and non-technical, that are involved in developing spoken language technologies for low-literate users, and propose solutions for them. One of the technical challenges is the rapid generation and deployment of speech recognizers in resource-poor languages. I will describe our Speech-based Automated Learning of Accent and Articulation Mapping (Salaam) method, which leverages existing off-the-shelf automatic Speech Recognition technology to create robust, speaker-independent, small-vocabulary speech recognition capability with minimal training data requirements. This method is able to reach recognition accuracies of greater than 90% with very little effort and, even more importantly, little speech technology skill.The talk concludes with an exploration of orality as a lens with which to analyze and understand low literate users, as well as recommendations on the design and testing of user interfaces for such users, and a discussion of the potential of voice-based social media in these contexts.Joint work with Jahanzeb Sherwani.

Speaker Biography

Roni Rosenfeld is Professor of Language Technologies, Machine Learning and Computer Science at the School of Computer Science, Carnegie Mellon University, in Pittsburgh, Pennsylvania. He received his B.Sc. in Mathematics and Physics from Tell-Aviv University in 1985, and his M.Sc. and Ph.D. in Computer Science from Carnegie Mellon University in 1990 and 1994, respectively. He is a recipient of the Allen Newell Medal for Research Excellence. His research interests include the evolution of viruses and viral epidemics, and the use of spoken language technologies to aid socio-economic development.

February 16, 2010 | 4:30PM

“Manipulation of Consonants in Natural Speech”   Video Available

Jont Allen, University of Illinois

[abstract]

Abstract

Starting in the 1920s, researchers at AT&T Research characterized speech perception. Until 1950, this work was done by a large group working under Harvey Fletcher, which resulted in the articulation index, an important tool able to predict average speech scores. In the 1950s a dedicated group of researchers at Haskins Labs in NYC attempted to extend these ideas, and then again at MIT under the direction of Ken Stevens, further work was done, on trying to identify the reliable speech cues. Most of this work after 1950 was not successful in finding speech cues, therefore today many consider it impossible. That is, many believe that there is no direct unique mapping from the time-frequency plane to consonant and vowel recognition. For example it has been claimed that context is necessary to successfully identify nonsense consonantvowels. In fact this is not the case. The post 1950 work mostly used synthetic speech. This was a major flaw with all these studies. Also only average results were studied, again a major flaw. In 2007 we measured the consonant error for 20 talkers speaking 16 different consonants, in two types of variable noise, and 2009 we measured confusions for 50 hearing impaired ears. For many consonants, normal human performance is well above chance at -20 dB SNR, and at 0 dB SNR, the score is close to 100% for most sounds. The error patterns for individual sounds are quite different from the average. Vowels preform very differently than consonants. The lesson learned is to carefully study token inhomogeneity. For the hearing impaired ears, the confusions vary wildly. The present work is a natural extension of these 1950 studies, but this time we have been successful and have determined the mapping. Using 1. extensive psychoacoustic methods, 2. working with a large data-base 3. of recorded speech sounds, with 4. the newly developed techniques that 5. use a model of the auditory system to 6. predict audible cues in noise, all 7. with a large number of listeners to evaluate the induced confusions, we have precisely identified the acoustic cues for individual utterances and for a large number of consonants. This paper explores the potential use of this new knowledge about perceptual cues of consonant sounds in speech processing. These cues provide deep insight into why Fletcher’s articulation index is successful in predicting average “nonsense” speech syllables. Our analysis of a large number of nonsense Consonant-Vowel syllables from the LDC database reveals that natural speech, especially stop consonants, often contain conflicting speech cues that are characteristic of confusable sounds. Through the manipulation of these acoustic cues, one phone (a consonant or vowel sound) is be morphed into another. Meaningful sentences can be morphed into nonsense, or a sentence with a very different meaning. The resulting morphed speech is natural-sounding human speech. These techniques are robust to noise: a weak sound, easily masked by noise, can be converted into a strong one. Results of speech perception experiments on feature-enhanced /ka/ and /ga/ show that any modification of speech cues significantly changes, and can even improve the score in noise, for both normal and hearing-impaired listeners. The implications for ASR will be discussed. This work forms the basis of the second author’s PhD.

February 2, 2010 | 4:30PM

“Segmental Conditional Random Fields and their Features”

Geoffrey Zweig, Microsoft

[abstract] [biography]

Abstract

Segmental conditional random fields provide an exciting new approach to speech recognition based on the integration of numerous acoustic and linguistic cues. In this approach, acoustic detectors are applied to the audio signal, resulting in a stream of detection events. For example, phoneme, syllable or general acoustic-template detectors may be used. The detection streams are analyzed in word level segments, where long-span features are extracted. A log-linear model, in the form of a segmental conditional random field, integrates these acoustic features with language modeling features, to produce a probability distribution over word labels. The talkwill describe the theoretical background of this approach, a specific implementation in the form of a publically available toolkit, and results on both a voice search task, and the Wall Street Journal task. This is joint work with Patrick Nguyen.Seminar Video: Video of this lecture is available from the CLSP library upon request.

Speaker Biography

Geoffrey Zweig studied at the University of California at Berkeley earning a B.A. in physics (Summa Cum Laude) in 1985 and a PhD in computer science in 1998. After graduating he worked for eight years at the IBM TJ Watson research center, where he led the speech recognition efforts in the DARPA EARS and GALE programs and managed the Advanced Large Vocabulary Continuous Speech Recognition group. In 2006 he joined Microsoft Research in Redmond, WA as a Senior Researcher. His work at Microsoft has revolved around acoustic and language modeling techniques for voice search applications, and most recently in the development of the segmental CRF approach to ASR. He has published over 50 papers in the area of speech recognition along with numerous patents. In addition to Microsoft, he is on the affiliate faculty of the University of Washington. Dr. Zweig is a member of the ACM and senior member of the IEEE. He served from 2003 to 2006 as associate editor of the IEEE transactions on Audio Speech and Language Processing, and is currently on the editorial board of Computer Speech and Language. More about him can be found at http://research.microsoft.com/en-us/people/gzweig/

January 26, 2010 | 4:30PM

“Exploiting Latent Semantic Mapping for Generic Feature Extraction”

Jerome R. Bellegarda, Apple Inc.

[abstract] [biography]

Abstract

Originally formulated in the context of information retrieval, latent semantic analysis exhibits three main characteristics: (i) words and documents (i.e., discrete entities) are mapped onto a continuous vector space; (ii) this mapping is determined by global correlation patterns; and (iii) dimensionality reduction is an integral part of the process. Because such fairly generic properties may be advantageous in a variety of different contexts, this has sparked interest in a more inclusive interpretation of the underlying paradigm. The outcome is latent semantic mapping, a data-driven framework for modeling global relationships implicit in large volumes of (not necessarily textual) data. The purpose of this talk is to give a broad overview of the framework, highlight the attendant focus shift from semantic classification to more general feature extraction, and underscore the multi-faceted benefits it can bring to a number of problems in speech and language processing. We conclude with a discussion of the inherent trade-offs associated with the approach, and some perspectives on its likely role in information extraction going forward.

Speaker Biography

Jerome R. Bellegarda received the Ph.D. degree in Electrical Engineering from the University of Rochester, Rochester, New York, in 1987. From 1988 to 1994 he worked on automatic speech and handwriting recognition at the IBM T.J. Watson Research Center, Yorktown Heights, New York. In 1994 he joined Apple Inc, Cupertino, California, where he is currently Apple Distinguished Scientist in Speech & Language Technologies. His general interests span voice-driven man-machine communications, multiple input/output modalities, and multimedia knowledge management. In these areas he has written approximately 150 publications, and holds over 40 U.S. and foreign patents. He has also served on many international scientific committees, review panels, and editorial boards. In particular, he has worked as Expert Adviser on speech technology for both the National Science Foundation and the European Commission (DGXIII), was Associate Editor for the IEEE Transactions on Audio, Speech and Language Processing, served on the IEEE Signal Processing Society Speech Technical Committee, and is currently a member of the Speech Communication Editorial Advisory Board. He is a Fellow of the IEEE.

Back to Top

2009

December 1, 2009 | 4:30PM

“Communication Disorders and Speech Technology”

Elmar Noeth, Friedrich-Alexander University Erlangen-Nuremberg

[abstract] [biography]

Abstract

In this talk we will give an overview of the different kinds of communication disorders. We will concentrate on communication disorders related to language and speech (i.e., not look at disorders like blindness or deafness). Speech and language disorders can range from simple sound substitution to the inability to understand or use language. Thus, a disorder may affect one or several linguistic levels: A patient with an articulation disorder cannot correctly produce speech sounds (phonemes) because of voice disorders or imprecise placement, timing, pressure, speed, or flow of movement of the lips, tongue, or throat. His speech may be acoustically unintelligible, yet the syntactic, semantic, and pragmatic level are not affected. With other pathologies, e.g. Wernicke’s aphasia, the acoustics of the speech signal might be intelligible, yet the patient is – due to mixup of words (semantic paraphasia) or sounds (phonematic paraphasia) – unintelligible. We will look at what linguistic knowledge has to be modeled in order to analyze different pathologies with speech technology, how difficult the task is, and how speech technology is able to support the speech therapist for the tasks diagnosis, therapy control, comparison of therapies, and screening. Joint work with Andreas Maier, Tino Haderlein, Stefan Steidl, and Maria Schuster.

Speaker Biography

Elmar Nöth studied in Erlangen and at MIT and obtained his 'Diplom' in Computer Science and his doctoral degree at the Friedrich-Alexander-Universität Erlangen-Nürnberg in 1985 and 1990, respectively. From 1985 to 1990 he was a member of the research staff of the Institute for Pattern Recognition (Lehrstuhl für Mustererkennung), working on the use of prosodic information in automatic speech understanding. Since 1990 he has been an assistant professor and since 2008 he is a full professor at the same institute and head of the speech group. He is one of the founders of the Sympalog company, which markets conversational dialog systems. He is on the editorial board of Speech Communication and EURASIP Journal on Audio, Speech, and Music Processing and member of the IEEE SLTC. His current interests are prosody, analysis of pathologic speech, computer aided language learning and emotion analysis.

November 24, 2009 | 4:30PM

“Hierarchical Phrase-based Translation with Weighted Finite State Transducers”

Bill Byrne, University of Cambridge

[abstract] [biography]

Abstract

HiFST is a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search errors, better parameter optimization, and improved translation performance. The direct generation of translation lattices in the target language can improve subsequent rescoring procedures, yielding further gains when applying long-span language models and Minimum Bayes Risk decoding.

Speaker Biography

Bill Byrne is a Reader in Information Engineering at the University of Cambridge.

November 17, 2009 | 4:30PM

“Graph Identification”

Lise Getoor, University of Maryland

[abstract] [biography]

Abstract

Within the machine learning and data mining communities, there has been a growing interest in learning structured models from input data that is itself structured or semi-structured. Graph identification refers to methods that transform observational data described as a noisy input graph into an inferred, "clean" information graph. Examples include inferring social networks from online, noisy, communication data, identifying gene regulatory networks from protein-protein interactions, and extracting semantic graphs from noisy and ambiguous co-occurrence information. Some of the key processes in graph identification are: entity resolution, collective classification, and link prediction. I will overview algorithms for these tasks, discuss the need for integrating the methods to solve the overall problem jointly. Time permitting, I will also give quick overviews of some of the other research projects in my group.

Speaker Biography

Lise Getoor is an associate professor in the Computer Science Department at the University of Maryland, College Park. She received her PhD from Stanford University in 2001. Her current work includes research on link mining, statistical relational learning and representing uncertainty in structured and semi-structured data. She has also done work on social network analysis and visual analytics. She has published numerous articles in machine learning, data mining, database, and artificial intelligence forums. She was awarded an NSF Career Award, is an action editor for the Machine Learning Journal, is a JAIR associate editor, has been a member of AAAI Executive council, and has served on a variety of program committees including AAAI, ICML, IJCAI, ISWC, KDD, SIGMOD, UAI, VLDB, and WWW.

November 10, 2009 | 4:30PM

“We KnowItAll: lessons from a Quarter Century of Web Extraction Research”

Oren Etzioni, University of Washington

[abstract] [biography]

Abstract

For the last quarter century (measured in person years), the KnowItAll project has investigated information extraction at Web scale. If successful, this effort will begin to address the long-standing "Knowledge Acquisition Bottleneck" in Artificial Intelligence, and will enable a new generation of search engines that extract and synthesize information from text to answer complex user queries. To date, we have generalized information extraction methods to process arbitrary Web text, to handle unanticipated concepts, and to leverage the redundancy inherent in the Web corpus, but many challenges remain. One of the most formidable challenges is moving from extracting isolated nuggets of information to capturing a coherent body of knowledge that can support automatic inference. My talk will describe the lessons we have learned and identify directions for future work.

Speaker Biography

Oren Etzioni is the Washington Research Foundation Entrepreneurship Professor at the University of Washington's Computer Science Department.He received his bachelor's degree in Computer Science from Harvard University in June 1986 where he was the first Harvard student to "major" in Computer Science. Etzioni received his Ph.D. from Carnegie Mellon University in January 1991, and joined the University of Washington's faculty in February 1991, where he is now a Professor of Computer Science. Etzioni received a National Young Investigator Award in 1993, and was selected as a AAAI Fellow a decade later. He is the founder and director of the University of Washington's Turing Center .Etzioni is also a Venture Partner at Madrona Venture Group where he chairs the Technology Advisory Board. He was the founder of Farecast, a company that utilizes data mining techniques to anticipate airfare fluctuations. Microsoft acquired Farecast in 2008. He was a co-founder of Clearforest, a text-mining startup, which was acquired by Reuters in 2007. He was the Chief Technology Officer and a board member of Go2net, which was acquired by Infospace in 2000. Finally, he co-founded Netbot, acquired by Excite in 1997. At Netbot, he helped to conceive of and design the web's first major comparison-shopping agent. In 1995, Etzioni and his student Erik Selberg developed MetaCrawler, the web’s premier Meta-search engine for several years, now being run by Infospace. Finally, he has served on the board of Performant (acquired by Mercury Interactive in 2003) and been a consultant or advisor to Askjeeves, Excite, Infospace, Google, Microsoft, Northern Telecom, SAIC, Vivisimo, and Zillow, and others.

November 3, 2009 | 4:30PM

“Vector-based Models of Semantic Composition”

Mirella Lapata, University of Edinburgh

[abstract] [biography]

Abstract

Vector-based models of word meaning have become increasingly popular in natural language processing and cognitive science. The appeal of these models lies in their ability to represent meaning simply by using distributional information under the assumption that words occurring within similar contexts are semantically similar. Despite their widespread use, vector-based models are typically directed at representing words in isolation and methods for constructing representations for phrases or sentences have received little attention in the literature. In this talk we propose a framework for representing the meaning of word combinations in vector space. Central to our approach is vector composition which we operationalize in terms of additive and multiplicative functions. Under this framework, we introduce a wide range of composition models which we evaluate empirically on a phrase similarity task. We also propose a novel statistical language model that is based on vector composition and can capture long-range semantic dependencies. Joint work with Jeff Mitchell

Speaker Biography

Mirella Lapata is a reader (US equivalent to associate professor) in the School of Informatics at the University of Edinburgh. Her research interests are in natural language processing focusing on semantic interpretation and generation. She obtained a PhD degree in Informatics from the University of Edinburgh in 2001 and spent two years as faculty member at the Department of Computer Science at the University of Sheffield. She received a B.A. degree in computer science from the University of Athens in 1994 and an Msc degree from Carnegie Mellon University in 1998.

October 27, 2009 | 4:30PM

“A new Golden Age of phonetics?”   Video Available

Mark Liberman, University of Pennsylvania

[abstract] [biography]

Abstract

From the perspective of a linguist, today's vast archives of digital text and speech, along with new analysis techniques from language engineering, look like a wonderful new scientific instrument, a modern equivalent of the 17th-century invention of the telescope and microscope. We can now observe linguistic patterns in space, time, and cultural context, on a scale three to five orders of magnitude greater than in the past, and simultaneously in much greater detail than before. Scientific use of these new instruments remains mainly potential, especially in phonetics and related disciplines, but the next decade is likely to be a new "golden age" of research. This talk will discuss some of the barriers to be overcome, present some successful examples, and speculate about future directions.

Speaker Biography

Biographical information for Mark Liberman is available from http://ling.upenn.edu/~myl

October 20, 2009 | 4:30PM

“Predicting Language Change”   Video Available

Charles Yang, University of Pennsylvania

[abstract] [biography]

Abstract

The parallels between language change and biological changes were noted by none other than Darwin himself. However, the development of a mathematical foundation for evolution has not taken place in the study of language change, even though tools from quantitative genetics have seen applications in the linguistic arena.This work attempts to develop a series of models of language change, drawing insights from population genetics on the one hand, and modern theories of linguistic structures, language acquisition and language processing on the other. The dynamics of language learning over generations turn out to bear strong resemblance to the process of Natural Selection. In some cases, this allows one to quantitatively measure the "fitness" of grammatical hypotheses and thus predict the directionality of language change. I will discuss the general use of population models in language, and present two specific case studies: the word order change from Old French to Modern French, and the cot-caught merger recently documented at the Massachusetts and Rhode Island border. The outcome of both changes is shown to be entirely predictable from the statistical composition of linguistic data in the environment.

Speaker Biography

Charles Yang teaches linguistics and computer science at the University of Pennsylvania, where he works on language learning, language change, and computational linguistics. He is the author of three books, and is currently finishing a monograph on the computational properties of words.

October 13, 2009 | 4:30PM

“Using speech models for separation in monaural and binaural contexts”

Dan Ellis, Columbia University

[abstract] [biography]

Abstract

When the number of sources exceeds the number of microphones, acoustic source separation is an underconstrained problem that must rely on additional constraints for solution. In a single-channel environment the expected behavior of the source -- i.e. an acoustic model -- is the only feasible basis for separation. I will describe our recent work in monaural speech separation based on fitting parametric "eigenvoice" speaker-adapted models to both voices in a mixture. In a binaural, reverberant environment, the interaural characteristics of an acoustic source exhibit structure that can be used to separate, even without prior knowledge of location or room characteristics. I will present MESSL, our EM-based system for source separation and localization. MESSL's probabilistic foundation facilitates the incorporation of more specific source models; I will also describe MESSL-EV, which incorporates the eigenvoice speech models for improved binaural separation in reverberant environments. Joint work with Ron Weiss and Mike Mandel.

Speaker Biography

Daniel P. W. Ellis received the Ph.D. degree in electrical engineering from the Massachusetts Institute of Technology, Cambridge, where he was a Research Assistant in the Machine Listening Group of the Media Lab. He spent several years as a Research Scientist at the International Computer Science Institute, Berkeley, CA. Currently, he is an Associate Professor with the Electrical Engineering Department, Columbia University, New York. His Laboratory for Recognition and Organization of Speech and Audio (LabROSA) is concerned with all aspects of extracting high-level information from audio, including speech recognition, music description, and environmental sound processing. He also runs the AUDITORY email list of 1700 worldwide researchers in perception and cognition of sound.

October 6, 2009 | 4:30PM

“Generic knowledge: acquisition and representation”   Video Available

Lenhart Schubert, University of Rochester

[abstract] [biography]

Abstract

AI is beginning to make some dents in the "knowledge acquisition bottleneck", the problem of acquiring large amounts of general world knowledge to support language understanding and commonsense reasoning. Two text-based approaches to the problem are (1) to abstract such knowledge from patterns of predication and modification in miscellaneous texts, and (2) to derive such knowledge by direct interpretation of general statements in ordinary language, such as are found in lexicons and resources like Open Mind. I will discuss the status of our efforts in these directions (currently centered around the KNEXT system), and the problems that are encountered. Among these problems are what exactly is meant by generalities such as "Cats land on their feet", and how this meaning should be formalized. One particular difficulty is that such statements typically involve ``donkey anaphora". I will suggest a "dynamic Skolemization" approach that leads naturally to script- or frame-like representations, of the sort that have been developed in AI independently of linguistic considerations.

Speaker Biography

Lenhart Schubert is a professor of computer science at the University of Rochester, with primary interests in natural language understanding, knowledge representation and acquisition, reasoning, and self-awareness. He is a fellow of the AAAI, has served as program chair for several AI/KR/CL conferences, and has published over a hundred articles, including ones in philosophical and linguistic handbooks and encyclopedias.

September 29, 2009 | 4:30PM

“Repetition and Language Models and Comparable Corpora”

Ken Church, Johns Hopkins University

[abstract] [biography]

Abstract

I will discuss a couple of non-standard features that I believe could be useful for working with comparable corpora. Dotplots have been used in biology to find interesting DNA sequences. Biology is interested in ordered matches, which show up as (possibly broken) diagonals in dotplots. Information Retrieval is more interested in unordered matches (e.g., cosine similarity), which shows up as squares in dotplots. Parallel corpora have both squares and diagonals multiplexed together. The diagonals tell us what is a translation of what, and the squares tell us what is in the same language. There is also an opportunity to take advantage of repetition in comparable corpora. Repetition is very common. Standard bag-of-word models in Information Retrieval do not attempt to model discourse structure such as given/new. The first mention in a news article (e.g., "Manuel Noriega, former President of Panama") is different from subsequent mentions (e.g., "Noriega"). Adaptive language models were introduced in Speech Recognition to capture the fact that probabilities change or adapt. After we see the first mention, we should expect a subsequent mention. If the first mention has probability p, then under standard (bag-of words) independence assumptions, two mentions ought to have probability p^2, but we find the probability is actually closer to p/2. Adaptation matters more for meaningful units of text. In Japanese, words (meaningful sequences of characters) are more likely to be repeated than fragments (meaningless sequences of characters from words that happen to be adjacent). In newswire, we find more adaptation for content words (proper nouns, technical terminology, out of vocabulary (OOV) words and good keywords for information retrieval), and less adaptation for function words, clichés and ordinary first names. There is more to meaning than frequency. Content words are not only low frequency, but likely to be repeated.

Speaker Biography

MIT undergrad (1978) and grad (1983), followed by 20 years at AT&T Bell Labs (1983-2003) and 6 years at Microsoft Research (2003-2009). Currently, at Hopkins as Chief Scientist of the Human Language Technology Center of Excellence as well as Research Professor in Computer Science. Honors: AT&T Fellow. I have worked on many topics in computational linguistics including: web search, language modeling, text analysis, spelling correction, word-sense disambiguation, terminology, translation, lexicography, compression, speech (recognition and synthesis), OCR, as well as applications that go well beyond computational linguistics such as revenue assurance and virtual integration (using screen scraping and web crawling to integrate systems that traditionally don't talk together as well as they could such as billing and customer care). When we were reviving empirical methods in the 1990s, we thought the AP News was big (1 million words per week), but since then I have had the opportunity to work with much larger data sets such as telephone call detail (1-10 billion records per month) and web logs (even bigger).

September 22, 2009 | 4:30PM

“Embracing Language Diversity: Unsupervised Multilingual Learning”   Video Available

Regina Barzilay, MIT

[abstract] [biography]

Abstract

For centuries, the deep connection between human languages has fascinated scholars, and driven many important discoveries in linguistics and anthropology. In this talk, I will show that this connection can empower unsupervised methods for language analysis. The key insight is that joint learning from several languages reduces uncertainty about the linguistic structure of each individual language. I will present multilingual generative unsupervised models for morphological segmentation, part-of-speech tagging, and parsing. In all of these instances we model the multilingual data as arising through a combination of language-independent and language-specific probabilistic processes. This feature allows the model to identify and learn from recurring cross-lingual patterns to improve prediction accuracy in each language. I will also discuss ongoing work on unsupervised decoding of ancient Ugaritic tablets using data from related Semitic languages. This is joint work with Benjamin Snyder, Tahira Naseem and Jacob Eisenstein.

Speaker Biography

Regina Barzilay is an assosiate professor in the Department of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory. Her research interests are in natural language processing. She is a recipient of the NSF Career Award, Microsoft Faculty Fellowship, and has been named as one of "Top 35 Innovators Under 35" by Technology Review Magazine. She received her Ph.D. in Computer Science from Columbia University in 2003 and spent a year as a postdoc at Cornell University. Regina received her M.S. in 1998 and B.A. in 1992, both from Ben-Gurion University, Israel.

September 15, 2009 | 4:30PM

“EM Works for Pronoun-Anaphora Resolution”   Video Available

Eugene Charniak, Brown University, Department of Computer Science

[abstract] [biography]

Abstract

EM (the Expectation Maximization Algorithm) is a well known technique for unsupervised learning (where one does not have any hand labeled solutions available, but instead one must learn from the raw text). Unfortunately EM is known to fail to find good solutions in many (most?) applications on which it is tried. In this talk we present some recent work on using EM to learn how to resolve pronoun-anaphora, e.g., determining that "the dog" is the antecedent of "he" and "his" in "When Sally fed the dog he wagged his tail". For this application EM works strikingly well, determining tens of thousands of parameters and resulting in a program that produces state of the art performance.

Speaker Biography

Eugene Charniak is University Professor of Computer Science at Brown University and past chair of the department. He received his A.B. degree in Physics from University of Chicago, and a Ph.D. from M.I.T. in Computer Science. He has published four books the most recent being Statistical Language Learning. He is a Fellow of the American Association of Artificial Intelligence and was previously a Councilor of the organization. His research has always been in the area of language understanding or technologies which relate to it. Over the last 15 years years he has been interested in statistical techniques for many areas of language processing including parsing, discourse and anaphora.

September 8, 2009 | 4:30PM

“Improving Machine Translation by Propagating Uncertainty”   Video Available

Chris Dyer, University of Maryland

[abstract] [biography]

Abstract

NLP systems typically consist of a series of components where the output of one module (e.g., a word segmenter) serves as input to another (e.g., a translator). Integration between the components is often achieved using only the 1-best analysis from an upstream component as the input to a downstream component. Unfortunately, this naive integration strategy results in compounding error propagation (cf. Finkel et al. 2006, Dyer et al. 2008). In this talk, I briefly review the effects of this problem in machine translation, where examples of upstream uncertainty include not only the noisy outputs of statistical preprocessors (such as word segmenters and STT systems), but also "development-time" decisions (such as determining what the appropriate granularity of the lexical units is or how much text normalization to do). I show that by encoding input alternatives in a word lattice, translation quality can be improved over a 1-best baseline, with only a slight runtime performance cost. I then explore in more detail the implications of modeling development-time uncertainty jointly with translation, focusing on the problem of source language word segmentation. I tackle this problem in two ways. First, I present a Markov random field model of word segmentation and describe how to use it to generate lattices appropriate for translation by training it to maximize the (conditional) probability of a collection of segmentation alternatives, rather than maximizing the probability of a single correct analysis. Second, I describe generalized alignment models that align lattices in one language to strings in another, enabling the joint modeling of segmentation (or other noisy processes) and translation. Since lattice inputs break the Markov assumptions that enable the efficient inference made in many common word alignment models, I also present novel Monte Carlo techniques for performing word and lattice alignment.

Speaker Biography

Chris Dyer is a Ph.D. candidate at the University of Maryland, College Park, in the Department of Linguistics under the supervision of Philip Resnik. His research interests include statistical machine translation, computational morphology and phonology, unsupervised learning, and scaling NLP models to deal with larger data sets using the MapReduce programming paradigm. He is graduating this spring and will be joining Noah Smith's lab as a postdoc.

September 6, 2009 | 10:30AM

“Geometric and Event-Based Approaches to Speech Representation and Recognition”

Aren Jansen, University of Illinois

[abstract] [biography]

Abstract

Anyone who has used an automatic speech recognition (ASR) system, either on a customer support line or on their own personal computer, knows firsthand there is vast room for improvement. While state-of-the-art commercial systems perform very well in near-ideal environments, system robustness remains far below human levels. The prevailing hidden Markov model (HMM) based paradigm will undoubtedly see gains in future decades as increased computing capacity admits more complex acoustic models that encompass a range of acoustic environments. In the meantime, there is a wealth of scientific understanding of production and perceptual mechanisms that has yet to be fully exploited by engineers and technologists. In this talk, I will present the main results of a research program that takes scientific inspiration from linguistics, speech perception, and neuroscience as starting points for alternative directions in automatic speech recognition. First, I consider the implications speech production have on the geometric structure of speech sounds and the role this perspective can play in speech technology. Second, I consider the hypothesis that the linguistic content underlying human speech may be more efficiently and robustly coded in the pattern of timings of various acoustic events (landmarks) present in the speech signal. I will present a point process-based statistical framework for phonetic recognition and keyword spotting that matches the performance of equivalent frame-based systems. This approach suggests a new unsupervised adaptation strategy for improving recognizer robustness that outperforms maximum likelihood linear regression adaptation of a continuous density keyword-filler HMM system.

Speaker Biography

Aren Jansen accepted a position of Senior Research Scientist at the Center of Excellence in Human Language Technology at JHU and is a candidate for a position of a Research Assistant Professor at the ECE department at JHU. He received the B.A. degree in physics from Cornell University in 2001. He received the M.S. degree in physics as well as the M.S. and Ph.D. degrees in computer science from the University of Chicago in 2003, 2005, and 2008, respectively, and has undertaken postdoctoral work at the University of Chicago. His research centers around exploring the interface of knowledge and statistical-based approaches to speech representation and recognition.

August 29, 2009 | 10:30AM

“Geometric and Event-Based Approaches to Speech Representation and Recognition”

Aren Jansen, University of Illinois

[abstract] [biography]

Abstract

Anyone who has used an automatic speech recognition (ASR) system, either on a customer support line or on their own personal computer, knows firsthand there is vast room for improvement. While state-of-the-art commercial systems perform very well in near-ideal environments, system robustness remains far below human levels. The prevailing hidden Markov model (HMM) based paradigm will undoubtedly see gains in future decades as increased computing capacity admits more complex acoustic models that encompass a range of acoustic environments. In the meantime, there is a wealth of scientific understanding of production and perceptual mechanisms that has yet to be fully exploited by engineers and technologists. In this talk, I will present the main results of a research program that takes scientific inspiration from linguistics, speech perception, and neuroscience as starting points for alternative directions in automatic speech recognition. First, I consider the implications speech production have on the geometric structure of speech sounds and the role this perspective can play in speech technology. Second, I consider the hypothesis that the linguistic content underlying human speech may be more efficiently and robustly coded in the pattern of timings of various acoustic events (landmarks) present in the speech signal. I will present a point process-based statistical framework for phonetic recognition and keyword spotting that matches the performance of equivalent frame-based systems. This approach suggests a new unsupervised adaptation strategy for improving recognizer robustness that outperforms maximum likelihood linear regression adaptation of a continuous density keyword-filler HMM system.

Speaker Biography

Aren Jansen accepted a position of Senior Research Scientist at the Center of Excellence in Human Language Technology at JHU and is a candidate for a position of a Research Assistant Professor at the ECE department at JHU. He received the B.A. degree in physics from Cornell University in 2001. He received the M.S. degree in physics as well as the M.S. and Ph.D. degrees in computer science from the University of Chicago in 2003, 2005, and 2008, respectively, and has undertaken postdoctoral work at the University of Chicago. His research centers around exploring the interface of knowledge and statistical-based approaches to speech representation and recognition.

July 24, 2009 | 10:30AM

“Computational Advertising”

Andrei Broder, Yahoo!

[abstract] [biography]

Abstract

Computational advertising is an emerging new scientific sub-discipline, at the intersection of large scale search and text analysis, information retrieval, statistical modeling, machine learning, classification, optimization, and microeconomics. The central challenge of computational advertising is to find the "best match" between a given user in a given context and a suitable advertisement. The context could be a user entering a query in a search engine ("sponsored search") , a user reading a web page ("content match" and "display ads"), a user watching a movie on a portable device, and so on. The information about the user can vary from scarily detailed to practically nil. The number of potential advertisements might be in the billions. Thus, depending on the definition of "best match" this challenge leads to a variety of massive optimization and search problems, with complicated constraints. This talk will give an introduction to this area focusing on the IR and NLP connections.

Speaker Biography

Andrei Broder is a Fellow and Vice President for Computational Advertising in Yahoo! Research. He also serves as Chief Scientist of Yahoo’s Advertising Technology Group. Previously he was an IBM Distinguished Engineer and the CTO of the Institute for Search and Text Analysis in IBM Research. From 1999 until 2002 he was Vice President for Research and Chief Scientist at the AltaVista Company. He graduated Summa cum Laude from the Technion, and obtained his M.Sc. and Ph.D. in Computer Science at Stanford University. His current research interests are centered on computational advertising, web search, context-driven information supply, and randomized algorithms. Broder is co-winner of the Best Paper award at WWW6 (for his work on duplicate elimination of web pages) and at WWW9 (for his work on mapping the web). He has authored more than ninety papers and was awarded twenty-eight patents. He is an ACM Fellow, an IEEE fellow, and past chair of the IEEE Technical Committee on Mathematical Foundations of Computing.

July 16, 2009 | 10:30AM

“Technosocial Predictive Analytics”

Antonio Sanfilippo, Pacific Northwest National Laboratory

[abstract] [biography]

Abstract

Events occur daily that challenge the security, health and sustainable growth of our nation, and often find our government agencies unprepared for the catastrophic outcomes. These events involve the interaction of complex processes such as climate change, energy reliability, terrorism, nuclear proliferation, natural and man-made disasters, social/political and economic vulnerability. If we are to help our nation to meet the challenges that emerge from these events, we must develop novel methods for predictive analysis that support a concerted decision-making effort by analyst and policymakers to anticipate and counter strategic surprise. There is now increased awareness among subject-matter experts, analysts, and decision makers that a combined understanding of interacting physical and human factors is essential in addressing strategic surprise proactively. The Technosocial Predictive Analytics (TPA) framework provides an operational advancement of this insight through the development of new methods for anticipatory analysis and response that · implement a multi-perspective approach to predictive modeling through the integration of human and physical models · facilitate the achievement of knowledge/evidence inputs to support the modeling task and promote inferential transparency · enable analysts and policymakers to stress-test the quality of their intelligence products and planned responses without waiting for history to prove them right or wrong. Human Language Technologies (HLT) play an important role in the realization of this framework with specific reference to evidence extraction, but must be augmented to support TPA’s knowledge requirements properly. In presenting TPA, I will discuss an approach which provides such an extension of HLT through the integration of insights from specific domains of expertise and content analysis processes.

Speaker Biography

Dr. Antonio Sanfilippo is Chief Scientist in the Computational and Statistical Analytics Division at Pacific Northwest National Laboratory (PNNL). His research focus is on Computational Linguistics, Content Analysis, Knowledge Technologies and Predictive Analytics with reference to Cognitive, Social, Behavioral and Biomedical Sciences. Dr. Sanfilippo holds a Laurea degree in Foreign Modern Languages awarded cum laude from the University of Palermo in Italy, M.A. and M. Phil. degrees in Anthropological Linguistics from Columbia University, and a Ph.D. in Cognitive Science from the University of Edinburgh (UK). Dr. Sanfilippo is the recipient of the 2008 Laboratory Director’s Award for Exceptional Scientific Achievement at PNNL. For more about Antonio please visit: http://www.linkedin.com/in/antoniosanfilippo

July 8, 2009 | 10:30AM

“Sequence Kernels for Speaker and Speech Recognition”   Video Available

Mark Gales, University of Cambridge

[abstract] [biography]

Abstract

Conceptually sequence kernels map variable length sequences into a fixed dimensional feature-space. In this feature space, for example, an inner-product can be computed. The ability to handle variable length sequences means that these kernels are suitable for speech signals which are by nature time varying. In the speech processing area, sequence kernels have been succesfully applied in speaker verification, where they are used in combination with support vector machines (SVMs) for classification. This talk will concentrate on a particular class of sequence kernels, generative kernels and how they can be used for speaker and speech recognition. Generative kernels, and score-spaces, make use of generative models such as hidden Markov models (HMMs) and Gaussian mixture models (GMMs). By taking first and higher-order derivatives of the log likelihood with respect to the model paarameters fixed dimenesional feature vectors can be extracted. An example of this form of kernel is the Fisher Kernel successfully applied to a range of biological sequences. The relationship of this form of kernel to schemes such as the GMM mean-Supervector kernel, commonly used in speaker verification, will be discussed. In addition, how these kernels and associated feature-spaces can be used for speech recognition and how they can handle speaker and environment changes will be looked at.

Speaker Biography

Mark Gales studied for the B.A. in Electrical and Information Sciences at the University of Cambridge from 1985-88. Following graduation he worked as a consultant at Roke Manor Research Ltd. In 1991 he took up a position as a Research Associate in the Speech Vision and Robotics group in the Engineering Department at Cambridge University. In 1995 he completed his doctoral thesis: Model-Based Techniques for Robust Speech Recognition supervised by Professor Steve Young. From 1995-1997 he was a Research Fellow at Emmanuel College Cambridge. He was then a Research Staff Member in the Speech group at the IBM T.J.Watson Research Center until 1999 when he returned to Cambridge University Engineering Department as a University Lecturer. He is currently a Reader in Information Engineering and a Fellow of Emmanuel College. Mark Gales is a Senior Member of the IEEE and was a member of the Speech Technical Committee from 2001-2004. He is currently an associate editor for IEEE Signal Processing Letters. Mark Gales was awarded a 1997 IEEE Young Author Paper Award for his paper on Parallel Model Combination and a 2002 IEEE Paper Award for his paper on Semi-Tied Covariance Matrices.

July 1, 2009 | 10:30AM

“How to Make a Billion Dollars: A Guide to Large-Economic-Scale Innovation”

Eric Brill, Microsoft

[abstract] [biography]

Abstract

It’s easy to have a $50 idea. Innovation at a scale large enough to be material to a big company like Microsoft is a completely different story. I’ll discuss some of the interesting challenges and opportunities innovating at the scale of a billion dollars, and how different types of innovation play a role in driving financial value. I’ll also share other things I’ve learned in my decade at Microsoft, including: tips to young scientists on how to have a great life-long career in a corporate setting, and how to make basic research much more valuable/impactful/profitable.

Speaker Biography

Eric Brill has spent the last 10 years working at Microsoft. He spent 9 years in Microsoft Research, running a research lab that focuses primarily on machine learning and data mining techniques for search and online advertising. Last year, he moved to the AdCenter product group to run a multi-national applied research lab called AdCenter Labs. Recently, he moved to the AdCenter Garage, a small group working on creating, prototyping and deploying game changer innovations. Prior to Microsoft, Eric spent 5 wonderful years as a faculty member at Johns Hopkins, in the Department of Computer Science and Center for Language and Speech Processing.

June 24, 2009 | 10:30AM

“Looking Behind Verb Classes”

Beth Levin, Stanford

[abstract] [biography]

Abstract

Fillmore's study "The Grammar of Hitting and Breaking" demonstrated the importance of semantically coherent verb classes as descriptive devices for understanding the organization of the verb lexicon and for capturing patterns of shared verb behavior. Much subsequent work has confirmed and extended the findings of this study. For example, in my 1993 book "English Verb Classes and Alternations", verbs are essentially classified in two ways: according to their semantic content (e.g., verbs of manner of motion, verbs of sound emission) and according to their participation in particular argument alternations (e.g., dative alternation, causative alternation). The first approach yields a fairly fine-grained semantic classification, while the second yields a coarser-grained classification, which appears to have more grammatical relevance than the first. This and other work suggests that verb classes should not be taken as primitive, as they have sometimes been. This position is reinforced by the considerable number of verbs which show complex patterns of behavior that have been handled by positing multiple semantic class membership. Previous studies of verb classes, then, raise important questions: What are the most useful dimensions for classifying verbs? What is the appropriate grain size for the description of verb classes? What determines whether a given verb shows multiple class membership? In this talk I ask what is behind verb classes that makes them so appealing as a research tool, yet explains their limitations. I show that many phenomena falling under the label "verb class" can be understood in the context of three levels of linguistic description and the relations between them: (i) the meaning lexicalized by the verb itself (its "root"), (ii) the set of event schemas, and (iii) the morphosyntactic devices that languages make available for the realization of arguments (e.g., grammatical relations, case markers, serial verb constructions). Each provides a way of grouping verbs into classes that can be helpful for certain facets of both language-specific and crosslinguistic studies.

Speaker Biography

Beth Levin is the William H. Bonsall Professor in the Humanities at Stanford University. After receiving her Ph.D. from MIT, she had major responsibility for the MIT Lexicon Project and taught at Northwestern University. Her work investigates the semantic representation of events and the morphosyntactic devices English and other languages use to express events and their participants. Her publications include English Verb Classes and Alternations: A Preliminary Investigation (1993) and with Malka Rappaport Hovav, Argument Realization (2005) and Unaccusativity: At the Syntax-Lexical Semantics Interface (1995)

April 28, 2009 | 4:30PM

“On Representing Acoustics of Speech for Speech Processing”   Video Available

Bishnu Atal, University of Washington

[abstract] [biography]

Abstract

Proper representation of the acoustic speech signal is crucial for almost every speech processing application. We often use short-time Fourier transform to convert the time-domain speech waveform to a new signal that is a function of both time and frequency by applying a moving time window of about 20 ms in duration. There are many issues, such as the size and shape of the window, that remain unresolved. The use of a relatively short window is widespread. In early development of the sound spectrograph, both narrow and wideband analysis were used, but the narrow-band analysis faded away. In digital speech coding applications (multipulse and code-excited linear prediction), high-quality speech is produced at low bit rates only when prediction using both short and long intervals is used. Recently Hermansky and others have argued that speech window for automatic speech recognition should be long, perhaps extending to as much as 1 s. What are the issues that arise in using a short or a long window? What are the relative advantages or disadvantages? In this talk, we will discuss these topics and present results that suggest that a short-time Fourier transform using long windows has advantages. In most speech representations, the Fourier components are not used directly but converted to their magnitude spectrum; the so-called phase is considered to be irrelevant. There are open questions regarding the use of phase information and we will discuss this important issue in the talk.

Speaker Biography

Bishnu S. Atal is an Affiliate Professor in the Electrical Engineering Department at the University of Washington, Seattle, WA. He retired in March 2002 after working for more than 40 years at Lucent Bell Labs, and AT&T Labs. He was a Technical Director at the AT&T Shannon Laboratory, Florham Park, New Jersey, from 1997 where he was engaged in research in speech coding and in automatic speech recognition. He joined the technical staff of AT&T Bell Laboratories in 1961, became head of Acoustics Research Department in 1985, and head of Speech Research Department in 1990. He is internationally recognized for his many contributions to speech analysis, synthesis, and coding. His pioneering work in linear predictive coding of speech established linear prediction as one of the most important speech analysis technique leading to many applications in coding, recognition and synthesis of speech. His research work is documented in over 90 technical papers and he holds 17 U.S. and numerous international patents in speech processing. He was elected to the National Academy of Engineering in 1987 and to the National Academy of Sciences in 1993. He is a Fellow of the Acoustical Society of America and the IEEE. He received the IEEE Morris N. Liebmann Memorial Field Award in 1986, the Thomas Edison Patent Award from the R&D Council of New Jersey in 1994, New Jersey Inventors Hall of Fame Inventor of the Year Award in 2000 and the Benjamin Franklin Medal in Electrical Engineering in 2003. Bishnu and his wife, Kamla, reside in Mukilteo, Washington. They have two daughters, Alka and Namita, two granddaughters, Jyotica and Sonali and two grandsons, Ananth and Niguel.

April 21, 2009 | 4:30PM

“Integrating Evidence Over Time: A Look at Conditional Models for Speech and Audio Processing”   Video Available

Eric Fosler-Lussier, Ohio State University

[abstract] [biography]

Abstract

Many acoustic events, particularly those associated with speech events, can be thought of as events in a rich descriptive subspace where the dimensions of the subspace can be thought of as a sort of decomposition of the original event space. In phonetic terms, we can think of how phonological features can be integrated to determine phonetic identity; for auditory scene analysis we can look how features like harmonic energy and cross-channel correlation come together to determine whether a particular frequency corresponds to target speech versus background noise. Some success has been achieved by thinking of these problems as probabilistic detection of acoustic (sub-)events. However, event detectors are typically local in nature, and need to be smoothed out by looking at neighboring events in time. In this talk, I describe current work in the Speech and Language Technologies Lab at OSU where we are looking at Conditional Random Fields models for both automatic speech recognition and computational auditory scene analysis problems. The talk will explore some of the successes and limitations of this log-linear method which integrates local evidence over time sequences. Joint work with Jeremy Morris, Ilana Heintz, Rohit Prabhavalkar, Zhaozhang Jin.

Speaker Biography

Eric Fosler-Lussier is currently an Assistant Professor of Computer Science and Engineering, with an adjunct appointment in Linguistics, at the Ohio State University. He received his Ph.D. in 1999 from the University of California, Berkeley, performing his dissertation research at the International Computer Science Institute under the tutelage of Prof. Nelson Morgan. He has also been a Member of Technical Staff at Bell Labs, Lucent Technologies, and a Visiting Researcher at Columbia University. He is generally interested in integrating linguistic insights as priors in statistical learning systems.

April 14, 2009 | 4:30PM

“Inducing Synchronous Grammars for Machine Translation”   Video Available

Phil Blunsom, University of Edinburgh, UK

[abstract] [biography]

Abstract

In this talk I'll outline current work at the University of Edinburgh to model machine translation (MT) as a probabilistic machine learning problem. Although MT systems have made large gains in translation quality in recent years, most are currently induced using a hand engineered pipeline of disparate models linked by heuristics. Although such techniques are effective for translating between related languages, they fail to capture the latent structure necessary to translate between languages which diverge significantly in word order, such as Chinese and English. I'll present a non-parametric Bayesian model for inducing synchronous context free grammars capable of learning the latent structure of translation equivalence from a corpus of parallel string pairs. I'll discuss the efficacy of both variational Bayes and Gibbs sampling inference procedures for this model and present experiments demonstrating competitive results on full scale translation evaluations.

Speaker Biography

Phil Blunsom is a Research Fellow in the Institute for Communicating and Collaborative Systems at the University of Edinburgh. He completed his PhD at the University of Melbourne in 2007. His current research interests focus upon the application of machine learning to complex structured problems in language processing, such as machine translation, language modelling, parsing and grammar induction.

April 7, 2009 | 4:30PM

“The Neural Control of Speech”   Video Available

Frank Guenther, Boston University

[abstract] [biography]

Abstract

Speech production involves coordinated processing in many regions of the brain. To better understand these processes, our laboratory has designed, tested, and refined a neural network model whose components correspond to brain regions involved in speech. Babbling and imitation phases are used to train neural mappings between phonological, articulatory, auditory, and somatosensory representations. After learning, the model can produce syllables and words it has learned by commanding movements of an articulatory synthesizer. Because the model’s components correspond to neurons and are given precise anatomical locations, activity in the model’s cells can be compared to neuroimaging data. Computer simulations of the model account for a wide range of experimental findings, including data on acquisition of speaking skills, articulatory kinematics, and brain activity during speech. "Impaired" versions of the model are being used to investigate several communication disorders, and the model has been used to guide development of a neural prosthesis aimed at restoring speech output to profoundly paralyzed individuals.

Speaker Biography

Frank Guenther, Professor of Cognitive and Neural Systems at Boston University, is a computational and cognitive neuroscientist specializing in speech and motor control. He received an MS in Electrical Engineering from Princeton University in 1987 and PhD in Cognitive and Neural Systems from Boston University in 1993. He is also a faculty member in the Harvard University/MIT Speech and Hearing Bioscience and Technology Program and a research affiliate at Massachusetts General Hospital. His research combines theoretical modeling with behavioral and neuroimaging experiments to characterize the neural computations underlying speech and language. He is also involved in the development of speech prostheses that utilize brain-computer interfaces to restore synthetic speech to paralyzed individuals.

March 31, 2009 | 4:30PM

“Neural Dynamics of Attentive Object Recognition, Scene Understanding, and Decision Making”   Video Available

Stephen Grossberg, Boston University

[abstract]

Abstract

This talk describes three recent models of how the brain visually understands the world. The models use hierarchical and parallel processes within and across the What and Where cortical streams to accumulate information that cannot in principle be fully computed at a single processing stage. The models hereby raise basic questions about the functional brain units that are selected by the evolutionary process, and challenge all models that use non-local information to explain vision. The ARTSCAN model (Fazl, Grossberg, & Mingolla, 2008, Cognitive Psychology) clarifies the following issues: What is an object? How does the brain learn to bind multiple views of an object into a view-invariant object category, during both unsupervised and supervised learning, while scanning its various parts with active eye movements? In particular, how does the brain avoid the problem of erroneously classifying views of different objects as belonging to a single object, and how does the brain direct the eyes to explore an object's surface even before it has a concept of the object? How does the brain coordinate object and spatial attention during object learning and recognition? ARTSCAN proposes an answer to these questions by modeling interactions between cortical areas V1, V2, V3A, V4, ITp, ITa, PPC, LIP, and PFC. The ARTSCENE model (Grossberg & Huang, 2008, Journal of Vision) also uses attentional shrouds. It clarifies the following issues: How do humans rapidly recognize a scene? How can neural models capture this biological competence to achieve state-of-the-art scene classification? ARTSCENE classifies natural scene photographs better than competing models by using multiple spatial scales to efficiently accumulate evidence for gist and texture. The model can incrementally learn and rapidly predict scene identity by gist information alone (defining gist computatationally along the way), and then accumulate learned evidence from scenic textures to refine this hypothesis. The MODE model (Grossberg & Pilly, 2008, Vision Research) clarifies the following basic issue: How does the brain make decisions? Speed and accuracy of perceptual decisions covary with certainty in the input, and correlate with the rate of evidence accumulation in parietal and frontal cortical "decision neurons." MODE models interactions within and between Retina/LGN and cortical areas V1, MT, MST, and LIP, gated by basal ganglia, to simulate dynamic properties of decision-making in response to ambiguous visual motion stimuli used by Newsome, Shadlen, and colleagues in their neurophysiological experiments. The model shows how the brain can carry out probabilistic decisions without using Bayesian mechanisms.

March 24, 2009 | 4:30PM

“Values and Patterns”   Video Available

Alon Orlitsky, University of California, San Diego

[abstract] [biography]

Abstract

Via four applications: distribution modeling, probability estimation, data compression, and classification, we argue that when learning from data, discrete values should be ignored except for just their appearance-order pattern. Along the way, we encounter Laplace, Good, Turing, Hardy, Ramanujan, Fisher, Shakespeare, and Shannon. The talk is self contained and based on work with P. Santhanam, K. Viswanathan, J. Zhang, and others.

Speaker Biography

Alon Orlitsky received B.Sc. degrees in Mathematics and Electrical Engineering from Ben Gurion University in 1980 and 1981, and M.Sc.and Ph.D. degrees in Electrical Engineering from Stanford University in 1982 and 1986. From 1986 to 1996 he was with the Communications Analysis Research Department of Bell Laboratories. He spent the following year as a quantitative analyst at D.E. Shaw and Company, an investment firm in New York city. In 1997 he joined the University of California, San Diego, where he is currently a professor of Electrical and Computer Engineering and of Computer Science and Engineering, and directs the Information Theory and Applications Center. Alon's research concerns information theory, statistical modeling, machine learning, and speech recognition. He is a recipient of the 1981 ITT International Fellowship and the 1992 IEEE W.R.G. Baker Paper Award, a co-recipient of the 2006 Information Theory Society Paper Award, a fellow of the IEEE, and holds the Qucalcomm Chair for Information Theory and its Applications at UCSD.

March 10, 2009 | 4:30PM

“EBW as a General, Consistent Framework for Parameter Estimation”   Video Available

Dimitri Kanevsky, IBM T.J.Watson Research Center

[abstract] [biography]

Abstract

Several optimization techniques are vastly used today in the speech and language community for estimating model parameters. The Extended Baum-Welch (EBW) is one such technique that is extensively used for estimating the parameters of Gaussian mixture models based on a discriminative criteria (like Maximum Mutual Information). In this talk, we present EBW as a consistent, theoretical framework for parameter estimation and show how other common parameter estimation techniques (for example, based on Constrained Line Search) belong to this family of model update rules. We introduce a general family of parameter updates that generalizes a Baum-Welch recursive process to an arbitrary objective function of Gaussian Mixture Models or Poisson Processes. In the second part of this talk we introduce an extension of the EBW for estimating sparse signals from a sequence of noisy observations. As part of this, the underlining EBW algorithms are compared with recently introduced Kalman filtering-based compressed sensing methods. This is joint work with Avishy Carmi, David Nahamoo, Bhuvana Ramabhadran and Tara Sainath.

Speaker Biography

Dimitri Kanevsky is a research staff member in the Speech and Language algorithms department at IBM T.J.Watson Research Center. Prior to joining IBM, he worked at a number of prestigious centers for higher mathematics, including Max Planck Institute in Germany and the Institute for Advanced Studies in Princeton. At IBM he has been responsible for developing the first Russian automatic speech recognition system, as well as key projects for embedding speech recognition in automobiles and broadcast transcription systems. He currently holds 110 US patents and received a Master Inventor title at IBM. His conversational biometrics based security patent was recognized by MIT, Technology Review, as one of five most influential patents for 2003 and his work on Extended Baum-Welch algorithm in speech was recognized as 2002 science accomplishment by the Director of Research at IBM.

March 3, 2009 | 4:30PM

“Domain Adaptation in Natural Language Processing”

Hal Daume, University of Utah

[abstract] [biography]

Abstract

Supervised learning technology has led to systems for part of speech tagging, parsing, named entity recognition with accuracies in the high 90%s. Unfortunately, the performance of these systems degrades drastically when they are applied on text outside their training domain (typically, newswire). Machine translation systems work fantastically for translating Parliamentary proceedings, but fall down when applied to alternate domains. I'll discuss research that aims to understand what goes wrong when models are applied outside their domain, and some (partial) solutions to this problem. I'll focus on named entity recognition and machine translation tasks, where we'll see a range of different sources of error (some of which are quite counter-intuitive!).

Speaker Biography

Hal Daume is an assistant professor in the School of Computing at the University of Utah. His primary research interests are in Bayesian learning, structured prediction and domain adaptation (with a focus on problems in language and biology). He earned his PhD at the University of Southern Californian with a thesis on structured prediction for language (his advisor was Daniel Marcu). He spent the summer of 2003 working with Eric Brill in the machine learning and applied statistics group at Microsoft Research. Prior to that, he studied math (mostly logic) at Carnegie Mellon University. He still likes math and doesn't like to use C (instead he uses O'Caml or Haskell). He doesn't like shoes, but does like activities that are hard on your feet: skiing, badminton, Aikido and rock climbing.

February 24, 2009 | 4:30PM

“Music Fingerprinting Using Finite State Transducers, a Novel Application of FST's”   Video Available

Pedro Moreno, Google

[abstract] [biography]

Abstract

Over the last years finite state transducer technology has found its way in many speech applications, from text processing for synthesizers to the core search algorithms used in speech recognition systems. In this talk we present a novel application of finite state transducers and acoustic modeling techniques to the problem of music fingerprinting*. We will show how the power of FST's can be applied to this problem with great results. In the talk I will also give an overview of current activities in google speech team. * This is joint work with Prof. Mehriar Mohri and Eugene Weinstein at NYU.

Speaker Biography

Pedro J. Moreno is a research scientist at Google Inc. working in the New York office. His research interests are speech and multimedia indexing and retrieval, speech and speaker recognition and applications of machine learning. He received a Ph.D. in electrical and computer engineering from Carnegie Mellon University.

February 17, 2009 | 4:30PM

“Coarse-To-Fine Models for Natural Language Processing”   Video Available

Dan Klein, University of California, Berkeley

[abstract] [biography]

Abstract

State-of-the-art NLP models are anything but compact. Parsers have huge grammars, machine translation systems have huge transfer tables, and so on across a range of tasks. With such complexity comes two challenges. First, how can we learn highly complex models? Second, how can we efficiently infer optimal structures within them? Hierarchical coarse-to-fine (CTF) methods address both questions. CTF approaches exploit sequences of models which introduce complexity gradually. At the top of the sequence is a trivial model in which learning and inference are both cheap. Each subsequent model refines the previous one, until a final, full-complexity model is reached. Because each refinement introduces only limited complexity, both learning and inference can be done in an incremental fashion. In this talk, I describe several coarse-to-fine NLP systems. In the domain of syntactic parsing, complexity comes from the grammar. I present a latent-variable approach which begins with an X-bar grammar and learns by iteratively splitting grammar symbols. For example, noun phrases might be split into subjects and objects, singular and plural, and so on. This splitting process admits an efficient incremental inference scheme which reduces parsing times by orders of magnitude. I also present a multiscale variant which splits grammar rules rather than grammar symbols. In the multiscale approach, complexity need not be uniform across the entire grammar, providing orders of magnitude of space savings. These approaches produce the best parsing accuracies in a variety of languages, in a fully language-general fashion. In the domain of syntactic machine translation, complexity arises from both the translation model and the language model. In short, there are too many transfer rules and too many target language word types. To manage the translation model, we compute minimizations which drop rules that have high computational cost but low importance. To manage the language model, we translate into target language clusterings of increasing vocabulary size. These approaches give dramatic speed-ups, while actually increasing final translation quality.

Speaker Biography

Dan Klein is an assistant professor of computer science at the University of California, Berkeley (PhD Stanford, MSt Oxford, BA Cornell). His research focuses on statistical natural language processing, including unsupervised learning methods, syntactic parsing, information extraction, and machine translation. Academic honors include a Marshall Fellowship, a Microsoft New Faculty Fellowship, the ACM Grace Murray Hopper award, and best paper awards at the ACL, NAACL, and EMNLP conferences.

February 10, 2009 | 4:30PM

“From Text to Knowledge via Markov Logic”

Pedro Domingos, University of Washington

[abstract] [biography]

Abstract

Language understanding is hard because it requires a lot of knowledge. However, the only cost-effective way to acquire a lot of knowledge is by extracting it from text. The best (only?) hope for solving this "chicken and egg" problem is bootstrapping: start with a small knowledge base, use it to process some text, add the extracted knowledge to the KB, process more text, etc. Doing this requires a modeling language that can incorporate noisy knowledge and seamlessly combine it with statistical NLP algorithms. Markov logic accomplishes this by attaching weights to first-order formulas and viewing them as templates for features of Markov random fields. In this talk, I will describe some of the main inference and learning algorithms for Markov logic, and the progress we have made so far in applying them to NLP. For example, we have developed a system for unsupervised coreference resolution that outperforms state-of-the-art supervised ones on MUC and ACE benchmarks.

Speaker Biography

Pedro Domingos is Associate Professor of Computer Science and Engineering at the University of Washington. His research interests are in artificial intelligence, machine learning and data mining. He received a PhD in Information and Computer Science from the University of California at Irvine, and is the author or co-author of over 150 technical publications. He is a member of the advisory board of JAIR, a member of the editorial board of the Machine Learning journal, and a co-founder of the International Machine Learning Society. He was program co-chair of KDD-2003, and has served on numerous program committees. He has received several awards, including a Sloan Fellowship, an NSF CAREER Award, a Fulbright Scholarship, an IBM Faculty Award, and best paper awards at KDD-98, KDD-99 and PKDD-2005.

February 3, 2009 | 4:30PM

“When is a Translation not a Translation?”

Martin Kay, Stanford University

[abstract] [biography]

Abstract

A translation is generally taken to be a text that expresses the same meaning as another text in a different language. But the products of the best translators reflects a different, if more illusive, goal. I will seek a somewhat more adequate characterization of translation as it is actually practiced and discuss its consequences for machine translation.

Speaker Biography

Martin Kay is a professor of linguistics and computer science at Stanford University. For many years, he was also a research fellow at the Xerox Palo Alto Research Center. He made a number of fundamental contributions to computational linguistics, including chart parsing, unification grammar, and applications of finite-state technology, notably in phonology. He has been an intermittent worker on, and skeptical observer of, machine translation since 1958.

January 27, 2009 | 4:30PM

“Modeling Bottom-Up and Top-Down Visual Attention in Humans and Monkeys”   Video Available

Laurent Itti, Department of Computer Science and Neuroscience Graduate Program, University of Southern California

[abstract] [biography]

Abstract

Visual processing of complex natural environments requires animals to combine, in a highly dynamic and adaptive manner, sensory signals that originate from the environment (bottom-up) with behavioral goals and priorities dictated by the task at hand (top-down). Together, bottom-up and top-down influences combine to serve the many tasks which require that we direct attention to the most ''relevant'' entities in our visual environment. While much progress has been made in investigating experimentally how humans and other primates may operate such goal-based attentional selection, very little is understood of the general mathematical principles and neuro-computational architectures that subserve the observed behavior. I will describe recent computational work which attacks the problem of developing models of visual attentional selection that are more flexible and can be strongly modulated by the task at hand. I will back the proposed architectures up by comparing their predictions to behavioral recordings from humans and monkeys. I will show examples of applications of these models to real-world vision challenges, using complex stimuli from television programs or modern immersive video games.

Speaker Biography

Dr. Laurent Itti received his M.S. in Electrical Engineering with a specialization in Image Proc-essing from the Ecole Nationale Superieure des Telecommunications, Paris, France, in 1994. He received his Ph.D. in Computation and Neural Systems from the California Institute of Technology, Pasadena, California, in 2000. He has since then been an Assistant (2000-2006) and Associate (2006-present) professor of Computer Science and voting faculty member of the cross-disciplinary Neuroscience Graduate Program at the University of Southern California (USC), Los Angeles, California. Dr. Itti has authored over 90 peer-reviewed publications in journals, books, and top-ranked conferences. Dr. Itti teaches Artificial Intelligence, Brain Theory and Neural Networks, Introduction to Robotics, Visual Processing, Neuroscience Core Course, Neural Basis for Visually Guided Behavior, and Computational Architectures in Biological Vision. Dr. Itti's laboratory comprises 15 students, postdocs and engineers, and is recipient of grants by the National Science Foundation, DARPA, the National Geospatial-Intelligence Agency, the Human Frontier Science Program (HFSP), the Office of Naval Research, the Army Research Office, and the National Institutes of Health. Dr. Itti has been distinguished through a number of awards, including the 2008 Okawa Foundation Research Award, being one of the 16 nationally selected speak-ers at the 2007 National Academy of Engineering's Frontiers of Engineering Symposium, and serving on Program Committees for several conferences by IEEE.

Back to Top

2008

December 2, 2008 | 4:30PM

“Temporal primitives in auditory cognition and speech perception”   Video Available

David Poeppel, University of Maryland

[abstract] [biography]

Abstract

Generating usable internal representations of speech input requires, among other operations, fractionating the signals into temporal units/chunks of the appropriate granularity. Adopting (and adapting) Marr's (1982) approach to vision, a perspective is outlined that formulates linking hypotheses between specific neurobiological mechanisms (for example cortical oscillations and phase-locking) and the representations that underlie auditory cognition (for example syllables). Focusing on the implementational and algorithmic levels of description, I argue that the perception of sound patterns requires a multi-time resolution analysis. In particular, recent experimental data from psychophysics, MEG (Luo & Poeppel 2007), and concurrent EEG/fMRI (Giraud et al. 2007) suggest that there exist two privileged time scales that form the basis for constructing elementary auditory percepts. These 'temporal primitives' permit the construction of the internal representations that mediate the analysis of speech and other acoustic signals.

Speaker Biography

David Poeppel is Professor in the Department of Biology, the Department of Linguistics, and the Neuroscience and Cognitive Science Program at the University of Maryland College Park and the Department of Psychology at New York University. Trained in neurophysiology, cognitive science, and cognitive neuroscience at MIT and UCSF, his lab is focused on the cognitive neuroscience of hearing, speech, and language. Although the lab uses all kinds of techniques, one principal methodology is magnetoencephalography (MEG).

November 25, 2008 | 4:30PM

“Are Linear Models Right for Language?”   Video Available

Fernando Pereira, Google

[abstract]

Abstract

Over the last decade, linear models have become the standard machine learning approach for supervised classification, ranking, and structured prediction natural language processing. They can handle very high-dimensional problem representations, they are easy to set up and use, and they extend naturally to complex structured problems. But there is something unsatisfying in this work. The geometric intuitions behind linear models were developed with low-dimensional, continuous problems, while natural language problems involve very high dimension, discrete representations with long tailed distributions. Do the orignal intuitions carry over? In particular, do standard regularization methods make any sense for language problems? I will give recent experimental evidence that there is much to do in making linear model learning more suited to the statistics of language.

November 18, 2008 | 4:30PM

“Acoustic Scene Analysis, Complex Modulations, and a New Form of Filtering”   Video Available

Les Atlas, University of Washington

[abstract] [biography]

Abstract

Be it in a restaurant or other reverberant and noisy environment, normal hearing listeners segregate multiple sources, usually strongly overlapping in frequency, well beyond capabilities expected by current beamforming approaches. What is it that we can learn from this common observation? As is now commonly accepted, the differing dynamical modulation patterns of the sources are key to these powers of separation. But until recently, the theoretical underpinnings for the notion of dynamical modulation patterns have been lacking. We have taken a previously loosely defined concept, called "modulation frequency analysis," and developed a theory which allows for distortion-free separation (filtering) of multiple sound sources with differing dynamics. A key result is that previous assumptions of non-negative and real modulation are not sufficient and, instead, coherent separation approaches are needed to separate different modulation patterns. These results may have an impact in separation and representation of multiple simultaneous sound streams for speech, audio, hearing loss treatment, and underwater acoustic applications. This research also suggests exciting new and potentially important open theoretical questions for general nonstationary signal representations, extending beyond acoustic applications and potentially impacting other areas of engineering and physics.

Speaker Biography

Les Atlas received his M.S. and Ph.D. degrees in Electrical Engineering from Stanford University in 1979 and 1984, respectively. He joined the University of Washington in 1984, where he is currently a Professor of Electrical Engineering. His research is in digital signal processing, with specializations in acoustic analysis, time-frequency representations, and signal recognition and coding. Professor Atlas received a National Science Foundation Presidential Young Investigator Award and a 2004 Fulbright Senior Research Scholar Award. He was General Chair of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, Chair of the IEEE Signal Processing Society Technical Committee on Theory and Methods, and a member-at-large of the Signal Processing Society's Board of Governors. He is a Fellow of the IEEE "for contributions to time-varying spectral analysis and acoustical signal processing."

November 11, 2008 | 4:30PM

“Broadening statistical machine translation with comparable corpora and generalized models”   Video Available

Chris Quirk, Microsoft

[abstract]

Abstract

As we scale statistical machine translation systems to general domain, we face many challenges. This talk outlines two approaches for building better broad-domain systems. First, progress in data-driven translation is limited by the availability of parallel data. A promising strategy for mitigating data scarcity is to mine parallel data from comparable corpora. Although comparable corpora seldom contain parallel sentences, they often contain parallel words or phrases. Recent fragment extraction approaches have shown that including parallel fragments in SMT training data can significantly improve translation quality. We describe efficient and effective generative models for extracting fragments, and demonstrate that these algorithms produce substantial improvements on out-of-domain test data without suffering in-domain degradation. Second, many modern SMT systems are very heavily lexicalized. While such information excels on in-domain test data, quality falls off as the test data broadens. This next section of the talk describes robust generalized models that leverage lexicalization when available, and back off to linguistic generalizations otherwise. Such an approach results in large improvements over baseline phrasal systems when using broad domain test sets.

November 5, 2008 | 4:30PM

“Densities of Excitation and ISI for Cortical Neurons via Laplace Transforms”

Toby Berger, University of Virginia

[abstract] [biography]

Abstract

For a canonical primary cortical neuron which we call N. we introduce a mathematically tractable and neuroscientifically meaningful model of how N stochastically converts the excitation intensities it receives from the union of all the neurons in its afferent cohort into the durations of the intervals between its efferent spikes. We assume that N operates to maximize the ratio of the information that its interspike interval (ISI) durations convey about the history of its afferent excitation intensity per joule of energy N expends to produce and propagate its spikes. We use calculus of variations and Laplace transforms to determine the probability density functions (pdf's) of said excitation intensities and of said ISI durations. The mathematically derived pdf of the ISI durations is in good agreement with experimental observations. Moreover, the derived pdf of the afferent excitation intensity vanishes below a strictly positive level, which also accords with experimental observations. It is felt that our results argue persuasively that primary cortical neurons employ interspike interval codes (i.e., timing codes as opposed to rate oodes).

Speaker Biography

Toby Berger was born in New York, NY on September 4, 1940. He received the B.E. degree in electrical engineering from Yale University, New Haven, CT in 1962, and the M.S. and Ph.D. degrees in applied mathematics from Harvard University, Cambridge, MA in 1964 and 1966. From 1962 to 1968 he was a Senior Scientist at Raytheon Company, Wayland, MA, specializing in communication theory, information theory, and coherent signal processing. From 1968 through 2005 he was a faculty member at Cornell University, Ithaca, NY where he held the position of Irwin and Joan Jacobs Professor of Engineering. In 2006 he became a professor in the ECE Deportment of the University of Virginia, Charlottesville, VA. Professor Berger's research interests include information theory, random fields, communication networks, wireless communications, video compression, voice and signature compression and verification, neuroinformation theory, quantum information theory, and coherent signal processing. He is the author of Rate Distortion Theory: A Mathematical Basis for Data Compression and a co-author of Digital Compression for Multimedia: Principles and Standards, and Information Measures for Discrete Random Fields. Berger has served as editor-in-chief of the IEEE Transactions on Information Theory and as president of the IEEE Information Theory Group. He has been a Fellow of the Guggenheim Foundation, the Japan Society for Promotion of Science, the Ministry of Education of the People's Republic of China and the Fulbright Foundation. He received the 1982 Frederick E. Terman Award of the American Society for Engineering Education, the 2002 Shannon Award from the IEEE Information Theory Society and the IEEE 2006 Leon K. Kirchmayer Graduate Teaching Award. Professor Berger is a Fellow and Life Member of the IEEE, a life member of Tau Beta Pi, a member of the National Academy of Engineering, and an avid amateur blues harmonica player.

October 30, 2008 | 4:30PM

“Active Learning with SVMs for Imbalanced Datasets and a Stopping Criterion Based on Stabilizing Predictions”

Michael Bloodgood

[abstract] [biography]

Abstract

The use of Active Learning (AL) to reduce NLP annotation costs has recently generated considerable interest. There has also been considerable interest in dealing effectively with the class imbalance that NLP problems so often give rise to. Additionally, the use of Support Vector Machines (SVMs) for NLP has become widespread. After explaining relevant background and motivation, I will discuss how to effectively address class imbalance during AL-SVM (AL with SVMs). In particular, I will discuss how to adapt passive learning techniques in order to effectively use asymmetric costs during AL-SVM. In order to realize the performance gains enabled by a strong AL algorithm, an effective stopping criterion is critical. Therefore, I will also present a new stopping criterion based on stabilizing predictions. An evaluation of the proposed techniques will be reported for several Information Extraction and Text Classification tasks.

Speaker Biography

Michael Bloodgood is a PhD candidate in the Department of Computer and Information Sciences at the University of Delaware. His thesis research deals with Active Learning with Support Vector Machines to reduce NLP annotation costs. More generally, he is interested in reducing training data annotation burdens via active, transfer, semi-supervised, and unsupervised learning techniques. In addition to his thesis work, Michael has worked on anaphora analysis (at U. of Delaware and at Palo Alto Research Center (PARC)), rapidly adapting POS taggers to new domains (at U. of Delaware), and discriminative training for statistical syntax-based machine translation (at USC/ISI). Michael earned his MS in Computer Science from the University of Delaware and a BS in Computer Science and in Information Systems Management from The College of New Jersey.

October 28, 2008 | 4:30PM

“Direct Models and their application to Word-Alignment and Machine-Translation”   Video Available

Abe Ittycheriah, IBM

[abstract] [biography]

Abstract

I'll present our work at IBM on word-alignment algorithms trained using supervised corpora. Also, I'll demonstrate how improved alignments required changes in machine translation and then present the direct translation model. This work is primarily focused on Arabic to English. I'll review some of the changes since our published papers in both word alignment and machine translation.

Speaker Biography

Abraham Ittycheriah works as a Research Staff Member in the Natural Language System group at the IBM T.J. Watson Research Lab in Yorktown Heights, NY. Over the last four years, his primary focus has been on machine translation and word alignment between Arabic and English. He is also responsible for the Statistical Machine Translation engine used in several government projects. Prior to this assignment, at IBM he has worked on Question Answering and Telephone speech recognition algorithms and interfaces. He obtained his PhD from Rutgers, The State University of New Jersey in 2001.

October 21, 2008 | 4:30PM

“OCP-Place, Similarity, and Multiplicative Interaction”   Video Available

Colin Wilson, Johns Hopkins University

[abstract] [biography]

Abstract

Generative linguistics studies the variation across languages and the laws (or universals) that limit cross-linguistic variation. The OCP-Place constraint, which is violated by sequences of consonants that have different tokens of the same place of articulation, is a good candidate for a linguistic law (Konstantin & Segerer 2007). However, beginning with the introduction of OCP-Place by McCarthy (1988) (building on observations due to Greenberg 1950) many researchers have claimed that the specific form of the constraint varies considerably across languages. In essence, the purported variation centers on how similar two consonants of the same place must be with respect to other features in order for the constraint to register a violation. I argue in this talk that a single definition of similarity --- the natural classes similarity metric introduced by Frisch et al. (2004) --- is consistent with the effects of OCP-Place in the languages that have been studied, and possibly in all languages. Apparent counterexamples, and particularly the recent case study of Muna (Austronesian) by Coetzee & Pater (2008), are shown to be artifacts of an inconsistent statistical method. A multiplicative, or log-linear, model of constraint interaction is able to maintain a universal formulation of OCP-Place and derive apparent variation from independent constraints.

Speaker Biography

Colin Wilson received a Ph.D. in Cognitive Science from Johns Hopkins in 2000. He was a member of the Linguistics department at UCLA from 2000 to 2007, and rejoined the Cognitive Science department as Associate Professor this semester. His most recent work focuses on the typology, gradient interaction, and learning of natural language phonotactics.

October 14, 2008 | 4:30PM

“Unsupervised Training of an HMM-based Speech Recognizer for Topic Classification”   Video Available

Herb Gish, BBN Technologies

[abstract] [biography]

Abstract

We address the problem of performing topic classification of speech when no transcriptions from the speech corpus of interest are available. The approach we take is one of incremental learning about the speech corpus starting with adaptive segmentation of the speech, leading to the generation of discovered acoustic units and a segmental recognizer for these units, and finally to an initial tokenization of the speech for the training of a HMM speech recognizer. The recognizer trained is BBN's Byblos system. We discuss the performance of this system and also consider the case when a small amount of transcribed data is available.

Speaker Biography

Dr. Herbert Gish received a Ph.D. in Applied Mathematics from Harvard University in 1967. He is a Principal Scientist at BBN Technologies in Cambridge, Massachusetts in the Speech and Language Processing Department. His most recent work deals with information extraction from speech and text with a focus on problems that have very limited amounts of training data available.

October 7, 2008 | 4:30PM

“Predicting Syntax: Processing Dative Constructions in American and Australian Varieties of English”   Video Available

Joan Bresnan, Stanford University

[abstract] [biography]

Abstract

Traditionally, linguistic variation within different time scales has been the province of different disciplines, each with a distinctive suite of techniques for obtaining and analyzing data. For example, historical linguistics, sociolinguistics and corpus linguistics study variation between different speaker groups over historical time and across space, while psycholinguistics, phonetics, and computational speech recognition and synthesis study the dynamics of producing and comprehending language in the individual on a scale of milliseconds. Yet there is evidence that linguistic variation at these different time scales is linked, even in the domain of higher-level syntactic choices. This is a primary finding in the present study of dative constructions, illustrated by (1a,b), in Australian and American English.   1a) Who gave you that wonderful watch? (V NP NP) b) Who gave that wonderful watch to you? (V NP PP) We use a very accurate multilevel probabilistic model of corpus dative productions (Bresnan, Cueni, Nikitina, and Baayen 2007) to measure the predictive capacities of both American and Australian subjects in three pairs of parallel psycholinguistic experiments involving sentence ratings (Bresnan 2007), decision latencies during reading (Ford 1983), and sentence completion. The experimental items were all sampled together with their contexts from the database of corpus datives, stratified by corpus model probabilities. We find that the Australian subjects share with the American subjects a sensitivity to corpus probabilities. But they also show covarying differences, notably a stronger end-weight effect of the recipient in the ratings task and the absence of a dependency-length effect of the theme argument in the decision latency task (cf. Grodner and Gibson 2005). A unifying explanation for these differences is that decision latencies for `to' are reduced and naturalness ratings are increased when a PP is consistent with expectation. The Australian group would then be predicted to have a higher expectation of PP than the US group. This prediction is borne out by the sentence completion tasks, which showed that the Australians produced NP PP completions more than the American subjects in the same contexts. These findings suggest that subtle variations in the experiences of the dative construction by historically and spatially divergent speaker groups can create measurable differences in internalized expectations in individuals at the millisecond level. Bresnan, Joan, Anna Cueni, Tatiana Nikitina, and R. Harald Baayen. 2007. Predicting the dative alternation. In Cognitive Foundations of Interpretation, ed. by G. Boume, I. Kraemer, and J. Zwarts. Amsterdam: Royal Netherlands Academy of Science, pp. 69--94. Bresnan, Joan. 2007. Is syntactic knowledge probabilistic? Experiments with the English dative alternation. In Roots: Linguistics in search of its evidential base. Series: Studies in Generative Grammar_, ed. by Sam Featherston and Wolfgang Sternefeld, pp. 75--96. Berlin: Mouton de Gruyter. Ford. Marilyn. 1983. A method for obtaining measures of local parsing complexity throughout sentences. Journal of Verbal Learning and Verbal Behavior, 22: 203--218.

Speaker Biography

Joan Bresnan, Ph.D. MIT 1972, is Sadie Dernham Patek Professor in Humanities Emerita at Stanford University and a senior researcher at Stanford's Center for the Study of Language and Information, where she has established her Spoken Syntax Lab. A Fellow of the American Academy of Arts and Sciences and a Fellow and past president of the Linguistics Society of America, she is currently PI of the project "The Dynamics of Probabilistic Grammar" funded by the NSF program in Human Social Dynamics.

September 30, 2008 | 4:30PM

“Computational Detection of Novel Structural RNA in the Genome of the Malaria Parasite”   Video Available

Fernando Pineda, JHU School of Public Health

[abstract] [biography]

Abstract

Structural ribonucleic acid (RNA) molecules play an important role in regulating gene expression in organisms throughout the tree of life. The number of different classes of structural RNA, their possible mechanisms of action, their interaction partners, etc. are poorly understood. Here we consider the challenging computational problem of ab initio detection of novel structural RNA. Ab initio approaches are especially useful for RNA that is not well conserved across species. We focus on the genome of Plasmodium falciparum, where there is evidence that structural RNA plays a dominant role in regulating gene expression. P. falciparum is an important organism to understand since every year it is responsible for 300-500 million clinical cases of Malaria and around a million deaths, of which over 75% occur in African children under 5 years of age. The genome of an organism codes for its biochemical building blocks as well as its regulatory elements. The "language" used to represent information in the genomic sequence is about as "natural" as it gets and it is not clear what are the appropriate features one should use to detect novel structural RNA. After a brief introduction to the salient biology, we will describe a pragmatic and computationally intensive approach based on methods originally developed by others for detecting structural RNAs in very short viral genomes. We describe a pilot study demonstrating the feasibility of the approach, which also highlighted computational limitations, as well as the fact that the signals are deeply buried in noise. We will describe new algorithms that have allowed us to reduce the computational complexity, and probably increase the signal-to-noise, thereby allowing us to scale up this approach to a truly genome-wide level.

Speaker Biography

Dr. Fernando Pineda is Associate Professor of Molecular Microbiology and Immunology at the Johns Hopkins Bloomberg School of Public Health. Where he collaborates with laboratory-based colleagues to model biological systems. He also directs the High Performance Scientific Computing Core. He received his PhD in Theoretical Physics from the University of Maryland, College Park. He has served on the editorial boards of several journals including Neural Computation and IEEE Transactions on Neural Networks. Prior to joining the faculty at the school of Public Health, he was on the Principal Staff at the Johns Hopkins Applied Physics Laboratory. He has also worked at the Jet Propulsion Laboratory and the Harvard-Smithsonian Center for Astrophysics.

September 23, 2008 | 4:30PM

“Extracting and Applying Measurements of Tongue Images”   Video Available

Maureen Stone, University of Maryland

[abstract] [biography]

Abstract

This talk will review our work using several instrumental techniques that image the tongue. These techniques include ultrasound, cine-MRI, tagged-MRI, and DTI. The tongue is of interest because it is the major articulator in the production of speech; it has the most degrees of freedom. In addition it is an unusual structure as it is composed entirely of soft tissue and must move without benefit of bones or joints. This talk will present an overview of work done by us and our colleagues toward the understanding of tongue motor control, and applications of tongue imaging data to the development of a silent speech interface, a FEM of tongue motion, a study of aging in tongue motion, and a study of tongue motion after removal of cancerous tumors.

Speaker Biography

Dr. Maureen Stone measures and models tongue biomechanics and motor control using data from ultrasound and MRI. Dr. Stone is a Professor at the University of Maryland Dental School, and Director of the Vocal Tract Visualization Laboratory. She has written numerous articles on the multi-instrumental approach to studying vocal tract function. She is a Fellow of the Acoustical Society of America.

September 19, 2008 | 11:00AM

“Top-Down Auditory Processing and Why Yahoo Cares”   Video Available

Malcolm Slaney, Yahoo! Research Laboratory

[abstract] [biography]

Abstract

The world we live in is not nearly as clean and orderly as the training data sets of yesteryear. Our (acoustic) world is noisy, filled with unknown people and events. Mark Tilden, lead robotic designer for WowWee/Hasbro toys, said last July, "The cocktail party effect is costing me money." In this talk I would like to talk about the need for context and top-down considerations in auditory processing and models of auditory perception. I will demonstrate the need with many examples from visual and auditory perception, and show some directions for future research. I'll conclude with a short discussion of why Yahoo cares (basically because the Internet is full of really noisy data and we want to help people find and understand it.)

Speaker Biography

Malcolm Slaney is a principal scientist at Yahoo! Research Laboratory. He received his PhD from Purdue University for his work on computed imaging. He is a coauthor, with A. C. Kak, of the IEEE book "Principles of Computerized Tomographic Imaging." This book was recently republished by SIAM in their "Classics in Applied Mathematics" Series. He is coeditor, with Steven Greenberg, of the book "Computational Models of Auditory Function." Before Yahoo!, Dr. Slaney has worked at Bell Laboratory, Schlumberger Palo Alto Research, Apple Computer, Interval Research and IBM's Almaden Research Center. He is also a (consulting) Professor at Stanford's CCRMA where he organizes and teaches the Hearing Seminar. His research interests include auditory modeling and perception, multimedia analysis and synthesis, compressed-domain processing, music similarity and audio search, and machine learning. For the last several years he has lead the auditory group at the Telluride Neuromorphic Workshop.

September 16, 2008 | 4:30PM

“Online Large-Margin Training of Syntactic and Structural Translation Features”   Video Available

David Chiang, Information Sciences Institute

[abstract] [biography]

Abstract

Minimum-error-rate training (MERT) is a bottleneck for current development in statistical machine translation (MT) because it has difficulty estimating more than a dozen or two parameters. I will present two classes of features that address deficiencies in the Hiero hierarchical phrase-based translation model but cannot practically be trained using MERT. Instead, we use the MIRA algorithm, introduced by Crammer et al and previously applied to MT by Watanabe et al. Building on their work, we show that by parallel processing and utilizing more of the parse forest, we can obtain results using MIRA that match those of MERT in terms of both translation quality and computational requirements. We then test the method on the new features: first, simultaneously training a large number of Marton and Resnik's soft syntactic constraints, and, second, introducing a novel structural distortion model based on a large number of features. In both cases we obtain significant improvements in translation performance over the baseline. This talk represents joint work with Yuval Marton and Philip Resnik of the University of Maryland.

Speaker Biography

David Chiang is a Research Assistant Professor at the University of Southern California and a Computer Scientist at the USC Information Sciences Institute. He received an AB/SM in Computer Science from Harvard University in 1997, and a PhD in Computer and Information Science from the University of Pennsylvania in 2004. After a research fellowship at the University of Maryland Institute for Advanced Computer Studies, he joined the USC Information Sciences Institute in 2006, where he currently works on formal grammars for statistical machine translation.

September 9, 2008 | 12:00PM

“CoE Quarterly Technical Exchange”

[abstract]

Abstract

CoE Quarterly Technical Exchange

August 6, 2008 | 01:00 pm

“Learning Rules with Adaptor Grammars”

Mark Johnson, Brown University

[abstract]

Abstract

Nonparametric Bayesian methods are interesting because they may provide a way of learning the appropriate units of generalization as well as the generalization's probability or weight. Adaptor Grammars are a framework for stating a variety of hierarchical nonparametric Bayesian models, where the units of generalization can be viewed as kinds of PCFG rules. This talk describes the mathematical and computational properties of Adaptor Grammars and linguistic applications such as word segmentation and syllabification, and describes the MCMC algorithms we use to sample them. Joint work with Sharon Goldwater and Tom Griffiths.

July 30, 2008 | 10:30 am

“Understanding Speech in the Face of Competition”   Video Available

Barbara Shinn-Cunningham, Boston University

[abstract] [biography]

Abstract

In most social settings, competing speech sounds mask one another, causing us to hear only portions of the signal we are trying to understand. Moreover, multiple signals vie for our attention, causing central interference that can also limit what we perceive. Despite such interruptions and interference, we are incredibly adept at communicating in everyday settings. This talk will review recent studies of how we it is that we manage to selectively attend to and understand speech despite interruptions and perceptual competition from other sources. Evidence supports the idea that selective attention depends on the formation of auditory objects, and that the processes of forming and attending to objects evolve over time. In addition, top-down knowledge is critical for enabling us to fill in missing information in what we are successful at attending in everyday settings. These results have important implications for listeners with hearing impairment or who are aging, who are likely to experience difficulties with selectively attending in complex settings.

Speaker Biography

Barbara Shinn-Cunningham received her training in electrical engineering at Brown University (Sc.B., 1986) and the Massachusetts Institute of Technology (M.S., 1989; Ph.D., 1994). She joined the faculty of Boston University (BU) in 1996, where she is Director of Graduate Studies and Associate Professor of Cognitive and Neural Systems. She also holds faculty appointments in BU Biomedical Engineering, the BU Program in Neuroscience, the Harvard/MIT Health Sciences and Technology Program, the Harvard/MIT Speech and Hearing Program, and the Naval Post-Graduate School. She serves on the Governing Board of the Boston University Center for Neuroscience and the Board of Directors for the CELEST NSF Science of Learning Center, as well as various committees of professional organizations such as the Acoustical Society of America, and the Association for Research in Otolaryngology. She has received research fellowships from the Alfred P. Sloan Foundation, the Whitaker Foundation, and the National Security Science and Engineering Faculty Fellows program. Her research includes studies of auditory attention, sound source separation, spatial hearing, and perceptual plasticity.

July 22, 2008 | 01:00 pm

“Language as Kluge”   Video Available

Gary Marcus, New York University

[abstract]

Abstract

In fields ranging from reasoning to linguistics, the idea of humans as perfect, rational, optimal creatures is making a comeback - but should it be? Hamlet's musings that the mind was "noble in reason ...infinite in faculty" have their counterparts in recent scholarly claims that the mind consists of an "accumulation of superlatively well- engineered designs" shaped by the process of natural selection (Tooby and Cosmides, 1995), and the 2006 suggestions of Bayesian cognitive scientists Chater, Tenenbaum and Yuille that "it seems increasingly plausible that human cognition may be explicable in rational probabilistic terms and that, in core domains, human cognition approaches an optimal level of performance", as well as in Chomsky's recent suggestions that language is close "to what some super-engineer would construct, given the conditions that the language faculty must satisfy". In this talk, I will I argue that this resurgent enthusiasm for rationality (in cognition) and optimality (in language) is misplaced, and that the assumption that evolution tends creatures towards "superlative adaptation" ought to be considerably tempered by recognition of what Stephen Jay Gould called "remnants of history", or what I call evolutionary inertia. The thrust of my argument is that the mind in general, and language in particular, might be better seen as what engineers call a kluge: clumsy and inelegant, yet remarkably effective.

July 16, 2008 | 01:00 pm

“Energy-Based Models and Deep Learning”

Yann LeCun, Computational and Biological Learning Lab, Courant Institute of Mathematical Sciences, New York University

[abstract] [biography]

Abstract

A long-term goal of Machine Learning research is to solve highy complex "intelligent" tasks, such as visual perception, auditory perception, and language understanding. To reach that goal, the ML community must solve two problems: the Partition Function Problem, and the Deep Learning Problem. The normalization problem is related to the difficulty of training probabilistic models over large spaces while keeping them properly normalized. In recent years, the ML and Natural Language communities have devoted considerable efforts to circumventing this problem by developing "un-normalized'' learning models for tasks in which the output is highly structured (e.g. English sentences). This class of models was in fact originally developed during the early 90's in the speech and handwriting recognition communities, and resulted in highly successful commercial system for automatically reading bank checks and other documents. The Deep Learning Problem is related to the issue of training all the levels of a recognition system (e.g. segmentation, feature extraction, recognition, etc) in an integrated fashion. We first consider "traditional'' methods for deep supervised learning, such as multi-layer neural networks and convolutional networks, a learning architecture for image recognition loosely modeled after the architecture of the visual cortex. Several practical applications of convolutional nets will be demonstrated with videos and live demos, including a handwriting recognition system, a real-time human face detector that also estimates the pose of the face, a real-time system that can detect and recognize objects such as airplanes, cars, animals and people in images, and a vision-based navigation system for off-road mobile robots that trains itself on-line to avoid obstacles. Although these methods produce excellent performance, they require many training samples. The next challenge is to devise unsupervised learning methods for deep networks. Inspired by some recent work by Hinton on "deep belief networks", we devised energy-based unsupervised algorithms that can learn deep hierarchies of invariant features for image recognition. We how such algorithms can dramatically reduces the required number of training samples, particularly for such tasks as the recognition of everyday objects at the category level.

Speaker Biography

Yann LeCun received an Electrical Engineer Diploma from Ecole Supérieure d'Ingénieurs en Electrotechnique et Electronique (ESIEE), Paris in 1983, and a PhD in Compuer Science from Université Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined the Adaptive Systems Research Department at AT&T Bell Laboratories in Holmdel, NJ, in 1988. Following AT&T's split with Lucent Technologies in 1996, he joined at AT&T Labs-Research as head of the Image Processing Research Department. In 2002 he became a Fellow at the NEC Research Institute in Princeton. He has been a professor of computer science at NYU's Courant Institute of Mathematical Sciences since 2003. Yann's research interests include computational and biological models of learning and perception, computer vision, mobile robotics, data compression, digital libraries, and the physical basis of computation. He has published over 130 papers in these areas. His image compression technology, called DjVu, is used by numerous digital libraries and publishers to distribute scanned documents on-line, and his handwriting recognition technology is used to process a large percentage of bank checks in the US. He has been general chair of the annual "Learning at Snowbird" workshop since 1997, and program chair of CVPR 2006.

May 6, 2008 | 07:35 am

“Recent Innovations in Dynamic Bayesian Networks for Automatic Speech Recognition”   Video Available

Chris Bartels, University of Washington

[abstract] [biography]

Abstract

Dynamic Bayesian networks (DBNs) are a class of directed graphical models for use on variable length sequences. DBNs have been applied to a number of tasks including automatic speech recognition, language processing, and DNA trace alignment. This talk will begin with a description of my recent work on reducing errors from burst noise in speech recognition using a DBN that combines a conventional phone-based speech recognizer with a classifier that detects syllable locations. The second portion of the talk will introduce several innovations for reducing the computational requirements of probabilistic inference on these types of models.

Speaker Biography

Chris Bartels is a Ph.D. candidate in the Department of Electrical Engineering at the University of Washington. He received his M.S. degree from the University of Washington in 2004 and his B.S. degree in computer engineering from the University of Kansas in 1999. Prior to his graduate studies he developed embedded software for GPS and sonar systems at GARMIN International. His research interests include graphical models in automatic speech recognition and inference in graphical models.

April 29, 2008 | 04:30 pm

“Forest-based Search Algorithms in Parsing and Machine Translation”   Video Available

Liang Huang, University of Pennsylvania

[abstract] [biography]

Abstract

Many problems in Natural Language Processing (NLP) involves an efficient search for the best derivation over (exponentially) many candidates, especially in parsing and machine translation. In these cases, the concept of "packed forest" provides a compact representation of the huge search spaces, where efficient inference algorithms based on Dynamic Programming (DP) are possible. In this talk we address two important problems within this framework: exact k-best inference which is widely used in NLP pipelines such as parse reranking and MT rescoring, and approximate inference when the search space is too big for exact search. We first present a series of fast and exact k-best algorithms on forests, which are orders of magnitudes faster than previously used methods on state-of-the-art parsers such as Collins (1999). We then extend these algorithms for approximate search when the forests are too big for exact inference. We will discuss two particular instances of this new method, forest rescoring for MT decoding with integrated language models, and forest reranking for discriminative parsing. In the former, our methods perform orders of magnitudes faster than conventional beam search on both state-of-the-art phrase-based and syntax-based systems, with the same level of search error or translation quality. In the latter, faster search also leads to better learning, where our approximate decoding makes whole-Treebank discriminative training practical and results in the best accuracy to date for parsers trained on the Treebank. This talk includes joint work with David Chiang (USC Information Sciences Institute).

Speaker Biography

Liang Huang will shortly finish his PhD at Penn, and is looking for a postdoctoral position.

April 22, 2008 | 04:30 pm

“Toward Automatic Temporal Interpretation of Texts”   Video Available

Graham Katz, Georgetown

[abstract] [biography]

Abstract

Extracting relational information about times and events referred to in a document has a wide range of applications, from information retrieval to document summarization. While there has been a long history of work on temporal interpretation in computational linguistics, this has been primarily in the terms of formal theories of interpretation. The advent of the TimeML language (and the creation of the TIMEBANK resource) has made this area more accessible to empirical methods in NLP and has standardized the task of temporal interpretation. In this paper I will overview the TimeML language, discuss some of its properties, and review the recent TempEval competition. In addition I present three sets of experiments in which we apply machine learning techniques to problem of determining the temporal relations that hold among the events and times in a text.

Speaker Biography

Graham Katz is an assistant professor of computational linguistics at the Linguistics Department of Georgetown University. He got his Ph.D. in Linguistics and Cognitive Science from the University of Rochester and spent a number of years as a researcher and lecturer in Germany, at the University of Tuebingen, Stuttgart and Osnabrueck. Dr. Katz's research area is computational and theoretical semantics, with a focus on issues in temporal interpretation.

April 15, 2008 | 04:30 pm

“Searching Efficiently for Solutions in Large-Scale NLP Application”

Daniel Marcu, Language Weaver / ISI

[abstract] [biography]

Abstract

As the Natural Language Processing (NLP) and Machine Learning fields mature, the gap between the mathematical equations we write when we model a problem statistically and the manner in which we implement these equations in NLP applications is widening. In this talk, I review first some of the challenges that we face when searching for best solutions in large-scale statistical applications, such as machine translation, and the effect that the ignoring of these challenges is having on end-to-end results. I also present recent developments that have the potential to impact positively a wide range of applications where parameter estimation and search are critical.

Speaker Biography

Daniel Marcu is the Chief Technology Officer of Language Weaver Inc. and an Associate Professor and Project Leader at the Information Sciences Institute, University of Southern California. His published work includes an MIT Press book, ââ¬ÅThe Theory and Practice of Discourse Parsing and Summarizationââ¬Â, and best paper awards, with his ISI colleagues, at AAAI-2000 and ACL-2001 for research on statistical-based summarization and translation. His research has influenced a diverse range of natural language processing fields from discourse parsing to summarization, machine translation, and question answering. His current focus is on efficient learning and decoding/search for statistical machine translation applications.

April 8, 2008 | 04:30 pm

“AXOMs: Asynchronous Cascaded Self-organizing Maps for Language Learning”   Video Available

Vito Pirrelli, CNR

[abstract] [biography]

Abstract

AXOMs are hierarchically-arranged Self-organizing Maps (SOMs) in an asynchronous feed-forward relation. In AXOMs, an incoming input word is sampled on a short time scale, and recoded through the topological activation state of a first-level SOM, called the phonotactic layer, placed at the bottom of the hierarchy. The activation state is eventually projected upwards to the second-level map in the hierarchy (or lexical layer) on a longer time scale. In the talk, we shall provide the formal underpinnings of AXOMs, together with a concrete illustration of their behaviour through two language learning sessions, simulating the acquisition of Italian and English verb forms respectively. The architecture is capable of mimicking two levels of long-term memory chunking: low-level segmentation of phonotactic patterns and higher-level morphemic chunking, together with their feeding relation. It turns out that the topology of second-level maps mirrors a meta-paradigmatic organization of the inflection lexicon, clustering verb paradigms sharing the same conjugation class, based on the principle of formal contrast. Examples of Vito's recent work are available at available here. These papers may be of particular interest: Calderone, B., I. Herreros, V. Pirrelli, 2007, Learning Inflection: the importance of starting big, Lingue e Linguaggio, vol. 2 Pirrelli, Vito, and Ivan Herreros (2007) ?Learning Morphology by Itself?, in Proceedings of the Fifth Mediterranean Morphology Meeting

Speaker Biography

Vito Pirrelli received a laurea degree in the Humanities from the Linguistics Department of Pisa University (Italy) and a PhD in Computational Linguistics from Salford University (UK) with a dissertation in “Morphology, Analogy and Machine Translationâ€Â. Currently he is Research Director at the CNR Institute for Computational Linguistics in Pisa and teaches “Computer for Humanities†at the Department of Linguistics of Pavia University. Author of two books and several journal and conference articles in Computational and Theoretical Linguistics, his main research interests include: Machine language learning Computer models of the mental lexicon Psycho-computational models of morphology learning and processing Hybrid models of language processing Information extraction Theoretical Morphology

April 1, 2008 | 04:30 pm

“Two Algorithms for Learning Sparse Representations”   Video Available

Tong Zhang, Rutgers

[abstract] [biography]

Abstract

We present two algorithms for sparse learning, where our goal is to estimate a target function that is a sparse linear combination of a set of basis functions. The first method is an online learning algorithm that focuses on scalability and can solve problems with large numbers of features and training data. We propose a general method called truncated gradient that can induce sparsity in the weights of online learning algorithms with convex loss functions. The approach is theoretically motivated, and can be regarded as an online counterpart of the popular L1-regularization method in the batch setting. We prove that small rates of sparsification result in only small additional regret with respect to typical online learning guarantees. Empirical experiments show that the approach works well. The second method is a batch learning algorithm focuses on effective feature selection. Since this problem is NP-hard in the general setting, approximation solutions are necessary. Two methods that are widely used to solve this problem are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination called FoBa that simultaneously incorporates forward and backward steps in a specific way, and show that the resulting procedure can effectively solve this NP-hard problem under quite reasonable conditions. The first part is joint with John Langford, Lihong Li and Alexander Strehl.

Speaker Biography

Tong Zhang received a B.A. in mathematics and computer science from Cornell University in 1994 and a Ph.D. in Computer Science from Stanford University in 1998. After graduation, he worked at IBM T.J. Watson Research Center in Yorktown Heights, New York, and Yahoo Research in New York city. He is currently an associate professor of statistics at Rutgers University. His research interests include machine learning, algorithms for statistical computation, their mathematical analysis and applications.

March 25, 2008 | 10:30 am

“Exploiting the Power of Search Engines for Real-Time Concept Association Tasks”   Video Available

Peter Anick, Yahoo

[abstract] [biography]

Abstract

Web search engines sift through billions of documents to identify those most likely to be relevant to a short natural language query. This functionality can be exploited to relate queries not just to documents but also to other concepts and queries. In this talk, we describe several applications of this principle, including the generation of query refinement suggestions for interactive search assistance and the discovery of alternative descriptors for an advertiser’s product space.

Speaker Biography

Peter Anick is a member of the Applied Sciences group at Yahoo! where he currently works on developing infrastructure and tools for supporting online query assistance, such as Yahoo’s recently released “Search Assist†product. He received his PhD in computer science from Brandeis University in 1999. Prior to that, he worked for many years in Digital Equipment Corporation’s Artificial Intelligence Technology groups on applications of computational linguistics, including online text search for customer support and natural language interfaces for expert and database systems, and subsequently at AltaVista and Overture. His research interests include intelligent information retrieval, user interfaces for exploratory search, text data mining and lexical semantics of nouns and noun compounds. He is a member of ACM SIGIR, former editor of SIGIR Forum and current workshops program chair for SIGIR’08.

March 18, 2008 | 12:00 pm

“Spring Break”

No Speaker

March 4, 2008 | 04:30 pm

“Finding Acoustic Regularities in Speech: From Words to Segments”   Video Available

Jim Glass, MIT

[abstract] [biography]

Abstract

The development of an automatic speech recognizer is typically a highly supervised process involving the specification of phonetic inventories, lexicons, acoustic and language models, along with annotated training corpora. Although some model parameters may be modified via adaptation, the overall structure of the speech recognizer remains relatively static thereafter. While this approach has been effective for problems when there is adequate human expertise and labeled corpora, it is challenged by less-supervised or unsupervised scenarios. It also stands in stark contrast to human processing of speech and language where learning is an intrinsic capability. From a machine learning perspective, a complementary alternative is to discover unit inventories in an unsupervised manner by exploiting the structure of repeating acoustic patterns within the speech signal. In this work we use pattern discovery methods to automatically acquire lexical entities, as well as speaker and topic segmentations directly from an untranscribed audio stream. Our approach to unsupervised word acquisition utilizes a segmental variant of a widely used dynamic programming technique, which allows us to find matching acoustic patterns between spoken utterances. By aggregating information about these matching patterns across audio streams, we demonstrate how to group similar acoustic sequences together to form clusters corresponding to lexical entities such as words and short multi-word phrases. On a corpus of lecture material, we demonstrate that clusters found using this technique exhibit high purity and that many of the corresponding lexical identities are relevant to the underlying audio stream. We have applied the acoustic pattern matching and clustering methods to several important problems in speech and language processing. In addition to showing how this methodology applies across different languages, we demonstrate two methods to automatically determine the identify of speech clusters. Finally, we also show how it can be used to provide an unsupervised segmentation of speakers and topics. Joint work with Alex Park, Igor Malioutov, and Regina Barzilay.

Speaker Biography

James R. Glass obtained his S.M. and Ph.D. degrees in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology. He is currently a Principal Research Scientist at the MIT Computer Science and Artificial Intelligence Laboratory where he heads the Spoken Language Systems Group. He is also a Lecturer in the Harvard-MIT Division of Health Sciences and Technology. His primary research interests are in the area of speech communication and human-computer interaction, centered on automatic speech recognition and spoken language understanding.

February 26, 2008 | 04:30 pm

“Learning Constraint-Based Grammars from Representative Data”   Video Available

Smaranda Muresan, UMD

[abstract] [biography]

Abstract

Traditional natural language processing systems have focused on modeling the deep, human-like level of text understanding, by integrating syntax and semantics. However, they overlooked a key requirement for scalability: learning. Modern natural language systems on the other hand, have embraced learning methods to ensure scalability, but they remain at a shallow level of text understanding by their inability to successfully model semantics. In this talk I will present a computationally efficient model for deep language understanding that brings together syntax, semantics and learning. I will present a new grammar formalism, Lexicalized Well-Founded Grammar, which integrates syntax and semantics, and is learnable from a small set of representative annotated examples, defining the importance to the model linguistically, and not simply by frequency, as in most previous work. The grammar rules have compositional and ontology constraints that provide access to meaning during parsing. The semantic representation is an ontology query language which allows a deep-level text-to-knowledge acquisition. I have proven that under appropriate assumptions the search space for grammar learning is a complete grammar lattice, which guarantees the uniqueness of the solution. I will show the linguistic relevance of a practical LWFG learning framework and its utility for populating terminological knowledge bases from text in the medical domain.

Speaker Biography

Smaranda Muresan received her PhD degree in Computer Science from Columbia University. She is currently a Postdoctoral Research Associate at the Institute for Advanced Computer Studies at University of Maryland. Her research interests include language learning and understanding, machine translation and relational learning. Her work unifies two separate but central themes in human language technologies: computational formalisms to express language phenomena and induction of knowledge from data.

February 19, 2008 | 04:30 pm

“Prominence in conversational speech: pitch accent, contrast and givenness”   Video Available

Ani Nenkova, University of Pennsylvania

[abstract] [biography]

Abstract

The ability to automatically predict appropriate prominence patterns is considered a key factor for improving the naturalness of text-to-speech synthesis systems. I will present results from a large human preference experiment showing that indeed even simple models of pitch accent and contrast/focus in a TTS system lead to measurable and significant improvements in concatenative synthesis. I will also present a study of prominence in conversational speech based on the Switchboard corpus. The corpus has been richly annotated for binary pitch accent information, as well as for semantically motivated distinctions such as contrast (narrow focus) and givenness (given/new distinctions), allowing for in-depth analysis of the factors involved in prominence assignment. This is joint work with Dan Jurafsky and other colleagues and parts of it have been presented at NAACL-HLT'07, Interspeech'07 and ASRU'07.

Speaker Biography

Ani Nenkova is an assistant professor of computer and information science at University of Pennsylvania. Prior to this appointment she worked as a postdoctoral fellow with Dan Jurafsky at Stanford University. She holds a Ph.D degree from Columbia University where she worked on different aspects of multi-document summarization of news.

February 12, 2008 | 04:30 pm

“Combining outputs from multiple machine translation systems”   Video Available

Antti-Veikko Rosti, BBN

[abstract] [biography]

Abstract

The interest in system combination for machine translation has recently increased due to programs involving multiple sites. In programs, such as the DARPA GALE, the sites develop MT systems independently for the same task. As these systems have different strengths and only a single output for each task is evaluated, several methods to combine the outputs from all systems to leverage their strengths have been explored. The system combination efforts within the AGILE team from the beginning of the GALE program until the recent re-test are presented in this talk. The talk will cover topics from two recent papers presented at the 2007 NAACL-HLT and ACL conferences as well as the latest improvements developed for the GALE Phase 2 re-test. Related papers: http://acl.ldc.upenn.edu/N/N07/N07-1029.pdf http://acl.ldc.upenn.edu/P/P07/P07-1040.pdf

Speaker Biography

Antti-Veikko Rosti received his MSc in information technology from Tampere University of Technology, Finland, and PhD in information engineering from Cambridge University, UK. He joined IBM Research as a postdoctoral researcher in Yorktown Heights, NY in 2004. Since 2005 he has been a scientist at BBN Technologies in Cambridge, MA. His research interests are in statistical signal processing and machine learning with a particular emphasis on their application to audio, speech, and language processing.

February 5, 2008 | 04:30 pm

“Recent Advances in Audio Information Retrieval”   Video Available

Bhuvana Ramabhadran, IBM

[abstract] [biography]

Abstract

Early word-spotting systems processed the audio signal to produce phonetic transcripts without the use of an automatic speech recognition (ASR) system. In the past decade, most of the research efforts on spoken data retrieval have focused on extending classical IR techniques to word transcripts. Some of these have been done in the framework of the NIST TREC Spoken Document Retrieval tracks.The use of word and phonetic transcripts was explored more recently in the context of the Spoken Term Detection (STD) 2006 evaluation conducted by NIST. In this talk, I will begin with IBMs submission to the STD evaluation and cover recent work at IBM to enhance the performance of the end-to-end audio search system. The first technique proposes the use of a similarity measure based on a phonetic confusion matrix that accounts for higher-order phonetic confusions (phone bi-grams and tri-grams) and the second is an application of vector space modeling, particularly Latent Semantic Analysis (LSA), to shortlist the most relevant audio segments, resulting in the same level of performance when using only 3% of the overall collection instead of the entire collection for search.

Speaker Biography

Dr. Bhuvana Ramabhadran is a Research Staff Member in the Multilingual Analytics and User Technologies at the IBM T.J. Watson Research Center. Since joining IBM in 1995, she has made significant contributions to the ViaVoice line of products and served as the Principal Investigator for the NSF-funded project, Multilingual Access to Large Spoken Archives: MALACH and EU-funded project, TC-STAR: Technology and Corpora for Speech-to-Speech Translation. She currently manages a group that focuses on large vocabulary speech transcription, audio information retrieval and text-to-speech synthesis. Her research interests include speech recognition algorithms, statistical signal processing, pattern recognition and biomedical engineering.

Back to Top

2007

December 4, 2007 | 04:30 pm

“Only connect! Two explorations in using graphs for IR and NLP”   Video Available

Lillian Lee, Cornell

[abstract] [biography]

Abstract

Can we create a system that can learn to understand political speeches well enough to determine the speakers' viewpoints? Can we improve information retrieval by using link analysis, as is famously done in Web search, if we are dealing with documents that don't contain hyperlinks? And how can these two questions form the basis of a coherent talk? Answer: graphs! Joint work with Oren Kurland, Bo Pang, and Matt Thomas.

Speaker Biography

Lillian Lee is an associate professor of computer science at Cornell University. She is the recipient of the Best Paper Award at HLT-NAACL 2004 (joint with Regina Barzilay), a citation in "Top Picks: Technology Research Advances of 2004" by Technology Research News (also joint with Regina Barzilay), and an Alfred P. Sloan Research Fellowship.

November 27, 2007 | 04:30 pm

“Elements of inference”   Video Available

Tommi Jaakkola, MIT

[abstract] [biography]

Abstract

Most engineering and science problems involve modeling. We need inference calculations to draw predictions from the models or to estimate them from available measurements. In many cases the inference calculations can be done only approximately as in decoding, sensor networks, or in modeling biological systems. At the core, inference tasks tie together three types of problems: counting (partition function), geometry (valid marginals), and uncertainty (entropy). Most approximate inference methods can be viewed as different ways of simplifying this three-way combination. Much of recent effort has been spent on developing and understanding distributed approximation algorithms that reduce to local operations in an effort to solve a global problem. In this talk I will provide an optimization view of approximate inference algorithms, exemplify recent advances, and outline some of the many open problems and connections that are emerging due to modern applications.

Speaker Biography

Tommi Jaakkola received the M.Sc. degree in theoretical physics from Helsinki University of Technology, Finland, and Ph.D. from MIT in computational neuroscience. Following a postdoctoral position in computational molecular biology (DOE/Sloan fellow, UCSC) he joined the MIT EECS faculty 1998. He received the Sloan Research Fellowship 2002. His research interests include many aspects of machine learning, statistical inference and estimation, and algorithms for various modern estimation problems such as those involving multiple predominantly incomplete data sources. His applied research focuses on problems in computational biology such as transcriptional regulation.

November 13, 2007 | 04:30 pm

“When Will Computers Understand Shakespeare?”   Video Available

Jerry Hobbs, University of Southern California

[abstract]

Abstract

In this talk I will examine problems encountered in coming to some kind of understanding of one sonnet by Shakespeare (his 64th), ask what it would take to solve these problems computationally, and suggests routes to the solution. The general conclusion is that we are closer to this goal as one might think. Or are we?

November 6, 2007 | 04:30 pm

“Developing Efficient Models of Intrinsic Speech Variability”   Video Available

Richard Rose, McGill University

[abstract] [biography]

Abstract

There are a variety of modeling techniques used in automatic speech recognition that have been developed with the goal of representing potential sources of intrinsic speech variability in a low dimensional subspace. The focus of much of the research in this area has been on "speaker space" based approaches where it is assumed that statistical models for an unknown speaker lie in a space whose basis vectors represent relevant variation among a set of reference speakers. As an alternative to these largely data driven approaches, more structured feature and model representations have been developed that are based on theories of speech production and acoustic phonetics. The performance improvements obtained by speaker space approaches like eigenvoice modeling, cluster adaptive training, and several others have been reported for speaker adaptation in many ASR task domains where only small amounts of adaptation data are available. The potential of systems based on phonological distinctive features has also been demonstrated on far more constrained task domains. This talk presents discussion and experimental results that attempt to explore the potential advantages of both classes of techniques. We will also focus on the limitations of these techniques in addressing some of the basic problems that still exist in state of the art ASR systems.

Speaker Biography

Richard Rose was a member of the Speech Systems Technology Group at MIT Lincoln Laboratory working on speech recognition and speaker recognition from 1988 to 1992. He was with AT&T from 1992 to 2003, specifically in the Speech and Image Processing Services Laboratory at AT&T Labs – Research in Florham Park, NJ after 1996.   Currently, he is an associate professor of Electrical and Computer Engineering at McGill University in Montreal, Quebec. Professor Rose served as a member of the IEEE Signal Processing Society (SPS) Technical Committee on Digital Signal Processing, as a member of the (SPS) Board of Governors, as associate editor for the IEEE Transactions on Speech and Audio Processing, as a member of the IEEE SPS Speech Technical Committee, on the editorial board for the Speech Communication Journal, and was founding editor of the STC Newsletter. He was also recently the Co-chair of the IEEE 2005 Workshop on Automatic Speech Recognition and Understanding.

October 30, 2007 | 04:30 pm

“Maximum Entropy and Species Distribution Modeling”   Video Available

Rob Schapire, Princeton

[abstract] [biography]

Abstract

Modeling the geographic distribution of a plant or animal species is a critical problem in conservation biology: to save a threatened species, one first needs to know where it prefers to live, and what its requirements are for survival. From a machine-learning perspective, this is an especially challenging problem in which the learner is presented with no negative examples and often only a tiny number of positive examples. In this talk, I will describe the application of maximum-entropy methods to this problem, a set of decades-old techniques that happen to fit the problem very cleanly and effectively. I will describe a version of maxent that we have shown enjoys strong theoretical performance guarantees that enable it to perform effectively even with a very large number of features. I will also describe some extensive experimental tests of the method, as well as some surprising applications. This talk includes joint work with Miroslav Dudík and Steven Phillips.

Speaker Biography

Robert Schapire received his ScB in math and computer science from Brown University in 1986, and his SM (1988) and PhD (1991) from MIT under the supervision of Ronald Rivest. After a short post-doc at Harvard, he joined the technical staff at AT&T Labs (formerly AT&T Bell Laboratories) in 1991 where he remained for eleven years. At the end of 2002, he became a Professor of Computer Science at Princeton University. His awards include the 1991 ACM Doctoral Dissertation Award, the 2003 Gödel Prize and the 2004 Kanelakkis Theory and Practice Award (both of the last two with Yoav Freund). His main research interest is in theoretical and applied machine learning.

October 23, 2007 | 04:30 pm

“Modelling human syntax by means of probabilistic dependency grammars”   Video Available

Matthias Buch-Kromann, Center for Computational Modelling of Language, Department of Computational Linguistics, Copenhagen Business School

[abstract] [biography]

Abstract

Probabilistic dependency grammars have played an important role in computational linguistics since they were introduced by Collins (1996) and Eisner (1996). In most computational formulations of dependency grammar, a dependency grammar can be viewed as a projective context-free grammar in which all phrases have a lexical head. However, there are many linguistic phenomena that a context-free dependency grammar cannot properly account for, such as non-projective word order (in topicalizations, scramblings, and extrapositions), secondary dependencies (in complex VPs, control constructions, relative clauses, elliptic coordinations and parasitic gaps), and punctuation (which is highly context-sensitive). In the talk, I will present a generative dependency model that can account for these phenomena and others. Although exact probabilistic parsing is NP-hard in this model, heuristic parsing need not be, and I will briefly describe a family of error-driven incremental parsing algorithms with repair that have time complexity O(n log^k(n)) given realistic assumptions about island constraints. In this parsing framework, the dependency model must assign probabilities to partial dependency analyses. I will show one way of doing this and outline how it introduces the need for adding time-dependence into the model in order to support the left-right incremental processing of the text.

Speaker Biography

Matthias Buch-Kromann is head of the Computational Linguistics Group at the Copenhagen Business School (CBS). He is also a member of the Center for Computational Modelling of Language and the Center for Research in Translation and Translation Technology at CBS. His current research interests include dependency treebanks, probabilistic dependency models of texts and translations, and computational models of human parsing and translation. His dr.ling.merc. dissertation (habilitation) from 2006 proposes a dependency-based model of human parsing and language learning. He has been the driving force behind the 100,000 word Danish Dependency Treebank (used in the CoNLL 2006 shared task) and the Copenhagen Danish-English Parallel Dependency Treebank.

October 16, 2007 | 04:30 pm

“OpenFst: a General and Efficient Weighted Finite-State Transducer Library”   Video Available

Michael Riley, Google

[abstract] [biography]

Abstract

We describe OpenFst, an open-source library for weighted finite-state transducers (WFSTs). OpenFst consists of a C++ template library with efficient WFST representations and over twenty-five operations for constructing, combining, optimizing, and searching them. At the shell-command level, there are corresponding transducer file representations and programs that operate on them. OpenFst is designed to be both very efficient in time and space and to scale to very large problems. This library has key applications speech, image, and natural language processing, pattern and string matching, and machine learning. We give an overview of the library, including an outline of some key algorithms, examples of its use, details of its design that allow customizing the labels, states, and weights, and the lazy evaluation of many of its operations. Further information and a download of the OpenFst library can be obtained from the OpenFst web site. Joint work with: Cyril Allauzen, Johan Schalkwyk, Wojtek Skut and Mehryar Mohri.

Speaker Biography

Michael Riley received his B.S., M.S. and Ph.D. in computer science from MIT. He joined Bell Labs in Murray Hill, NJ in 1987 and moved to AT&T Labs in Florham Park, NJ in 1996. He is currently a member of the research staff at Google, Inc. in New York City. His interests include speech and natural language processing, text analysis, information retrieval, and machine learning.

October 9, 2007 | 04:30 pm

“New Methods to Capture and Exploit Multiscale Speech Dynamics: From Mathematical Models to Forensic Tools”   Video Available

Patrick Wolfe, Statistics and Information Sciences Laboratory (SISL), Harvard University

[abstract] [biography]

Abstract

The variability inherent in speech waveforms gives rise to powerful temporal and spectral dynamics that evolve across multiple scales, and in this talk we describe new methods to capture and exploit these multiscale dynamics. First we consider the canonical task of formant estimation, formulated as a statistical model-based tracking problem. We extend a recent model of Deng et al. both to account for the uncertainty of speech presence by way of a censored likelihood formulation, as well as to explicitly model formant cross-correlation via a vector autoregression. Our results indicate an improvement of 20-30% relative to benchmark formant analysis tools. In the second part of the talk we present a new adaptive short-time Fourier analysis-synthesis scheme for signal analysis, and demonstrate its efficacy in speech enhancement. While a number of adaptive analyses have previously been proposed to overcome the limitations of fixed time-frequency resolution schemes, we derive here a modified overlap-add procedure that enables efficient resynthesis of the speech waveform. Measurements and listening tests alike indicate the potential of this approach to yield a clear improvement over fixed-resolution enhancement systems currently used in practice.

Speaker Biography

Patrick J. Wolfe is currently Assistant Professor of Electrical Engineering in the School of Engineering and Applied Sciences at Harvard, with appointments in the Department of Statistics and the Harvard-MIT Program in Speech and Hearing Biosciences and Technology. He received a B.S. in Electrical Engineering and a B.Mus. concurrently from the University of Illinois at Urbana-Champaign in 1998, both with honors. He then earned his Ph.D. in Engineering from the University of Cambridge (UK) as an NSF Graduate Research Fellow, working on the application of perceptual criteria to statistical audio signal processing. Prior to founding the Statistics and Information Sciences Laboratory at Harvard in 2004, Professor Wolfe held a Fellowship and College Lectureship jointly in Engineering and Computer Science at New Hall, a University of Cambridge consituent college where he also served as Dean. He has also taught in the Department of Statistical Science at University College, London, and continues to act as a consultant to the professional audio community in government and industry. At Harvard he teaches a variety of courses on advanced topics in inference, information, and statistical signal processing, as well as applied mathematics and statistics at the undergraduate level. In addition to his diverse teaching activities, Professor Wolfe has published in the literatures of engineering, computer science, and statistics, and has received honors from the IEEE, the Acoustical Society of America, and the International Society for Bayesian Analysis. His research group focuses on statistical signal processing for modern high-dimensional data sets such as speech waveforms and color images, and is supported by a number of grants and partnerships, including sponsored projects with NSF, DARPA, and Sony Electronics, Inc. Recent research highlights include a paper award at the 2007 IEEE International Conference on Image Processing for work in color image acquisition, a new approach to speech formant tracking that yields up to 30% improvement relative to benchmark methods, and a set of matrix approximation techniques for spectral methods in machine learning, with error bounds that improve significantly upon known results.

October 2, 2007 | 04:30 pm

“Detecting Deceptive Speech”   Video Available

Julia Hirschberg, Columbia University

[abstract] [biography]

Abstract

This talk will discuss production and perception studies of deceptive speech and the acoustic/prosodic and lexical cues associated with deception. Experiments in which we collected a large corpus of deceptive and non-deceptive speech from naive subjects in the laboratory are described, together with perception experiments of this corpus. Features extracted from this corpus have been used in Machine Learning experiments to predict deception with classification accuracy from 64.0- 66.4%, depending upon feature-set and learning algorithm. This performance compares favorably with the performance of human judges on the same data and task, which averaged 58.2%. We also discuss current findings on the role of personality factors in deception detection, speaker-dependent models of deception, and future research. This work was done in collaboration with Frank Enos, Columbia University;Elizabeth Shriberg, Andreas Stolcke, and Martin Graciarena, SRI/ICSI; Stefan Benus, Brown University; and more.

Speaker Biography

Julia Hirschberg is Professor of Computer Science at Columbia University. From 1985-2003 she worked at Bell Labs and AT&T Labs, as member of Technical Staff working on intonation assignment in text-to-speech synthesis and then as Head of the Human Computer Interaction Research Department. Her research focusses on prosody in speech generation and understanding. She currently works on speech summarization, emotional speech, charismatic speech, deceptive speech, and dialogue prosody. Hirschberg was President of the International Speech Communication Association from 2005-2007 and co-editor-in-chief of Speech Communication from 2003-2006. She was editor-in-chief of Computational Linguistics and on the board of the Association for Computational Linguistics from 1993-2003. She has been a fellow of the American Association for Artificial Intelligence since 1994.

September 27, 2007 | 04:30 pm

“Structural Alignment for Finite-State Syntactic Processing”   Video Available

Brian Roark, OGI School of Science and Engineering at OHSU

[abstract] [biography]

Abstract

In this talk we will present some preliminary experiments on using multi-sequence alignment (MSA) techniques for inducing monolingual finite-state tagging models that capture some global sequence information. Such MSA techniques are popular in bio-sequence processing, where key information about long-distance dependencies and three-dimensional structures of protein or nucleotide sequences can be captured without resorting to polynomial complexity context-free models. In the NLP community, such techniques have been used very little -- most notably for aligning paraphrases (Barzilay and Lee, 2003) -- and not at all for monolingual syntactic processing. We discuss key issues in pursuing this approach: syntactic functional alignment; inducing multi-sequence alignments; and using such alignments in tagging. Experiments are preliminary but promising.

Speaker Biography

Brian Roark is a faculty member in the Center for Spoken Language Understanding (CSLU) and Department of Computer Science and Electrical Engineering (CSEE) of the OGI School of Science and Engineering at OHSU. He was in the Speech Algorithms Department at AT&T Labs from 2001-2004. He finished his Ph.D. in the Department of Cognitive and Linguistic Sciences at Brown University in 2001. At Brown he was part of the Brown Laboratory for Linguistic Information Processing.

September 18, 2007 | 04:30 pm

“Transcribing Speech for Language Processing”   Video Available

Mari Ostendorf, University of Washington

[abstract] [biography]

Abstract

With recent advances in automatic speech recognition, there are growing opportunities for natural language processing of speech, including applications such as information extraction, summarization and translation. As speech processing moves from simple word transcription to document processing and analyses of human interactions, it becomes increasingly important to represent structure in spoken language and incorporate structure in performance optimization. In this talk, we consider two types of structure: segmentation and syntax. Virtually all types of language processing technology, having been developed on written text, assumes knowledge of sentence boundaries; hence, sentence segmentation is critical for spoken document processing. Experiments show that sentence segmentation has a significant impact on performance of tasks such as parsing, translation and information extraction. However, optimizing for downstream task performance leads to different operating points for different tasks, which we claim argues for the additional use of subsentence prosodic structure. Parsing itself is an important analysis tool used in many human language technologies, and jointly optimizing speech recognition performance for parse and word error benefits these applications. Moreover, we show that optimizing recognition for parsing performance can benefit subsequent language processing (e.g. translation) even when parse structure is not explicitly used, because of the increased importance placed on constituent headwords. Of course, if parsing is part of the ultimate objective, recognition benefits even more from parsing language models than with simple word error rate criteria. A complication arises in working with conversational speech due to the presence of disfluencies, which reinforces the argument for subsentence prosodic modeling and explicit representation of disfluencies in parsing models.

Speaker Biography

Mari Ostendorf received the Ph.D. in electrical engineering from Stanford University in 1985. After working at BBN Laboratories (1985-1986) and Boston University (1987-1999), she joined the University of Washington (UW) in 1999. She has also served as a visiting researcher at the ATR Interpreting Telecommunications Laboratory in Japan in 1995 and at the University of Karlsruhe in 2005-2006. At UW, she is currently an Endowed Professor of System Design Methodologies in Electrical Engineering and an Adjunct Professor in Computer Science and Engineering and in Linguistics. She teaches undergraduate and graduate courses in signal processing and statistical learning, including a project-oriented freshman course that introduces students to signal processing and communications. Prof. Ostendorf's research interests are in dynamic and linguistically-motivated statistical models for speech and language processing. Her work has resulted in over 160 publications and 2 paper awards. Prof. Ostendorf has served on numerous technical and advisory committees, as co-Editor of Computer Speech and Language (1998-2003), and now as the Editor-in-Chief of the IEEE Transactions on Audio, Speech and Language Processing. She is a Fellow of IEEE and a member of ISCA, ACL, SWE and Sigma Xi.

August 1, 2007

“Field Methods for Natural Language Processing”   Video Available

Kevin Cohen, Center for Computational Pharmacology, University of Colorado, School of Medicine

[abstract]

Abstract

Software testing is a first-class research object in computer science, but so far has not been studied in the context of natural language processing. Testing of language processing applications is qualitatively different from testing other types of applications, because language itself is qualitatively different from other classes of inputs. Nonetheless, a methodology for testing NLP applications already exists. It is theoretically isomorphic with descriptive and structural linguistics, and its praxis is isomorphic with linguistic field methods. In this talk, I present data on the state of software testing for a popular class of text mining application, show the commonalities between software testing and linguistic field methods, and illustrate a number of benefits that accrue from approaching language processing from a software testing perspective in general, and from a descriptive linguistic perspective in particular.

July 25, 2007

“Generalized Principal Components Analysis”   Video Available

Rene Vidal, Johns Hopkins University

[abstract] [biography]

Abstract

Over the past two decades, we have seen tremendous advances on the simultaneous segmentation and estimation of a collection of models from sample data points, without knowing which points correspond to which model. Most existing segmentation methods treat this problem as "chicken-and-egg", and iterate between model estimation and data segmentation. This lecture will show that for a wide variety of data segmentation problems (e.g. mixtures of subspaces), the "chicken-and-egg" dilemma can be tackled using an algebraic geometric technique called Generalized Principal Component Analysis (GPCA). This technique is a natural extension of classical PCA from one to multiple subspaces. The lecture will touch upon a few motivating applications of GPCA in computer vision, such as image/video segmentation, 3-D motion segmentation or dynamic texture segmentation, but will mainly emphasize the basic theory and algorithmic aspects of GPCA.

Speaker Biography

Professor Vidal received his B.S. degree in Electrical Engineering (highest honors) from the Pontificia Universidad Catolica de Chile in 1997 and his M.S. and Ph.D. degrees in Electrical Engineering and Computer Sciences from the University of California at Berkeley in 2000 and 2003, respectively. He was a research fellow at the National ICT Australia since September 2003 and joined The Johns Hopkins University in January 2004 as an Assistant Professor in the Department of Biomedical Engineering and the Center for Imaging Science. His areas of research are biomedical imaging (DTI registration and clustering, heart motion analysis), computer vision (segmentation of static and dynamic scenes, multiple view geometry, omnidirectional vision), machine learning (generalized principal component analysis GPCA, kernel GPCA, dynamic GPCA), vision-based coordination and control of unmanned vehicles, and hybrid systems identification and control. Dr. Vidal is recipient of the 2005 NFS CAREER Award and the 2004 Best Paper Award Honorable Mention (with Prof. Yi Ma) for his work on "A Unified Algebraic Approach to 2-D and 3-D Motion Segmentation" presented at the European Conference on Computer Vision. He also received the 2004 Sakrison Memorial Prize for "completing an exceptionally documented piece of research", the 2003 Eli Jury award for "outstanding achievement in the area of Systems, Communications, Control, or Signal Processing", the 2002 Student Continuation Award from NASA Ames, the 1998 Marcos Orrego Puelma Award from the Institute of Engineers of Chile, and the 1997 Award of the School of Engineering of the Pontificia Universidad Catolica de Chile to the best graduating student of the school. He is a program chair for PSIVT 2007 and area chair for CVPR 2005 and ICCV 2007.arning, and the European Conference on Principles and Practice of Knowledge Discovery in Databases.

June 1, 2007

“Learning Under Differing Training and Test Distributions”   Video Available

Tobias Scheffer, Machine Learning Research Group of the Max Planck Institute for Computer Science

[abstract] [biography]

Abstract

Most learning algorithms are constructed under the assumption that the training data is governed by the exact same distribution which the model will later be exposed to. In practice, control over the data generation process is often less perfect. Training data may consist of a benchmark corpus (e.g., the Penn Treebank) that does not reflect the distribution of sentences that a parser will later be used for. Spam filters may be used by individuals whose distribution of inbound emails diverges from the distribution reflected in public training corpora (e.g., the TREC spam corpus). In the talk, I will analyze the problem of learning classifiers that perform well under a test distribution that may differ arbitrarily from the training distribution. I will discuss the correct optimization criterion and a solutions, including a kernel logistic regression classifier for differing training and test challenges. In filtering spam, phishing and virus emails, distributions vary greatly over users, IP domains, and over time. Taking into account that spam senders change their email templates in response to the filtering mechanisms employed, leads to the related but even more challenging problem of adversarial learning.  

Speaker Biography

Tobias Scheffer is Research Associate Professor and head of the Machine Learning Research Group of the Max Planck Institute for Computer Science. He is an adjunct faculty member of Humboldt-Universitaet zu Berlin. Between 2003 and 2006, he was a Research Assistant Professor at Humboldt-Universitaet zu Berlin. Prior to that, he worked at the University of Magdeburg, at Technische Universitaet Berlin, the University of New South Wales in Sydney and Siemens Corporate Research in Princeton, N.J. He was awarded an Emmy Noether Fellowship of the German Science Foundation DFG in 2003 and an Ernst von Siemens Fellowship by Siemens AG in 1996. He received a Master's Degree in Computer Science (Diplominformatiker) in 1995 and a Ph.D. (Dr. rer nat.) in 1999 from Technische Universitat Berlin. Tobias serves on the Editorial Board of the Data Mining and Knowledge Discovery Journal. He served as Program Chair of the European Conference on Machine Learning, and the European Conference on Principles and Practice of Knowledge Discovery in Databases.

May 10, 2007 | 04:00 pm

“A Multinomial View of Signal Spectra for Latent-Variable Analyses”   Video Available

Bhiksha Raj, MERL Research Lab

[abstract]

Abstract

The magnitude spectrum of any signal may be viewed as a density function or (in the case of discrete frequency spectra) histograms with the frequency axis as the support. In this talk I will describe how this perspective allows us to perform spectral decompositions through a latent-variable model that enables us to extract underlying, or "latent", spectral structures that additively compose the speech spectrum. I show how such decomposition can be used for varied purposes such as bandwidth expansion of narrow-band speech, component separation from mixed monaural signals, and denoising. I then explain how the basic latent-variable model may be extended to derive sparse overcomplete decompositions of speech spectra. I demonstrate through examples that such decompositions can not only be utilized for improved speaker separation from mixed monaural recordings, but also to extract the building blocks of other data such as images and text. Finally, I present shift- and transform-independent extensions of the model, through which it becomes possible to automatically extract repeating themes or objects within data such as audio, images or video.

May 8, 2007 | 12:00 pm

“Machine Translation = Automata Theory + Probability + Linguistics”   Video Available

Kevin Knight, USC/Information Sciences Institute

[abstract] [biography]

Abstract

Machine translation (MT) systems have been getting more accurate. One reason is that machines now gather translation knowledge autonomously, combing through large amounts of human-translated material available on the web. Most of these MT systems learn finite-state Markov models -- target strings are substituted for source strings, followed by local word re-ordering. This kind of model can only support very weak linguistic transformations, and the trained models do not yet lead to reliably high-quality MT. Over the past several years, many new probabilistic tree-based models (versus string-based models) have been designed and tested on many natural language applications, including MT. Such models frequently turn out to be instances of tree transducers, a formal automata model first described by W. Rounds and J. Thatcher in the 1960s and 70s. Tree automata open up new opportunities for us to marry deeper representations, mathematical theory, and machine learning. This talk covers novel algorithms and open problems for tree automata, together with experiments in machine translation.

Speaker Biography

Kevin Knight is a Senior Research Scientist and Fellow at USC's Information Sciences Institute, a Research Associate Professor in the Computer Science Department at USC, and co-founder of Language Weaver, Inc. He received his Ph.D. from Carnegie Mellon University in 1991 and his BA from Harvard University in 1986. He is co-author (with Elaine Rich) of the textbook Artificial Intelligence (McGraw-Hill, 1991). His research interests are in statistical natural language processing, machine translation, natural language generation, and decipherment.

May 3, 2007 | 04:00 pm

“Should Airplanes Flap Wings?
(Should machine processing of sensory data take inspiration from nature?)”
   Video Available

Hynek Hermansky, IDIAP Research Institute

[abstract] [biography]

Abstract

Nature's sensory systems, their corresponding processing modules, and, in some cases (such as speech) also the structure of message-carrying sensory data, have all co-evolved to ensure survival of their respective species, and hence reached a high level of effectiveness. Therefore, we argue that human-like processing often represents the most effective engineering processing for sensory data. However, we also argue that such human-like processing does not (and perhaps should not) be derived by indiscriminate emulation of all mechanisms and properties of biological systems. Rather, we think that our designs should selectively apply key human-like concepts that address the particular weaknesses of artificial algorithms, and that have not yet fully evolved in the course of the historical evolution of speech technology. We also show that these concepts may sometimes directly emerge in the course of optimizing performance of the machine algorithms on target data. The approach will be illustrated on a several specific examples of algorithms that are currently being successfully used in main stream applications.

Speaker Biography

Hynek Hermansky is a Director of Research at the IDIAP Research Institute Martigny and a Professor at the Swiss Federal Institute of Technology at Lausanne, Switzerland (among a number of other mostly unpaid affiliations). He has been working in speech processing for over 30 years, previously as a Research Fellow at the University of Tokyo, a Research Engineer at Panasonic Technologies in Santa Barbara, California, a Senior Member of Research Staff at U S WEST Advanced Technologies, and a Professor and Director of the Center for Information Processing at the OGI School of the Oregon Health and Sciences University, Portland, Oregon.

May 1, 2007 | 12:00 pm

“Factoring Speech into Linguistic Features”   Video Available

Karen Livescu, Massachusetts Institute of Technology

[abstract] [biography]

Abstract

Spoken language technologies, such as automatic speech recognition and synthesis, typically treat speech as a string of "phones". In contrast, humans produce speech through a complex combination of semi-independent articulatory trajectories. Recent theories of phonology acknowledge this, and treat speech as a combination of multiple streams of linguistic "features". In this talk I will present ways in which the factorization of speech into features can be useful in speech recognition, in both audio and visual (lipreading) settings. The main contribution is a feature-based approach to pronunciation modeling, using dynamic Bayesian networks. In this class of models, the great variety of pronunciations seen in conversational speech is explained as the result of asynchrony among feature streams and changes in individual feature values. I will also discuss the use of linguistic features in observation modeling via feature-specific classifiers. I will describe the application of these ideas in experiments with audio and visual speech recognition, and present analyses suggesting additional potential applications in speech science and technology.

Speaker Biography

Karen Livescu is a Luce Post-doctoral Fellow in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and EECS department at MIT. She completed her PhD in EECS, MIT, in 2005, and her BA in Physics at Princeton University in 1996, with a stint in between as a visiting student in EE/CS at the Technion in Israel. In the summer of 2006 she led a team project in JHU's summer workshop series on speech and language engineering. Her main research interests are in speech and language processing, with a slant toward combining statistical modeling techniques with knowledge from linguistics and speech science.

April 17, 2007 | 12:00 pm

“Protein folding and parsing”   Video Available

Julia Hockenmaier, University of Pennsylvania

[abstract] [biography]

Abstract

We know that adult speakers of a language have no problem understanding newspapers in that language, and that proteins fold spontaneously into specific three-dimensional structures. However, a sentence in the Wall Street Journal may have millions of possible grammatical analyses, and a protein may have millions of possible structures. As computer scientists who want to design systems that can either parse natural language or predict the folded structure of proteins, we are faced with two very similar search problems: In both cases, we want to find the optimal structure of an input sequence among an exponential number of possible alternatives. In this talk, I will demonstrate how CKY, a standard dynamic programming algorithm that is normally used in natural language parsing, can be adapted to give us novel insights into the protein folding problem. If we assume that folding is a greedy, hierarchical search for lowest-energy structures, CKY provides an efficient way to find all direct folding routes. I will also show that we can extend CKY to construct a Markov chain model of the entire folding process, and that this Markov chain may explain an apparent contradiction between what experimentalists observe in a test tube and what many theorists predict.

Speaker Biography

Julia Hockenmaier is a postdoc with Aravind Joshi at the University of Pennsylvania, and also a frequent visitor to Ken Dill's lab at the University of California at San Francisco. Her research areas are natural language processing (computational linguistics) and computational biology, specifically natural language parsing and protein folding.

March 27, 2007 | 12:00 pm

“A synthesis of logical reasoning and word learning abilities in children and adults”   Video Available

Justin Halberda, Johns Hopkins University

[abstract]

Abstract

In this talk I will bring together two literatures: 1) work on word-learning in young children, and 2) work on the developmental origins of logical reasoning.  The predominant view in each of these literatures has been that word learning is supported by probabilistic (non-deductive) inference mechanisms, and that children display no abstract logical competence until after 5 or more years of age (after the onset of robust language ability).  I will make a case that two-year-old children have access to a particular domain general logical reasoning strategy (Disjunctive Syllogism) and that they bring this strategy to bear on the task of learning new words.  This reveals a logical competence that has not been observed before in young children and it begins to reveal the logical computations that support word learning constraints.

March 20, 2007 | 12:00 pm

“Complexity Metrics for Surface Structure Parsing”   Video Available

John Hale, Michigan State University

[abstract] [biography]

Abstract

The relationship between grammar and language behavior is not entirely clear-cut. One classic view (Chomsky 65, Bresnan & Kaplan 82, Stabler 83, Steedman 89) holds that grammars specify a time-independent body of knowledge, one that is deployed on-line by a processing mechanism. Determining the computational properties of this mechanism is thus a central problem in cognitive science. This talk demonstrates an analytical approach to this problem that divides the job up into three parts: parser = control * memory * grammar Time-dependent sentence processing predictions then follow mechanically from the conjunction of assumptions about each of the three parts (cf. Kaplan 72). Certain combinations accord with known phenomena and suggest new experimental directions. But more broadly the approach offers an explicit, positive proposal about how human sentence comprehension works and the role grammar plays in it.

Speaker Biography

John Hale is a cognitive scientist whose research focuses on computational linguistics. His recent projects have addressed human sentence processing, formal language theory and speech disfluency. He received his PhD from Johns Hopkins in 2003 and holds a joint appointment in Linguistics & Languages and Computer Science & Engineering at Michigan State University.

March 6, 2007 | 12:00 pm

“The Real Time News Analysis (RTNA) Project”   Video Available

Joan E. Forester, U.S. Army Research Laboratory

[abstract] [biography]

Abstract

Intelligence analysts have the arduous responsibility of processing large amounts of data to determine trends and relationships. Analysts must be able to gather traditional information (signal, human, and measurement and signature intelligence) and nontraditional data (financial and social context) to form actionable intelligence. One source of nontraditional data is Web-based news. The U.S. Army Research Laboratory (ARL) currently has two projects that will jointly meet part of this requirement. The first is the Real Time News Analysis (RTNA) project. RTNA is being developed to harvest real time streaming data from Web-based news sources and pre-process it by information extraction, categorization, message understanding, concept mining, and fusing. This data will then be fed to ARL's Social Network Analysis (SNA) project. This is challenging research but with a potential high payoff of providing non-traditional information quickly to analysts.

Speaker Biography

Ms. Forester received a Bachelor of Science in Computer and Information Science from Towson University in January 1987, graduating with Summa cum Laude. She did post graduate training at the G.W.C. Whiting School of Engineering, The John Hopkins University, where her major concentration was in artificial intelligence and computer vision. She is currently an operations research analyst (computer scientist) in the Computational & Information Science Directorate, Tactical Collaboration & Data Fusion Branch of the ARL, where is works on projects dealing with real time news analysis and social networking.

February 27, 2007 | 12:00 pm

“Spoken Document Retrieval and Browsing”   Video Available

Ciprian Chelba, Google Inc.

[abstract] [biography]

Abstract

Ever increasing computing power and connectivity bandwidth together with falling storage costs result in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, search emerges as a key application as more and more data is being saved. Speech search has not received much attention due to the fact that large collections of untranscribed spoken material have not been available, mostly due to storage constraints. As storage becomes cheaper, the availability and usefulness of large collections of spoken documents is limited strictly by the lack of adequate technology to exploit them. Manually transcribing speech is expensive and sometimes outright impossible due to privacy concerns. This leads us to exploring an automatic approach to searching and navigating spoken document collections. The talk will focus on techniques for the indexing and retrieval of spoken audio files, and results on a corpus (MIT iCampus) containing recorded academic lectures.

Speaker Biography

Ciprian Chelba is a Research Scientist with Google. Previously he worked as a Researcher in the Speech Technology Group at Microsoft Research. His core research interests are in statistical modeling of natural language and speech. Recent projects include speech content indexing for search in spoken documents, discriminative language modeling for large vocabulary speech recognition, as well as speech and text classification.

February 20, 2007 | 01:00 pm

“Exploiting Sparsity and Structure in Parametric and Nonparametric Estimation”   Video Available

John Lafferty, Carnegie Mellon University

[abstract] [biography]

Abstract

We present new results on sparse estimation in both the parametric setting for graphical models, and in the nonparametric setting for regression in high dimensions. For graphical models, we use l1 regularization to estimate the structure of the underlying graph in the high dimensional setting. In the case of nonparametric regression, we present a method that regularizes the derivatives of an estimator, resulting in a type of nonparametric lasso technique. In addition, we discuss the problem of semi-supervised learning, where unlabeled data is used in an attempt to improve estimation. We analyze some current regularization methods in terms of minimax theory, and develop new methods that lead to improved rates of convergence. Joint work with Han Liu, Pradeep Ravikumar, Martin Wainwright, and Larry Wasserman.

Speaker Biography

John Lafferty is a professor in the Computer Science Department and the Machine Learning Department within the School of Computer Science at Carnegie Mellon University. His research interests are in machine learning, statistical learning theory, computational statistics, natural language processing, information theory, and information retrieval. Prof. Lafferty received the Ph.D. in Mathematics from Princeton University, where he was a member of the Program in Applied and Computational Mathematics. Before joining the faculty of CMU, he was a Research Staff Member at the IBM Thomas J. Watson Research Center as a Research Staff Member, working in Frederick Jelinek's group on statistical natural language processing. Prof. Lafferty currently serves as co-Director, with Steve Fienberg, of CMU's Ph.D. Program in Computational and Statistical Learning, and as an associate editor of the Journal of Machine Learning Research. His first glimpse of the power and magic of combining statistics and computation--in the practice of what has come to be called machine learning--was seeing the first decodings emerge from the IBM statistical machine translation system in the late 1980s.

February 13, 2007 | 12:00 pm

“Diffusion Kernels on Graphs”   Video Available

Bruno Jedynak, Johns Hopkins University (CIS)

[abstract]

Abstract

The heat equation is a partial differential equation which describes the variation of temperature in a given region over time subject to boundary conditions. We will define a related equation that we will also call a heat equation in the situation where the space variable belongs to the vertices of a graph. We will review examples of graphs where the heat equation can be solved analytically. We will then discuss applications in language modeling and in image processing where solving the heat equation on a well chosen graph can lead to interesting Smoothing and denoising algorithms.

February 6, 2007 | 12:00 pm

“Latent factor models for relational data and social networks”   Video Available

Peter Hoff, University of Washington

[abstract] [biography]

Abstract

Relational data consist of information that is specific to pairs (triples, etc) of objects. Examples include friendships among people, trade between countries, word counts in documents and interactions among proteins. A recent approach to modeling such data is via the use of latent factor models, in which the relationship between two objects is modeled as a function of some unobserved characteristics of the objects. Such a modeling approach is related to random effects modeling and to matrix decomposition techniques, such as the eigenvalue and singular value decompositions. In the context of several data analysis examples, I will describe and motivate this modeling approach, and show how latent factor models can be used for estimation, prediction and visualization for relational data.

Speaker Biography

Peter Hoff is an associate professor in the departments of Statistics and Biostatistics, and a member of the Center for Statistics and the Social Sciences at the University of Washington in Seattle.

January 31, 2007 | 12:00 pm

“Learning and exploiting statistical dependencies in networks”   Video Available

David Jensen, University of Massachusetts Amherst

[abstract] [biography]

Abstract

Networks are ubiquitous in computer science and everyday life. We live embedded in social and professional networks, we communicate through telecommunications and computer networks, and we represent information in documents connected by hyperlinks and bibliographic citations. Only recently, however, have researchers developed techniques to analyze and model data about these networks. These techniques build on work in artificial intelligence, statistics, databases, graph theory, and social network analysis, and they are profoundly expanding the phenomena that we can understand and predict. Emerging applications for these new techniques include citation analysis, web mining, bioinformatics, peer-to-peer networking, computer security, epidemiology, and financial fraud detection. This talk will outline the unifying ideas behind three lines of recent work in my research group: 1) methods for learning joint distributions of variables on networks; 2) methods for navigating networks; and 3) methods for indexing network structure. All these methods share a common thread -- representing and exploiting autocorrelation. Autocorrelation (or homophily) is a common feature of many social networks. Two individuals are more likely to share similar occupations, political beliefs, or cultural backgrounds if they are neighbors. In general, a statistical dependence often exists between the values of the same variable on neighboring entities. Much of the work in my group focuses on relational dependency networks and latent group models, two methods for learning statistical dependencies in social networks. The most important discoveries made using these models are often autocorrelation dependencies. We have also developed expected-value navigation, a method that combines information about autocorrelation and degree structure to efficiently discover short paths in networks. Finally, we have developed network structure indices, a method of annotating networks with artificially created autocorrelated variables to index graph structures so that short paths can be discovered quickly. Network structure indices, in turn, provide several ways to improve our probabilistic modeling, completing a surprising cycle of research unified by the concept of autocorrelation.

Speaker Biography

David Jensen is Associate Professor of Computer Science and Director of the Knowledge Discovery Laboratory at the University of Massachusetts Amherst. From 1991 to 1995, he served as an analyst with the Office of Technology Assessment, an agency of the United States Congress. He received his doctorate from Washington University in 1992. His research focuses on machine learning and knowledge discovery in relational data, with applications to web mining, social network analysis, and fraud detection. He serves on the program committees of the International Conference on Knowledge Discovery and Data Mining and the International Conference on Machine Learning. He is a member of the 2006-2007 Defense Science Study Group.

Back to Top

2006

December 5, 2006 | 12:00 pm

“Improved Statistical Machine Translation Using Paraphrases”

Chris Callison-Burch, University of Edinburgh

[abstract] [biography]

Abstract

In this talk I show how automatically generated paraphrases can be used to improve the quality of statistical machine translation. Specifically, I show how paraphrases can be used to alleviate problems associated with out-of-vocabulary words and phrases. Statistical translation systems currently perform poorly when they encounter a word that was unseen in the training corpus. Since they have not learned a translation of it, they either reproduce the foreign word untranslated, or delete it. I propose replacing the unknown source phrase with a paraphrase which the model has learned the translation of, and then translating the paraphrase. I show experimental results which indicate that coverage can be increased dramatically, with most of the newly covered items translating accurately. Related publications: Chris Callison-Burch, Philipp Koehn and Miles Osborne. "Improved Statistical Machine Translation Using Paraphrases." In Proceedings NAACL-2006.

Speaker Biography

Chris Callison-Burch is a PhD student at the University of Edinburgh. He is currently finishing his thesis entitled "Paraphrasing and Translation." This summer he participated in the CLSP summer workshop on Factored Translation Models. In 2002 he co- founded a machine translation startup company called Linear B (http:// linearb.co.uk/).

November 28, 2006 | 12:00 pm

“Data Compression and Secrecy Generation”

Prakash Narayan, University of Maryland

[abstract] [biography]

Abstract

Consider a situation in which multiple terminals observe separate but correlated signals. In a multiterminal data compression problem, a la the classic work of Slepian and Wolf, each terminal seeks to acquire the signals observed by all the other terminals by means of efficiently compressed interterminal communication. This problem does not involve any secrecy constraints. On the other hand, in a secret key generation problem, the same terminals seek to devise a ``common secret key" through public communication, which can be observed by an eavesdropper, in such a way that the key is concealed from the eavesdropper. Such a secret key can be used for subsequent encryption. We explain the innate connections that exist between these two problems and explore a constructive approach to secret key generation based on muliterminal data compression techniques. This talk is based on joint work with Imre Csiszar and Chunxuan Ye.

Speaker Biography

Prakash Narayan received the Bachelor of Technology degree in Electrical Engineering from the Indian Institute of Technology, Madras, in 1976, and the M.S. and D.Sc. degrees in Systems Science and Mathematics, and Electrical Engineering, respectively, from Washington University, St. Louis, MO., in 1978 and 1981. He is Professor of Electrical and Computer Engineering at the University of Maryland, College Park, with a joint appointment at the Institute for Systems Research. He is also a founding member of the Center for Satellite and Hybrid Communication Networks, a NASA Commercial Space Center. He has held visiting appointments at ETH, Zurich; the Technion, Haifa; the Renyi Institute of the Hungarian Academy of Sciences, Budapest; the University of Bielefeld; the Institute of Biomedical Engineering (formerly LADSEB), Padova; and the Indian Institute of Science, Bangalore. Dr. Narayan has served as Associate Editor for Shannon Theory for the IEEE Transactions on Information Theory; Co-Organizer of the IEEE Workshop on Multi-User Information Theory and Systems, VA (1983); Technical Program Chair of the IEEE/IMS Workshop on Information Theory and Statistics, VA (1994); General Co-Chair of the IEEE International Symposium on Information Theory, Washington, D.C. (2001); and Technical Program Co-Chair of the IEEE Information Theory Workshop, Bangalore (2002). He is a Fellow of the IEEE.

November 21, 2006 | 12:00 pm

“Scan Statistics on Enron Graphs”

Carey Priebe, Johns Hopkins University

[abstract]

Abstract

Scan statistics are commonly used to investigate an instantiation of a random field for the possible presence of a local signal. Known in the engineering literature as "moving window analysis", the idea is to scan a small window over the data, calculating some local statistic (number of events for a point pattern, perhaps, or average pixel value for an image) for each window. The supremum or maximum of these locality statistics is known as the scan statistic. In this talk, we introduce a theory of scan statistics on graphs, and we apply these ideas to the problem of anomaly detection in a time series of Enron email Graphs.

November 14, 2006 | 07:00 pm

“Current and Future Trends in Question Answering”

Sanda Harabagiu, University of Texas at Dallas

[abstract]

Abstract

Work on Question Answering (QA) has argued that the relation between a question and its answer(s) can be cast in terms of logical entailment. In this talk a QA framework that takes advantage of question semantics and a computational framework for text entailment is presented. To be able to process complex questions in this framework, question decompositions need to be generated. Complex questions are decomposed by a random walk on a bipartite graph of relations established between concepts related to the question and sub-questions derived from topic-relevant passages. Decomposed questions and textual entailment significantly enhance the accuracy and comprehensiveness of extracted answers.

November 6, 2006 | 06:00 pm

“Search Result Summarization and Disambiguation via Contextual Dimensions”

Raghuram Krishnapuram, IBM, India Research Lab

[abstract] [biography]

Abstract

Dynamically generated topic hierarchies are a popular method of summarizing the results obtained in response to a query in various search applications. However, topic hierarchies generated by statistical techniques tend to be somewhat unintuitive, rigid and inefficient for browsing and disambiguation. In this talk, we propose an alternative approach to query disambiguation and result summarization. The approach uses a fixed set of orthogonal contextual dimensions to summarize and disambiguate search results. A contextual dimension defines a specific type to a context which makes it incomparable to contexts of other types. For the generic search scenario, we propose to use three types of contextual dimensions, namely, concepts, features, and specializations. We use NLP techniques to extract the three types of contexts, and a data mining algorithm to select a subset of contexts that are as distinct _LP_i.e., mutually exclusive_RP_ as possible. Our experimental results show that we can achieve a considerable reduction in the effort required for retrieving relevant documents via the proposed interface.

Speaker Biography

Raghu Krishnapuram received his Ph.D. degree in electrical and computer engineering from Carnegie Mellon University, Pittsburgh, in 1987. From 1987 to 1997, he was on the faculty of the Department of Computer Engineering and Computer Science at the University of Missouri, Columbia. From 1997 to 2000, Dr. Krishnapuram was a Full Professor at the Department of Mathematical and Computer Sciences, Colorado School of Mines,Golden, Colorado. Since then, he has been at at IBM India Research Lab, New Delhi. Dr. Krishnapurams research encompasses many aspects of Web mining, information retrieval, e-commerce, fuzzy set theory, neural networks, pattern recognition, computer vision, and image processing. He has published over 160 papers in journals and conferences in these areas. Dr. Krishnapuram is an IEEE Fellow, and a co-author _LP_with J. Bezdek, J. Keller and N. Pal_RP_ of the book "Fuzzy Models and Algorithms for Pattern Recognition and Image Processing".

October 30, 2006 | 11:00 am

“The architecture of speech perception: Data from language acquisition and speech processing”

Emmanuel Dupoux, Ecole des Hautes Études en Sciences Sociales, School for Advanced Studies in Social Sciences, Paris

[abstract] [biography]

Abstract

The human speech communication system is uniquely complex. It is complex in its intricate use of levels of representations _LP_sounds, words, sentences_RP_, and in the range of superficial variability that it displays, both across cultures and across individuals. Despite such complexities, human babies quickly and robustly acquire the language_LP_s_RP_ spoken in their community. In this talk, we discuss speech processing architectures that could address such robustness of speech acquisition. In particular we review recent data suggesting that speech variability is dealt with through three subsystems: _LP_1_RP_ an acoustic-universal system which is not learnt and shared across species _LP_2_RP_ a phonetic system, learnt early in life in a bottom up fashion and nonplastic after a critical period, _LP_3_RP_, a phonological system, learnt in a top down fashion, and remaining very plastic throughout life. We discuss data on early language acquisition as well as second language learning.

Speaker Biography

Prof. Dupouxââ¬â¢s research focuses on the procedures and representations specific to the human brain that allow the baby to acquire one or several languages, using classical techniques in adult processing, newborn studies and brain imagery. He is the director of the Laboratoire des Sciences Cognitive et Psycholinguistique _LP_Psycholinguisitics and Cognitive Science Lab_RP_ in Paris. His training is in cognitive psychology, applied mathematics, computer sciencend engineering _LP_Telecom_RP_.

October 24, 2006 | 12:00 pm

“Depth of Feelings: Alternatives for Modeling Affect in User Models and Cognitive Architectures”

Eva Hudlicka, Psychometrix Associates

[abstract] [biography]

Abstract

Neuroscience and psychology research over the past two decades has demonstrated a close connection between cognition and affect. Affective factors _LP_emotions and personality traits_RP_ can profoundly influence perception, decision-making and behavior, contributing to a variety of biases and heuristics. These effects may enhance or degrade cognitive processes and performance, depending on the context. The ability to explicitly represent affective factors in user models and cognitive architectures has a number of benefits, including more accurate user models, increased realism and believability of synthetic agents, and improved effectiveness of decision-aiding systems. Consideration of affective factors can also provide disambiguating information for speech recognition and natural language understanding. This talk will first provide an overview of emotion research in psychology relevant for the construction of computational models of emotion. The talk will then discuss the motivation and alternatives for incorporating emotions and personality traits within user models and cognitive architectures. The representational and reasoning requirements for several alternative modeling approaches will be discussed, along with examples from existing cognitive-affective architectures. These include shallow and deep models of emotions, and a generic methodology for modeling multiple, interacting affective factors. The talk will conclude with a discussion of the challenges involved in building and validating computational emotion models.

Speaker Biography

Eva Hudlicka is a Principal Scientist and President of Psychometrix Associates, in Blacksburg, VA. Her primary research focus is the development of computational models of emotion; both the cognitive processes involved in appraisal, and the effects of emotions on cognition. This research is conducted within the context of a computational cognitive-affective architecture, the MAMID architecture, which implements a generic methodology for modeling the interacting effects of multiple affective factors on decision-making. Her prior research includes affect-adaptive user interfaces, visualization and UI design, decision-support system design, and knowledge elicitation. Dr. Hudlicka received her BS in Biochemistry from Virginia Tech, her MS from The Ohio State University in Computer Science, and her PhD in Computer Science from the University of Massachusetts-Amherst. Prior to founding Psychometrix Associates in 1995, she was a Senior Scientist at Bolt Beranek & Newman in Cambridge, MA.

October 16, 2006 | 04:30 pm

“The Multimedia Communications Revolution of the 21st Century”

Larry Rabiner, Rutgers University

[abstract] [biography]

Abstract

We are now in the midst of a Multimedia Communications Revolution in which virtually every aspect of telecom is changing in ways that would have been considered unthinkable just a decade or so ago. Perhaps the greatest challenge in realizing this communications revolution is to figure out how to provide a range of new services that seamlessly integrate text, sound, image, and video information and to do it in a way that preserves the ease-of-use and interactivity of conventional telephony, irrelevant of the bandwidth or means of access of the connection to the service. In order to achieve this overarching goal, there are a number of technological problems that must be considered, including: ââ¬Â¢ compression and coding of multimedia signals, including algorithmic issues, standards issues, and transmission issues; ââ¬Â¢ synthesis and recognition of multimedia signals, including speech, images, handwriting, and text; ââ¬Â¢ organization, storage, and retrieval of multimedia signals; ââ¬Â¢ access methods to the multimedia signal; ââ¬Â¢ searching; ââ¬Â¢ browsing. In each of these areas a great deal of progress has been made in the past few years, driven in part by the relentless growth in processing and storage capacity of VLSI chips, and in part by the availability of broadband access to and from the home and to and from wireless connections. It is the purpose of this talk to review the status of the technology in the areas of telecom, multimedia compression, and multimedia understanding, and to illustrate some of the challenges and limitations of current capabilities.

Speaker Biography

Lawrence Rabiner was born in Brooklyn, New York, on September 28, 1943. He received the S.B., and S.M. degrees simultaneously in June 1964, and the Ph.D. degree in Electrical Engineering in June 1967, all from the Massachusetts Institute of Technology, Cambridge Massachusetts. From 1962 through 1964, Dr. Rabiner participated in the cooperative program in Electrical Engineering at AT&T Bell Laboratories, Whippany and Murray Hill, New Jersey. During this period Dr. Rabiner worked on designing digital circuitry, issues in military communications problems, and problems in binaural hearing. Dr. Rabiner joined AT&T Bell Labs in 1967 as a Member of the Technical Staff. He was promoted to Supervisor in 1972, Department Head in 1985, Director in 1990, and Functional Vice President in 1995. He joined the newly created AT&T Labs in 1996 as Director of the Speech and Image Processing Services Research Lab, and was promoted to Vice President of Research in 1998 where he managed a broad research program in communications, computing, and information sciences technologies. Dr. Rabiner retired from AT&T at the end of March 2002 and is now a Professor of Electrical and Computer Engineering at Rutgers University, and the Associate Director of the Center for Advanced Information Processing _LP_CAIP_RP_ at Rutgers. Dr. Rabiner is co-author of the books ââ¬ÅTheory and Application of Digital Signal Processingââ¬Â _LP_Prentice-Hall, 1975_RP_, ââ¬ÅDigital Processing of Speech Signalsââ¬Â _LP_Prentice-Hall, 1978_RP_, ââ¬ÅMultirate Digital Signal Processingââ¬Â _LP_Prentice-Hall, 1983_RP_, and ââ¬ÅFundamentals of Speech Recognitionââ¬Â _LP_Prentice-Hall, 1993_RP_. Dr. Rabiner is a member of Eta Kappa Nu, Sigma Xi, Tau Beta Pi, the National Academy of Engineering, the National Academy of Sciences, and a Fellow of the Acoustical Society of America, the IEEE, Bell Laboratories, and AT&T. He is a former President of the IEEE Acoustics, Speech, and Signal Processing Society, a former Vice-President of the Acoustical Society of America, a former editor of the ASSP Transactions, and a former member of the IEEE Proceedings Editorial Board.

October 10, 2006 | 10:00 am

“Straw Man as Weapon: Simple Approaches to Processing Human Language”

James Mayfield, JHU Applied Physics Lab

[abstract] [biography]

Abstract

Many evaluations of human language technologies use a "straw man system" as a point of comparison. Yet the straw man doesn often attract significant attention, either from the audience or from the author. In this talk, I will argue that for many human language technologies, careful selection of a straw man can lead to significant insights into the nature of the problem being studied. I will draw on examples from information retrieval and information extraction to illustrate this point.

Speaker Biography

James Mayfield is a member of the Principal Professional Staff at the Johns Hopkins University Applied Physics Laboratory, as well as Associate Research Professor in the Whiting School of Engineering. He is Principal Investigator for the HAIRCUT information retrieval project, which routinely places among the top systems in the world in international evaluations of cross-language retrieval.

July 12, 2006

“Convex Training Algorithms for Hard Machine Learning Problems”   Video Available

Dale Schuurmans, University of Alberta

[abstract] [biography]

Abstract

In this talk, I will first discuss a new unsupervised training algorithm for hidden Markov models that is discriminative, convex, and avoids the use of EM. The idea is to formulate an unsupervised version of maximum margin Markov networks (M3Ns) that can be trained via semidefinite programming. This extends our recent work on unsupervised support vector machines. The result is a discriminative training criterion for hidden Markov models that remains unsupervised and does not create local minima. Experimental results show that the convex discriminative procedure can produce better conditional models than conventional Baum-Welch (EM) training. I will then discuss how the convex relaxation approach, in general, can be used to derive effective new training algorithms for many hard machine learning problems, including outlier detection and Bayesian network structure learning.

Speaker Biography

Dale Schuurmans is a Professor of Computing Science and Canada Research Chair in Machine Learning at the University of Alberta. He received his PhD in Computer Science from the University of Toronto and MSc and BSc degrees in Computing Science and Mathematics from the University of Alberta. He has previously been an Associate Professor at the University of Waterloo, a Postdoctoral Fellow at the University of Pennsylvania, a Researcher at the NEC Research Institute, and a Research Associate at the National Research Council Canada. Prof. Schuurmans is currently an Action Editor for the Journal of Machine Learning Research and the Machine Learning journal, and served as Program Co-Chair for the Intenational Conference on Machine Learning (in 2004). His research interests include machine learning, optimization and search. He is author of over eighty publications in these areas, and has received outstanding paper awards at the International Joint Conference on Artificial Intelligence (IJCAI) and the National Conference on Artificial Intelligence (AAAI).

May 2, 2006 | 02:00 am

“Sensing and Computation for Autonomous Surveillance Systems”

Andreas Andreou, Johns Hopkins University

[abstract]

Abstract

Sensing and decoding human motor activity into a symbolic representation yields information about the state of mind of individuals and their future actions and interactions within their environment. Automatic speech recognition by machines, the decoding of the signals produced by the human vocal apparatus, is an example of such a task long recognized as a fundamental problem of interest to the department of defense and the commercial sector. In this talk I will discuss research in my lab for sensing and computational systems capable of solving the above problem while operating in a truly autonomous manner on energy aware hardware and aimed at matching or exceeding the cognitive capabilities of humans. I make the discussion concrete by presenting the architecture of JHU-ASU _LP_Johns Hopkins University - Autonomous/Acoustic Surveillance Unit_RP_ _RP_ a prototype hardware platform developed to explore algorithms and architectures as well test custom hardware systems for autonomous surveillance sensing and computation. The problem of surveillance is hierarchically solved by obtaining answers to the questions: Is there anything interesting in the environment, where is it and what is it? To do that we exploit a multitude of passive and active sensors, including a continuous wave acoustic radar in a distributed network.

April 25, 2006 | 12:00 pm

“Learning to Model Text Structure”   Video Available

Regina Barzilay, MIT

[abstract]

Abstract

Discourse models capture relations across different sentences in a document. These models are crucial in applications where it is important to generate coherent text. Traditionally, rule-based approaches have been predominant in discourse research. However, these models are hard to incorporate as-is in modern systems: they rely on handcrafted rules, valid only for limited domains, with no guarantee of scalability or portability. In this talk, I will present discourse models that can be effectively learned from a collection of unannotated texts. The key premise of our work is that the distribution of entities in coherent texts exhibits certain regularities. The models I will be presenting operate over an automatically-computed representation that reflects distributional, syntactic, and referential information about discourse entities. This representation allows us to induce the properties of coherent texts from a given corpus, without recourse to manual annotation or a predefined knowledge base. To conclude my talk, I will show how these models can be effectively integrated in statistical generation and summarization systems. This is joint work with Mirella Lapata and Lillian Lee.

April 18, 2006 | 06:00 pm

“Distance Metric Learning for Large Margin Classification”   Video Available

Lawrence K. Saul, University of Pennsylvania

[abstract] [biography]

Abstract

Many frameworks for statistical pattern recognition are based on computing Mahalanobis distances, which appear as positive semidefinite quadratic forms. I will describe how to learn the parameters of these quadratic forms -- the so-called distance metrics -- for two popular models of multiway classification. First, for k-nearest neighbor _LP_kNN_RP_ classification, I will show how to learn metrics with the property that distances between differently labeled examples greatly exceed distances between nearest neighbors belonging to the same class. Second, for Gaussian mixture models _LP_GMMs_RP_, I will show how to learn the mean and covariance parameters so that these models perform well as classifiers with large margins of error. Both of these models for multiway classification can be viewed as new types of large margin classifiers, with the traditionally linear _LP_hyperplane_RP_ decision boundaries of support vector machines _LP_SVMs_RP_ replaced by the nonlinear decision boundaries induced by kNN or GMMs. Like SVMs, the objective functions for "large margin" kNN/GMM classifiers are convex, with no local minima. The advantages of these models over SVMs are that: _LP_i_RP_ they are naturally suited for problems in multiway _LP_as opposed to binary_RP_ classification, and _LP_ii_RP_ the kernel trick is not required for nonlinear decision boundaries. I will describe successful applications of these models to handwritten digit recognition and automatic speech recognition. Joint work with K. Weinberger, F. Sha, and J. Blitzer.

Speaker Biography

Lawrence Saul received his A.B. in Physics from Harvard _LP_1990_RP_ and his Ph.D. in Physics from M.I.T. _LP_1994_RP_. He stayed at M.I.T. for two more years as a postdoctoral fellow in the Center for Biological and Computational Learning, then joined the Speech and Image Processing Center of AT&T Labs in Florham Park, NJ. In 1999, the MIT Technology Review recognized him as one of 100 top young innovators. He joined the faculty of the University of Pennsylvania in January 2002, where he is currently an Associate Professor in the Department of Computer and Information Science. He has received an NSF CAREER Award for work in statistical learning, and more recently he served as Program Chair and General Chair for the 2003-2004 conferences on Neural Information Processing Systems. He is currently serving on the Editorial Board for the Journal of Machine Learning Research. In July 2006, he will be joining the faculty of the Department of Computer Science and Engineering at UC San Diego.

April 11, 2006 | 11:00 am

“Assistive Technology for the Deaf: American Sign Language Machine Translation”

Matt Huenerfauth, University of Pennsylvania

[abstract]

Abstract

A majority of deaf high school graduates in the United States have an English reading level comparable to that of a 10-year-old hearing student, and so machine translation _LP_MT_RP_ software that translates English text into American Sign Language _LP_ASL_RP_ animations can significantly improve these individuals access to information, communication, and services. This talk will trace the development of an English-to-ASL MT system that has made translating texts important for literacy and user-interface applications a priority. These texts include some difficult-to-translate ASL phenomena called classifier predicates that have been ignored by previous ASL MT projects. During classifier predicates, signers use special hand movements to indicate the location and movement of invisible objects _LP_representing entities under discussion_RP_ in space around their bodies. Classifier predicates are frequent in ASL and are necessary for conveying many concepts. This talk will describe several new technologies that facilitate the creation of machine translation software for ASL and are compatible with recent linguistic analyses of the language. These technologies include: a multi-path machine translation architecture, a 3D visualization of the arrangement of objects under discussion, a planning-based animation generator, and a multi-channel representation of the structure of the ASL animation performance. While these design features have been prompted by the unique requirements of generating a sign language, these technologies have applications for the machine translation of written languages, the representation of other multimodal language signals, and the production of meaningful gestures by other animated virtual human characters.

April 4, 2006 | 04:00 pm

“Is Sengwato a Possible Language?”   Video Available

Elizabeth Zsiga, Georgetown University

[abstract]

Abstract

Sengwato, a dialect of Setswana spoken in Botswana, ought to be impossible. In this talk, we’ll examine two aspects of the sound structure of Sengwato that ought to be impossible: doubly-articulated fricatives _LP_a sound that shouldn’t occur_RP_ and post-nasal devoicing _LP_a process that shouldn’t occur_RP_. Doubly-articulated labio-coronal fricatives, which produce audible turbulence with the tongue blade and lips simultaneously, have been argued to be physically impossible _LP_Ladefoged and Maddieson 1996_RP_. Post-nasal devoicing, a preference for voiceless consonants after nasals _LP_whereby [bata], “look for†becomes [mpata] “look for meâ€Â_RP_ can not be derived from the interaction of phonetically-grounded constraints. The aerodynamic consequences of velum lowering and raising for nasality have been shown to promote voicing _LP_Hayes 1999_RP_, and all other languages examined _LP_Pater 1999_RP_ show either no phonological effect of the nasal environment, or a preference for voiced stops after nasals. For example, English allows both [mp] and [mb] _LP_camper vs. amber_RP_, and Kikuyu allows only [mb] _LP_/n+koma/ “I sleep†becomes [Ngoma]_RP__RP__RP_ Clements 1985_RP_. Setswana is the only clear case where it has been argued that [mb] is prohibited and [mp] is preferred. The case has thus proven problematic for the typological predictions of theories such as OT _LP_Hyman 2001, Odden 2003_RP_. This talk will present a reanalysis of the consonant system of Sengwato, based on acoustic data from six speakers of the dialect, recorded in the village of Shoshong, Central Botswana _LP_Tlale 2005; Zsiga, Gouskova, & Tlale in press_RP_. Acoustic and video recordings confirm that labio-coronal fricatives are indeed double articulations and not sequences. Realization of the “voiced†stops varied by speaker, by place of articulation, and by position. Post-nasal stops are confirmed to be voiceless. In other positions, however, segments that are described in the literature as voiced stops are generally found to be either not voiced or not stops. We conclude that the Sengwato consonant system can be better analyzed by reference to phonetically-grounded constraints against voiced stops and to an independently active process of post-nasal hardening. In the one case, I will argue that we need to adjust our understanding of universal constraints based on what we know about Sengwato. In the other case, I will argue that we need to adjust our understanding of Sengwato in light of what we know about universal constraints. Lisa Zsiga, Department of Linguistics, Georgetown University in collaboration with Maria Gouskova _LP_NYU_RP_ and One Tlale _LP_University of Botswana_RP_

March 28, 2006 | 12:00 pm

“CUR Matrix Decompositions for Improved Data Analysis”   Video Available

Michael Mahoney, Yahoo Research

[abstract]

Abstract

Much recent work in the Theoretical Computer Science, Linear Algebra, and Machine Learning has considered matrix decompositions of the following form: given an m-by-n matrix A, decompose it as a product of three matrices, C, U, and R, where C consists of a few _LP_typically a constant number of_RP_ columns of A, R consists of a few _LP_typically a constant number of_RP_ rows of A, and U is a small carefully constructed matrix that guarantees that the product CUR is "close" to A. Applications of such decompositions include the computation of compressed "sketches" for large matrices in a pass-efficient manner, matrix reconstruction, speeding up kernel-based statistical learning computations, sparsity-preservation in low-rank approximations, and improved interpretability of data analysis methods. In this talk we shall discuss various choices for the matrices C, U, and R that are appropriate in different application domains. The main result will be an algorithm that computes matrices C, U, and R such that the _LP_Frobenius_RP_ norm of the error matrix A - CUR is almost minimal. We will also discuss applications of this algorithm _LP_and related heuristics_RP_ in the bioinformatics of human genetics, in recommendation system analysis, and in medical imaging.

March 16, 2006 | 04:00 pm

“Adaptive Human Language Processing”

Matt Crocker, Saarland University

[abstract] [biography]

Abstract

Theories of sentence comprehension have traditionally been based on evidence from reading studies, and have emphasized cognitive limitations and invariant parsing strategies. There is increasing evidence, however, that human language comprehension is highly adapted both to long-term experience and to immediate linguistic and non-linguistic context. In this talk, I will present a collection of recent experiments focusing on adaptation to immediate context in situated language comprehension. Using the "visual world" paradigm, which monitors eye movements in scenes during spoken comprehension, we have demonstrated the ability of the human language processor to rapidly adapt to and exploit diverse linguistic cues, prosody, as well as information from the immediate scene. These findings have led to the proposal of the "coordinated interplay account" of scene and utterance processing _LP_Knoeferle and Crocker, Cognitive Science, to appear_RP_, which argues for the rapid use, and even priority, of scene information during spoken comprehension. In order to model the rapid use of both experience and immediate context, we have developed a family of connectionist models based on Elmans Simple Recurrent Networks. These networks have been modified to include inputs not only for the utterance, but also for the current _LP_visual_RP_ context, and have been trained to model the findings of five experiments. In addition to the general ability of the networks to successfully learn the task, they also model on-line behavior: _LP_a_RP_ anticipation of role and filler, _LP_b_RP_ frequency biases derived from training, _LP_c_RP_ early influence of depicted events, _LP_d_RP_ delayed disambiguation without the scene, and _LP_e_RP_ relative priority of scene information _LP_Mayberry, Crocker & Knoeferle, IJCNLP, 2005_RP_. In conclusion, I argue that the mechanisms that underlie language comprehension are fundamentally adaptive in nature: optimized to recover the most likely interpretation based on relevant linguistic and non-linguistic information sources. I further suggest that theories and models of human language comprehension must be expanded to explain on-line interaction with other cognitive and perceptual processes.

Speaker Biography

Prof. Dr. Matthew W. Crocker began his study of Computer Science in Canada at the University of New Brunswick _LP_BSc, 1986_RP_ and the University of British Columbia _LP_MSc, 1988_RP_, where he specialised in computational linguistics. In 1992, he received a Doctorate in Philosophy from the Department of Artificial Intelligence, at the University of Edinburgh, Scotland _LP_now the School of Informatics_RP_. Dr. Crocker then continued in Edinburgh, where he was a lecturer in Artificial Intelligence and Cognitive Science _LP_1992-94_RP_, and subsequently an ESRC Research Fellow _LP_1994-1998_RP_. In 1999, Dr. Crocker took up a position as Research Scientist at the University of the Saarland, Germany, and in January 2000, he was appointed to the newly established Chair in Psycholinguistics, in the Department of Computational Linguistics at that University.

February 28, 2006 | 12:00 pm

“Modeling Science: Topic models of Scientific Journals and Other Large Text Databases”

David Blei, Princeton University

[abstract] [biography]

Abstract

A surge of recent research in machine learning and statistics has developed new techniques for finding patterns of words in document collections using hierarchical probabilistic models. These models are called "topic models" because the word patterns often reflect the underlying topics that are combined to form the documents; however topic models also naturally apply also such data as images and biological sequences. After reviewing the basics of topic modeling, I will describe two related lines of research in this field, which extend the current state of the art. First, while previous topic models have assumed that the corpus is static, many document collections actually change over time: scientific articles, emails, and search queries reflect evolving content, and it is important to model the corresponding evolution of the underlying topics. For example, an article about biology in 1885 will exhibit significantly different word frequencies than one in 2005. I will describe probabilistic models designed to capture the dynamics of topics as they evolve over time. Second, previous models have assumed that the occurrence of the different latent topics are independent. In many document collections, the presence of a topic may be correlated with the presence of another. For example, a document about sports is more likely to also be about health than international finance. I will describe a probabilistic topic model which can capture such correlations between the hidden topics. In addition to giving quantitative, predictive models of a corpus, topic models provide a qualitative window into the structure of a large document collection. This allows a user to explore a corpus in a topic-guided fashion. We demonstrate the capabilities of these new models on the archives of the journal Science, founded in 1880 by Thomas Edison. Our models are built on the noisy text from JSTOR, an online scholarly journal archive, resulting from an optical character recognition engine run over the original bound journals. _LP_joint work with M. Jordan, A. Ng, and J. Lafferty_RP_

Speaker Biography

David Blei received his Ph.D. from U.C. Berkeley in 2005. He is an assistant professor of Computer Science at Princeton University.

February 21, 2006 | 09:00 pm

“Multilingual Dependency Parsing with Spanning Tree Algorithms”

Ryan McDonald, University of Pennsylvania

[abstract]

Abstract

A dependency representation of a sentence identifies, for each word, all its modifying arguments. The ability to parse dependencies is an important problem due to the fact that they have been of use in machine translation, information extraction and many other common NLP problems. In this talk I will present some recent work on dependency parsing. The models I describe are based on two key components. The first is an aggressive edge based factorization that allows for the use of maximum spanning tree algorithms during inference. This is advantageous since it enables parsing of projective as well as non-projective languages _LP_e.g. free-word order languages like Dutch, German and Czech_RP_. The second component is a large-margin discriminative training model that can use rich sets of dependent features to overcome these rather weak factorization assumptions. I plan to present a number of experiments, including parsing results for 14 languages _LP_using a single model_RP_ as well as experiments describing the use of the parser for sentence compression.

February 7, 2006 | 07:00 am

“Improving Access to Conversational Speech: The MALACH Project”

Doug Oard, University of Maryland

[abstract] [biography]

Abstract

The MALACH Project is one of four major collaborative research projects on which researchers at the Center for Language and Speech Processing at Johns Hopkins and the Computational Linguistics and Information Processing Lab at the University of Maryland have worked closely together. In MALACH, our collaboration has focused on the intersection between speech recognition and information retrieval, with a large corpus of oral history interviews in several languages as the specific application. In this talk, Ill describe our progress to date with speech and search technologies, and with related work on text classification, machine translation, and information extraction. Ill conclude the talk with a few remarks about the future directions that we are exploring together in the new GALE program.

Speaker Biography

Douglas Oard is an Associate Professor at the University of Maryland, College Park, with a joint appointment in the College of Information Studies and the Institute for Advanced Computer Studies. He holds a Ph.D. in Electrical Engineering from the University of Maryland, and his research interests center around the use of emerging technologies to support information seeking by end users. His recent work has focused on interactive techniques for cross-language information retrieval, searching conversational media, and leveraging observable behavior to improve user modeling. Additional information is available at http://www.glue.umd.edu/~oard/.

January 31, 2006 | 12:00 pm

“What does "bwick" have that "bnick" does not?”

Adam Albright

[abstract] [biography]

Abstract

Native speakers of English generally agree that although "blick" is not actually a word, it is quite plausible as one; "bwick", on the other hand, is somewhat odd, and "bnick" is impossible. What computations are used to make these assessments, and what types of knowledge do they rely on? In this talk I contrast two classes of models of gradient wordlikeness: exemplar-based/lazy learning models, in which novel words are compared to the lexicon of existing words, vs. sequence-based/n-gram type models, in which the likelihood of novel words depends on the probability of their subparts. It is often assumed that an exemplar approach is necessary in modeling gradient wordlikeness, since both [bw] and [bn] are non-occurring _LP_zero probability_RP_ sequences, yet [bw] is more similar to existing [br], [bl], etc. I present a sequence-based model that overcomes this difficulty by considering natural classes _LP_sets of sounds that share phonological feature values_RP_. I show that when we compare the predictions of these models against human judgments, a sequential model that uses natural classes outperforms nearest-neighbor/exemplar-based models in a variety of important respects.

Speaker Biography

Adam Albright received his Ph.D. in linguistics from UCLA in 2002. He was a Faculty Fellow at UC Santa Cruz from 2002-2004, and is currently an Assistant Professor at MIT. His research interests include phonology, morphology, and learnability, with an emphasis on using computational modeling and experimental techniques to investigate issues in phonological theory.

Back to Top

2005

December 6, 2005 | 04:00 pm

“Strategies for Coreference from the Perspective of Information Exploitation”   Video Available

Breck Baldwin, Alias-i, Inc.

[abstract]

Abstract

Coreference entices us with the promise of radically improved information exploitation via data mining, search and information extraction. Coreference in its canonical form involves equating text mentions of Abu Musab al-Zarqawi with mentions in Arabic, phone calls which reference him, images that contain him. Once such a foundation of coreference is established over a body of information, questions like "get me all individuals with some relation to al-Zarqawi" become feasible. It also is a dynamite research problem. Progress has been made in text mediums with apparently excellent results in named entity recognition, pronoun resolution, cross document individual resolution and database linking. This suggests that some sort of Uber-search/indexing engine should fall out the bottom of a series of 90% f-measure results in these key areas. Unfortunately, this is not the case and for good reasons. In this talk I will argue that there are fundamental flaws in how we think about coreference in the context of information access. The argument ranges from basic philosophical issues about what an entity or an ontology is to an analysis of why first-best approaches to entity detection hobble performance in significant ways. As a proposed strategy for approaching the problem I will discuss our own efforts two directions: 1_RP_ Targeting known entities using match filtering as well as n-best driven analysis with character language models, and 2_RP_ targeting unknown entities with n-best chunking approaches to named entity extraction as opposed to first-best approaches commonly used.

November 29, 2005 | 05:00 pm

“Prosody in Spoken Language Processing”

Izhak Shafran, Johns Hopkins University

[abstract]

Abstract

Automatic speech recognition is now capable of transcribing speech from a variety of sources with high accuracy. This has opened up new opportunities and challenges in translation, summarization and distillation. Currently, most applications only extract the sequence of words from a speakers voice and ignore other useful information that can be inferred from speech such as prosody. Prosody has been studied extensively by linguists and is often characterized in terms of phrasing _LP_break indices_RP_, tones and emphasis _LP_prominence_RP_. The availability of a prosodically labeled corpus of conversational speech has spurred renewed interest in exploiting prosody for downstream applications. As a first step, an automatic method is needed to detect prosodic events. For this task, we have investigated the performance of a series of classifiers with increasing complexity, namely, decision tree, bagging-based classifier, random forests and hidden Markov models of different orders. Our experiments show that break indices and prominence can be detected with accuracies above 80%, making them useful for practical applications. Two such examples were explored. In the context of disfluency detection, the interaction between the prosodic interruption point and the syntactic EDITED constituents were modeled with a simple and direct model -- a PCFG with additional tags. The preliminary results are promising and show that the F-score of the EDIT constituent improves significantly without degrading the overall F-measure significantly. The task of building elaborate generative models is difficult, largely, due to the lack of an authoritative theory on syntax-phonology interface. An alternative approach is to incorporate the interaction as features in a discriminative framework for parsing, speech recognition or metadata detection. As an example, we illustrate how this can be done in a sentence boundary detection using a re-ranking framework and show improvements on a state-of-the-art system. The work reported in this talk was carried out in the 2005 JHU workshop and previously at University of Washington in collaboration with several researchers.

November 15, 2005 | 05:00 pm

“Integrative Models of the Cardiac Ventricular Myocyte”   Video Available

Raimond Winslow, Johns Hopkins University

[abstract]

Abstract

Cardiac electrophysiology is a field with a rich history of integrative modeling. A particularly important milestone was the development of the first biophysically-based cell model describing interactions between voltage-gated membrane currents, pumps and exchangers, and intracellular calcium _LP_Ca2+_RP_ cycling processes in the cardiac Purkinje fiber _LP_DiFrancesco & Noble, Phil. Trans. Roy. Soc. Lond. B 307: 353_RP_ and the subsequent elaboration of this model to describe the cardiac ventricular myocyte action potential _LP_Noble et al. Ann. N. Y. Acad. Sci. 639: 334; Luo, C-H and Rudy, Y. Circ. Res. 74: 1071_RP_. This talk will review the “state-of-the-art†in integrative modeling of the cardiac myocyte, focusing on modeling of the ventricular myocyte because of its significance to arrhythmia and heart disease. Special emphasis will be placed on the importance of modeling mechanisms of Ca2+-Induced Ca2+-Release _LP_CICR_RP_. CICR is the process by which influx of trigger calcium _LP_Ca2+_RP_ through L-Type Ca2+ channels _LP_LCCs_RP_ leads to opening of ryanodine sensitive Ca2+ release channels _LP_RyRs_RP_ in the junctional sarcoplasmic reticulum _LP_JSR_RP_ membrane and release of Ca2+ from the JSR. It is of fundamental importance in cardiac muscle function, as it not only underlies the process of muscle contraction, but is also involved in regulation of the cardiac action potential. We will demonstrate that every model of CICR in use today has serious shortcomings, and we will offer insights as to how these shortcomings must be addressed in order to develop reconstructive and predictive models that can be used to investigate myocyte function in both health and disease. _LP_Supported by NIH HL60133, the NIH Specialized Center of Research on Sudden Cardiac Death P50 HL52307, the Whitaker Foundation, the Falk Medical Trust, and IBM Corporation_RP_

November 8, 2005 | 04:00 pm

“Syntactic Models of Alignment”   Video Available

Dan Gildea, University of Rochester

[abstract] [biography]

Abstract

I will describe work on tree-based models for aligning parallel text, presenting results for models that make use of syntactic information provided for one or both languages, as well as models that infer structure directly from parallel bilingual text. In the second part of the talk, I will discuss some theoretical aspects of Synchronous Context Free Grammars as a model of translation, describing a method to factor grammars to lower the complexity of synchronous parsing.

Speaker Biography

Dan Gildea received a BA in linguistics and computer science, as well as an MS and PhD in computer science, from the University of California, Berkeley. After two years as a postdoctoral fellow at the University of Pennsylvania, he joined the University of Rochester as an assistant professor of computer science in 2003.

November 1, 2005 | 04:00 pm

“Quest for the Essence of Language”   Video Available

Steven Greenberg, Centre for Applied Hearing Research, Technical Univ of Denmark; Silicon Speech, Santa Venetia, CA, USA

[abstract] [biography]

Abstract

Spoken language is often conceptualized as mere sequences of words and phonemes. From this traditional perspective, the listeners task is to decode the speech signal into constituent elements derived from spectral decomposition of the acoustic signal. This presentation outlines a multi-tier theory of spoken language in which utterances are composed not only of words and phones, but also syllables, articulatory-acoustic features and _LP_most importantly_RP_ prosemes, encapsulating the prosodic pattern in terms of prominence and accent. This multi-tier framework portrays pronunciation variation and the phonetic micro-structure of the utterance with far greater precision than the conventional lexico-phonetic approach, thereby providing the prospect of efficiently modeling the information-bearing elements of spoken language for automatic speech recognition and synthesis.

Speaker Biography

In the early part of his career, Steven Greenberg, studied Linguistics, first at the University of Pennsylvania _LP_A.B._RP_ and then at the University of California, Los Angeles _LP_Ph.D._RP_. He also studied Neuroscience _LP_UCLA_RP_, Psychoacoustics _LP_Northwestern_RP_ and Auditory Physiology _LP_Northwestern, University of Wisconsin_RP_. He was a principal researcher in the Neurophysiology Department at the University of Wisconsin-Madison for many years, before migrating back to the "Golden West" in 1991 to assume directorship of a speech laboratory at the University of California, Berkeley, where he also held a tenure-level position in the Department of Linguistics. In 1995, Dr. Greenberg migrated a few blocks further west to join the scientific research staff at the International Computer Science Institute _LP_affiliated with, but independent from UC-Berkeley_RP_. During the time he was at ICSI, he published many papers on the phonetic and prosodic properties of spontaneous spoken language, as well as conducted perceptual studies regarding the underlying acoustic _LP_and visual_RP_ basis of speech intelligibility. He also developed _LP_with Brian Kingsbury_RP_ the Modulation Spectrogram for robust representation of speech for automatic speech recognition as well as syllable-centric classifiers of phonetic features for speech technology applications. Since 2002, he has been President of Silicon Speech, a company based in the San Francisco Bay Area, that is dedicated to developing future-generation speech technology based on principles of human brain function and information theory. Beginning in 2004, Dr. Greenberg has also been a Visiting Professor at the Centre for Applied Hearing Research at the Technical University of Denmark where he performs speech-perception-related research.

October 25, 2005 | 04:00 pm

“On the Parameter Space of Lexicalized Statistical Parsing Models”   Video Available

Dan Bikel, IBM

[abstract] [biography]

Abstract

Over the last several years, lexicalized statistical parsing models have been hitting a "rubber ceiling" when it comes to overall parse accuracy. These models have become increasingly complex, and therefore require thorough scrutiny, both to achieve the scientific aim of understanding what has been built thus far, and to achieve both the scientific and engineering goal of using that understanding for progress. In this talk, I will discuss how I have applied as well as developed techniques and methodologies for the examination of the complex systems that are lexicalized statistical parsing models. The primary idea is that of treating the "model as data", which is not a particular method, but a paradigm and a research methodology. Accordingly, I take a particular, dominant type of parsing model and perform a macro analysis, to reveal its core _LP_and design a software engine that modularizes the periphery_RP_, and I also crucially perform a detailed analysis, which provides for the first time a window onto the efficacy of specific parameters. These analyses have not only yielded insight into the core model, but they have also enabled the identification of "inefficiencies" in my baseline model, such that those inefficiencies can be reduced to form a more compact model, or exploited for finding a better-estimated model with higher accuracy, or both.

Speaker Biography

Daniel M. Bikel graduated from Harvard University in 1993 with an A.B. in Classics--Greek & Latin. After spending a post-graduate year at Harvard taking more courses in computer science, engineering and music, Bikel joined Ralph Weischedels research group at BBN in Cambridge, MA. During his three years there, Bikel developed several NLP technologies, including Nymble _LP_now called IdentiFinder_LP_tm_RP__RP_, a learning named-entity detector. In 2004, Bikel received a Ph.D. from the Computer and Information Science Department at the University of Pennsylvania _LP_advisor: Prof. Mitch Marcus_RP_. At Penn, he focused on statistical natural language parsing, culminating in a dissertation entitled identically to this talk. Bikel is currently a Research Staff Member at IBMs T. J. Watson Research Center in Yorktown Heights, NY.

October 20, 2005 | 08:00 pm

“Human-Like Audio Signal Processing”

David V. Anderson, Georgia Institute of Technology **New Date & Time**

[abstract] [biography]

Abstract

The discipline of signal processing provides formal, mathematical techniques for processing information. The applications of signal processing are countless and make up an increasing amount of all computing performed across all computing platforms. Problems and opportunities are arising, however, that will be met as we learn more about how neurological systems process information. The first problem stems from the evolution of computing devices. For a variety of reasons, high performance computing platforms must employ increasing parallelism to achieve performance improvements. The problem comes from the difficulty in effectively using highly parallel systems. The second problem comes from the need to decrease the power consumption of computing systems. This is true for everything from large systems, where power determines density and cooling costs, to small systems where battery life is the limiting factor. A third problem is inherent in the difficult, ongoing task of making machines intelligent. All of these problems may be addressed by applying techniques learned through the study of neurological systems. These systems are highly parallel and operate very efficiently. Additionally, the intelligence of biological systems is largely due to the ability of these systems to recognize patterns--an ability that greatly exceeds that of synthetic systems in robustness and flexibility. This talk will discuss the problems mentioned above as well as summarizing some recent applications of signal processing that have benefited from the inspiration or modeling of neurological systems.

Speaker Biography

David V. Anderson received the B.S and M.S. degrees from Brigham Young University, Provo, UT and the Ph.D. degree from Georgia Institute of Technology _LP_Georgia Tech_RP_ Atlanta, GA, in 1993, 1994, and 1999, respectively. He is an associate professor in the School of Electrical and Computer Engineering at Georgia Tech and an associate director of the Center for Research in Embedded Systems Technology. His research interests include audio and psycho-acoustics, signal processing in the context of human auditory characteristics, and the real-time application of such techniques using both analog and digital hardware. His research has included the development of a digital hearing aid algorithm that has now been made into a successful commercial product. Dr. Anderson was awarded the National Science Foundation CAREER Award for excellence as a young educator and researcher in 2004 and is a recipient of the 2004 Presidential Early Career Awards for Scientists and Engineers _LP_PECASE_RP_. He has over 60 technical publications and 5 patents/patents pending. Dr. Anderson is a member of the IEEE, the Acoustical Society of America, ASEE, and Tau Beta Pi. He has been actively involved in the development and promotion of computer enhanced education and other education programs.

October 11, 2005 | 04:00 pm

“Making Visualization Work”   Video Available

Ben Bederson, University of Maryland

[abstract] [biography]

Abstract

The human visual system is incredibly powerful. Many people have tried to create computer systems that present information visually to take advantage of that power. The potential is great - for tasks ranging from detecting patterns and outliers to quickly browsing and comparing large datasets. And yet, the number of successful visualization programs that we use today is limited. In this talk, I will discuss common problems with visualizations, and how several approaches that we have developed avoid those problems. By applying Zoomable User Interfaces, Fisheye distortion, carefully controlled animation, and working closely with users, we have created a range of applications which we have shown to have significant benefits. I will show demos from application domains including photos, trees, graphs, and even digital libraries. To build these visualizations, we have built Piccolo, a general open source toolkit available in Java and C#. It offers a hierarchical scene graph in the same style that many 3D toolkits offer - but for 2D visualization. By offering support for graphical objects, efficient rendering, animation, event handling, etc., we Piccolo can reduce the effort in building complex visual applications with minimal run-time expense. In this talk, I will also discuss Piccolo, alternative approaches to building visualizations - and the computational expense of using Piccolo.

Speaker Biography

Benjamin B. Bederson is an Associate Professor of Computer Science and director of the Human-Computer Interaction Lab at the Institute for Advanced Computer Studies at the University of Maryland, College Park. His work is on information visualization, interaction strategies, digital libraries, and accessibility issues such as voting system usability.

October 4, 2005 | 05:00 pm

“Progress in speaker adaptation and acoustic modeling for LVCSR”   Video Available

George Saon, IBM

[abstract] [biography]

Abstract

This talk is organized in two parts. In the first part, we discuss a non-linear feature space transformation for speaker/environment adaptation which forces the individual dimensions of the acoustic data to be Gaussian distributed. The transformation is given by the preimage under the Gaussian cumulative distribution function _LP_CDF_RP_ of the empirical CDF for each dimension. In the second part, we review some existing techniques for precision matrix modeling such as EMLLT and SPAM and we describe our recent work on discriminative training of full covariance Gaussians on the 2300 hours EARS dataset.

Speaker Biography

http://www.clsp.jhu.edu/seminars/slides/F2005/CLSP Seminar Slides 2005-10-24 - Saon, George - Progress in speaker adaptation and acoustic modeling for LVCSR.pdf

July 21, 2005

“Introduction to Arabic Natural Language Processing - Part 2”   Video Available

Nizar Habash, Columbia University

July 20, 2005

“An Information State Approach to Collaborative Reference”   Video Available

Matthew Stone, Rutgers University

July 14, 2005

“Introduction to Arabic Natural Language Processing - Part 1”   Video Available

Nizar Habash, Columbia University

May 3, 2005 | 03:00 pm

“Conversational Speech and Language Technologies for Structured Document Generation”

Juergen Fritsch, MModal

[abstract] [biography]

Abstract

I will present Multimodal Technologies AnyModal CDS, a clinical documentation system that is capable of creating structured and encoded medical reports from conversational speech. Set up in form of a back-end service oriented architecture, the system is completely transparent to the dictating physician and does not require active enrollment or changes in dictation behavior, while producing complete and accurate documents. In contrast to desktop dictation systems which essentially produce a literal transcript of spoken audio, AnyModal CDS attempts to recognize the meaning and intent behind dictated phrases, producing a richly structured and easily accessible document. In the talk, I will discuss some of the enabling speech and language technologies, focusing on continuous semi-supervised adaptation of speech recognition models based on non-literal transcripts and on combinations of statistical language models and semantically annotated probabilistic grammars for the modeling and identification of structure in spoken audio.

Speaker Biography

Dr. Jürgen Fritsch is co-founder and chief scientist of Multimodal Technologies _LP_M*Modal_RP_ where he leads research efforts in the fields of speech recognition and natural language processing. He has an extensive background in speech and language technologies and in advancing the state-of-the-art in these areas. He held research positions at the University of Karlsruhe, Germany, and at Carnegie Mellon University, Pittsburgh where he was participating in the LVCSR/Switchboard speech recognition evaluations. Before co-founding M*Modal, Jürgen was co-founder of Interactive Systems Inc, where he was instrumental in the design and development of an advanced conversational speech recognition platform that continuously evolved into one of the foundations of M*Modals current line of products. Jürgen received his Ph.D. and M.Sc. degrees in computer science from the University of Karlsruhe, Germany.

April 26, 2005 | 02:35 pm

“Machine Translation Performance Evaluation Based on DOD Standards”   Video Available

Doug Jones, MIT

[biography]

Speaker Biography

Doug Jones is a Linguist on the technical staff in the Information Systems Technology Group at MIT Lincoln Laboratory _LP_MIT Ph.D. on Hindi Syntax, Stanford AB/AM specializing in computational linguistics_RP_. He was employed for four years as a Senior Scientific Linguist at NSA, focusing on machine translation of low density languages. His current area of active research focuses on evaluating human language technology applications such as machine translation and speech recognition in terms of enhancing human language processing skills. A list of recent papers can be found at the Information Systems Technology home page at MIT Lincoln Laboratory has additional information and links: http://www.ll.mit.edu/IST/pubs.html

April 19, 2005 | 07:00 pm

“New Directions in Robust Automatic Speech Recognition”

Richard Stern, Carnegie Mellon University

[abstract]

Abstract

As speech recognition technology is transferred from the laboratory to the marketplace, robustness in recognition is becoming increasingly important. This talk will review and discuss several classical and contemporary approaches to robust speech recognition. The most tractable types of environmental degradation are produced by quasi-stationary additive noise and quasi-stationary linear filtering. These distortions can be largely ameliorated by the "classical" techniques of cepstral high-pass filtering _LP_as exemplified by cepstral mean normalization and RASTA filtering_RP_, as well as by techniques that develop statistical models of the distortion _LP_such as codeword-dependent cepstral normalization and vector Taylor series expansion_RP_. Nevertheless, these types of approaches fail to provide much useful improvement in accuracy when speech is degraded by transient or non-stationary noise such as background music or speech. We describe and compare the effectiveness of techniques based on missing-feature compensation, multi-band analysis, feature combination, and physiologically-motivated auditory scene analysis toward providing increased recognition accuracy in difficult acoustical environments.

April 12, 2005 | 12:00 pm

“Old and new work in discriminative training of acoustic models”   Video Available

Daniel Povey, IBM TJ Watson Research Center

[abstract]

Abstract

I will give a general review on the subject of discriminative training of acoustic models, with special emphasis on MPE. I will then describe fMPE, which is a more recent discriminative training technique developed at IBM. It is a feature-space transformation that is trained by maximizing the MPE criterion.

April 12, 2005

“Speech, Language, & Machine Learning”   Video Available

Jeff A. Bilmes

[abstract]

Abstract

CLSP "Speech, Language, & Machine Learning" Jeff A. Bilmes University of Washington, Seattle April 1, 2005 Running Time: 1 Hour 10 Min 35 Sec Dub-Date:  4/4/05

April 5, 2005 | 05:00 pm

“Bipartite Graph Factorization in Static Decoding Graphs with Long-Span Acoustic Context: An Interesting Combinatorial Problem in ASR”   Video Available

Geoffrey Zweig, IBM TJ Watson Research

[abstract] [biography]

Abstract

A key problem in large vocabulary speech recognition is how to search for the word sequence with the highest likelihood, given an acoustic model, a language model, and some input audio data. There are two standard approaches to doing this: 1_RP_ to construct the search space “on-demand†so as to represent just the portions that are reasonably likely given the data, and 2_RP_ to construct ahead-of-time a full representation of the entire search space. This paper identifies and solves a problem that arises in the construction of a full representation of the search space when long span acoustic context is used in the acoustic model, specifically when the expected acoustic realization of a word depends on the identity of the preceding word. In this case, a sub-portion of the search space contains a bipartite graph with O_LP_V_RP_ vertices and O_LP_V^2_RP_ edges, where V is the vocabulary size. For large vocabulary systems, the number of edges is prohibitive, and we tackle the problem of finding an edge-wise minimal representation of this sub-graph. This is done by identifying complete bipartite sub-graphs within the graph, and replacing the edges of each such sub-graph with an extra vertex and edges that connect the left and right sides of the sub-graph to the new vertex. The problem of finding the smallest such representation is NP-hard, and we present a heuristic for finding a reasonable answer. The talk concludes with some experimental results on a large-vocabulary speech recognition system and a discussion of related problems.

Speaker Biography

Geoffrey Zweig received his PhD in Computer Science from the University of California at Berkeley in 1998, after which he joined IBM at the T.J. Watson Research Center. At IBM, Geoffrey manages the advanced large vocabulary speech recognition research group. His responsibilities include the development of improved acoustic modeling techniques and state-of-the-art decoding algorithms. Geoffrey was a PI for the DARPA EARS program and organized IBM’s participation in the 2003 and 2004 evaluation. His research interests revolve around the application of machine learning techniques such as boosting and Bayesian Network modeling to speech recognition, as well as highly practical applications such as the automated construction of grammars for directory dialer applications. In addition to speech recognition, Geoffrey has worked on a wide variety of topics, including extremely large scale document clustering for the web, and DNA physical mapping. Geoffrey is a member of the IEEE and an associate editor of the IEEE Transactions on Speech and Audio Processing.

March 29, 2005 | 02:35 pm

“Translingual Information Processing”

Salim Roukos, IBM TJ Watson Research

[abstract]

Abstract

Searching unstructured information in the form of _LP_largely_RP_ text with increasing image, audio, and video content is fast becoming a daily activity for many people. Increasingly, the content is becoming multilingual _LP_e.g. one such trend is that non-english speakers became the majority of online users in the summer of 2001 and continue to increase their share reaching two-thirds today_RP_. To help assist users with accessing answers to their information needs regardless of the original language of the relevant content, we at IBM Research have a number of projects to handle multilingual content ranging from machine translation, information extraction, to topic detection and tracking. In this talk, we will present an overview of our work on statistical machine translation and demonstrate a cross-lingual search engine to search Arabic content using English queries.

March 8, 2005 | 08:00 am

“Interaction: Conjectures, Results, Myths”   Video Available

Dina Goldin, University of Connecticut

[abstract] [biography]

Abstract

Computer technology has shifted from mainframes to locally networked workstations and now to mobile wireless devices. Software engineering has evolved from procedure-oriented to object-oriented and component-based systems. Al has refocused from logical reasoning and search algorithms to intelligent agents and partially observable environments. These parallel changes exemplify a conceptual paradigm shift from algorithms to interaction. Interactive computational processes allow for input and output to take place during the computation, in contrast to the traditional "algorithmic" computation which transforms predefined input to output. It had been conjectured _LP_Wegner 1997_RP_ that "interaction is more powerful than algorithms". We present Persistent Turing Machines _LP_PTMs_RP_ that serve as a model for sequential interactive computation. PTMs are multitape Turing Machines _LP_TMs_RP_ with a persistent internal tape and dynamic stream-based semantics. We formulate observation-based notions of system equivalence and computational expressiveness. Among other results, we demonstrate that PTMs are more expressive than TMs, thus proving Wegners conjecture. As an analogue of the Church-Turing Thesis which relates Turing machines to algorithmic computation, it is hypothesized that PTMs capture the intuitive notion of sequential interactive computation. We end by considering the historic reasons for the widespread misinter- pretation of the Church-Turing Thesis, that TMs model all computation. The myth that this interpretation is equivalent to the original thesis is fundamental to the mainstream theory of computation. We show how it can be traced to the establishment of computer science as a discipline in the 1960s.

Speaker Biography

Dina Q. Goldin is a faculty member in Computer Science & Engineering at the University of Connecticut and an adjunct faculty member in Computer Science at Brown University. Dr. Goldin obtained her B.S. in Mathematics and Computer Science at Yale University, and her M.S. and Ph.D. in Computer Science at Brown University. Her current topics of research are models of interaction and database queries.

March 1, 2005 | 01:00 pm

“Multi-Rate and Variable-Rate Accoustic Modeling of Speech at Phone Syllable and Time Scales”

Ozgur Cetin, ICSI/Berkeley

[abstract] [biography]

Abstract

In this talk we will describe a multi-rate extension of hidden Markov models _LP_HMMs_RP_, multi-rate coupled HMMs, and present their applications to acoustic modeling for speech recognition. Multi-rate HMMs are parsimonious models for stochastic processes that evolve at multiple time scales, using scale-based observation and state spaces. For speech recognition, we use multi-rate HMMs for joint acoustic modeling of speech at multiple time scales, complementing the usual short-term, phone-based representations of speech with wide modeling units and long-term temporal features. We consider two alternatives for the coarse scale in our multi-rate models, representing either phones, or syllable structure and lexical stress. We will also describe a variable-rate sampling extension to the basic multi-rate model, which tailors the analysis towards temporally fast-changing regions and significantly improves over fixed-rate sampling. Experiments on conversational telephone speech will be presented, showing that the proposed multi-rate approaches significantly improve recognition accuracy over HMM- and other coupled HMM-based approaches _LP_e.g. feature concatenation and multi-stream coupled HMMs_RP_ for combining short- and long-term acoustic and linguistic information. This is a joint work with Mari Ostendorf of University of Washington.

Speaker Biography

Ozgur Cetin is a post-doctoral researcher at the International Computer Science Institute, Berkeley. He has received PhD and MS degrees from University of Washington, Seattle in 2005 and 2000, respectively, both in electrical engineering, and a BS degree from Bilkent University, Turkey in 1998 in electrical and electronics engineering. His research interests include machine learning, and speech and language processing.

February 22, 2005 | 02:35 pm

“Boundaries to the influence of Animates.”   Video Available

Annie Zaenen, PARC

[abstract]

Abstract

The talk reports on recent work in corpus analysis done at Stanford. The studies aim at determining the weight of various factors that influence the choice of syntactic paraphrases. More specifically I concentrate on the influence of animacy in the dative alternation and in left-dislocation and topicalization and discuss current models of language production in the light of our quantitative results. The talk will end with a short discussion of the relevance of these findings for NL generation.

February 15, 2005 | 03:00 pm

“Progress toward the LIFEmeter: Epidemiology meets Speech Recognition”   Video Available

Thomas Glass, Johns Hopkins School of Public Health

[abstract] [biography]

Abstract

Dr. Glass is Associate professor of Epidemiology at the Bloomberg School of Public Health. He is broadly trained in social science and holds a Ph.D. in Medical Sociology from Duke University. He completed post-doctoral training in epidemiology at Yale School of Medicine. He has been on the faculty of the Yale School of Medicine, Harvard School of Public Health and the Johns Hopkins Bloomberg School of Public Health. Dr. Glass is primarily interested in understanding the impact of social and behavioral factors on health and functioning in late life. His previous work has explored the role of social support, social networks and social engagement on outcomes ranging from stroke recovery, to alcohol consumption and dementia risk. He teaches, directs graduate students and conducts research in social epidemiology. In addition to observational studies, he has done intervention studies to improve function in older. More recently, his work has centered on unraveling the impact of factors in the built and social environment of urban neighborhoods on functioning. He oversees the Baltimore Neighborhood Research Consortium _LP_BNRC_RP_ at Johns Hopkins. Among his current projects, Dr. Glass is leading a team to develop integrated sensor technology that will improve the measurement of social, physical and cognitive function for use in population studies.

Speaker Biography

http://www.clsp.jhu.edu/seminars/slides/S2005/Glass.pps

February 8, 2005 | 08:00 am

“From Phrase-Based MT towards Syntax-Based MT”   Video Available

David Chiang, University of Maryland

[abstract] [biography]

Abstract

We explore two ways of extending phrase-based machine translation to incorporate insights from syntax.

Speaker Biography

David Chiang is a postdoctoral researcher at the University of Maryland Institute for Advanced Computer Studies. He received his PhD in Computer and Information Science from the University of Pennsylvania in 2004, under the supervision of Aravind K. Joshi. His research interests are in applying formal grammars to a variety of areas, including statistical machine translation, statistical natural language parsing, and biological sequence analysis.

February 1, 2005 | 04:56 pm

“NLP Research for Commercial Development of Writing Evaluation Capabilities”   Video Available

Jill Burstein, ETS

[abstract] [biography]

Abstract

Automated essay scoring was initially motivated by its potential cost savings for large-scale writing assessments. However, as automated essay scoring became more widely available and accepted, teachers and assessment experts realized that the potential of the technology could go way beyond just essay scoring. Over the past five years or so, there has been rapid development and commercial deployment of automated essay evaluation for both large-scale assessment and classroom instruction. A number of factors contribute to an essay score, including varying sentence structure, grammatical correctness, appropriate word choice, errors in spelling and punctuation, use of transitional words/phrases, and organization and development. Instructional software capabilities exist that provide essay scores and evaluations of student essay writing in all of these domains. The foundation of automated essay evaluation software is rooted in NLP research. This talk will walk through the development of CriterionSM, e-rater, and Critique writing analysis tools, automated essay evaluation software developed at Educational Testing Service.

Speaker Biography

Jill Burstein is a Principal Development Scientist at Educational Testing Service. She received her Ph.D. in Linguistics from the City University of New York, Graduate Center. The focus of her research is on the development of automated writing evaluation technology. She is one of the inventors of e-rater®, an automated essay scoring system developed at Educational Testing Service. She has collaborated on the research and development of capabilities that provide evaluative feedback on student writing for grammar, usage, mechanics, style, and discourse analysis for CriterionSM, a web-based writing instruction application. She is co-editor of the book “Automated Essay Scoring: A Cross-Disciplinary Perspective.â€Â

January 25, 2005 | 12:00 pm

“Reducing Confusions and Ambiguities in Speech Translation”   Video Available

Pascale Fung, Hong Kong University of Science & Technology

[abstract] [biography]

Abstract

I will introduce some of our research on improving speech translation, by reducing acoustic-phonetic confusions in accented speech, named entity recognition from speech, and semantics-based translation disambiguation. Accent is a major factor in causing acoustic-phonetic confusions for spontaneous Mandarin speech. Named entity extraction is needed to facilitate the understanding of “What, Who, When, Where, Why†contained in the speech. Translation disambiguation is essential for translation accuracy. More importantly, translation disambiguation with frame semantics helps decode the meaning of a spoken query even if the recognition is not perfect. We believe our approach will bring marked improvement to speech translation performances.

Speaker Biography

http://www.clsp.jhu.edu/seminars/slides/S2005/Fung.pdf

Back to Top

2004

December 7, 2004 | 07:00 am

“Use of a perturbation-correlation method to measure the relative importance of different frequency bands for speech recognition”   Video Available

Christophe Micheyl, MIT

[abstract] [biography]

Abstract

In order to recognize speech, human listeners use cues distributed across different frequencies. Frequency-importance functions, which indicate the relative importance of different frequency bands for speech recognition, are an essential ingredient of predictive models of speech intelligibility, such as the articulation index. They can also be useful for optimizing multi-band speech-processing devices _LP_e.g., current hearing aids_RP_. Traditionally, frequency-importance functions have been assessed using low- and high-pass filtered speech. However, this approach has some limitations. An alternative approach, pioneered by Doherty and Turner _LP_J. Acoust. Soc. Am. 100, 1996_RP_, uses wide-band speech, to which random perturbations _LP_noise_RP_ are added independently in different bands. The importance of each band is then estimated based on the correlation between the signal-to-noise ratios applied successively in that band and the corresponding binary recognition scores, across thousands of trials. In this talk, I will review results obtained with this perturbation-correlation approach. In particular, I will show how the approach may be used to gain insight into the strategies used by listeners to recognize speech in different kinds of acoustic backgrounds _LP_noise versus competing speech_RP_. I will also address the question of inter-listener variability and the influence of hearing loss. Finally, I will describe my recent efforts to better understand the theoretical _LP_mathematical_RP_ basis of the perturbation-correlation method as applied to speech, in an attempt to improve it. [Work done in collaboration with Gaëtan Gilbert, CNRS UMR 5020, Lyon, France]

Speaker Biography

I obtained a PhD in Experimental and Cognitive Psychology from Lumiere University _LP_Lyon, France_RP_ in 1995. From 1996 to 1997 I was as a Research Associate in the Department of Experimental Psychology of Cambridge University _LP_Cambridge, UK_RP_, and a Visiting Scientist in the Medical Research Council Cognition and Brain Sciences Unit _LP_MRC-CBU_RP_. I worked there with Bob Carlyon and Brian Moore for a total of three years. After being offered a tenure position by the French Centre National de la Recherche Scientifique _LP_CNRS_RP_, I went back to Lyon for about three years. I came over to the US in 2001. After a short stay in Pr. Rauscheckers lab at Institute for Cognitive and Computational Sciences, Georgetwon University _LP_Washington, DC_RP_, I joined Andrew Oxenhams group in the Research Laboratory of Electronics, Massachusetts Institute of Technology _LP_Cambridge, MA_RP_, where I am currently a Research Scientist

November 30, 2004 | 12:00 pm

“Towards a Universal Framework for Tree Transduction”

Stuart Shieber, Harvard

[abstract] [biography]

Abstract

The typical natural-language pipeline can be thought of as proceeding by successive transformation of various data structures, especially strings and trees. For instance, low-level speech processing can be viewed as transduction of strings of speech samples into phoneme strings, then into triphone strings, finally into word strings. Morphological processes can similarly be modeled as character string transductions. For this reason, weighted finite-state transducers _LP_WFST_RP_, a general formalism for string-to-string transduction, can serve as a kind of universal formalism for representing low-level natural-language processes. Higher-level natural-language processes can also be thought of as transductions, but on more highly structured representations, in particular, trees. Semantic interpretation can be viewed as a transduction from a syntactic parse tree to a tree of semantic operations whose simplification to logical form can be viewed as a further transduction. Machine translation systems have been viewed as tree transductions of various sorts as well. This raises the question as to whether there is a universal formalism for natural-language tree transduction that can play the same role there that WFST plays for string transduction. In this talk, we explore this question, proposing that the characterization of classical tree transducers in terms of bimorphisms, little known outside the formal language theory community, can be used as a unifying framework for a wide variety of tree transduction formalisms, including, for instance, several previously proposed for statistical machine translation and the back-end formalism for Dragons speech command and control system. The framework also places so-called synchronous grammar formalisms into the tree transducer family for the first time.

Speaker Biography

Stuart Shieber is Harvard College Professor and James O. Welch, Jr. and Virginia B. Welch Professor of Computer Science in the Division of Engineering and Applied Sciences at Harvard University. Professor Shieber was awarded a Presidential Young Investigator award in 1991, and was named a Presidential Faculty Fellow in 1993, one of only thirty in the country in all areas of science and engineering. At Harvard, he has been awarded two honorary chairs: the John L. Loeb Associate Professorship in Natural Sciences in 1993 and the Harvard College Professorship in 2001. He was elected a Fellow of the American Association for Artificial Intelligence in 2004. He is the author or editor of five books and numerous articles in computer science. Professor Shieber holds eight patents, and is co-founder of Cartesian Products, Inc., a high-technology research and development company based in Cambridge, Massachusetts, providing advanced software technology to improve worldwide communication and information access. He is also the founder of Microtome Publishing, a company dedicated to publishing services in support of open access to the scholarly literature.

November 23, 2004 | 02:35 pm

“Coping with Information Overload”

Dr. Allen Gorin, US Department of Defense

[abstract]

Abstract

Coping with information overload is a major challenge of the 21st century. In previous eras, access to information was difficult and often tightly controlled as a source of power. Today, we are overloaded with so much electronic information that it has become an obstacle to effective decision making. Thus, the challenge facing individuals and institutions is how to embrace this information rather than being paralyzed by it. The intelligence community is overloaded with huge volumes of information, moving at large velocities and comprising great variety. Information includes both content and context, which humans deal with as a gestalt but computer systems tend to treat separately. We discuss two complementary approaches to coping with information overload and the open research questions that arise in this emerging discipline. First is value estimation, where humans examine only the golden nuggets of information judged valuable by some process. The second approach is knowledge distillation, where the information is digested and compressed, producing salient knowledge for human consumption. Finally, there are many open questions regarding the symbiosis between people and machines for knowledge discovery.

November 9, 2004 | 02:35 pm

“Discriminative Learning of Generative Models”   Video Available

Tony Jebara, Columbia

[abstract] [biography]

Abstract

Generative models such as Bayesian networks, distributions, and hidden Markov models are elegant formalisms to setup and specify prior knowledge about a learning problem. However, the standard estimation methods they rely on, including maximum likelihood and Bayesian integration do not focus modeling resources on a particular input-output task. They only generically describe the data. In applied settings when models are imperfectly matched to real data, more discriminative learning as in support vector machines is crucial for improving performance. In this talk, I show how we can learn generative models optimally for a given task such as classification and obtain large margin discrimination boundaries. Through maximum entropy discrimination, all exponential family models can be discriminative via convex programming. Furthermore, the method handles interesting latent models such as mixtures and hidden Markov models. This is done via a variant of the maximum entropy that uses variational bounding on classification constraints to make computations tractable in the latent case. Interestingly, the method gives rise to Lagrange multipliers that behave like posteriors over hidden variables. Preliminary experiments are shown.

Speaker Biography

Tony Jebara is an Assistant Professor of Computer Science at Columbia University. He is Director of the Columbia Machine Learning Laboratory whose research focuses upon machine learning, computer vision and related application areas such as human-computer interaction. Jebara is also a Principal Investigator at Columbias Vision and Graphics Center. He has published over 30 papers in the above areas including the book Machine Learning: Discriminative and Generative _LP_Kluwer_RP_. Jebara is the recipient of the Career award from the National Science Foundation and has also recieved honors for his papers from the International Conference on Machine Learning and from the Pattern Recognition Society. He has served as chair or program committee member for various conferences including ICDL, ICML, COLT, UAI, IJCAI and on the editorial board of the Machine Learning Journal. Jebaras research has been featured on television _LP_ABC, BBC, New York One, TechTV, etc._RP_ as well as in the popular press _LP_Wired Online, Scientific American, Newsweek, Science Photo Library, etc._RP_. Jebara obtained his Bachelors from McGill University _LP_at the McGill Center for Intelligent Machines_RP_ in 1996. He obtained his Masters in 1998 and his PhD in 2002 both from the Massachusetts Institute of Technology _LP_at the MIT Media Laboratory_RP_. He is currently a member of the IEEE, ACM and AAAI. Professor Jebaras research and laboratory are supported in part by Microsoft, Alpha Star Corporation and the National Science Foundation.

November 2, 2004 | 02:35 pm

“Unsupervised Learning of Natural Language Structure”   Video Available

Dan Klein, Berkeley

[abstract] [biography]

Abstract

There is precisely one complete language processing system to date: the human brain. Though there is debate on how much built-in bias human learners might have, we definitely acquire language in a primarily unsupervised fashion. On the other hand, computational approaches to language processing are almost exclusively supervised, relying on hand-labeled corpora for training. This reliance is largely due to repeated failures of unsupervised approaches. In particular, the problem of learning syntax _LP_grammar_RP_ from completely unannotated text has received a great deal of attention for well over a decade, with little in the way of positive results. We argue that previous methods for this task have generally failed because of the representations they used. Overly complex models are easily distracted by non-syntactic correlations _LP_such as topical associations_RP_, while overly simple models aren rich enough to capture important first-order properties of language _LP_such as directionality, adjacency, and valence_RP_. We describe several syntactic representations which are designed to capture the basic character of natural language syntax as directly as possible. With these representations, high-quality parses can be learned from surprisingly little text, with no labeled examples and no language-specific biases. Our results are the first to show above-baseline performance in unsupervised parsing, and far exceed the baseline _LP_in multiple languages_RP_. These specific grammar learning methods are useful since parsed corpora exist for only a small number of languages. More generally, most high-level NLP tasks, such as machine translation and question-answering, lack richly annotated corpora, making unsupervised methods extremely appealing, even for common languages like English.

Speaker Biography

Dan Klein is an assistant professor of computer science at UC Berkeley, having recently completed his doctoral work at Stanford University. He holds a BA from Cornell University _LP_summa cum laude in computer science, linguistics, and math_RP_ and a masters in linguistics from Oxford University. Professor Kleins research focuses on natural language processing, including unsupervised grammar induction, statistical parsing methods, and information extraction. His academic honors include a British Marshall Fellowship, several graduate research fellowships, and best paper awards at the ACL and EMNLP conferences.

October 26, 2004 | 02:35 pm

“Towards Automatic Acquisition of Ontological Knowledge”   Video Available

Patrick Pantel, ISI USC

[abstract] [biography]

Abstract

Recently, many corpus-based and web-based knowledge acquisition systems have been proposed for creating lexical resources. Not many attempts, however, have been made at ontologizing these resources. We present a semi-automatic method for extracting fine-grained semantic relations between verbs. We detect similarity, strength, antonymy, enablement, and temporal happens-before relations between pairs of strongly associated verbs using lexico-syntactic patterns over the Web. We provide the resource, called VerbOcean, for download at http://semantics.isi.edu/ocean/. We will discuss current work on ontologizing lexical resources like VerbOcean. Using an automatic algorithm, we assign a grammatical template to each node of an ontology. The challenge lies in disambiguating these templates. Benefits of this work potentially include the disambiguation of VerbOcean, the disambiguation of new conceptualizations, improved unsupervised word sense disambiguation, and the personalization of ontologies, like WordNet, to a particular domain.

Speaker Biography

Dr. Patrick Pantel is currently a Research Scientist in the Natural Language Group at the USC Information Sciences Institute where he does research in semi-automatic ontology construction, text mining, knowledge acquisition, and machine learning. In 2003, he received a Ph.D. in Computing Science from the University of Alberta in Edmonton, Canada. He is the recipient of various prestigious awards, including the University of Manitoba gold medal for the Faculty of Science, two national scholarships from the Natural Sciences and Engineering Research Council of Canada and the Izaak Walton Killam Memorial scholarship.

October 21, 2004 | 10:00 am

“Towards Semi-Supervised Algorithms for Semantic Relation Detection in BioScience Text”

Marti Hearst, Berkeley

[abstract] [biography]

Abstract

A crucial step toward the goal of automatic extraction of propositional information from natural language text is the identification of semantic relations between constituents in sentences. In the bioscience text domain, we have developed a simple ontology-based algorithm for determining which semantic relation holds between terms in noun compounds, and a supervised learning algorithm for discovering relations between entities. In this talk, I will first briefly describe these results. A major bottleneck for semantic labeling work is the development of labeled training data. To remedy this, we propose a new approach for creating semantically-labeled data that makes use of what we call *citances*: the text of the sentences surrounding citations to research articles. Citances provide us with differently-worded statements of approximately the same semantic information; by looking at the way that different authors talk about the same facts, we obtain paraphrases nearly for free. We have just begun to assess how well citances work for the creation of labeled training data for the problem of detecting protein-protein interaction relations. We also hypothesize that citances will be useful for synonym creation, document summarization, and database curation. Joint work with Preslav Nakov, Barbara Rosario, Ariel Schwartz, and Janice Hamer. This work is part of the BioText project, supported by NSF DBI-0317510.

Speaker Biography

Dr. Marti Hearst is an associate professor in SIMS, the School of Information Management and Systems at UC Berkeley, with an affiliate appointment in the Computer Science Division. Her primary research interests are user interfaces and visualization for information retrieval, empirical computational linguistics, and text data mining. She received BA, MS, and PhD degrees in Computer Science from the University of California at Berkeley, and she was a Member of the Research Staff at Xerox PARC from 1994 to 1997. Prof. Hearst is on the editorial boards of ACM Transactions on Information Systems and ACM Transactions on Computer-Human Interaction and was formerly on the boards of Computational Linguistics and IEEE Intelligent Systems, and was the program co-chair of HLT-NAACL

October 19, 2004 | 02:35 pm

“Time Independent ICA through a Fisher Game”   Video Available

Dr. Ravi C. Venkatesan, Systems Research Corp

[abstract]

Abstract

Extreme Physical Information _LP_EPI_RP_ [1] is a self contained theory to elicit physical laws from a system/process _LP_Nature_RP_ based on a measurement-response framework. A specific form of the Fisher information measure _LP_FIM_RP_ known as the Fisher channel capacity _LP_FCC_RP_ is employed as a measure of uncertainty. The FCC is the trace of the FIM. EPI may be construed as being a zero-sum-game between a gedanken observer and a system under observation _LP_characterized by a demon, reminiscent of the Maxwell demon, residing in a conjugate space_RP_. The payoff of the competitive game results in a variational principle that defines the physical law that generates the observations made by the gedanken observer, as a consequence of the response of the system to the measurements. A principled formulation for reconstructing pdf’s from arbitrary discrete time independent random sequences based on an invariance preserving extension of the Extreme Physical Information _LP_EPI_RP_ theory, is presented [2, 3]. Invariances are incorporated into the invariant EPI _LP_IEPI_RP_ model through a Discrete Variational Complex inspired by the seminal work of T. D. Lee [4]. A quantum mechanical connotation is provided to the Fisher game. This is accomplished through the IEPI Euler-Lagrange equation that acquires the form of a time independent Schrödinger-like equation, and, the quantum mechanical virial theorem [5]. The concomitant constraints of the IEPI variational principle are consistent with the Heisenberg uncertainty principle. The ansatz’ describing the state estimators are obtained so as to selfconsistently satisfy an analog of the Fisher game corollary [1, 3]. The game corollary permits the demon to make the closing move in the Fisher game, by minimizing the FCC. This corresponds to a state of maximum uncertainty, and, is in keeping with the demon’s strategy of minimizing the information made available to the observer. A fundamental tenet of the EPI/IEPI model is the collection of statistically independent data by the observer. A principled IEPI Fisher game formulation guaranteeing the statistical independence of the quantum mechanical observables is presented, utilizing statistical analyses commonly employed in Independent Component Analysis _LP_ICA_RP_ [6]. Specifically, correlations are first eliminated using a whitening process _LP_facilitated by a linear filter or PCA_RP_, in conjunction with Givens rotation _LP_a unitary transform_RP_. Next, the IEPI Fisher game is played between the gedanken observer and the process inhabiting the conjugate system space. Finally, an inverse whitening filter is applied to the observables corresponding to the reconstructed state vectors obtained from the Fisher game. This yields a novel form of ICA based on minimizing the FCC. The prospect of obtaining an optimal whitening filter based on the Fisher game corollary is investigated into. Qualitative analogies and distinctions between the Fisher game ICA model and other prominent ICA theories are briefly discussed. Reconstruction of time independent random sequences generated from Gaussian mixture models demonstrates the efficacy of the Fisher game/ICA formulation. The utility of the Fisher game ICA formulation to achieve quantum clustering of data where a-priori knowledge of the number of clusters is unknown, is briefly discussed.

October 12, 2004 | 02:35 pm

“Towards a Grand Unified Theory of Underspecification”   Video Available

Alexander Koller, Universität des Saarlandes

[abstract] [biography]

Abstract

Underspecification is an approach to dealing with scope ambiguities, a certain class of semantic ambiguities in natural language. The basic idea is to derive from a syntactic analysis of a sentence not all the _LP_exponentially many_RP_ semantic representations, but one single compact description of all semantic representations. Then the actual semantic representations can be computed from the description by need. Underspecification has become the standard approach to dealing with scope in large-scale grammars. In my talk, I present one particular scope underspecification formalism, the language of dominance constraints. Dominance constraints have a particularly canonical definition _LP_as a logic interpreted over trees_RP_, and very efficient solvers are available for them. Then I investigate the relationship between dominance constraints and two other popular underspecification formalisms: Hole Semantics and Minimal Recursion Semantics. While the formalisms all look superficially similar, it turns out that there are fundamental differences once we look more closely. However, I show that significant fragments of the three formalisms are indeed equivalent, and present empirical data that suggests that these fragments encompass all descriptions that are used by current grammars. These results bridge the gap between different underspecification formalisms for the first time, which makes resources such as grammars and solvers that were created for one formalism available to the others. On a more general level, they also clarify the expressive power that a formalism actually has to offer in the linguistic application.

Speaker Biography

Alexander Koller is a researcher at Saarland University in Saarbruecken, Germany. He received his MSc degrees in computational linguistics and computer science from Saarland University in 1999, and plans to complete his PhD in computer science by the end of 2004. His research interests include the application of efficient algorithms and logic-based methods to natural language processing, computational semantics, automated text generation, and the language-robotics interface.

October 5, 2004 | 02:35 pm

“Unsupervised learning of natural languages”   Video Available

Shimon Edelman, Cornell

[abstract]

Abstract

We describe an unsupervised algorithm capable of finding hierarchical, context-sensitive structure in corpora of raw symbolic sequential data such as text or transcribed speech. In the domain of language, the algorithm handles both artificial stochastic context-free grammar data and real natural-language corpora, including raw transcribed child-directed speech. It identifies candidate structures iteratively as patterns of partially aligned sequences of symbols, accompanied by equivalence classes of symbols that are in complementary distribution in the context of their patterns. Pattern significance is estimated using a context-sensitive probabilistic criterion defined in terms of local flow quantities in a graph whose vertices are the lexicon entries and where the paths correspond, initially, to corpus sentences. New patterns and equivalence classes can incorporate those added previously, leading to the emergence of recursively structured units that also support highly productive and safe generalization, by opening context-dependent paths that do not exist in the original corpus. This is the first time an unsupervised algorithm is shown capable of learning complex, grammar-like linguistic representations that are demonstrably productive, exhibit a range of structure-dependent syntactic phenomena, and score well in standard language proficiency tests.

September 28, 2004 | 02:35 pm

“Joint discriminative language modeling and utterance classification”   Video Available

Brian Roark, OGI

[abstract]

Abstract

In this talk, I will describe several discriminative language modeling techniques for large vocabulary automatic speech recognition _LP_ASR_RP_ tasks. I will initially review recent work on n-gram model estimation using the perceptron algorithm and conditional random fields, with experimental results on Switchboard _LP_joint work with Murat Saraclar, Michael Collins and Mark Johnson_RP_. I will then present some new work on a call-classification task, for which training utterance classes are annotated along with the reference transcription. We demonstrate that a joint modeling approach, using utterance-class, n-gram, and class/n-gram features, reduces WER significantly over just using n-gram features, while additionally providing significantly more accurate utterance classification than the baselines. A variety of parameter update approaches will be discussed and evaluated with respect to both WER and classification error rate reduction, including simultaneous and independent optimization. As with the earlier n-gram modeling approaches, the resulting models are encoded as weighted finite-state automata and used by simply intersecting with word-lattices output from the baseline recognizer _LP_joint work with Murat Saraclar_RP_.

September 21, 2004 | 02:35 pm

“Listener-oriented Phonology”   Video Available

Paul Boersma, University of Amsterdam

[abstract]

Abstract

French has two kinds of vowel-initial words _LP_normal ones and so-called "h aspiré" words_RP_, which differ with respect to four phonological processes _LP_enchaînement, liaison, elision, and schwa deletion_RP_. I will show that a speaker-based view of phonology can handle at best three of these processes, and that a listener-oriented view can handle all of them. And this is just one example among many others, which I will touch upon briefly. A suitable framework that can be turned listener-oriented is Optimality Theory. Its speaker-based version _LP_Prince & Smolensky 1993_RP_ originally recognized two kinds of constraints: faithfulness constraints and constraints against marked structures. However, many ad-hoc constraints that do not fall in either group _LP_namely, what I call "exclamation constraints"_RP_ have been proposed through the years as well. The authors of such constraints often display a degree of dissatisfaction with their own proposals, usually because these constraints have little applicability outside the language under discussion. This is because the usual distal task of these constraints is to express the maintenance of a language-specific contrast. I will argue that these speaker-based exclamation constraints should be replaced with "listener-oriented faithfulness" constraints. Whereas a speaker-based faithfulness constraint reads "an element X that is present in the underlying form should appear as X in the surface form", a listener-oriented faithfulness constraint reads "an element X that is present in the underlying form should be pronounced as something that will be perceived as X by the listener". By replacing exclamation constraints with such faithfulness constraints, their formulations become unsurprising and non-ad-hoc. The empirical gain is that the limited applicability of these constraints _LP_namely, to cases of the maintenance of contrast_RP_ is now directly predicted by their inclusion in the faithfulness group.

September 14, 2004 | 02:35 pm

“Algorithms and Rate-Distortion Bounds in Data Compression for Multi-User Communications”   Video Available

Michelle Effros, CalTech

[abstract]

Abstract

A network source code is a data compression algorithm designed specifically for the multi-user communication system in which it will be employed. Network source codes achieve better rate-distortion trade-offs and improved functionality over the _LP_better known_RP_ "point-to-point" source coding alternatives. Perhaps the simplest example of a network source code is the multiresolution code -- in which a single transmitter describes the same information to a family of receivers, each of whom receives the data at a different rate; the descriptions are embedded, so that all receivers receive the lowest-rate description, and each higher rate is achieved by adding on to the description at the nearest lower rate. In this talk, I will discuss rate-distortion bounds for lossy network source coding and algorithms for designing codes approaching these bounds. I will focus primarily on a survey of multiresolution source coding results but also include a brief discussion of generalizations to other network source coding environments. Results include rate-distortion bounds, rate-loss bounds, properties of optimal codes, and an approximation algorithm for optimal quantizer design. The quantizer design is based on a new approximation algorithm for $ell_2^2$ data clustering. Parts of the work described in this talk were done in collaboration with Hanying Feng, Dan Muresan, Qian Zhao, and Leonard Schulman.

July 28, 2004

“Applying Speech/Language Technologies to Communication Disorders: New Challenges for Basic Research”   Video Available

Jan von Sarten, Center for Spoken Language Understanding, Oregon Graduate Institute

July 14, 2004

“Mary Harper: CDG-Based Language Models”   Video Available

Mary Harper

July 6, 2004

“Opening Day Presentations”   Video Available

Various

May 11, 2004 | 02:35 pm

“Network Models for Game Theory and Economics”   Video Available

Michael Kearns, University of Pennsylvania

[abstract]

Abstract

Over the last several years, a number of authors have developed graph-theoretic or network models for large-population game theory and economics. In such models, each player or organization is represented by a vertex in a graph, and payoffs and transactions are restricted to obey the topology of the graph. This allows the detailed specification of rich structure _LP_social, technological, organizational, political, regulatory_RP_ in strategic and economic systems. In this talk, I will survey these models and the attendant algorithms for certain basic computations, including Nash, correlated, and Arrow-Debreu equilibria. Connections to related topics, such as Bayesian and Markov networks for probabilistic modeling and inference, will be discussed. I will also discuss some recent work marrying this general line of thought with topics in social network theory.

April 27, 2004 | 02:35 pm

“Discriminative Estimation of Mixtures of Exponential Distributions”   Video Available

Vaibhava Goel, IBM

[abstract]

Abstract

An auxiliary function based approach for estimation of exponential model parameters under a maximum conditional likelihood _LP_MCL_RP_ objective was recently proposed by Gunawardana and Byrne. While for Gaussian mixture models it leads to parameter updates that were known previously, it is a very useful method in that it is applicable to arbitrarily constrained exponential models and the resulting auxiliary function is similar to the EM auxiliary function, thus eliminating the need for two separate optimization procedures. It is also easily extensible to other utility functions that are similar to MCL, such as sum-of-posteriors and maximum mutual information. One shortcoming of this approach, however, is that the validity of the auxiliary function is not rigorously established. In this talk I will present our work on discriminative estimation using the auxiliary function approach. Ill first discuss our recent proof of validity of the auxiliary function, and then present application of this approach for discriminative estimation of subspace constrained Gaussian mixture models _LP_SCGMMs_RP_, where the exponential model weights of all Gaussians are required to belong to a common subspace. SCGMMs have been shown to generalize and yield significant error rate reductions over previously considered model classes such as diagonal models, models with semi-tied covariances, and extended maximum likelihood linear transformation _LP_EMLLT_RP_ models. We find that MMI estimation of SCGMMs _LP_tried on a digit task so far_RP_ results in more than 20% relative reduction in word error rate over maximum likelihood estimation. Time permitting, Ill also discuss MCL estimation of language models that combine N-grams and stochastic finite state grammars. This work was done in collaboration with Scott Axelrod, Ramesh Gopinath, Peder Olsen, and Karthik Visweswariah.

April 20, 2004 | 02:35 pm

“Wiretapping the Brain”   Video Available

Terry Sejnowski, Salk Institute

[abstract] [biography]

Abstract

Blind source separation -- also called the cocktail party problem -- has recently been solved using Independent Component Analysis. This new signal processing technique has allowed us to eavesdrop onto the brains internal communication systems.

Speaker Biography

Terrence Sejnowski is an Investigator with the Howard Hughes Medical Institute and a Professor at The Salk Institute for Biological Studies where he directs the Computational Neurobiology Laboratory. He is also Professor of Biological Sciences and Adjunct Professor in the Departments of Physics, Neurosciences, Psychology, Cognitive Science, and Computer Science and Engineering at the University of California, San Diego, where he is Director of the Institute for Neural Computation. Dr. Sejnowski received B.S. in physics from the Case-Western Reserve University, M.A. in physics from Princeton University, and a Ph.D. in physics from Princeton University in 1978. From 1978-1979 Dr. Sejnowski was a postdoctoral fellow in the Department of Biology at Princeton University and from 1979-1982 he was a postdoctoral fellow in the Department of Neurobiology at Harvard Medical School. In 1982 he joined the faculty of the Department of Biophysics at the Johns Hopkins University, where he achieved the rank of Professor before moving to San Diego in 1988. He has had a long-standing affiliation with the California Institute of Technology, as a Wiersma Visiting Professor of Neurobiology in 1987, as a Sherman Fairchild Distinguished Scholar in 1993 and as a part-time Visiting Professor 1995-1998. Dr. Sejnowski received a Presidential Young Investigator Award in 1984. He received the Wright Prize from the Harvey Mudd College for excellence in interdisciplinary research in 1996 and the Hebb Prize for his contributions to learning algorithms by the International Neural Network Society in 1999. He bacame a Fellow of the Institute of Electrical and Electronics Engineers in 2000 and received their Neural Network Pioneer Award in 2002. In 2003 he was elected to the Johns Hopkins Society of Scholars. In 1989, Dr. Sejnowski founded Neural Computation, published by the MIT Press, the leading journal in neural networks and computational neuroscience. He is also the President of the Neural Information Processing Systems Foundation, a non-profit organization that oversees the annual NIPS Conference. This interdisciplinary meeting brings together researchers from many disciplines, including biology, physics, mathematics and engineering. The long-range goal Dr. Sejnowskis research is to build linking principles from brain to behavior using computational models. This goal is being pursued with a combination of theoretical and experimental approaches at several levels of investigation ranging from the biophysical level to the systems level. Hippocampal and cortical slice preparations are being used to explore the properties of single neurons and synapses. Biophysical models of electrical and chemical signal processing within neurons are used as an adjunct to physiological experiments. The dynamics of network models are studied to explore how populations of neurons interact during states of alertness and sleep. His laboratory has developed new methods for analyzing the sources for electrical and magnetic signals recorded from the scalp and hemodynamic signals from functional brain imaging.

April 13, 2004 | 02:35 pm

“Discriminative Language Modeling for LVCSR”   Video Available

Murat Saraclar, AT&T Labs - Research

[abstract]

Abstract

This talk describes a discriminative language modeling technique for large vocabulary speech recognition. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields _LP_CRFs_RP_. The models are encoded as deterministic weighted finite-state automata, and are applied by intersecting the automata with word-lattices that are output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. We present results for various perceptron training scenarios for the Switchboard task, including using n-gram features of different orders, and performing n-best extraction versus using full word lattices. Using the feature set selected by the perceptron algorithm, CRF training provides an additional 0.5 percent reduction in word error rate, for a total of 1.8 percent absolute WER reduction from the baseline of 39.2 percent.

April 6, 2004 | 02:35 pm

“Interactivity in Written Language Processing: Evidence from Impairments”   Video Available

Michael McCloskey, JHU Department of Cognitive Science

[abstract]

Abstract

Via two studies of cognitively impaired individuals I consider reciprocal interactions between levels of representation in written language processing The first study involves a severely dysgraphic stroke patient, and provides evidence of feedback from grapheme to lexeme levels of representation in written word production. The second study involves a young woman with a remarkable deficit in visual perception, and concerns top-down influences on reading comprehension for words and sentences.

March 30, 2004 | 02:35 pm

“Scaling of Information in Natural Language”   Video Available

Naftali Tishby, The Hebrew University of Jerusalem

[abstract] [biography]

Abstract

The idea that the observed semantic structure of human language is a result of an adaptive competition between accuracy of expression and efficient communication is not new. It has been suggested in various forms by Zipf, Shannon, and Mandelbrot, among many others. In this talk I will discuss a novel technique for studying such a competition between accuracy and efficiency of communication, solely from the statistics of large linguistic corpora. By exploiting the deep and intriguing duality between source and channel coding in Shannons information theory we can explore directly the relationship between the semantic accuracy and the complexity of the representation in a large corpus of English documents. We do this by evaluating the accuracy in identifying the topic of a document as a function of the complexity of the semantic representation, as captured by relevant hierarchical clustering of words via the information bottleneck method, which can be viewed as a combination of perfectly matched source and channel. What we obtain is a scaling relation _LP_a power-law_RP_ that, unlike the famous Zipfs law, quantifies directly the statistical way words are semantically refined in human language. It may therefore reveal some quantitative properties of human cognition which can now be explored experimentally in other languages or other complex cognitive modalities such as music and mathematics. This work is partly based on joint work with Noam Slonim. See also: http://www.cs.huji.ac.il/labs/learning/Theses/Noam_phd1.ps.gz

Speaker Biography

Dr. Naftali Tishby is currently on sabbatical the at the CIS department at U Penn. Until last summer he served as the founding chair of the new computer engineering program at the School of Computer Science and Engineering at the Hebrew University. He is a founding member of the Interdisciplinary Center for Neural Computation _LP_ICNC_RP_ and one of the key teachers of the well known computational neuroscience graduate program of the ICNC. He received his PhD in theoretical physics from the Hebrew university in 1985 and has been a research member of staff at MIT, Bell Labs, AT&T, and NECI since then. His current research is on the interface between computer science, statistical physics, and computational biology. He introduced various methods from statistical mechanics into computational learning theory and machine learning and is interested in particular in the role of phase transitions in learning and cognitive phenomena. More recently he has been working on the foundation of biological information processing and has developed novel conceptual frameworks for relevant data representation and learning algorithms based on information theory, such as the Information Bottleneck method and Sufficient Dimensionality Reduction.

March 23, 2004 | 02:35 pm

“Structuring Semantic Representations”

Beth Levin, Stanford

[abstract] [biography]

Abstract

Over the years predicate decompositions --- representations of verb meaning that take the form of combinations of primitive predicates, such as the infamous CAUSE TO DIE for _kill_ --- have come in for substantial and sometimes well-merited criticism. Yet, such representations continue to be adopted, suggesting that there is something appealing about them. In this talk, I identify two underappreciated properties of these representations that make them effective semantic representations and present several types of evidence to demonstrate this. First, predicate decompositions can easily capture the "bipartite" nature of a verbs meaning. For instance, a specification of the meaning of _lengthen_ must indicate that it describes a change of state event and that the relevant state involves the length of the changed entity. These two types of meaning components can be represented using a small set of event types defined in terms of combinations of primitive predicates together with "roots" representing a verbs idiosyncratic or core meaning _LP_Grimshaw 1993, Hale & Keyser 2002, Jackendoff 1983, 1990, Mohanan & Mohanan 1999, Pesetsky 1995, Pinker 1989, RH&L 1998_RP_. Second, perhaps the most important distinction among event types involves a dichotomy between simple events and complex events --- an event composed of simple events. In fact, the notion "complex event" or a comparable notion --- most often "causative event" --- has been invoked since at least the generative semantics era, though its interpretation and role in linguistic explanation have changed over the years. Predicate decompositions can easily capture this fundamental distinction. In support of the importance of these two properties of semantic representations, I review ways in which they gain explanatory power in my joint work with Malka Rappaport Hovav. First, the distinct argument expression options manifested by two semantic classes of English transitive verbs --- surface contact verbs _LP_e.g., _wipe_, _scrub_, _sweep__RP_ and change of state verbs _LP_e.g., _break_, _dry_, _open__RP_ --- can be tied to differences in the complexity of the events they denote: simple events for surface contact verbs and complex events for change of state verbs; these differences, in turn, reflect differences in the nature of the roots of these two types of verbs. Second, the distribution of fake reflexives in resultative constructions _LP__Sally sang herself hoarse/*Sally sang hoarse__RP_ is sensitive to event complexity. Third, event complexity illuminates crosslinguistic variation in the transitive verb class and leads to a natural differentiation among transitive verb objects, providing insight into the repeated observations that not all objects are equal, observations that have previously attributed to slippery notions such as "affectedness." Finally, the interaction of event complexity and the bipartite nature of verb meaning provides the key to understanding the origins and properties of English object alternations _LP_e.g., the locative alternation: _stuff groceries into a bag/stuff a bag with groceries__RP_.

Speaker Biography

Beth Levin is the William H. Bonsall Professor in the Humanities and the Chair of the Department of Linguistics at Stanford University. After receiving her Ph.D. from MIT in 1983, she spent four years at the MIT Center for Cognitive Science, where she had major responsibility for the Lexicon Project. She joined Stanfords Department of Linguistics in 1999, after twelve years at Northwestern University. Her research focuses on the lexicon --- the component of the language system that serves as a repository for information on the words of a language. She has conducted extensive breadth- and depth-first studies of the English verb lexicon, which have provided the foundation for her theoretical research. Her recent work investigates the linguistic representation of events and the ways in which events and their participants are expressed in English and other languages.

March 9, 2004 | 02:35 pm

“Automatic Speech Processing by Inference in Generative Models”   Video Available

Sam Roweis, University of Toronto

[abstract]

Abstract

Say you want to perform some complex speech processing task. How should you develop the algorithm that you eventually use? Traditionally, you combine inspiration, carefully examination of previous work, and arduous trial-and-error to invent a sequence of operations to apply to the waveform. But there is another approach: dream up a "generative model" --a probabilistic machine which outputs data in the same form as your data--in which the key quantities that you would eventually like to compute appear as hidden _LP_latent_RP_ variables. Now perform inference in this model, estimating the hidden quantities. In doing so, the rules of probability will derive for you, automatically, a signal processing algorithm. While inference is well known to the speech community as a decoding step for HMMs, exactly the same type of calculation can be performed in many other models not related to recognition. In this talk, I will give several examples of this paradigm, showing how inference in very simple models can be used to perform surprisingly complex speech processing tasks including denoising, source separation, pitch tracking, timescale modification and estimation of articulatory movements from audio. In particular, I will introduce the factorial-max vector quantization _LP_MAXVQ_RP_ model, motivated by the astonishing max approximation to log spectrograms of mixtures, show that it can be used with an efficient branch-and-bound technique for exact inference to perform both additive denoising and monaural separation. I will also describe a purely time domain approach to pitch processing which identifies waveform samples at the boundaries between glottal pulse periods _LP_in voiced speech_RP_ or at the boundaries between unvoiced segments. An efficient algorithm for inferring these boundaries is derived from a simple probabilistic generative model for segments, which gives excellent results on pitch tracking, voiced/unvoiced detection and timescale modification.

March 2, 2004 | 02:35 pm

“Semantic Lexicons and Semantic Tagging: towards content interoperability”   Video Available

Nicoletta Calzolari, Istituto Di Linguistica Computazionale

[abstract] [biography]

Abstract

Large scale language resources are unanimously recognised as the necessary infrastructure underlying language technology. Discussing a few major European initiatives for building harmonised lexicons, we will highlight how computational lexicons and textual corpora should be considered as complementary views on the lexical space. A ‘complete’ computational lexicon should incorporate our ‘knowledge of the world’, and represent it in an explicit and formal way. We claim that it is theoretically not possible to achieve completeness within any ‘static’ lexicon. A sound language infrastructure must encompass both ‘static’ lexicons, as the traditional ones, and ‘dynamic’ systems able to enrich the lexicon with information acquired on-line from large corpora, thus capturing the ‘actually realised’ potentialities, the large range of variation, and the flexibility inherent in the language as it is used. These are the challenges for semantic tagging. Part of the talk will point at problems arisen in different semantic annotation exercises. Broadening our perspective into the future, the need of ever growing language resources for effective content processing requires a change in the paradigm, and the design of a new generation of language resources, based on open content interoperability standards. The Semantic Web notion is going to crucially determine the shape of the language resources of the future, consistent with the vision of an open space of sharable knowledge available on the Web for processing.

Speaker Biography

Nicoletta Calzolari, graduated in Philosophy at the University of Bologna, is Director of Research at CNR, and now Director of the Istituto di Linguistica Computazionale of the CNR in Pisa, Italy. She works in the field of Computational Linguistics since 1972. Main fields of interest: computational lexicology and lexicography; text corpora; standardisation and evaluation of language resources; lexical semantics; knowledge acquisition from multiple _LP_lexical and textual_RP_ sources, integration and representation. She has co-ordinated many international/European and national projects. Member and general secretary of ICCL, member of the ELRA Board, and of many International Committees and Advisory Boards. Conference chair of LREC’04. Invited speaker, member of program committee or organiser for quite numerous international scientific conferences, workshops, etc.

March 2, 2004

“Semantic Lexicons & Semantic Tagging”   Video Available

Nicoletta Calzolari

February 24, 2004 | 12:00 pm

“Norms and Exploitations: Mapping Meaning onto Use”   Video Available

Patrick Hanks, Berlin-Brandenburg Academy of Sciences and Brandeis University

[abstract] [biography]

Abstract

Words in isolation have innumerable potential meanings. When they are used, the lexical entropy is greatly reduced. Corpus Pattern Analysis has shown that, while the number of possible contexts for each word is very great _LP_infinite?_RP_, the number of typical contexts is small and manageable. Corpus Pattern Analysis _LP_CPA_RP_ aims to account for all uses of each word by grouping its collocations into semantically motivated syntagmatic patterns. The patterns are then linked to meanings or other applications such as synonym sets or foreign translations. Noun patterns arrange statistically significant collocates in sets of prototypical statements _LP_e.g. "A storm may be gathering, brewing, impending, …; storms lash coastlines,…; people and ships get caught in a storm, weather a storm, ride out a storm, …; storms are violent, severe, raging, howling, …;" and so on_RP_. Verb patterns are built in the SPOCA framework. Pattern elements consist of lexical sets of nouns and other elements, grouped by their clause roles in relation to the target verb. Subvalency features such as determiners can also be relevant _LP_"took place" vs. "took his place" vs. "took someone elses place" vs. "took third place."_RP_ Because the normal meaning of a word can be not only activated but also exploited for rhetorical effect, the empirical linguistic theory arising from this work is known as the Theory of Norms and Exploitations. Typical exploitations include ad-hoc metaphors, ellipsis, and other figures of speech.

Speaker Biography

I am a lexicographer and corpus linguist. As chief editor of English dictionaries at Collins _LP_1970-90_RP_ and subsequently chief editor, current English dictionaries at Oxford University Press _LP_1990-2000_RP_, I created some of the worlds most successful English dictionaries, including the New Oxford Dictionary of English _LP_NODE_RP_ and the highly innovative Cobuild project _LP_based on corpus research at the University of Birmingham_RP_, described by the philosopher David Wiggins as "the first significant development in the study of word meaning since the 18th century". In the late 1980s, he was a visiting scientist at AT&T Bell Laboratories in New Jersey, where he co-authored a series of influential and widely cited papers on statistical approaches to lexical analysis. He has also pioneered practical advances in computational onomastics and is the editor in chief of the 3-volume Dictionary of American Family Names _LP_New York: Oxford University Press 2003_RP_. He is a Consultant _LP_Berater_RP_ in lexical semantics and corpus linguistics to the Digitalische W{o:}rterbuch der deutschen Sprache at the Berlin Brandenburg Academy of Sciences. He has been an invited keynote speaker at many conferences on lexicography, lexicology, and computational linguistics throughout the world.

February 17, 2004 | 02:35 pm

“A Bayesian view of inductive learning in humans and machines”   Video Available

Josh Tenenbaum, MIT

[abstract]

Abstract

In everyday learning and reasoning, people routinely draw successful generalizations from very limited evidence. Even young children can infer the meanings of words or the existence of hidden biological properties or causal relations from just one or a few relevant observations -- far outstripping the capabilities of conventional learning machines. How do they do it? I will argue that the success of peoples everyday inductive leaps can be understood as the product of domain-general rational Bayesian inferences constrained by peoples implicit theories of the structure of specific domains. This talk will explore the interactions between peoples domain theories and their everyday inductive leaps in several different task domains, such as generalizing biological properties and learning word meanings. I will illustrate how domain theories generate the hypothesis spaces necessary for Bayesian generalization, and how these theories may themselves be acquired as the products of higher-order statistical inferences. I will also show how our approach to modeling human learning motivates new machine learning techniques for semi-supervised learning: generalizing from very few labeled examples with the aid of a large sample of unlabeled data.

February 10, 2004 | 02:35 pm

“Name That Tune: Finding a song from a sung query”   Video Available

Bryan Pardo, University of Michigan

[abstract] [biography]

Abstract

Music Information Retrieval has become an active area of research motivated by the increasing importance of internet-based music distribution. Online catalogs are already approaching one million songs, so it is important to study new techniques for searching these vast stores of audio. One approach to finding music that has received much attention is "Query by Humming" _LP_QBH_RP_. This approach enables users to retrieve songs and information about them by singing, humming, or whistling a melodic fragment. In QBH systems, the query is a digital audio recording of a melodic fragment, and the ultimate target is a complete digital audio recording of a piece. We have created a QBH system for music search and retrieval. A user sings a theme from the desired piece of music. The sung theme _LP_query_RP_ is converted into a sequence of pitch-intervals and rhythms. This sequence is compared to musical themes _LP_targets_RP_ stored in a database. The top pieces are returned to the user in order of similarity to the sung theme. We describe two approaches to measuring similarity between database themes and the sung query. In the first, queries are compared to database themes using probabilistic string-alignment algorithms. Here, similarity between target and query is determined by edit cost. In the second approach, pieces in the database are represented as hidden Markov models _LP_HMMs_RP_. In this approach, the query is treated as an observation sequence and a target is judged similar to the query if its HMM has a high likelihood of generating the query. Experiments show that while no approach is clearly superior in retrieval ability, string matching often has a significant speed advantage. Moreover, neither approach surpasses human performance.

Speaker Biography

Bryan Pardo is a doctoral candidate in the Electrical Engineering and Computer Science department of the University of Michigan, with a specialization in Intelligent Systems. He applies machine learning, probabilistic natural language processing, and database search techniques to auditory user interfaces for human computer interaction. Bryan takes a broader view of natural language than is traditional in computational linguistics, including timbre and prosody _LP_timing, pitch contour, loudness_RP_, with an emphasis on music. In addition to his research activities, Bryan is also an adjunct professor of Music at Madonna University in Livonia, Michigan, where he teaches a course in music technology and also performs regularly throughout Michigan on saxophone and clarinet with his band, Into the Freylakh.

Back to Top

2003

November 28, 2003 | 4:30PM

“Stuff Ive Seen: A system for personal information retrieval”

Susan Dumais, Microsoft Research

[abstract] [biography]

Abstract

Most information retrieval technologies are designed to facilitate information discovery. However, much knowledge involved finding and re-using previously seen information. Stuff I've Seen (SIS) is a system designed to facilitate information re-use. This is accomplished in two ways. First, the system provides unified access to information that a person has seen, regardless of whether the information was seen as an email, appointment, web page, document, hand-written note, etc. Second, because the information has been seen before, rich contextual cues and visualizations can be used in the search interface. In addition to list views of results, we have explored a timeline visualization which incorporates personal landmarks and a summary overview for identifying patterns. The system has been deployed internally to more than one thousand users. Both qualitative and quantitative aspects of system use and our experiences in deploying it will be reported.    

Speaker Biography

Susan Dumais is a Senior Researcher in the Adaptive Systems and Interaction Group at Microsoft Research where she works on algorithms and interfaces for improved information access and management. Prior to joining Microsoft Research in 1997, she was at Bellcore and Bell Labs for many years. She has published widely in the areas of human-computer interaction and information retrieval. Her current research focuses on personal information retrieval, user modeling, text categorization using inductive learning techniques, and collaborative information retrieval. Previous research included well-known work on Latent Semantic Indexing (a statistical method for concept-based retrieval), combining search and navigation, individual differences, perceptual learning and attention, and organizational impacts of new technology. Susan is Past-Chair of ACM.s SIGIR group, and serves on the NRC Committee on Computing and Communications Research to Enable Better Use of Information Technology in Digital Government, and the NRC Board on Assessment of NIST Programs. She is on the editorial board of: ACM:Transactions on Information Systems, ACM:Transactions on Human Computer Interaction, Human Computer Interaction, Information Processing and Management, Information Retrieval, Hypertext, Encyclopedia of Information Retrieval, and Annual Review of Information Science and Technology, and is actively involved on program committees for several conferences. She is an adjunct professor at the University of Washington, and has been a visiting faculty member at Stevens Institute of Technology, New York University, and the University of Chicago. Additional information is available at: http://research.microsoft.com/~sdumais  

November 11, 2003 | 4:30PM

“Could We Double the Functionality--and the Use--of Hearing Aids?”   Video Available

David Myers, Hope College

[abstract] [biography]

Abstract

David Myers will describe progress, in Europe and in west Michigan, toward a world in which hearing aids serve not only as sophisticated microphone amplifiers, but also as customized loudspeakers. He will also share his vision for how "hearing aid compatible assistive listening" could enrich the lives of America's hard of hearing people and lessen the stigma of hearing aids and hearing loss.

Speaker Biography

Hope College social psychologist David Myers (davidmyers.org) is the author of scientific publications in two dozen journals, including Science, Scientific American, and the American Scientist. His science writings for college students and the lay public have also appeared in many magazines and in 15 books, including A Quiet World: Living with Hearing Loss (Yale University Press, 2000). His advocacy for a revolution in American assistive listening is explained at hearingloop.org.    

November 4, 2003 | 4:30PM

“Named Entity Classification in the Biology Domain: Investigating Some Sources of Information”

Vijay Shanker, University of Deleware

[abstract]

Abstract

Information extraction from the Biology literature has gained considerable attention in recent years. Classifying the named entitites mentioned in the text can help the information extraction process in many ways. In this talk, I will discuss some of our recent work on named entity classification in this domain. The talk will focus on the kind of features that we believe are useful for classification purposes. Our investigation look at both name-internal features (in the form of certain informative words or suffixes found within the names) as well as name-external features in the form of words in context and the use of some simple syntactic constructions. We will conclude with some remarks on how named entity classification helps our project on extraction of phosphorylation relation from the Biology literature.  

October 28, 2003 | 4:30PM

“Turning Probabilistic Reasoning into Programming”   Video Available

Avi Pfeffer, Harvard University

[abstract]

Abstract

Uncertainty is ubiquitous in the real world, and probability provides a sound way to reason under uncertainty. This fact has led to a plethora of probabilistic representation languages such as Bayesian networks, hidden Markov models and stochastic context-free grammars. More recently, we have developed new probabilistic languages that reason at the level of object, such as object-oriented Bayesian networks and probabilistic relational models. The wide variety of languages leads to the question of whether a general purpose probabilistic modeling language can be developed that encompasses all of them. This talk will describe IBAL, an attempt at developing such a language. After presenting the IBAL language, motivating considerations for the inference algorithm will be discussed, and the mechanism for IBAL inference will be described.

October 21, 2003 | 4:30PM

“Speech and Language Processing: Where have we been and where are we going”   Video Available

Ken Church, AT&T Research Labs

[abstract] [biography]

Abstract

Can we use the past to predict the future? Moore.s Law is a great example: performance doubles and prices halve approximately every 18 months. This trend has held up well to the test of time and is expected to continue for some time. Similar arguments can be found in speech demonstrating consistent progress over decades. Unfortunately, there are also cases where history repeats itself, as well as major dislocations, fundamental changes that invalidate fundamental assumptions. What will happen, for example, when petabytes become a commodity? Can demand keep up with supply? How much text and speech would it take to match this supply? Priorities will change. Search will become more important than coding and dictation.

Speaker Biography

I am the head of a data mining department in AT&T Labs-Research. I received my BS, Masters and PhD from MIT in computer science in 1978, 1980 and 1983, and immediately joined AT&T Bell Labs, where I have been ever since (though the name of the organization has changed). I have worked in many areas of computational linguistics including: acoustics, speech recognition, speech synthesis, OCR, phonetics, phonology, morphology, word-sense disambiguation, spelling correction, terminology, translation, lexicography, information retrieval, compression, language modeling and text analysis. I enjoy working with very large corpora such as the Associated Press newswire (1 million words per week). My datamining department is currently applying similar methods to much larger data sets such as telephone call detail (1-10 billion records per month).

October 14, 2003

“Advances in Statistical Machine Translation: Phrases, Noun Phrases and Beyond”   Video Available

Philipp Koehn, University of Southern California / ISI

[abstract] [biography]

Abstract

I will review the state of the art in statistical machine translation (SMT), present my dissertation work, and sketch out the research challenges of syntactically structured statistical machine translation. The currently best methods in SMT build on the translation of phrases (any sequences of words) instead of single words. Phrase translation pairs are automatically learned from parallel corpora. While SMT systems generate translation output that often conveys a lot of the meaning of the original text, it is frequently ungrammatical and incoherent. The research challenge at this point is to introduce syntactic knowledge to the state of the art in order to improve translation quality. My approach breaks up the translation process along linguistic lines. I will present my thesis work on noun phrase translation and ideas about clause structure.

Speaker Biography

Philipp Koehn is expected to receive his PhD in Computer Science from the University of Southern California in Fall 2003. He is a research assistant at the Information Sciences Institute. He worked as visiting researcher at AT&T Labs and Whizbang Labs. He published a number of papers on machine translation, lexical acquisition, machine learning and related subjects. He also gave tutorials on statistical machine translation at recent HLT/NAACL and MT Summit conferences.

September 16, 2003

“Repair Detection in Transcribed Speech”   Video Available

Mark Johnson

August 26, 2003

“Probabilistic Models of Relational Domains”   Video Available

Daphne Koller

August 21, 2003

“CLSP Summer Workshop: Workshop Review and Summary Presentations - Part II (Tape 2 of 2)”   Video Available

[biography]

Speaker Biography

<iframe src="//player.vimeo.com/video/75063388" width="500" height="367" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>

July 31, 2003

“A Case of Signal Reconstruction in Biology”   Video Available

Jacob T Schwartz

July 11, 2003 | 12:00 pm

“Event Extraction: Learning from Corpora”

Ralph Grishman, New York University

[abstract] [biography]

Abstract

Event extraction involves automatically finding, within a text, instances of a specified type of event, and filling a data base with information about the participants and circumstances (date, place) of the event. These data bases can provide an alternative, to traditional text search engines for repeated, focused searches on a single topic. Constructing an extraction system for a new event type requires identifying the linguistic patterns and classes of words which express the event. We consider the types of knowledge required and how this knowledge can be learned from text corpora with minimal supervision.

Speaker Biography

Please see this webpage.

July 2, 2003 | 04:30 pm

“Recognition and Organization of Speech and Audio: A program and some projects”

Dan Ellis, Columbia University

[abstract] [biography]

Abstract

The recently-established Laboratory for Recognition and Organization of Speech and Audio (LabROSA) at Columbia has the mission of developing techniques to extract useful information from sound. This covers a range of areas: General-purpose structure discovery and recovery, i.e. the basic segmentation problem over scales from subwords to episodes, and on both time and frequency dimensions; Source/object-based organization: Explaining the observed signal as the mixture of the independent sources that would be percieved by listeners; Special-purpose recognition and characterization for specific domains such as speech (transcription, speaker tracking etc.), music (classification and indexing), and other distinct categories. I will present more details on some current and new projects, including: Tandem acoustic modeling: Noise-robust speech recognition features calculated by a neural net. The Meeting Recorder project: Acoustic information extraction applications in a conventional meeting scenario. Machine Listening: Hearing for autonomous devices in the real world.

Speaker Biography

Biographical info coming soon. For more information, please see this webpage.

May 6, 2003 | 02:00 pm

“Learning Theory: Overview and Applications in Computer Vision and Computer Graphics”

Tomaso Poggio, Center for Biological and Computational Learning, Artificial Intelligence Laboratory and McGovern Institute for Brain Research, Massachusetts Institute of Technology

[abstract] [biography]

Abstract

This seminar will be held in 210 Hodson Hall from 11 am to 12 pm. Refreshments available at 10:45am. The problem of learning is one of the main gateways to making intelligent machines and to understanding how the brain works. In this talk I will give a brief overview of our recent work on learning theory, including new results on predictivity and stability of the solution of the learning problem. I will then describe recent efforts in developing machines that learn in applications such as visual recognition and computer graphics. In particular, I will summarize our work on trainable, hierarchical classifiers for problems in object recognition and especially for face and person detection. I will also describe how we used the same learning techniques to synthesize a photorealistic animation of a talking human face. Finally, I will speculate briefly on the implication of our research on how visual cortex learns to recognize and perceive objects. Relevant papers can be downloaded from http://www.ai.mit.edu/projects/cbcl/publications/all-year.html.

Speaker Biography

Tomaso A. Poggio is the Eugene McDermott Professor at the Department of Brain and Cognitive Sciences at MIT; he is director of the Center for Biological and Computational Learning; member of the Artificial Intelligence Laboratory and of the McGovern Institute for Brain Research. His work is motivated by the belief that the problem of learning is the gateway to making intelligent machines and understanding how the brain works. Research on learning in his group follows three directions: mathematics of learning theory and ill-posed problems, engineering applications (in computer vision, computer graphics, bioinformatics, datamining and artificial markets) and neuroscience of learning, presently focused on how visual cortex learns to recognize and represent objects.

April 29, 2003 | 12:00 pm

“Speech and Dialog Mining on Heterogeneous Data”   Video Available

Allen Gorin, AT&T Research

[abstract] [biography]

Abstract

A critical component of any business is interacting with their customers, either by human agents or via automated systems. Many of these interactions involve spoken or written language, with associated customer profile data. Current methods for analyzing, searching and acting upon these interactions are labor intensive and often based on small samples or shallow views of the huge volumes of actual data. In this talk I will describe research directed at enabling businesses to browse, prioritize, select and extract information from these large volumes of customer interactions. A key technical issue is that the data is hetereogenous, comprising both speech and associated call/caller data. Experimental evaluation of these methods on AT&T's 'How May I Help You?'(sm) spoken dialog system will be presented.

Speaker Biography

Biographical information coming soon.

April 22, 2003 | 12:00 pm

“An Overview of Digital Libraries”   Video Available

Tim DiLauro, Johns Hopkins University

[abstract] [biography]

Abstract

The Digital Knowledge Center (DKC) is the digital library research and development department of the Sheridan Libraries. The DKC's research agenda focuses on the ingestion of and access to materials in digital libraries. Its projects emphasize the development of automated tools, systems, and software to reduce the costs and resources required for converting the vast knowledge within print materials into digital form. Fundamentally, the DKC's R&D efforts emphasize a combination of automated technologies with strategic human intervention. The DKC: conducts research and development related to digital libraries in collaboration with faculty, librarians, and archivists both within and beyond Johns Hopkins University. provides expertise to facilitate the creation of digital library materials and services. focuses on assessment and evaluation of digital libraries through usability research and economic analyses. provides leadership in fostering an environment and culture which is conducive to advancing the library and university in the information age DKC projects have been funded by the National Science Foundation, the Institute of Museum and Library Services, the Mellon Foundation, a technology entrepreneurial group in Maryland, corporations and individual donors. The Hodson Trust has provided an endowment to support the Director of the DKC, and an Information Technology Assistant. The DKC has published numerous academic papers, and has been featured in articles or news stories by the New York Times, Baltimore Magazine, Tech TV, UPI, and the Congressional Internet Caucus. Tim DiLauro, the Deputy Director of the Digital Knowledge Center, will provide an overview of digital libraries and digital library research issues.

Speaker Biography

Biographical information coming soon.

April 15, 2003 | 12:00 pm

“Resource demands and sentence complexity”   Video Available

Edward Gibson, Department of Brain and Cognitive Sciences/Department of Linguistics and Philosophy, MIT

[abstract] [biography]

Abstract

Why is sentence (1) so much harder to understand than sentence (2)? The student who the professor who the scientist collaborated with had advised copied the article. The scientist collaborated with the professor who had advised the student who copied the article. The two sentences have very similar meanings, yet (1) is far more complicated than (2). In this presentation, I will present evidence from my lab for two independent factors in sentence complexity at play in sentences like (1) and (2) (Gibson, 1998, 2000): Integration distance between syntactic dependents. The processing cost of integrating a new word w is shown to be proportional to the distance between w and the syntactic head to which w is being integrated. The syntactic dependents in (1) are generally much further apart than they are in (2), making (1) more complex. Syntactic storage in terms of the number of partially processed syntactic dependents. Our evidence suggests that complexity increases as the number of predicted syntactic dependents increases. This factor also predicts greater complexity for (1) relative to (2). Evidence for these two factors will be provided in the form of reading times and comprehension questionnaires across a variety of English (Grodner & Gibson, 2002), Japanese (Nakatani & Gibson, 2003) and Chinese (Hsiao & Gibson, 2003) materials. Furthermore, recent evidence will be presented which helps to distinguish how distance is quantified, in terms of discourse structure (Warren & Gibson, 2002) and/or interfering similar syntactic elements (Gordon et al., 2001).

Speaker Biography

Biographical information can be found here.

April 1, 2003 | 12:00 pm

“A supercomputer in your cellphone”

Adam Berger, Eizel Technologies, Inc.

[abstract] [biography]

Abstract

The mobile Internet promises anytime, anywhere, convenient access to email, web, and networked applications. Parts of this promise - high-throughput 2.5G and 3G wireless networks, richly-functional PDAs and phones - are already becoming available. But there remain several core technical problems hindering full-scale adoption of wireless data. One of these problems, for instance, is real-time document adaptation: how should a small-screen rendering algorithm adapt a hypertext document (web page, email message, etc.) which was designed for viewing on a standard PC display? Solving this problem draws on techniques in image processing, pattern recognition, networking, and of course, language processing. This talk introduces a proxy-based architecture designed to handle these kinds of problems. A mobile web proxy is a remote, high-performance agent, deployed on commodity PC or high-end dedicated hardware, which acts on behalf of a population of mobile device users. Demos as time permits.

Speaker Biography

Adam Berger is a founder and CTO of Eizel Technologies Inc. (www.eizel.com), a software firm whose products allow users to do new things with their mobile phones and PDAs. Adam's Ph.D. is from Carnegie Mellon University's School of Computer Science, where his research was at the intersection of machine learning and statistical language processing. Previously, he worked for several years in the statistical machine translation group at IBM's Thomas J. Watson Research Center, and held a research position at Clairvoyance Corporation, a Pittsburgh-based advanced technology firm specializing in information management.

March 18, 2003 | 12:00 pm

“Natural Language Human-Computer Interaction on Relational Domains”

Ciprian Chelba, Microsoft Research

[abstract] [biography]

Abstract

A growing amount of information is stored in relational databases and many scenarios involving human-computer interaction by means of natural language can be distilled to the problem of designing interfaces to relational databases that are driven by natural language. The talk presents approaches to human-computer interaction by means of natural speech or free text. The methods described focus on relational domains --- such as Air Travel Information Systems (ATIS) --- where semantic models are well defined by simple entity-relationship diagrams (schemas). We distinguish between techniques that aim at classifying a speech utterance or typed sentence into some category (call/text routing) and higher resolution forms of information extraction from text or speech that aim at recovering more precise domain-specific semantic entities such as dates, city/airport names, airlines, etc. The first part of the talk will focus on simple speech utterance/text classification techniques such as n-gram, Na?ve Bayes, and Maximum Entropy. The second part outlines an attempt at using the structured language model (SLM) --- as a syntactic parser enriched with semantic tags --- for extracting fine-grained semantic information from text.

Speaker Biography

Ciprian Chelba graduated from the Center for Language and Speech Processing at the Johns Hopkins University in January 2000. After graduation he joined the Speech Technology Group at Microsoft Research (http://research.microsoft.com~chelba). His core research interests are in statistical language and speech processing while the broader ones could be loosely described as statistical modeling. When not producing floating point numbers and trying to make sense out of them he goes out and enjoys outdoors activities such as hiking, tennis and skiing as well as a good play or movie.

March 4, 2003 | 02:00 pm

“Tracking Lexical Access in Continuous Speech”   Video Available

Michael Tanenhaus, University of Rochester

[abstract] [biography]

Abstract

All current models of spoken word recognition assume that as speech unfolds multiple lexical candidates become partially activated and compete for recognition. However, models differ on fundamental questions such as the nature of the competitor set, the temporal dynamics of word recognition, how fine-grained acoustic information is used in discriminating among potential candidates, and how acoustic input is combined with information from the context of the utterance. I.ll illustrate how each of these issues is informed by monitoring eye movements as participants follow instructions to use a computer mouse to click on and move pictures presented on a monitor. The timing and pattern of fixations allows for strong inferences about the activation of potential lexical competitors in continuous speech, while monitoring lexical access at the finest temporal grain to date, without interrupting the speech or requiring a meta-linguistic judgment. I.ll focus on recent work examining the effects on lexical access of fine-grained acoustic variation, such as coarticulatory information in vowels, within category differences in voice-onset time, interactions between acoustic and semantic constraints, and prosodic context.

Speaker Biography

Biographical information coming soon.

February 25, 2003 | 12:00 pm

“Ontological Semantics: An Overview”   Video Available

Sergei Nirenburg, University of Maryland, Baltimore County

[abstract] [biography]

Abstract

The term ontological semantics refers to the apparatus of describing and manipulating meaning in natural language texts. Basic ontological-semantic analyzers take natural language texts as inputs and generate machine-tractable text meaning representations (TMRs) that form the basis of various reasoning processes. Ontological-semantic text generators take TMRs as inputs and produce natural language texts. Ontological-semantic systems centrally rely on extensive static knowledge resources: a language-independent ontology, the model of the world that includes models of intelligent agents; ontology-oriented lexicons (and onomasticons, or lexicons of proper names) for each natural language in the system; and a fact repository consisting of instances of ontological concepts as well as remembered text meaning representations. Applications of ontological semantics include knowledge-based machine translation, information retrieval and extraction, text summarization, ontological support for reasoning systems, including networks of human and software agents, general knowledge management and others. In this talk I will give a broad overview of some of the ontological-semantic processing and static resources and discuss .

Speaker Biography

Biographical information coming soon.

February 18, 2003 | 12:00 pm

“MEasuring TExt Reuse”

Yorick Wilks, Department of Computer Science, University of Sheffield

[abstract] [biography]

Abstract

In this paper we present initial results from the METER (MEasuring TExt Reuse) project whose aim is to explore issues pertaining to text reuse and derivation, especially in the context of newspapers using newswire sources. Although the reuse of text by journalists has been studied in linguistics, We are not aware of the investigation using existing computational methods for this particular task and context. In this paper we concentrate on classifying newspapers according to their dependency upon PA copy using a 3-class document-level scheme designed by domain experts from journalism and a number of well-known approaches to text analysis. We show that the 3-class document-level scheme is better implemented as 2 binary Naive Bayes classifiers and gives an F-measure score of 0.7309.

Speaker Biography

More biographical information can be found here.

February 11, 2003 | 12:00 pm

“Kernel Machines for Pattern Classification and Sequence Decoding”   Video Available

Gert Cauwenberghs, Johns Hopkins University

[abstract] [biography]

Abstract

Recently it has been shown that a simple learning paradigm, the support vector machine (SVM), outperforms elaborately tuned expert systems and neural networks in learning to recognize patterns from sparse training examples. Underlying its success are mathematical foundations of statistical learning theory. I will present a general class of kernel machines that fit the statistical learning paradigm, and that extend to class probability estimation and MAP forward sequence decoding. Sparsity in the kernel expansion (number of support vectors) relates to the shape of the loss function, and (more fundamentally) to the rank of the kernel matrix. Applications will be illustrated with examples in image classification and phoneme sequence recognition. I will also briefly present the Kerneltron, a silicon support vector "machine" for high-performance, real-time, and low-power parallel kernel computation.

Speaker Biography

Dr. Cauwenberghs' research focuses on algoritms, architectures and VLSI systems for signal processing and adaptive neural computation, including speech and acoustic processors, focal-plane image processors, adaptive classifiers, and low-power coding and instrumentation. He has served as chair of the Analog Signal Processing Technical Committee of the IEEE Circuits and Systems Society, and is associate editor of the IEEE Transactions of Circuits and Systems II: Analog and Digital Signal Processing and the newly established IEEE Sensors Journal. More biographical information can be found here.

February 4, 2003 | 12:00 pm

“Towards Meaning: The Tectogrammatical Sentence Representation”   Video Available

Jan Hajic, Institute of Formal and Applied Linguistics, Charles University

[abstract] [biography]

Abstract

The so-called "Tectrogrammatical" representation of natural language sentence structure will be described as it is being developed for the Prague Depepndency Treebank and used for the Czech, English and German languages, with other languages in the works and plans (Arabic, Slovenian). The Tectogrammatical representation attempts at a semi-universal representation of such language phenomena as the predicate-argument structure, lexical semantics, discourse structure displayed at the sentence level, and co-reference both inside and across sentences. Its relation to the classical dependency- and parse-tree representation of (surface) sentence structure will be presented as well. Possible advantages of the tectogrammatical representation will be demonstrated on examples of the Machine Translation and Question Answering tasks.

Speaker Biography

More biographical information can be found here.

January 28, 2003 | 12:00 pm

“Rational Kernels -- General Framework, Algorithms and Applications”   Video Available

Patrick Haffner, AT&T Research

[abstract] [biography]

Abstract

Joint work with Corinna Cortes and Mehryar Mohri. Kernel methods are widely used in statistical learning techniques due to their excellent performance and their computational efficiency in high-dimensional feature space. However, text or speech data cannot always be represented by the fixed-length vectors that the traditional kernels handle. In this talk, we introduce a general framework, Rational Kernels, that extends kernel techniques to deal with variable-length sequences and more generally to deal with large sets of weighted alternative sequences represented by weighted automata. Far from being abstract and computationally complex objects, rational kernels can be readily implemented using general weighted automata algorithms that have been extensively used in text and speech processing and that we will briefly review. Rational kernels provide a general framework for the definition and design of similarity measures between word or phone lattices particularly useful in speech mining applications. Viewed as a similarity measure, they can also be used in Support Vector Machines and significantly improve the spoken-dialog classification performance in difficult tasks such as the AT&T 'How May I Help You' (HMIHY) system. We present several examples of rational kernels to illustrate these applications. We finally show that many string kernels commonly considered in computational biology applications are specific instances of rational kernels.

Speaker Biography

Patrick Haffner graduated from Ecole Polytechnique, Paris, France in 1987 and from Ecole Nationale Superieure des Telecommunications (ENST), Paris, France in 1989. He received his PhD in speech and signal processing from ENST in 1994. His research interests center on statistical learning techniques that can be used to globally optimize real-world processes with speech or image input. With France Telecom Research, he developed multi-state time-delay neural networks (MS-TDNNs) and applied them to recognize telephone speech. In 1995, he joined AT&T Laboratories, where he worked on image classification using convolutional neural networks (with Yann LeCun) and Support Vector Machines (with Vladimir Vapnik). Using information theoretic principles, he also developed and implemented the segmenter used in the DjVu document compression system. Since 2001, he has been working on kernel methods and information theoretic learning for spoken language understanding.

Back to Top

2002

December 3, 2002 | 4:30PM

“Cortical Modeling of Auditory Processing Disorders”   Video Available

Dana Boatman, Departments of Neurology and Otolaryngology, Johns Hopkins School of Medicine

[abstract]

Abstract

Auditory processing disorders (APD) affect a listener's ability to discriminate, recognize, and understand speech under adverse listening conditions, despite otherwise normal hearing abilities. The neurobiological bases of developmental APD are poorly understood. We will discuss three lines of clinical research that provide converging evidence for abnormal intra- and inter-hemispheric organization of auditory functions in individuals diagnosed with APD.    

November 19, 2002 | 4:30PM

“Customizing MT Systems”   Video Available

Remi Zajac, Systran

[abstract]

Abstract

Building a high-quality general purpose Machine Translation system is still out of reach in the present state of knowledge. MT has been used mostly to understand the content of foreign texts. However, when the style and the domain are restricted, MT can provide useful results if the system is tuned to these texts. This talk will present linguistic, technical as well methodological issues arising in the construction of customized MT systems, and will address the following topics: Notions of lexical and linguistic closure Assessing customization needs Customization of dictionaries and grammars Manual vs. automatic approaches The iterative manual customization process MT evaluation issues An example of a customization project  

November 12, 2002 | 4:30PM

“Arabic Vowel Restoration”   Video Available

John Henderson, MITRE

[abstract] [biography]

Abstract

Arabic speech recognizers are frequently designed to produce output without any short vowels because readers of Arabic do not require the diacritics that indicate short vowels. This design also allows the speech recognizers to utilize the millions of words of available non-diacritized Arabic text for language model training. Unwritten vowels are also left out of the pronunciation models. This forces the acoustic models to capture not only their intended targets, the non-short-vowel phonemes, but also the systematic interference of the unwritten short vowels. I will detail data-driven approaches to Arabic vowel restoration explored during the 2002 Hopkins summer workshop and the effects they have on speech recognition systems for Arabic. Specifically, I will show that an Arabic ASR system that is trained on the output of an automatic vowel restoration system has lower word error rate than an ASR system trained with implicit disregard for the unwritten portions of the words.  

Speaker Biography

John Henderson received a B.S. in Math/CS from Carnegie Mellon University in 1994, and a PhD from Johns Hopkins University in 2000 where he studied in the Natural Language Processing Laboratory. Since joining MITRE in 1999, he has been working on diverse topics such as designing annotation standards, named entity recognition, combining question-answering system outputs, recognizing variant forms of transliterated names, and out-of-vocabulary word repair for ASR systems. His current research includes machine translation of fixed point concepts such as proper names, times, and uniquely-specified artifacts, evaluation of MT systems, and other topics that lie in the intersections of MT, NLP, and ASR.

November 5, 2002 | 4:30PM

“AI and the Impending Revolution in Brain Sciences”   Video Available

Tom Mitchell, Carnegie Mellon University

[abstract] [biography]

Abstract

The sciences that study the brain are experiencing a significant revolution, caused mainly by the invention of new instruments for observing and manipulating brain function. For example, function Magnetic Resonance Imaging (fMRI) now provides a safe, non-invasive tool to observe human brain activity, allowing scientists to capture a 3D image of activity across the entire human brain at a spatial resolution of 1mm, once per second. Brain probes now allow direct recording simultaneously from hundreds of individual neurons in laboratory animals as they move about their environment, genetic knock-out experiments allow studying lab mice missing specific neuro-transmitters, and new dyes provide new ways to study neural pathways and neural metabolism. Brain implants now allow tens of thousands of humans to hear for the first time, and the FDA recently approved the first human retinal implants intended to help blind people. The thesis of my talk is that research over the coming decade in the brain sciences will have a significant impact on Artificial Intelligence research, and that AI will have an even more significant impact on studies of the brain. Well examine two distinct ways in which this synergy between AI and brain sciences is already beginning to take shape. First, AI architectures and algorithms for specific tasks are providing a basis for interpreting new data on brain activity in animals in several cases leading to the conclusion that animals may use approaches surprisingly similar to these engineered AI solutions. Second, machine learning methods are providing new ways to discover regularities in the huge volume of new data for example, automatically discovering the spatial-temporal patterns of brain activity associated with reading a confusing sentence, or determining the semantic category of a word.  

Speaker Biography

Tom M. Mitchell is the Fredkin Professor of Computer Science at Carnegie Mellon University, and Founding Director of CMU's Center for Automated Learning and Discovery, an interdisciplinary research center specializing in statistical machine learning and data mining. He is President of the American Association of Artificial Intelligence (AAAI), author of the textbook "Machine Learning," and a member of the National Research Council, Computer Science and Telecommunications Board. During 1999-2000 he served as Vice President and Chief Scientist at WhizBang! Labs, a company that employs machine learning to extract information from the web. Mitchell's research interest lies in the area of machine learning and data mining. He has developed specific learning algorithms such as inductive inference methods, learning methods that combine data with background knowledge, methods that learn from combinations of labeled and unlabeled training data, and methods for learning probabilistic first-order logic rules from relational data. He has also explored the application of these methods to complex time series data, including studies of pneumonia mortality and C-section risk from time series data in medical records, to studies of brain function from complex functional MRI time series data, to robot learning.  

October 29, 2002 | 4:30PM

“Learning Energy-Based Models of High-Dimensional Data”   Video Available

Geoffry Hinton, University of Toronto

[abstract]

Abstract

Many researchers have tried to model perception using belief networks based on directed acyclic graphs. The belief network is viewed as a stochastic generative model of the sensory data and perception consists of inferring plausible hidden causes for the observed sensory input. I shall argue that this is approach is probably misguided because of the difficulty of inferring posterior distributions in densely connected belief networks. An alternative approach is to use layers of hidden units whose activities are a deterministic function of the the sensory inputs. The activities of the hidden units provide additive contributions to a global energy, E, and the probability of each sensory datavector is defined to be proportional to exp(-E). The problem of perceptual inference vanishes in deterministic networks, so perception is very fast once the network has been learned. The main difficulty of this approach is that maximum likelihood learning is very inefficient. Maximum likelihood adjusts the parameters to maximize the probability of the observed data given the model, but this requires the derivatives of an intractable normalization term. I shall show how this difficulty can be overcome by using a different objective function for learning. The parameters are adjusted to minimize the extent to which the data distribution is distorted when it is moved towards the distribution that the model believes in. This new objective function makes it possible to learn large energy-based models quickly.  

October 22, 2002 | 4:30PM

“Evolution of Cooperative Problem Solving in an Artificial Economy”   Video Available

Eric B. Baum, NEC Research

[abstract]

Abstract

We address the problem of how one can reinforcement learn in ultra-complex environments, with huge state spaces, where one must learn to exploit compact structure of the problem domain. The approach proposed is to simulate the evolution of an artificial economy of computer programs. We discuss why imposing two simple principles on the economic structure leads to the evolution of a collection of programs that collaborate, thus autonomously dividing the problem and greatly facilitating solution. We have tested this on three game domains and one real world problem, using two different computational models (post production and S-expression) for a total of about 6 tests. We find empirically that we are able in each case to evolve systems from random computer code to solve hard problems. In particular, our economy has learned to solve all Blocks World problems (in a certain infinite class) (whereas competing methods solve such problems only up to goal stacks of at most 8 blocks); to unscramble about half a randomly scrambled Rubik's cube; to solve several among a collection of commercially sold puzzles; and to learn a focused web crawler that outperformed a Bayesian focused crawler in our experiments. The web crawler is supplied a number of sample pages, evolves an economy of agents that recognize sets of keywords in ancestors of these pages, and then uses this knowledge to efficiently crawl to similar pages on the web. Igor Durdanovic, Erik Kruus, and John Hainsworth contributed to this work.  

October 15, 2002 | 4:30PM

“Improving Statistical Parsers Using Cross-Corpus Data”

Xiaoqiang Luo, IBM T.J. Watson Research Center

[abstract] [biography]

Abstract

The performance of a statistical parser often improves if trained with more labelled data. But acquiring labelled data is often expensive and labor-intensive. We address this problem by proposing to use data annotated for other purpose. Label information in other domain or corpus provides partial constraints for parsing, therefore EM algorithm can be employed naturally to infer missing information. I will present our results of improving a maximum entropy parser using cross-domain or cross-corpus data.

Speaker Biography

Xiaoqiang Luo got his bachelor degree from University of Science and Technology of China in 1990, and Ph.D from Johns Hopkins University in 1999, all in electrical enigeering. From 1998 till now, he has been working at IBM T.J Watson Research Center as a senior software engineer. He was responsible for developing the semantic parser and interpreter used in the IBM DARPA Communicator. His research interests inlcude statistical modeling in natural language processing (NLP), language modeling, speech recognition and spoken dialog system.

October 8, 2002 | 4:30PM

“Cellular Cooperation in Cochlear Mechanics”   Video Available

George Zweig, Los Alamos National Laboratory

[abstract]

Abstract

Two contrasting views of cochlear mechanics are compared with each other, and with experiment. The first posits that all qualitative features of the nonlinear cochlear response are those of a simple dynamical system poised at a Hopf bifurcation, the second argues that the cochlear response must be found with 3-D simulations. Hopf bifurcations are explained, and their consequences for cochlear mechanics explored.    

October 1, 2002 | 4:30PM

“Dynamic Programming based Search and Segmentation Algorithms for Statistical Machine Translation”

Christoph Tillman, IBM T.J. Watson Research Center

[abstract] [biography]

Abstract

This talk is about the use of dynamic programming (DP) techniques for statistical machine translation (SMT). I will present a search procedure for SMT based on dynamic programming. The starting point is a DP solution to the traveling salesman problem. For SMT, the cities correspond to source sentence positions to be translated. Imposing restrictions on the order in which the source positions are translated yields a DP algorithm to carry out the word re-ordering efficiently. A simple data-driven search organization allows to prune unlikely translation hypotheses. Furthermore, I will sketch a DP-based segmentation procedure for SMT. The units of segmentation are blocks - pairs of source and target clumps. Here, the segmentation problem is related to the set cover problem and an efficient DP segmentation algorithm exists if the blocks are restricted by an underlying word-to-word alignment.

Speaker Biography

Christoph Tillmann is a Research Staff Member at the IBM T.J. Watson Research Center. He received his Dipl. degree in computer science in 1996 and his Dr. degree in computer science in 2001, both from Aachen University of Technology (RWTH), Germany. Currently, he is working on statistical machine translation. His research interests include probabilistic language modeling and probabilistic parsing.

September 24, 2002 | 4:30PM

“Natural Language Parsing: Graphs, the A* Algorithm, and Modularity”   Video Available

Christopher Manning, Stanford University

[abstract] [biography]

Abstract

Probabilistic parsing methods have in recent years transformed our ability to robustly find correct parses for open domain sentences. But people normally still think of parsers in terms of logical presentations via the notion of "parsing as deduction". I will instead connect stochastic parsing with finding shortest paths in hypergraphs, and show how this approach naturally provides a chart parser for arbitrary probabilistic context-free grammars (finding shortest paths in a hypergraph is easy; the central problem of parsing is that the hypergraph has to be constructed on the fly). Running such a parser exhaustively, I will briefly consider the properties of the Penn Treebank (the most used hand-parsed corpus): the vast parsing ambiguity that results from these properties and how simple models can accurately predict the amount of work a parser does on this corpus. Using the hypergraphical viewpoint, a natural approach is to use the A* algorithm to cut down the work in finding the best parse. On unlexicalized grammars, this can reduce the parsing work done dramatically, by at least 97%. This approach is competitive with methods standardly used in statistical parsers, while ensuring optimality, unlike most heuristic approaches to best-first parsing. Finally, I will present a novel modular generative model in which semantic (lexical dependency) and syntactic structures are scored separately. This factored model is conceptually simple, linguistically interesting, and provides straightforward opportunities for separately improving the component models. Further, it provides a level of performance close to that of similar, non-factored models. And most importantly, unlike other modern parsing models, the factored model permits the continued use of an extremely effective A* algorithm, which makes efficient, exact inference feasible. This is joint work with Dan Klein.  

Speaker Biography

Christopher Manning is an Assistant Professor of Computer Science and Linguistics at Stanford University. He received his Ph.D. from Stanford University in 1995, and served on the faculty of the Computational Linguistics Program at Carnegie Mellon University (1994-1996) and the University of Sydney Linguistics Department (1996-1999) before returning to Stanford. His research interests include probabilistic models of language, natural language parsing, constraint-based linguistic theories, syntactic typology, information extraction and text mining, and computational lexicography. He is the author of three books, including Foundations of Statistical Natural Language Processing (MIT Press, 1999, with Hinrich Schuetze).

September 12, 2002 | 4:30PM

“Modern Electret Microphones and Their Applications”

James West, The Johns Hopkins University

[abstract]

Abstract

It is well known that condenser microphones are the transducer of choice when accuracy, stability, frequency characteristics, dynamic range, and phase are important. But conventional condenser microphones require critical and costly construction as well as the need for a high DC bias for linearity. These disadvantages ruled out practical microphone designs such as multi- element arrays and the use of linear microphones in telephony. The combination of our discovery of stable charge storage in thin polymers and the need for improved linearity in communications encouraged the development of modern electret microphones. Modern polymer electret transducers can be constructed in various sizes and shapes mainly because it is a simple inexpensive transducer. Applications of electret microphones range from very small hearing aid microphones to very large single element units for underwater and airborne reception of very low frequencies. Because the frequency and phase response of electret microphones are relatively constant from unit to unit, multiple element two-dimensional arrays have been constructed using 400 electret elements that cost about $1.00 each. The Internet Protocol (IP) offers the bandwidth needed to further improve audio quality for telephony, but this will require broadband microphones and loudspeakers to provide customers with voice presence and clarity. Directional microphones for both hand-held and hands free modes are necessary to improve signal-to-noise ratios and to enable automatic speech recognition. Arrays with dynamic beam forming properties are also necessary for large conference rooms. Signal processing has made possible stereo acoustic echo cancellers and many other signal enhancements that improve audio quality. I will discuss some of the current work on broadband communications at Avaya Labs.

July 31, 2002

“Facing the Curse of Dimensionality in Statistical Language Modeling using Distributed Representations”   Video Available

Yoshua Bengio

[biography]

Speaker Biography

<iframe src="http://player.vimeo.com/video/59310531" width="500" height="367" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe>

April 30, 2002 | 12:00 pm

“Modeling Shape: Computer Vision Meets The Euler Equation”   Video Available

David Mumford, Brown University

[abstract] [biography]

Abstract

Like pure mathematics, applied mathematics thrives on unexpected links between subfields. The stochastic modeling of perception has created the need for new types of probability models of the patterns of the world. One of these is "shape", the illusive quality that tells, for instance, what is a dog, what is a cat. We will review various models to capture these patterns, esp. the wonderful link, based on the work of Arnold and forged by Grenander and Miller, with the Euler equation.

Speaker Biography

Information about David Mumford is available at his website.

April 23, 2002 | 12:00 pm

“The Silicon Cochlea: From Biology to Bionics”

Rahul Sarpeshkar, Massachusetts Institute of Technology

[abstract] [biography]

Abstract

The silicon cochlea implements the biophysics of the human cochlea on an analog electronic chip. I shall demonstrate the operation of a 61dB, 0.5mW analog VLSI silicon cochlea. An engineering analysis of this cochlea suggests why?the ear is designed as a distributed traveling wave amplifier rather than as a bank of bandpass filters: Such an architecture is a very efficient way of implementing a high resolution, high filter order, wide dynamic range frequency analyzer.?? I shall outline work on constructing low-power cochlear-implant processors that are based on circuits in the silicon cochlea as well as work on constructing a distributed-gain-control silicon-cochlea-based cochlear-implant processor. These processors have promise for cutting power dissipation by more than an order of magnitude in today's implant processors?and for improving patient performance in noise.

Speaker Biography

Biographical info coming soon.

April 16, 2002 | 12:00 pm

“Identifying minimalist languages from dependency structures”   Video Available

Ed Stabler, UCLA Dept. of Linguistics

[abstract] [biography]

Abstract

The human acquisition of human languages is is based on the analysis of signal and context. To study how this might work, a simplified robotic setting is described in which the problem is divided into two basic steps: an analysis of the linguistic events in context that yields dependency structures, and the identification of grammars that generate those structures. A learnability result that generalizes (Kanazawa 1994) has been obtained, showing that non-CF, and even non-TAG languages can be identified in this setting, and more realistic assessments of the learning problem are under study.

Speaker Biography

Stabler is Professor of Linguistics at UCLA. He specializes in theories of human language processing and formal learnability theory, with interests in automated theorem proving, philosophy of logic and language, and artificial life.

April 9, 2002 | 12:00 pm

“Alpino: Wide-coverage Computational Analysis of Dutch”

Gertjan van Noord, University of Groningen

[abstract] [biography]

Abstract

Alpino is a wide-coverage computational analyzer of Dutch which aims at accurate, full, parsing of unrestricted text. Alpino is based on a constructionalist HPSG grammar including a large lexical component. Alpino produces dependency structures, as proposed in the CGN (Corpus of Spoken Dutch). Important aspects of wide-coverage parsing are robustness, efficiency and disambiguation. In the talk we briefly introduce the Alpino system, and then discuss two recent developments. The first development is the integration of a log-linear model for disambiguation. It is shown that this model performs well on the task, despite the small size of the training data that is used to train the model. We also describe how we avoid the inherent efficiency problems of using such a log-linear model in parse selection. The second development concerns the implementation of an unsupervised POS-tagger. It is shown that a simple POS-tagger can be used to filter the results of lexical analysis of a wide-coverage computational grammar. The reduction of the number of lexical categories not only greatly improves parsing efficiency, but in our experiments also gave rise to a mild increase in parsing accuracy; in contrast to results reported in earlier work on supervised tagging. The novel aspect of our approach is that the POS-tagger does not require any human-annotated data - but rather uses the parser output obtained on a large training set.

Speaker Biography

Biographical info coming soon. For more information, please see this webpage.

April 2, 2002 | 12:00 pm

“Emergent Properties of Speech and Language as Social Activities”

Mark Liberman, Linguistic Data Consortium, University of Pennsylvania

[abstract] [biography]

Abstract

When linguists, psychologists or engineers try to understand, explain or imitate human speech and language, they usually do so by modeling individual speakers, hearers or learners. Nevertheless, language is an emergent property of groups (of humans), and elementary arguments suggest that non-trivial characteristics of speech and language emerge from interactions within groups of individuals over time. We should also expect that we need to look at how variable inherited traits affect such socially-emergent properties, in order to understand the evolved genetic influences on speech and language. After an obligatory but brief discussion of insect communication, this talk will explore the application of these ideas to (pathetically simple) models in two areas: morphosyntactic regularization and categorical perception.

Speaker Biography

Biographical information coming soon.

March 12, 2002 | 12:00 pm

“Data-driven discriminative features for HMM-based ASR”

Hynek Hermansky, Oregon Graduate Institute

[abstract] [biography]

Abstract

The talk describes our work towards data-driven features that could be used with the current HMM system and that would represent transformed posterior probabilities of the sub-word classes. To address steady or slowly-varying artifacts, the probabilities are derived from relatively long time spans of the signal (up to 1 sec). This may also alleviate some dependencies on the phonetic context. To address excessive sensitivity of ASR to changes in short-term spectral profiles, we do the probability estimations in two steps. The first step yields frequency-localized class probability estimates. These estimates are used as inputs to another probability estimator that yields the final class probabilities. These final probabilities are appropriately transformed to yield features for the subsequent HMM classifier. The whole feature module is trained on labeled speech data.

Speaker Biography

Hynek Hermansky is a Professor of Electrical and Computer Engineering and Director of Center for Information Technology at the OGI School of Oregon Health and Sciences University in Portland, Oregon, and a Senior Research Scientist at the International Computer Science Institute in Berkeley, California. He has been working in speech processing for over 25 years, previously as a research fellow at the University of Tokyo, a Research Engineer at Panasonic Technologies in Santa Barbara, California, and as a Senior Member of Research Staff at U S WEST Advanced Technologies. He is a Fellow of IEEE, Member of the Board of the International Speech Communication Association, Editor of IEEE Transactions on Speech and Audio Processing, and a Member of the Editorial Board of Speech Communication. He holds Dr.Eng. degree from the University of Tokyo. His main research interests are in acoustic processing for speech and speaker recognition.

March 1, 2002 | 12:00 pm

“Event Extraction: Learning from Corpora”

Ralph Grishman, New York University

[abstract] [biography]

Abstract

Event extraction involves automatically finding, within a text, instances of a specified type of event, and filling a data base with information about the participants and circumstances (date, place) of the event. These data bases can provide an alternative, to traditional text search engines for repeated, focused searches on a single topic. Constructing an extraction system for a new event type requires identifying the linguistic patterns and classes of words which express the event. We consider the types of knowledge required and how this knowledge can be learned from text corpora with minimal supervision.

Speaker Biography

Biographical info coming soon. For more information, please see this webpage.

February 28, 2002 | 12:00 pm

“Recognition and Organization of Speech and Audio: A program and some projects”

Dan Ellis, Laboratory for Recognition and Organization of Speech and Audio (LabROSA) at Columbia

[abstract] [biography]

Abstract

Biographical info coming soon. For more information, please see this webpage.

Speaker Biography

ellis.pdf

February 19, 2002 | 12:00 pm

“Modeling Inverse Covariance Matrices by Basis Expansion”

Dr. Peder A. Olsen, IBM T.J. Watson Research Center

[abstract] [biography]

Abstract

This talk introduces a new covariance modeling technique for Gaussian Mixture Models. Specifically the inverse covariance (precision) matrix of each Gaussian is expanded in a rank-1 basis. A generalized EM algorithm is proposed to obtain maximum likelihood parameter estimates for the basis set and the corresponding expansion coefficients for the precision matrices of individual Gaussians. This model, called the Extended Maximum Likelihood Linear Transform (EMLLT) model, is extremely flexible: by varying the number of basis elements from d to d(d+1)/2 one gradually moves from a Maximum Likelihood Linear Transform (MLLT) model (also known as semi-tied covariance) to a full-covariance model. Experimental results on two speech recognition tasks show that the EMLLT model can give relative gains of up to 35% in the word error rate over a standard diagonal covariance model, 30% over a standard MLLT model.

Speaker Biography

Biographical information is coming soon. For more information, please see this webpage.

February 12, 2002 | 12:00 pm

“Toward Improved Dialect Modeling”

Malcah Yaeger-Dror, University of Arizona, Cognitive Science

[abstract] [biography]

Abstract

Speech technology (including synthesis, speech recognition and speaker verification) has made significant advances in recent years in laboratories and in field applications, but speech recognition can still degrade when the test data do not match the training data well -- for example, when the test data includes dialects that are not included in the original sample, or when the speech collected from certain speakers does not match the way they normally speak. As a result, 'non-mainstream dialects' are under-represented because they are more difficult to collect using 'standard' channels.? For those who speak dialects not represented in the training data, this is a serious impediment to the goal of universal access. Such an impediment can have broad-reaching consequences since it can affect access to education and even telephone information systems. In fact, appropriate corpora for developing more adequate recognition strategies are so sparse that it is difficult even to be able to assess just how bad the current situation is, or to evaluate new modeling techniques. One impediment to devising a corpus which permits a better modeling of dialect is the fact that those who understand dialect and 'style' differences are often not versed in speech technology and vice versa. This paper will address the issue of how better understanding between these groups can permit researchers to gather appropriate speech so adequate recognition strategies can be devised for all speakers of English - both by choosing speakers from a broader range of dialects and by collecting the speech in a setting which is appropriate. After a short discussion of 'dialect' and 'style' (Eckert and Rickford 2001, Yaeger-Dror & Hall-Lew 2000(A)), the paper will propose how to take better advantage of a corpus which is already available, and which meets the criteria which appear to be needed for better dialect modeling. The paper will propose that if appropriately labeled and coded with respect to dialect and demographic variables, at least one corpus presently available could be quite helpful in improving dialect recognition.? A subset of the phone calls from Call Friend Southern American English appear to meet these criteria, both because the speech style is natural and conversational and because the speakers represent non-mainstream dialects for which there is at present very inadequate recognition.? We will conclude that better modeling of dialect effects across age groups, dialect groups, and sex should greatly enhance the goal of universal speech access. References Eckert, P. and J. Rickford? 2001. Style and Sociolinguistic Variation. CUP. Yaeger-Dror, M. 2001. Primitives for the analysis of 'style'. In P.Eckert and J.Rickford, 170-185. Lauren Hall-Lew 2000. Prosodic prominence on negation in various registers of US English. Journal of the Acoustical Society of America. 108:2468 (A).

Speaker Biography

Biographical information is coming soon. For more information, please see this webpage.

February 5, 2002 | 12:00 pm

“Rules and Analogy in Word Learning and Change”

Charles Yang, Yale University - Dept. of Linguistics

[abstract] [biography]

Abstract

It is often claimed that irregular verbs in English are learned by memorizing associated pairs between stems and past tense forms, and hence, that frequency of an irregular verb largely determines the success of its acquisition (Pinker 1999). Yet a careful examination of the acquisition data (Marcus, Pinker, Ullman, Hollander, & Xu 1992) shows that the frequency-acquisition correlation completely breaks down when the phonological regularities in irregular past tense formation are taken into consideration. The data in fact suggest a view of learning that involves (a) the construction of phonological rules, even among the very unsystematic irregular classes, and (b) probabilistic associations between words and their corresponding rules (e.g., lose -> (-t suffixation + vowel shorteing). This talk gives acquisition evidence for this approach. Then, based on a model of word learning by Sussman & Yip (1996, 1997), we develop a computational model of sound change, which may explain, inter alia, why irreguliarty in languages is not an "imperfection", but a necessity.

Speaker Biography

Charles Yang received his Ph.D. in computer science from MIT, and has since been teaching computational linguistics and child language at Yale. He is the author of "Knowledge and Learning in Natural Language" (Oxford University Press, 2002).

January 29, 2002 | 12:00 pm

“The Shoah Foundations digital archive: A 180 Tera-Byte database for teaching tolerance”   Video Available

Sam Gustman, Survivors of the Shoah Visual History Foundation

[abstract] [biography]

Abstract

In 1994, after filming Schindler's List, Steven Spielberg established Survivors of the Shoah Visual History Foundation with an urgent mission: to videotape and preserve the testimonies of Holocaust survivors and witnesses. Today, the Shoah Foundation has collected more than 50,000 eyewitness testimonies in 57 countries and 32 languages, and is committed to ensuring the broad and effective educational use of its archive worldwide. The Shoah Foundation has built a 180 Tera-Byte database from the digitized testimonies and is in the process of cataloging those testimonies using analysts with historical and political science backgrounds. Technologies for disseminating the archive and automating some of the manual processes involved in cataloging the testimonies are a few of the Shoah Foundation's current efforts and the topic of this talk.

Speaker Biography

Biographical information coming soon.

Back to Top

2001

October 30, 2001 | 4:30PM

“How Important is "Starting Small" in Language Acquisition”   Video Available

David Plaut, Carnegie Mellon University

[abstract] [biography]

Abstract

Elman (1993, Cognition) reported that recurrent connectionist networks could learn the structure of English-like artificial grammars by performing implicit word prediction, but that learning was successful only when "starting small" (e.g., starting with limited memory that only gradually improves). This finding provided critical computational support for Newport's (1990, Cognitive Science) ``less is more'' account of critical period effects in language acquisition---that young children are aided rather than hindered by limited cognitive resources. The current talk presents connectionist simulations that indicate, to the contrary, that language learning by recurrent networks does not depend on starting small; in fact, such restrictions hinder acquisition as the languages are made more natural by introducing graded semantic constraints. Such networks can nonetheless exhibit apparent critical-period effects as a result of the entrenchment of representations learned in the service of performing other tasks, including other languages. Finally, although the word prediction task may appear unrelated to actual language processing, a preliminary large-scale simulation illustrates how performing implicit prediction during sentence comprehension can provide indirect training for sentence production. The results suggest that language learning may succeed in the absence of innate maturational constraints or explicit negative evidence by taking advantage of the indirect negative evidence that is available in performing online implicit prediction.

Speaker Biography

Dr. Plaut is an Associate Professor of Psychology and Computer Science at Carnegie Mellon University, with a joint appointment in the Center for the Neural Basis of Cognition. His research involves using connectionist modeling, complemented by empirical studies, to extend our understanding of normal and impaired cognitive processing in the domains of reading, language, and semantics. For more details, see his vitae at http://www.cnbc.cmu.edu/~plaut/vitae.html.

October 23, 2001

“Knowledge Discovery in Text”   Video Available

Dekang Lin

October 23, 2001

“Knowledge Discovery From Text”   Video Available

Dekang Lin

October 16, 2001 | 4:30PM

“Inductive Databases and Knowledge Scouts”   Video Available

Ryszard S. Michalski

[abstract] [biography]

Abstract

The development of very large databases and WWW has created extraordinary opportunities for monitoring, analyzing and predicting economical, ecological, demographic, and other processes in the world. Our current technologies for data mining are, however, insufficient for such tasks. This talk will describe an ongoing research in the GMU MLI Laboratory on the development of "inductive databases and knowledge scouts" that represent a new approach to the problem of semi-automatically deriving user-oriented knowledge from databases. If time permits, a demo will be presented that illustrates the methodology natural induction that is at the heart of this research.

Speaker Biography

Ryszard S. Michalski is Planning Research Corporation Chaired Professor of Computational Sciences, and Director of the Machine Learning and Inference Laboratory at George Mason University. He is also Fellow of AAAI, and a Foreign Member of the Polish Academy of Sciences. Dr. Michalski is viewed as a co-founder of the field of machine learning and has initiated a number of research directions, such as constructive induction, conceptual clustering, variable-precision logic (with Patrick Winston, MIT), human plausible reasoning (with Alan Collins, BBN), multistrategy learning, the LEM model of non-Darwinian evolutionary computation, and most recently inductive databases and knowledge scouts. He has authored, co-authored and co-edited 14 books/proceedings, and over 350 papers in the areas of his interest.

October 9, 2001 | 4:30PM

“Spatial Language and Spatial Cognition: The Case of Motion Events”

Barbara Landau, Johns Hopkins University

[abstract]

Abstract

For some years, our lab has been working on questions about the nature of semantic representations of space, in particular, representations of objects, motions, locations, and paths involved in motion events. One central question we address is the nature of the correspondence between these semantic representations of space and their possible non-linguistic counterparts. To what extent are there direct mappings between our non-linguistic representation of motion events, and the language we use to express these? We have recently approached this question by studying the language of motion events (as well as other spatial language) in people with severely impaired spatial cognition. Individuals with Williams syndrome (a rare genetic defect) have an unusual cognitive profile, in which spatial cognition is severely impaired, but language (i.e. morphology, syntax) is spared. The crucial question is whether these individuals will show impaired spatial language, commensurate with their spatial impairment, or spared spatial language, commensurate with the rest of their linguistic system. Evidence suggests remarkable sparing in the language of motion events for this group, with rich structure in the nature of expressions for objects, motions, and paths. These findings suggest that the linguistic encoding of motion events may be insulated from the effects of more general spatial breakdown, and raises questions about the nature of the mapping between spatial language and spatial cognition.

October 9, 2001

“CLSP Fall Seminar Series”   Video Available

Barbara Landau

October 2, 2001 | 4:30PM

“Integrating behavioral, neuroscience, and computational methods in the study of human speech”   Video Available

Lynne Bernstein, Department of Communication Neuroscience, House Ear Institute

[abstract] [biography]

Abstract

Speech perception is a process that transforms speech stimuli into neural representations that are then projected onto word-form representations in the mental lexicon. This process is conventionally thought to involve the encoding of stimulus information as abstract linguistic categories, such as features or phonemes. We have been using a variety of methods to study auditory and visual speech perception and spoken word recognition. Across behavioral, brain imaging, electrophysiology, and computational methods, we are obtaining evidence for modality-specific speech processing, for example: Based on computational modeling and behavioral testing, it appears that modality-specific representations contact the mental lexicon during spoken word recognition; Based on fMRI results, there is a visual phonetic processing route in human cortex that is distinct from the auditory phonetic processing route; and Direct correlations between optical phonetic similarity measures and visual speech perception are high, approximately .80. An implication of these findings is that speech perception operates on modality-specific representations rather than being mediated by abstract, amodal representations; And spoken language processing is far more highly distributed in the brain than heretofore thought.

Speaker Biography

Dr. Bernstein received her Ph.D. in Psycholinguistics from the University of Michigan. She holds current academic appointments at UCLA, the California Institute of Technology, and California State University. For more information, please visit her webpage at http://www.hei.org/research/scientists/bernbio.htm.

September 25, 2001 | 4:30PM

“Time-frequency auditory processing in bat sonar”

James Simmons, Brown University

[abstract]

Abstract

I'm interested in understanding how the bat's sonar works and how?the bat's brain makes sonar images. They make sounds, listen to echoes, and then see objects. To study echolocation, we go into the field and videotape bats using sonar for different purposes. These observations tell us in what situations bats use their sonar, and what sorts of sounds they use. If we know where the objects are in the videos, we can figure out what sounds get back to the bats. We then use a computer to generate these sounds and play them to the bats while we record responses from their brains. We want to know what the neurons in the bat's auditory system do to process the echoes to allow the brain to see. ?We also train bats in the lab to respond to computer-generated echoes, so we can tell something about the images the bat perceives. We are developing a computer model of how the bat's brain processes the echoes to see if the model produces the same kind of images the bat perceives. This model is part of a project to design new high-performance sonar for the U.S. Navy.

September 11, 2001 | 12:00 pm

“Predictive clustering: smaller, faster language modeling”

Joshua Goodman, Microsoft Corp.

[abstract] [biography]

Abstract

I'll start with a brief overview of current research directions in several Microsoft Research groups, including the Speech Group (where I used to be), the Machine Learning Group (where I currently work) and the Natural Language Processing Group (with whom I kibitz.) Then, I will go on to describe my own recent research in clustering. Clusters have been one of the staples of language modeling research for almost as long as there has been language modeling research. I will give a novel clustering approach that allows us to create smaller models, and to train maximum entropy models faster. First, I examine how to use clusters for language model compression, with a surprising result. I achieve my best results by first making the models larger using clustering, and then pruning them. This can result in a factor of three or more reduction in model size at the same perplexity. I then go on to examine a novel way of using clustering to speed up maximum entropy training. Maximum entropy is considered by many people to be one of the more promising avenues of language model research, but it is prohibitively expensive to train large models. I show how to use clustering to speed up training time by up to a factor of 35 over standard techniques, while slightly improving perplexity. The same approach can be used to speed up some other learning algorithms that try to predict a very large number of outputs.

Speaker Biography

Joshua Goodman worked at Dragon Systems for two years, where he designed the speech recognition engine that was used until their recent demise, which he claims he had nothing to do with. He then went to graduate school at Harvard, where he studied statistical natural language processing, especially statistical parsing. Next, he joined the Microsoft Speech Technology Group, where he worked on language modeling, especially language model combination, and clustering. Recently, he switched to the Machine Learning and Applied Statistics Group, where he plans to do "something with language and probabilities.

June 1, 2001 | 12:00 pm

“Geometric Source Separation:
Merging convolutive source separation with geometric beamforming”
   Video Available

Lucas Parra, Sarnoff Corporation

[abstract] [biography]

Abstract

Blind source separation of broad band signals in a multi-path environment remains a difficult problem. Robustness has been limited due to frequency permutation ambiguities. In principle, increasing the number of sensors allows improved performance but also introduces additional degrees of freedom in the separating filters that are not fully determined by separation criteria. We propose here to further shape the filters and improve the robustness of blind separation by including geometric information such as sensor positions and localized source assumption. This allows us to combine blind source separation with notions from adaptive and geometric beamforming leading to a number of novel algorithms that could be termed collectively "geometric source separation".

Speaker Biography

Lucas C. Parra was born in Tucumán, Argentina. He received his Diploma in Physics in 1992, and Doctorate in Physics in 1996 from the Maximilian-Ludwig-Universit, Munich, Germany. From 1992 to 1995 he worked at the Neural Networks Group at Siemens Central Research in Munich, Germany and at the Machine Learning Department at Siemens Corporate Research (SCR) in Princeton, NJ. During 1995-1997 he was member of the Imaging Department at SCR and worked on medical image processing and novel reconstruction algorithms for nuclear medicine. Since 1997 he is with Sarnoff Corp. His current research concentrates on probabilistic models in various image and signal processing areas.

April 24, 2001 | 4:30PM

“Effects of sublexical pattern frequency on production accuracy in young children”

Mary Beckman, Department of Linguistics, Ohio State University

[abstract] [biography]

Abstract

A growing body of research on adult speech perception and production suggests that phonological processing is grounded in generalizations about sub-lexical patterns and the relative frequencies with which they occur in the lexicon. Much infant perception work suggests that the acquisition of attentional strategies appropriate for lexical access in the native language similarly is based on generalizations over the lexicon. Given this picture of the adult and the infant, we might expect to see the influence of phoneme frequency and phoneme sequence frequency on young children's production accuracy and fluency as well. We have done several experiments recently which document such influences. For example, we looked at the frequency of /k/ relative to /t/ in Japanese words, and its effect on word-initial lingual stops produced by young children acquiring the language. Both relative accuracy overall and error patterns differed from those observed for English-acquiring children, in ways that reflect the different relative frequencies of coronals and dorsals in the language. Another set of studies focused on the effect of phoneme sequence frequency in English words on the accuracy and fluency of non-word repetition by young English-speaking children. We had 87 children, aged 3-7, imitate nonsense forms containing diphone sequences that were controlled for transitional probability. For example, a non-word containing high-frequency /ft/ was matched to another non-word containing low-frequency /fk/. Children were generally less accurate on the low-frequency sequence, and the size of this effect was correlated with the size of the difference in transitional probability. It was also correlated the child's vocabulary size. That is, the more words a child knows, the more robust the phonological abstraction. These results have important implications for models of phonological competence and its relationship to the lexicon.

Speaker Biography

Mary E. Beckman is a Professor of Linguistics and Speech & Hearing Science at Ohio State University in Columbus, OH, and also spends part of the year as a Professor at the Macquarie Centre for Cognitive Sciences in Sydney, Australia. Much of her work has focused on intonation and prosodic structure in English and Japanese. For example, she is co-author (with Janet Pierrehumbert) of the 1988 monograph Japanese Tone Structure, and she wrote the associated intonation synthesis program. She also has worked on speech kinematics, on effects of lexical frequency and sub-lexical pattern frequency on phonological processing, and on phonological acquisition.

April 17, 2001 | 4:30PM

“Acoustic-optical Phonetics and Audiovisual Speech Perception”

Lynne Bernstein

[abstract]

Abstract

Several sources of behavioral evidence show that speech perception is audiovisual when acoustic and optical speech signals are afforded the perceiver. The McGurk effect and enhancements to auditory speech intelligibility in noise are two well-known examples. In comparison with acoustic phonetics and auditory speech perception, however, relatively little is known about optical phonetics and visual speech perception. Likewise, how optical and acoustic signals are related and how they are integrated perceptually remains an open question. We have been studying relationships between kinematic and acoustic recordings of speech. The kinematic recordings were made with an optical recording system that tracked movements on talkers faces and with a magnetometer system that simultaneously tracked tongue and jaw movements. Speech samples included nonsense syllables and sentences from four talkers, prescreened for visual intelligibility. Mappings among the kinematic and acoustic signals show a perhaps surprisingly high degree of correlation. However, demonstrations of correlations in speech signals are not evidence about perceptual mechanisms responsible for audiovisual integration. Perceptual evidence from McGurk experiments has been used to hypothesize early phonetic integration of visual and auditory speech information, even though some of these experiments have also shown that the effect occurs despite relatively long crossmodal temporal asynchronies. The McGurk effect can be made to occur when acoustic /ba/ is combined in synchrony with a visual /ga/, typically resulting in the perceiver reporting having heard /da/. To investigate the time course and cortical location of audiovisual integration, we obtained event-related potentials (ERPs) from twelve adults, prescreened for McGurk susceptibility. Stimuli were presented in an oddball paradigm to evoke the mismatch negativity (MMN), a neurophysiological discrimination measure, most robustly demonstrated with acoustic contrasts. Conditions were audiovisual McGurk stimuli, visual-only stimuli from the McGurk condition, and auditory stimuli corresponding to the McGurk condition percepts (/ba/-/da/). The magnitude (area) of the MMN for the audiovisual condition was maximal at a latency > 300ms, much later than the maximal magnitude of the auditory MMN (approximately 260ms), suggesting that integration occurs later than auditory phonetic processing. Additional latency, amplitude, and dipole source analyses revealed similarities and differences between the auditory, visual, and audiovisual conditions. Results support an audiovisual integration neural network that is at least partly distinct from and operates at a longer latency than unimodal networks. In addition, results showed dynamic differences in processing across correlated and uncorrelated audiovisual combinations. These results point to a biocomplex system. We can consider the agents of complexity theory to be for our case (non-inclusively) the unimodal sensory/perceptual systems, which have important heterogeneous characteristics. Auditory and visual perception have their own organization and when combined apparently participate in another organization. Apparently also, the dynamics of audiovisual organization vary depending on the correlation of the acoustic-optical phonetic signals. This view contrasts with views of audiovisual integration based primarily on consideration of algorithms or formats for information combination.    

April 10, 2001 | 12:00 pm

“Deterministic Annealing for Clustering, Compression, Classification, Regression, and Speech Recognition”

Kenneth Rose, Department of Electrical and Computer Engineering, University of California at Santa Barbara

[abstract] [biography]

Abstract

The deterministic annealing approach to clustering and its extensions have demonstrated substantial performance improvement over standard supervised and unsupervised learning methods on a variety of important problems including compression, estimation, pattern recognition and classification, and statistical regression. It has found applications in a broad spectrum of disciplines ranging from various engineering fields to physics, biology, medicine and economics. The method offers three important features: ability to avoid many poor local optima; applicability to many different structures/architectures; and ability to minimize the right cost function even when its gradients vanish almost everywhere as in the case of the empirical classification error. It is derived within a probabilistic framework from basic information theoretic principles. The application-specific cost is minimized subject to a constraint on the randomness (Shannon entropy) of the solution, which is gradually lowered. We emphasize intuition gained from analogy to statistical physics, where this is an annealing process that avoids many shallow local minima of the specified cost and, at the limit of zero "temperature", produces a non-random (hard) solution. Phase transitions in the process correspond to controlled increase in model complexity. Alternatively, the method is derived within rate-distortion theory, where the annealing process is equivalent to computation of Shannon's rate-distortion function, and the annealing temperature is inversely proportional to the slope of the curve. This provides new insights into the method and its performance, as well as new insights into rate-distortion theory itself. The basic algorithm is extended by incorporating structural constraints to allow optimization of numerous popular structures including vector quantizers, decision trees, multilayer perceptrons, radial basis functions, mixtures of experts and hidden Markov models. Experimental results show considerable performance gains over standard structure-specific and application-specific training methods. After covering the basics, the talk will emphasize the speech recognition applications of the approach, including a brief summary of work in progress.

Speaker Biography

Kenneth Rose received the Ph.D. degree in electrical engineering from Caltech in 1991. He then joined the faculty of the Department of Electrical and Computer Engineering, University of California at Santa Barbara. His research interests are in information theory, source and channel coding, speech and general pattern recognition, image coding and processing, and nonconvex optimization in general. He currently serves as the editor for source-channel coding for the IEEE Transactions of Communications. In 1990 he received the William R. Bennett Prize-Paper Award from the IEEE Communications Society for the best paper in the Transactions.

March 27, 2001 | 4:30PM

“Learning Probabilistic and Lexicalized Grammars for Natural Language Processing”   Video Available

Rebecca Hwa, University of Maryland

[abstract] [biography]

Abstract

This talk addresses two questions: what are the properties of a good grammar representation for natural language processing applications, and how can such grammars be constructed automatically and efficiently? I shall begin by describing a formalism called the Probabilistic Lexicalized Tree Insertion Grammars (PLTIGs), which has several linguistically motivated properties that are helpful for processing natural languages. Next, I shall present a learning algorithm that automatically induces PLTIGs from human-annotated text corpora. I have conducted empirical studies showing that a trained PLTIG compares favorably with other formalisms on several kinds of tasks. Finally, I shall discuss ways of making grammar induction more efficient. In particular, I want to reduce the dependency of the induction process on human-annotated training data. I will show that by applying a learning technique called sample selection to grammar induction, we can significantly decrease the number of training examples needed, and thereby reducing the human effort spent on annotating training data.

Speaker Biography

Rebecca Hwa is currently a postdoctoral research fellow at University of Maryland, College Park. Her research interests include natural language processing, machine learning, and human computer interaction.    

March 13, 2001 | 4:30PM

“Latent-Variable Representations for Speech Processing and Research”   Video Available

Miguel A. Carreira-Perpinan, Georgetown Institute for Computational and Cognitive Sciences, Georgetown University Medical Center

[abstract] [biography]

Abstract

Continuous latent variable models are probabilistic models that represent a distribution in a high-dimensional Euclidean space using a small number of continuous, latent variables. Examples include factor analysis, the generative topographic mapping (GTM) and independent factor analysis (ICA). This type of models is well suited for dimensionality reduction and sequential data reconstruction. In the first part of this talk I will introduce the theory of continuous latent variable models and show an example of their application to the dimensionality reduction of electropalatographic (EPG) data. In the second part I will present a new method for missing data reconstruction of sequential data that includes as a particular case the inversion of many-to-one mappings. The method is based on multiple pointwise reconstruction and constraint optimisation. Multiple pointwise reconstruction uses a Gaussian mixture joint density model for the data, conveniently implemented with a nonlinear continuous latent variable model (GTM). The modes of the conditional distribution of missing values given present values at each point in the sequence represent local candidate reconstructions. A global sequence reconstruction is obtained by efficiently optimising a constraint, such as continuity or smoothness, with dynamic programming. I derive two algorithms for exhaustive mode finding in Gaussian mixtures, based on gradient-quadratic search and fixed-point search, respectively; as well as estimates of error bars for each mode and a measure of distribution sparseness. I will demonstrate the method with synthetic data for a toy example and a robot arm inverse kinematics problem; and describe potential applications in speech, including the acoustic-to-articulatory mapping problem, audiovisual mappings for speech recognition and recognition of occluded speech.  

Speaker Biography

Miguel A. Carreira-Perpinan is a postdoctoral fellow at the Georgetown Institute for Computational and Cognitive Sciences, Georgetown University Medical Center. He has university degrees in computer science and in physics (Technical University of Madrid, Spain, 1991) and a PhD in computer science (University of Sheffield, UK, 2001). In 1993-94 he worked at the European Space Agency in Darmstadt, Germany, on real-time simulation of satellite thermal subsystems. His current research interests are statistical pattern recognition and computational neuroscience.

March 6, 2001 | 4

“Computational Anatomy: Computing Metrics on Anatomical Shapes”   Video Available

Mirza Faisal Beg, Department of Biomedical Engineering, Johns Hopkins University

[abstract] [biography]

Abstract

In this talk, I will present the problem of quantifying anatomical shape as represented in an image within the framework of the deformable template model. Briefly, the deformable template model approach involves selecting a representative shape to be the reference or the template representing prior knowledge of the shape of the anatomical sub-structures in the anatomy to be characterized and comparing anatomical shapes as represented in given images also called as the target to the image of the template. Comparison is done by computing extremely detailed, high-dimensional diffeomorphisms (smooth and invertible transformations) as a flow between the images that will deform the template image to match the target image. By minimizing a cost comprised of a term representing the energy of the velocity of the flow field and a term that represents the amount of mismatch between images being compared, such a diffeomorphic transformation between the images is computed. The construction of diffeomorphisms between the images allows metrics to be calculated in comparing shapes represented in image data. Transformations "far" from identity represent larger deviations in shape from the template than those "close" to the identity transformation. The minimization procedure to compute the diffeomorphic transformations is implemented via a standard steepest-descent technique. I will show some preliminary results on image matching and the metrics computed on mitochindrial and hippocampal shapes by using this approach. An example of the possible clinical applications of this work are in the area of diagnosis of neuropsychiatric disorders such as Alzheimer's disease, Schizophrenia, and Epilepsy by quantifying shape changes in the hippocampus.  

Speaker Biography

Mirza Faisal BegPh.D. Candidate, Center for Imaging ScienceDepartment of Biomedical EngineeringJohns Hopkins University

February 27, 2001 | 12:00 pm

“Whats up with pronunciation variation? Why its so hard to model and what to do about it”   Video Available

Dan Jurafsky, University of Colorado, Boulder Department of Linguistics, Department of Computer Science, Institute of Cognitive Science, & Center for Spoken Language Research

[abstract] [biography]

Abstract

Automatic recognition of human-to-machine speech has made fantastic progress in the last decades, and current systems achieve word error rates below 5% on many tasks. But recognition of human-to-human speech is much harder; error rates are often 30% or even higher. Many studies of human-to-human speech have shown that pronunciation variation is a key factor contributing to these high error rates. Previous models of pronunciation variation, however, have not had significant success in reducing error rates. In order to help understand why gains in pronunciation modeling have proven so elusive, we investigated which kinds of pronunciation variation are well captured by current triphone models, and which are not. By examining the change in behavior of a recognizer as it receives further triphone training, we show that many of the kinds of variation which previous pronunciation models attempt to capture, such as phone substitution or phone reduction due to neighboring phonetic contexts, are already well captured by triphones. Our analysis suggests rather that syllable deletion caused by non-phonetic factors is a major cause of difficulty for recognizers. We then investigated a number of such non-phonetic factors in a large database of phonetically hand-transcribed words from the Switchboard corpus. Using linear and logistic regression to control for phonetic context and rate of speech, we did indeed find very significant effects of non-phonetic factors. For example, words have extraordinarily long and full pronunciations when they occur near "disfluencies" (pauses, filled pauses, and repetitions), or initially or finally in turns or utterances, while words which have a high unigram, bigram, or reverse bigram (given the following word) probability have much more reduced pronunciations. These factors must be modeled with lexicons based on dynamic pronunciation probabilities; I describe our work-in-progress on building such a lexicon. This talk describes joint work with Wayne Ward, Alan Bell, Eric Fosler-Lussier, Dan Gildea, Cynthia Girand, Michelle Gregory, Keith Herold, Zhang Jianping, William D. Raymond, Zhang Sen, and Yu Xiuyang.

Speaker Biography

Dan Jurafsky is an assistant professor in the Linguistics and Computer Science departments, the Institute of Cognitive Science, and the Center for Spoken Language Research at the University of Colorado, Boulder. He was last at Hopkins for the JHU Summer 1997 Workshop managing the dialog act modeling group. Dan is the author with Jim Martin of the recent Prentice Hall textbook "Speech and Language Processing", and is teaching speech synthesis and recognition this semester at Boulder. Dan also plays the drums in mediocre pop bands and the corpse in local opera productions, and is currently working on his recipe for "Three Cups Chicken".

February 6, 2001 | 4:30PM

“Geometric Source Separation: Merging convolutive source separation with geometric beamforming”   Video Available

Lucas Parra, Sarnoff Corporation

[abstract] [biography]

Abstract

Blind source separation of broad band signals in a multi-path environment remains a difficult problem. Robustness has been limited due to frequency permutation ambiguities. In principle, increasing the number of sensors allows improved performance but also introduces additional degrees of freedom in the separating filters that are not fully determined by separation criteria. We propose here to further shape the filters and improve the robustness of blind separation by including geometric information such as sensor positions and localized source assumption. This allows us to combine blind source separation with notions from adaptive and geometric beamforming leading to a number of novel algorithms that could be termed collectively "geometric source separation".

Speaker Biography

Lucas C. Parra was born in Tucumán, Argentina. He received his Diploma in Physics in 1992, and Doctorate in Physics in 1996 from the Maximilian-Ludwig-Universi?t, Munich, Germany. From 1992 to 1995 he worked at the Neural Networks Group at Siemens Central Research in Munich, Germany and at the Machine Learning Department at Siemens Corporate Research (SCR) in Princeton, NJ. During 1995-1997 he was member of the Imaging Department at SCR and worked on medical image processing and novel reconstruction algorithms for nuclear medicine. Since 1997 he is with Sarnoff Corp. His current research concentrates on probabilistic models in various image and signal processing areas.  

Back to Top

2000

December 5, 2000 | 4:30pm-6:00pm

“Semantic Information Processing of Spoken Language”   Video Available

Allen Gorin, AT&T Shannon Laboratories Speech Research Florham Park, New Jersey

[abstract]

Abstract

The next generation of voice-based user interface technology will enable easy-to-use automation of new and existing communication services. A critical issue is to move away form highly-structured menus to a more natural human-machine paradigm. In recent years, we have developed algorithms which learn to extract meaning from fluent speech via automatic acquisition and exploitation of salient words, phrases and grammar fragments from a corpus. These methods have been previously applied to the "How may I help you? task for automated operator services, in English, Spanish and Japanese. In this paper, we report on a new application of these language acquisition methods to a more complex customer care task. We report on empirical comparisons which quantify the increased linguistic and semantic complexity over the previous domain. Experimental results on call-type classification will be reported for this new corpus of 30K utterances from live customer traffic. This traffic is drawn form both human/human and human/machine interactions.

December 5, 2000

“How May I Help You? -- understanding fluently spoken language”

Allen L. Gorin, AT&T Labs Research

[abstract] [biography]

Abstract

We are interested in providing automated services via natural spoken dialog systems. By natural, we mean that the machine understands and acts upon what people actually say, in contrast to what one would like them to say. There are many issues that arise when such systems are targeted for large populations of non-expert users. In this talk, we focus on the task of automatically routing telephone calls based on a user's fluently spoken response to the open-ended prompt of "How may I help you?." We first describe a database generated from 10,000 spoken transactions between customers and human agents. We then describe methods for automatically acquiring language models for both recognition and understanding from such data. Experimental results evaluating call-classification from speech are reported for that database. These methods have been embedded and further evaluated within a spoken dialog system, with subsequent processing for information retrieval and form-filling.

Speaker Biography

Allen Gorin received the B.S. and M.A. degrees in Mathematics from SUNY at Stony Brook in 1975 and 1976 respectively, then the Ph.D. in Mathematics from the CUNY Graduate Center in 1980. From 1980-83 he worked at Lockheed investigating algorithms for target recognition from time-varying imagery. In 1983 he joined AT&T Bell Labs in Whippany where he was the Principle Investigator for AT&T's ASPEN project within the DARPA Strategic Computing Program, investigating parallel architectures and algorithms for pattern recognition. In 1987, he was appointed a Distinguished Member of the Technical Staff. In 1988, he joined the Speech Research Department at Bell Labs in Murray Hill, and is now at AT&T Labs Research in Florham Park. His long-term research interest focuses on machine learning methods for spoken language understanding. He has served as a guest editor for the IEEE Transactions on Speech and Audio, and was a visiting researcher at the ATR Interpreting Telecommunications Research Laboratory in Japan during 1994. He is a member of the Acoustical Society of America and a Senior Member of the IEEE.

November 28, 2000 | 4:30pm-6:00pm

“What makes information accessible during or after narrative comprehension?”

Tom Trabasso, Department of Psychology, University of Chicago

[biography]

Speaker Biography

What information has been activated or is accessible to the reader or listener during comprehension has been a central question of psychologists who study discourse using experimental methods. However, experimenters frequently rely upon informal or intuitive analyses of their texts to make claims about findings . This talk shows that such claims are often misleading or wrong in that they admit of alternative explanations. To do this, the question of what makes information accessible during or after reading narratives is addressed by examining three cases: (1) decision making by Kahneman and Tversky (1982), (2) pronominal reference by Greene, Gerrig, McKoon, and Ratcliff, (1994), and (3) map representations of space (Rinck & Bower, 1995). Using a three-pronged approach of (1) an explicit discourse analysis and (2)a connectionist model to predict (3) behavioral measures in these studies, explanatory (causal) reasoning is shown to provide a unifying account of how alternatives, references, or spatial locations in narrative texts are represented and accessed during or after comprehension. The importance of explicit discourse analysis is supported by the success of the three-pronged approach.

November 14, 2000 | 4:30pm-6:00pm

“The Open Mind Initiative: An internet based distributed framework for developing”   Video Available

David G. Stork, Ricoh Silicon Valley

[abstract] [biography]

Abstract

The Open Mind Initiative provides a framework for large-scale collaborative efforts in building components of "intelligent" systems that address common-sense reasoning, document and language understanding, speech and character recognition, and so on. These areas have highly developed and adequate theory; progress is held back by lack of sufficiently large datasets of 'informal' knowledge, which can be provided by non-expert netizens. Based on the Open Source methodology, the Open Mind Intitiative allows domain specialists to contribute algorithms, tool developers to provide software infrastructure and tools, and most importantly non-specialist netizens to contribute data to large knowledge bases via the internet. An important challenge is to make it easy and rewarding -- for instance by novel game interfaces, financial incentives, and educational interest -- for netizens to provide data. We review free software and open source approaches, including their business and economic models, and past software projects of particular relevance to Open Mind. Traditional Open Source Open Mind Initiative expert knowledge contributed by expert machine learning irrelevant most work is directly on the released software (e.g., Linux device drivers) hacker and programmer culture (e.g., ~100,000 contributors to Linux) Result: software informal knowledge contributed by programmers non-expert netizens machine learning essential most work is on data acquisition, machine learning, and infrastructure (e.g., collecting labelled patterns) netizen and business culture (e.g., ~100,000,000 netizens on the web) Result: software and databases We describe three Open Mind projects: handwriting recognition, speech recognition and commonsense reasoning, as well as some of the challenges and opportunities for computer-human interactions, particularly novel game interfaces, insuring data integrity and learning from heterogeneous contributors. [1] "The Open Mind Initiative" by David G. Stork, IEEE Expert systems and their applications pp. 19-20 May/June 1999. [2] "Character and Document Research in the Open Mind Initiative" by David G. Stork, International Conference on Document Analysis and Recognition (ICDAR99), 1999. [3] "Open Mind Speech Recognition" by Jean-Marc Valin and David G. Stork, Proceedings of Automatic Speech Recognition and Understanding Workshop (ASRU), Keystone CO, Dec. 1999.

Speaker Biography

Dr. David G. Stork is Chief Scientist of Ricoh Silicon Valley as well as Consulting Associate Professor of Electrical Engineering at Stanford University. A graduate of MIT and the University of Maryland, he has been on the faculties of Wellesley College, Swarthmore College, Clark University, Boston University and Stanford University. Dr. Stork sits on the editorial boards of five journals, has published numerous peer-reviewed papers and book chapters and holds over a dozen patents. A distinguished lecturer of the Association for Computing Machinery, he has published five books, including most recently Pattern Classification (2nd ed.) with R. O. Duda, P. E. Hart and D. G. Stork, and HAL's Legacy: 2001's computer as dream and reality for popular audiences, the basis of his forthcoming PBS television documentary 2001: HAL's Legacy. His deepest interests are in pattern recognition by machines and humans.

November 7, 2000 | 4:30pm-6:00pm

“Information Extraction: What has Worked, What hasn't, and What has Promise for the Future”   Video Available

Ralph Weischedel, BBN Technologies

[abstract] [biography]

Abstract

During the past 10 years, one of the dominant application areas for natural language processing research has been automatic extraction of information from text or speech to automatically update databases of names, descriptions, relations, and/or events. During those same 10 years, natural language processing technology has experienced a paradigm shift -- formerly dominated by handwritten rules but now influenced by learning approaches. This talk will review the underlying challenges, state benchmark results on standard test sets, point towards new directions, and suggest a vision for how technologies may be combined into useful systems. Our primary focus among the competing approaches surveyed will be a recent approach to automatic information extraction based on statistical algorithms that learn to extract information from text or speech. The goal is to replace the requirement for writing patterns manually by annotating examples of the information to be extracted. We have evaluated this approach on news data.

Speaker Biography

Ralph Weischedel is a Principal Scientist at BBN with twenty-five years experience in written language processing, artificial intelligence, and knowledge representation. He leads a group of 10 full-time equivalents engaged in research, development, and application of natural language processing technology, including information extraction from text, and probabilistic language understanding. He is a past president of the Association for Computational Linguistics. He joined BBN in 1984, leaving the University of Delaware as an Associate Professor. He received his Ph.D. in Computer & Information Sciences from the University of Pennsylvania in 1975.

November 6, 2000 | 4:30pm-6:00pm

“Non-Stationary Multi-Stream Processing Towards Robust and Adaptive Speech Recognition”   Video Available

Herv Bourlard, Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) and Swiss Federal Institute of Technology at Lausanne (EPFL) Switzerland

[abstract] [biography]

Abstract

Multi-stream automatic speech recognition (ASR) extends the standard hidden Markov model (HMM) based approach by assuming that the speech signal is processed by different (independent) "experts", each expert focusing on a different characteristic of the signal, and that the different stream likelihoods (or posteriors) are combined at some (temporal) stage to yield a global recognition output. The most successful approach developed so far consists in combining the stream likelihoods through integration over all possible stream combinations (i.e., over all possible values of a hidden variable representing the position of the most reliable streams). As a particular case of this approach, subband-based speech recognition will also be discussed. In this framework, we will introduce different mathematical models and discuss some interesting relationships with psycho-acoustic evidence. As a further extension to multi-stream ASR, we will also introduce a new approach, referred to as HMM2, where the HMM emission probabilities are estimated via state specific feature based HMMs responsible for merging the stream information and modeling their possible correlation. For each case, recognition results achieved on non-stationary noise will be presented, and possibilities of fast adaptation (of a limited number of parameters) will be illustrated through specific examples.

Speaker Biography

Hervé Bourlard is Professor at the Swiss Federal Institute of Technology at Lausanne (EPFL, Switzerland) and Director of the Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP, Martigny, Switzerland, http://www.idiap.ch/), a semi-private research institute affiliated with EPFL and performing research in speech processing, vision, and machine learning. He is also External Fellow of the International Computer Science Institute (ICSI), Berkeley, CA. With nearly 20 years of experience in speech processing, statistical pattern recognition, applied mathematics, and artificial neural networks, Hervé Bourlard is the author/coauthor of over 140 reviewed papers (and book chapters) and two books. In 1996, he received the IEEE Signal Processing Society Award for the paper (co-authored with N. Morgan, from ICSI, Berkeley) entitled "Continuous Speech Recognition -- An Introduction to the Hybrid HMM/Connectionist Approach," published in the IEEE Signal Processing Magazine in May 1995. Hervé Bourlard is an IEEE Fellow "for contributions in the field of statistical speech recognition and neural networks". Hervé Bourlard is co-Editor-in-Chief of Speech Communication, member of the IEEE Technical Committee for Neural Network Signal Processing, member of the Administration Committee of EURASIP (European Association for Signal Processing), member of the Advisory Committee of ISCA (International Speech Communication Association), appointed expert for the European Community, and member of the Foundation Council of the Swiss Network for Innovation.

October 31, 2000 | 4:30pm-6:00pm

“Artificial Language Learning and Language Acquisition”   Video Available

Rebecca Gomez, Department of Psychology, JHU

[abstract]

Abstract

The rapidity with which children acquire language is one of the mysteries of human cognition. A widely held view is that children master language by means of a language-specific learning device. An earlier proposal, generating renewed interest, is that children make use of domain-general, associative learning mechanisms in acquiring language. However, we know little about the actual learning mechanisms involved, making it difficult to determine the relative contributions of innate and acquired knowledge. A recent approach to studying this problem exposes infants to artificial languages and assesses the resulting learning. Studies using this paradigm have led to a number of exciting discoveries regarding the learning mechanisms available during infancy. This approach is useful for achieving finer-grained characterizations of infant learning mechanisms than previously imagined. Such characterizations should result in a better understanding of the relative contributions of innate and learned factors to language acquisition, as well as the dynamic between the two. I will present several studies using this methodology and will discuss the implications for the study of language acquisition.

October 24, 2000 | 4:30pm-6:00pm

“A Phonologist's View of the Past Tense Controversy”