Skip to main content

1st Edition

Machine Translation

By Pushpak Bhattacharyya Copyright 2015

264 Pages 46 B/W Illustrations

by Chapman & Hall

260 Pages

by Chapman & Hall

Also available as eBook on:

Taylor & Francis eBooks
(Institutional Purchase)Opens in new tab or window

Description

Three paradigms have dominated machine translation (MT)—rule-based machine translation (RBMT), statistical machine translation (SMT), and example-based machine translation (EBMT). These paradigms differ in the way they handle the three fundamental processes in MT—analysis, transfer, and generation (ATG). In its pure form, RBMT uses rules, while SMT uses data. EBMT tries a combination—data... Read more

Table of Contents

List of Figures

List of Tables

Preface

Acknowledgments

About the Author

Introduction

A Feel for a Modern Approach to Machine Translation: Data-Driven MT

MT Approaches: Vauquois Triangle

Understanding Transfer over the Vauquois Triangle

Understanding Ascending and Descending Transfer

Language Divergence with Illustration between Hindi and English

Syntactic Divergence

Lexical-Semantic Divergence

Three Major Paradigms of Machine Translation

MT Evaluation

Adequacy and Fluency

Automatic Evaluation of MT Output

Summary

Further Reading

Learning Bilingual Word Mappings

A Combinatorial Argument

Necessary and Sufficient Conditions for Deterministic Alignment in Case of One-to-One Word Mapping

A Naïve Estimate for Corpora Requirement

Deeper Look at One-to-One Alignment

Drawing Parallels with Part of Speech Tagging

Heuristics-Based Computation of the V_E × V_F Table

Iterative (EM-Based) Computation of the V_E × V_F Table

Initialization and Iteration 1 of EM

Iteration 2

Iteration 3

Mathematics of Alignment

A Few Illustrative Problems to Clarify Application of EM

Derivation of Alignment Probabilities

Expressing the E- and M-Steps in Count Form

Complexity Considerations

Storage

Time

EM: Study of Progress in Parameter Values

Necessity of at Least Two Sentences

One-Same-Rest-Changed Situation

One-Changed-Rest-Same Situation

Summary

Further Reading

IBM Model of Alignment

Factors Influencing P(f|e)

Alignment Factor a

Length Factor m

IBM Model 1

The Problem of Summation over Product in IBM Model 1

EM for Computing P(f|e)

Alignment in a New Input Sentence Pair

Translating a New Sentence in IBM Model 1: Decoding

IBM Model 2

EM for Computing P(f|e) in IBM Model 2

Justification for and Linguistic Viability of P(i|j,l,m)

IBM Model 3

Summary

Further Reading

Phrase-Based Machine Translation

Need for Phrase Alignment

Case of Promotional/Demotional Divergence

Case of Multiword (Includes Idioms)

Phrases Are Not Necessarily Linguistic Phrases

An Example to Illustrate Phrase Alignment Technique

Two-Way Alignments

Symmetrization

Expansion of Aligned Words to Phrases

Phrase Table

Mathematics of Phrase-Based SMT

Understanding Phrase-Based Translation through an Example

Deriving Translation Model and Calculating Translation and Distortion Probabilities

Giving Different Weights to Model Parameters

Fixing λ Values: Tuning

Decoding

Example to Illustrate Decoding

Moses

Installing Moses

Workflow for Building a Phrase-Based SMT System

Preprocessing for Moses

Training Language Model

Training Phrase Model

Tuning

Decoding Test Data

Evaluation Metric

More on Moses

Summary

Further Reading

Rule-Based Machine Translation (RBMT)

Two Kinds of RBMT: Interlingua and Transfer

What Exactly Is Interlingua?

Illustration of Different Levels of Transfer

Universal Networking Language (UNL)

Illustration of UNL

UNL Expressions as Binary Predicates

Why UNL?

Interlingua and Word Knowledge

How Universal Are UWs?

UWs and Multilinguality

UWs and Multiwords

UW Dictionary and Wordnet

Comparing and Contrasting UW Dictionary and Wordnet

Translation Using Interlingua

Illustration of Analysis and Generation

Details of English-to-UNL Conversion: With Illustration

Illustrated UNL Generation

UNL-to-Hindi Conversion: With Illustration

Function Word Insertion

Case Identification and Morphology Generation

Representative Rules for Function Words Insertion

Syntax Planning

Transfer-Based MT

What Exactly Are Transfer Rules?

Case Study of Marathi-Hindi Transfer-Based MT

Krudant: The Crux of the Matter in M-H MT

M-H MT System

Summary

Further Reading

Example-Based Machine Translation

Illustration of Essential Steps of EBMT

Deeper Look at EBMT’s Working

Word Matching

Matching of Have

EBMT and Case-Based Reasoning

Text Similarity Computation

Word Based Similarity

Tree and Graph Based Similarity

CBR’s Similarity Computation Adapted to EBMT

Recombination: Adaptation on Retrieved Examples

Based on Sentence Parts

Based on Properties of Sentence Parts

Recombination Using Parts of Semantic Graph

EBMT and Translation Memory

EBMT and SMT

Summary

Further Reading

Index

Author(s)

Biography

Pushpak Bhattacharyya is Vijay and Sita Vashee chair professor of computer science and engineering at the Indian Institute of Technology (IIT) Bombay, where he has been teaching and researching for the last 25 years. He was educated at IIT Kharagpur (B.Tech), IIT Kanpur (M.Tech), and IIT Bombay (Ph.D). While earning his Ph.D, he was visiting scholar at the Massachusetts Institute of Technology. Subsequently, he has been visiting professor at Stanford University and University of Grenoble, and distinguished lecturer at the University of Houston. Dr. Bhattacharyya’s research interests include natural language processing, machine learning, machine translation, information extraction, sentiment analysis, and cross-lingual search, in which he has published extensively. Currently, he is associate editor of ACM Transactions on Asian Language Information Processing and vice president-elect of Association of Computational Linguistics (ACL).

Critics' Reviews

"…a clear, well-written introduction to a key area in computer science."
—Ernest Davis, in Computing Reviews

Add to Cart