Machine Translation (Pack - Book and Ebook) book cover

Machine Translation

By Pushpak Bhattacharyya

© 2015 – Chapman and Hall/CRC

260 pages | 46 B/W Illus.

Purchasing Options:$ = USD
Pack - Book and Ebook: 9781439897188
pub: 2015-01-13
$96.00
x
eBook (VitalSource) : 9781439897201
pub: 2015-01-02
from $40.00


FREE Standard Shipping!
e–Inspection Copy

Description

Three paradigms have dominated machine translation (MT)—rule-based machine translation (RBMT), statistical machine translation (SMT), and example-based machine translation (EBMT). These paradigms differ in the way they handle the three fundamental processes in MT—analysis, transfer, and generation (ATG). In its pure form, RBMT uses rules, while SMT uses data. EBMT tries a combination—data supplies translation parts that rules recombine to produce translation.

Machine Translation compares and contrasts the salient principles and practices of RBMT, SMT, and EBMT. Offering an exposition of language phenomena followed by modeling and experimentation, the text:

  • Introduces MT against the backdrop of language divergence and the Vauquois triangle
  • Presents expectation maximization (EM)-based word alignment as a turning point in the history of MT
  • Discusses the most important element of SMT—bilingual word alignment from pairs of parallel translations
  • Explores the IBM models of MT, explaining how to find the best alignment given a translation pair and how to find the best translation given a new input sentence
  • Covers the mathematics of phrase-based SMT, phrase-based decoding, and the Moses SMT environment
  • Provides complete walk-throughs of the working of interlingua-based and transfer-based RBMT
  • Analyzes EBMT, showing how translation parts can be extracted and recombined to translate a new input, all automatically
  • Includes numerous examples that illustrate universal translation phenomena through the usage of specific languages

Machine Translation is designed for advanced undergraduate-level and graduate-level courses in machine translation and natural language processing. The book also makes a handy professional reference for computer engineers.

Reviews

"…a clear, well-written introduction to a key area in computer science."

—Ernest Davis, in Computing Reviews

Table of Contents

List of Figures

List of Tables

Preface

Acknowledgments

About the Author

Introduction

A Feel for a Modern Approach to Machine Translation: Data-Driven MT

MT Approaches: Vauquois Triangle

Understanding Transfer over the Vauquois Triangle

Understanding Ascending and Descending Transfer

Language Divergence with Illustration between Hindi and English

Syntactic Divergence

Lexical-Semantic Divergence

Three Major Paradigms of Machine Translation

MT Evaluation

Adequacy and Fluency

Automatic Evaluation of MT Output

Summary

Further Reading

Learning Bilingual Word Mappings

A Combinatorial Argument

Necessary and Sufficient Conditions for Deterministic Alignment in Case of One-to-One Word Mapping

A Naïve Estimate for Corpora Requirement

Deeper Look at One-to-One Alignment

Drawing Parallels with Part of Speech Tagging

Heuristics-Based Computation of the VE × VF Table

Iterative (EM-Based) Computation of the VE × VF Table

Initialization and Iteration 1 of EM

Iteration 2

Iteration 3

Mathematics of Alignment

A Few Illustrative Problems to Clarify Application of EM

Derivation of Alignment Probabilities

Expressing the E- and M-Steps in Count Form

Complexity Considerations

Storage

Time

EM: Study of Progress in Parameter Values

Necessity of at Least Two Sentences

One-Same-Rest-Changed Situation

One-Changed-Rest-Same Situation

Summary

Further Reading

IBM Model of Alignment

Factors Influencing P(f|e)

Alignment Factor a

Length Factor m

IBM Model 1

The Problem of Summation over Product in IBM Model 1

EM for Computing P(f|e)

Alignment in a New Input Sentence Pair

Translating a New Sentence in IBM Model 1: Decoding

IBM Model 2

EM for Computing P(f|e) in IBM Model 2

Justification for and Linguistic Viability of P(i|j,l,m)

IBM Model 3

Summary

Further Reading

Phrase-Based Machine Translation

Need for Phrase Alignment

Case of Promotional/Demotional Divergence

Case of Multiword (Includes Idioms)

Phrases Are Not Necessarily Linguistic Phrases

An Example to Illustrate Phrase Alignment Technique

Two-Way Alignments

Symmetrization

Expansion of Aligned Words to Phrases

Phrase Table

Mathematics of Phrase-Based SMT

Understanding Phrase-Based Translation through an Example

Deriving Translation Model and Calculating Translation and Distortion Probabilities

Giving Different Weights to Model Parameters

Fixing λ Values: Tuning

Decoding

Example to Illustrate Decoding

Moses

Installing Moses

Workflow for Building a Phrase-Based SMT System

Preprocessing for Moses

Training Language Model

Training Phrase Model

Tuning

Decoding Test Data

Evaluation Metric

More on Moses

Summary

Further Reading

Rule-Based Machine Translation (RBMT)

Two Kinds of RBMT: Interlingua and Transfer

What Exactly Is Interlingua?

Illustration of Different Levels of Transfer

Universal Networking Language (UNL)

Illustration of UNL

UNL Expressions as Binary Predicates

Why UNL?

Interlingua and Word Knowledge

How Universal Are UWs?

UWs and Multilinguality

UWs and Multiwords

UW Dictionary and Wordnet

Comparing and Contrasting UW Dictionary and Wordnet

Translation Using Interlingua

Illustration of Analysis and Generation

Details of English-to-UNL Conversion: With Illustration

Illustrated UNL Generation

UNL-to-Hindi Conversion: With Illustration

Function Word Insertion

Case Identification and Morphology Generation

Representative Rules for Function Words Insertion

Syntax Planning

Transfer-Based MT

What Exactly Are Transfer Rules?

Case Study of Marathi-Hindi Transfer-Based MT

Krudant: The Crux of the Matter in M-H MT

M-H MT System

Summary

Further Reading

Example-Based Machine Translation

Illustration of Essential Steps of EBMT

Deeper Look at EBMT’s Working

Word Matching

Matching of Have

EBMT and Case-Based Reasoning

Text Similarity Computation

Word Based Similarity

Tree and Graph Based Similarity

CBR’s Similarity Computation Adapted to EBMT

Recombination: Adaptation on Retrieved Examples

Based on Sentence Parts

Based on Properties of Sentence Parts

Recombination Using Parts of Semantic Graph

EBMT and Translation Memory

EBMT and SMT

Summary

Further Reading

Index

About the Author

Pushpak Bhattacharyya is Vijay and Sita Vashee chair professor of computer science and engineering at the Indian Institute of Technology (IIT) Bombay, where he has been teaching and researching for the last 25 years. He was educated at IIT Kharagpur (B.Tech), IIT Kanpur (M.Tech), and IIT Bombay (Ph.D). While earning his Ph.D, he was visiting scholar at the Massachusetts Institute of Technology. Subsequently, he has been visiting professor at Stanford University and University of Grenoble, and distinguished lecturer at the University of Houston. Dr. Bhattacharyya’s research interests include natural language processing, machine learning, machine translation, information extraction, sentiment analysis, and cross-lingual search, in which he has published extensively. Currently, he is associate editor of ACM Transactions on Asian Language Information Processing and vice president-elect of Association of Computational Linguistics (ACL).

Subject Categories

BISAC Subject Codes/Headings:
BUS061000
BUSINESS & ECONOMICS / Statistics
COM037000
COMPUTERS / Machine Theory
MAT000000
MATHEMATICS / General
MAT004000
MATHEMATICS / Arithmetic
TEC037000
TECHNOLOGY & ENGINEERING / Robotics