Text Processing in Java
HomeStore

Text Processing in Java

Text Processing in Java

$363.82
Text Processing in Java
$363.82

The Story

This book teaches you how to master the subtle art of multilingual text processing and prevent text data corruption. It provides an introduction to natural language processing using Lucene and Solr. It gives you tools and techniques to manage large collections of
text data, whether they come from news feeds, databases, or legacy documents. Each chapter contains executable programs that can also be used for text data forensics. Topics covered: • Unicode code points • Character encodings from ASCII and Big5 to UTF-8 and UTF-32LE • Character normalization using International Components for Unicode (ICU) • Java I/O, including working directly with zip, gzip, and tar files • Regular expressions in Java • Transporting text data via HTTP • Parsing and generating XML, HTML, and JSON • Using Lucene 4 for natural language search and text classification • Search, spelling correction, and clustering with Solr 4 Other books on text processing presuppose much of the material covered in this book.
They gloss over the details of transforming text from one format to another and assume perfect input data. The messy reality of raw text will have you reaching for this book again and again.
ASIN: 0988208725
VSKU: BVV.0988208725.G
Condition: Good
Author/Artist:Morris, Mitzi
Binding: Paperback
Note: Any images shown are stock photographs and product may differ from what is shown.
Condition Notes: The item shows wear from consistent use, but it remains in good condition and works perfectly. All pages and cover are intact including the dust cover, if applicable . Spine may show signs of wear. Pages may include limited notes and highlighting. May NOT include discs, access code or other supplemental materials.

Description

This book teaches you how to master the subtle art of multilingual text processing and prevent text data corruption. It provides an introduction to natural language processing using Lucene and Solr. It gives you tools and techniques to manage large collections of
text data, whether they come from news feeds, databases, or legacy documents. Each chapter contains executable programs that can also be used for text data forensics. Topics covered: • Unicode code points • Character encodings from ASCII and Big5 to UTF-8 and UTF-32LE • Character normalization using International Components for Unicode (ICU) • Java I/O, including working directly with zip, gzip, and tar files • Regular expressions in Java • Transporting text data via HTTP • Parsing and generating XML, HTML, and JSON • Using Lucene 4 for natural language search and text classification • Search, spelling correction, and clustering with Solr 4 Other books on text processing presuppose much of the material covered in this book.
They gloss over the details of transforming text from one format to another and assume perfect input data. The messy reality of raw text will have you reaching for this book again and again.
ASIN: 0988208725
VSKU: BVV.0988208725.G
Condition: Good
Author/Artist:Morris, Mitzi
Binding: Paperback
Note: Any images shown are stock photographs and product may differ from what is shown.
Condition Notes: The item shows wear from consistent use, but it remains in good condition and works perfectly. All pages and cover are intact including the dust cover, if applicable . Spine may show signs of wear. Pages may include limited notes and highlighting. May NOT include discs, access code or other supplemental materials.