CIO

PyText builds on PyTorch for language recognition

A Facebook project for natural language processing is now open source, and it promises better ways to mine texts for meaning

Facebook has open-sourced its PyText project, a machine learning library for natural language processing (NLP) intended to make it easier to put together both experimental projects and production systems.

PyText, built with Facebook’s existing PyTorch library for machine learning and used internally by the company, was created to address how machine learning using neural networks (such as for NLP). Such libraries typically were “a trade-off between frameworks optimized for experimentation and those optimized for production,” they said in a post.

Frameworks built for experimentation allowed fast prototyping, but suffered from “increased latency and memory use in production,” Facebook’s engineers wrote. On the other hand, frameworks built for production worked better under load, but were tougher to develop quickly with.

PyText’s main touted difference is its workflow, which Facebook claims can be optimized for either experiments or production use. The frameworks’ components can be stitched together to create an entire NLP pipeline, or individual pieces can be broken out and reused in other contexts.

Training new models can be distributed across multiple nodes, and multiple models can be trained at the same time. PyText can also use many existing models for text classification, skipping the need for training entirely in those cases.

PyText also improves comprehension via contextual models, a way to enrich the model’s understanding of a text from previous inputs. A chatbot, for example, could reuse information from earlier messages in a discussion to shape its answers.

One feature in PyText shows how machine learning systems driven by Python find ways to avoid the performance issues that can crop up with the language. PyText models can be exported in the optimized ONNX format for fast inferencing with Caffe2. This way, the inferencing process isn’t limited by Python’s runtime, but Python is still used to assemble the pipeline and orchestrate model training.

PyTorch itself was recently given a formal Version 1.0 release, with its own share of features intended to speed training and inference without being limited by Python. One of them, Torch Script, just-in-time-compiles Python code to speed its execution, but it can work only with a subset of the language.

Near-term plans for PyText include “supporting multilingual modeling and other modeling capabilities, making models easier to debug, and adding further optimizations for distributed training,” Facebook’s engineers say.