Google's Universal Pretraining Framework Unifies Language Learning Paradigms – Synced

Synced
AI Technology & Industry Review
Synced

56 Temperance St, #700
Toronto, ON M5H 3V5
In the new paper Unifying Language Learning Paradigms, a Google Research/Brain team proposes a framework for pretraining universal language models that are effective across many different tasks. Their 20B parameter model surpasses 175B GPT-3 on the zero-shot SuperGLUE benchmark and triples the performance of T5-XXL on one-shot summarization tasks.
Generalization is one of the primary goals in contemporary machine learning research and is regarded as a pathway to artificial general intelligence. Although today’s pretrained large language models (LMs) continue to push the state-of-the-art in natural language processing (NLP), most such models target specific problem classes and suffer significant performance drops when applied to new tasks. Is it possible to pretrain language models that will work well across many diverse tasks?
A Google Research/Brain team addresses this question in the new paper Unifying Language Learning Paradigms, proposing UL2, a framework for pretraining universal language models that are effective across many different tasks. Their 20B parameter model surpasses the state-of-the-art 175B GPT-3 on the zero-shot SuperGLUE benchmark and triples the performance of T5-XXL on one-shot summarization tasks.
The UL2 framework aims at building a universally applicable language model that is consistently effective across various types of datasets, tasks, and setups. UL2 is driven by Mixture-of-Denoisers (MoD), a novel pretraining objective that integrates diverse pretraining paradigms to enable a single model to maintain strong performance across different tasks.
MoD employs three main paradigms during pretraining: R-Denoiser, a standard denoiser that is good at acquiring knowledge instead of learning to generate fluent text; S-Denoiser, designed for specific denoising cases where a strict sequential order can be observed for framing input-to-target tasks; and X-Denoiser, which is adopted when the model needs to recover a large part of the input but is only given a small moderated part. A novel mode-switching feature enables dynamic mode switching via discrete prompting, such that the model can switch between the R, S and X denoisers on-demand when learning downstream tasks.
In their empirical study, the team conducted extensive experiments on diverse tasks ranging from supervised to prompt-based in-context few-shot learning. In the evaluations, the proposed UL2 outperformed a T5 baseline by 43.6 percent and GPT-like models by 76.1 percent. The team also scaled UL2 to 20B parameters and ran the model on 50+ NLP tasks, where it achieved state-of-the-art performance on a vast majority of the tasks and setups. In zero/few-shot experiments, UL2 surpassed GPT-3 175B on the zero-shot SuperGLUE benchmark.
Flax-based T5X model checkpoints for the 20B UL2 are available on the project’s GitHub. The paper Unifying Language Learning Paradigms is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
Machine Intelligence | Technology & Industry | Information & Analysis
Your email address will not be published.











Synced

56 Temperance St, #700
Toronto, ON M5H 3V5

One Broadway, 14th Floor, Cambridge, MA 02142

75 E Santa Clara St, 6th Floor, San Jose, CA 95113
Contact Us @ global.general@jiqizhixin.com
Visit Us @ Synced China
Contribute to Synced Review
 

source
Connect with Chris Hood, a digital strategist that can help you with AI.

Leave a Reply

Your email address will not be published.

© 2022 AI Caosuo - Proudly powered by theme Octo