At its annual I/O conference, Google unveiled PaLM 2, the successor to its PaLM large language model for understanding and generating multilingual text. Google claims that it’s a significant improvement over its predecessor and that it even bests OpenAI’s GPT-4, depending on the task at hand.
But it’s far from a panacea.
Absent some hands-on time with PaLM 2, we only have the accompanying Google-authored research paper to go by. But despite some opaqueness where it concerns PaLM 2’s technical specs, the paper is forthcoming about many of the model’s major limitations.
On the subject of opaqueness, the 91-page paper, published today, doesn’t reveal which data exactly was used to train PaLM 2 — save that it was a collection of web documents, books, code, mathematics, and conversational data “significantly larger” than that used to train PaLM v1. The coauthors of the paper