Skip to content

woogle

OpenGov Lab data as a source for training GPT-NL

The OpenGov Lab's WooZM (former WooGLe) dataset was used as a source for training GPT-NL, a large language model focused on Dutch language. The Woogle dataset, which contains millions of documents and metadata from the Dutch government, provided a rich and diverse source of text for training GPT-NL.

Chart showing the contribution of Woogle data to GPT-NL training There we are, at the bottom right corner!

WooGLe keeps growing: 8 million documents!

Every year around Christmas, we take stock of WooGLe's growth. And once again, the most recent numbers are remarkable: WooGLe has doubled in size, now indexing over 8 million documents across more than 90,000 dossiers from nearly 800 government bodies! Maarten Marx has written an extensive blog about this on Wooverheid (Dutch). Read the summary below!

WooGLe growth chart showing exponential increase from 0.1M to 8M documents WooGLe's document count has doubled each year since launch