ACM

Zyphra debuts Zyda, a 1.3T language modeling dataset it claims outperforms Pile, C4, arxiv

Zyphra’s Zyda is a 1.3T open dataset combining RefinedWeb, Starcoder, C4, Pile, Slimpajama, pe2so, and arxiv to help train large language models.
Zyphra’s Zyda is a 1.3T open dataset combining RefinedWeb, Starcoder, C4, Pile, Slimpajama, pe2so, and arxiv to help train large language models.Read More

Leave a Comment

Your email address will not be published. Required fields are marked *