Шрифт:
Интервал:
Закладка:
2625
Zhang S., Roller S., Goyal N., Artetxe M., Chen M., Chen S., Dewan C., Diab M., Li X., Lin X. V., Mihaylov T., Ott M., Shleifer S., Shuster K., Simig D., Koura P. S., Sridhar A., Wang T., Zettlemoyer L. (2022). OPT: Open Pre-trained Transformer Language Models // https://arxiv.org/abs/2205.01068
2626
Zhang S., Diab M., Zettlemoyer L. (2022). Democratizing access to large-scale language models with OPT-175B / Meta AI, May 3, 2022 // https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/
2627
Taylor R., Kardas M., Cucurull G., Scialom T., Hartshorn A., Saravia E., Poulton A., Kerkez V., Stojnic R. (2022). Galactica: A Large Language Model for Science // https://arxiv.org/abs/2211.09085
2628
AI21 Labs Makes Language AI Applications Accessible to Broader Audience (2021) / businesswire: a Berkshire Hathaway Company, August 11, 2021 // https://www.businesswire.com/news/home/20210811005033/en/AI21-Labs-Makes-Language-AI-Applications-Accessible-to-Broader-Audience
2629
Rae J., Irving G., Weidinger L. (2021). Language modelling at scale: Gopher, ethical considerations, and retrieval / DeepMind blog, 08 Dec 2021 // https://deepmind.com/blog/article/language-modelling-at-scale
2630
Rae J. W., Borgeaud S., Cai T., Millican K., Hoffmann J., Song F., Aslanides J., Henderson S., Ring R., Young S., Rutherford E., Hennigan T., Menick J., Cassirer A., Powell R., Driessche G. v. d., Hendricks L. A., Rauh M., Huang P., Glaese A., Welbl J., Dathathri S., Huang S., Uesato J., Mellor J., Higgins I., Creswell A., McAleese N., Wu A., Elsen E., Jayakumar S., Buchatskaya E., Budden D., Sutherland E., Simonyan K., Paganini M., Sifre L., Martens L., Li X. L., Kuncoro A., Nematzadeh A., Gribovskaya E., Donato D., Lazaridou A., Mensch A., Lespiau J., Tsimpoukelli M., Grigorev N., Fritz D., Sottiaux T., Pajarskas M., Pohlen T., Gong Z., Toyama D., d'Autume C. d. M., Li Y., Terzi T., Mikulik V., Babuschkin I., Clark A., Casas D. d. L., Guy A., Jones C., Bradbury J., Johnson M., Hechtman B., Weidinger L., Gabriel I., Isaac W., Lockhart E., Osindero S., Rimell L., Dyer C., Vinyals O., Ayoub K., Stanway J., Bennett L., Hassabis D., Kavukcuoglu K., Irving G. (2021). Scaling Language Models: Methods, Analysis & Insights from Training Gopher // https://arxiv.org/abs/2112.11446
2631
Kaplan J., McCandlish S., Henighan T., Brown T. B., Chess B., Child R., Gray S., Radford A., Wu J., Amodei D. (2020). Scaling Laws for Neural Language Models // https://arxiv.org/abs/2001.08361
2632
Hoffmann J., Borgeaud S., Mensch A., Sifre L. (2022). An empirical analysis of compute-optimal large language model training / DeepMind blog, April 12, 2022 // https://www.deepmind.com/publications/an-empirical-analysis-of-compute-optimal-large-language-model-training
2633
Hoffmann J., Borgeaud S., Mensch A., Buchatskaya E., Cai T., Rutherford E., de Las Casas D., Hendricks L. A., Welbl J., Clark A., Hennigan T., Noland E., Millican K., van den Driessche G., Damoc B., Guy A., Osindero S., Simonyan K., Elsen E., Rae J. W., Vinyals O., Sifre L. (2022). Training Compute-Optimal Large Language Models // https://arxiv.org/abs/2203.15556
2634
Pichai S. (2023). Google DeepMind: Bringing together two world-class AI teams. / Google Blog, Apr 20, 2023 // https://blog.google/technology/ai/april-ai-update/
2635
Chowdhery A., Narang S., Devlin J., Bosma M., Mishra G., Roberts A., Barham P., Chung H. W., Sutton C., Gehrmann S., Schuh P., Shi K., Tsvyashchenko S., Maynez J., Rao A., Barnes P., Tay Y., Shazeer N., Prabhakaran V., Reif E., Du