Шрифт:

Интервал:

Закладка:

Сделать

1 ... 438 439 440 441 442 443 444 445 446 ... 482

Перейти на страницу:

Chemistry, 2020-01-10. // https://doi.org/10.3389/fchem.2019.00895

2152

Le Q. V., Mikolov T. (2014). Distributed Representations of Sentences and Documents // https://arxiv.org/abs/1405.4053

2153

Kalchbrenner N., Blunsom P. (2014). Recurrent Continuous Translation Models / Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1700—1709 // https://www.aclweb.org/anthology/D13-1176/

2154

Sutskever I., Vinyals O., Le Q. V. (2014). Sequence to Sequence Learning with Neural Networks / Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2, pp. 3104–3112 // https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf

2155

Bahdanau D., Cho K., Bengio Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate / International Conference on Learning Representations (ICLR-2015) // https://arxiv.org/abs/1409.0473

2156

«В Минске пытался прибиться хоть куда-нибудь». Дима Богданов изобрёл механизм attention и работает с лауреатом премии Тьюринга. Говорим про ML и Монреаль (2019). / Dev.BY, 3 апреля 2019 // https://devby.io/news/dmitry-bogdanov

2157

Mnih V., Heess N., Graves A., Kavukcuoglu K. (2014). Recurrent Models of Visual Attention / Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2, pp. 2204–2212 // https://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf

2158

Ba J. L., Mnih V., Kavukcuoglu K. (2015). Multiple object recognition with visual attention / International Conference on Learning Representations (ICLR-2015) // https://arxiv.org/abs/1412.7755

2159

Vinyals V., Toshev A., Bengio S., Erhan D. (2015). Show and Tell: A Neural Image Caption Generator / 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) // https://doi.org/10.1109/CVPR.2015.7298935

2160

Xu K., Ba J. L., Kiros R., Cho K., Courville A., Salakhutdinov R., Zemel R. S., Bengio Y. (2015). Show, Attend and Tell: Neural Image Caption Generation with Visual Attention / Proceedings of the 32nd International Conference on International Conference on Machine Learning, Vol. 37, pp. 2048—2057 // http://proceedings.mlr.press/v37/xuc15.pdf

2161

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser L., Polosukhin I. (2017). Attention Is All You Need / Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017) // https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

2162

Schmidhuber J. (1991). Learning to control fast-weight memories: An alternative to recurrent nets. Technical Report FKI147-91, Institut für Informatik, Technische Universität München, March 1991 // https://people.idsia.ch/~juergen/FKI-147-91ocr.pdf

2163

Schmidhuber J. (1992). Learning to control fast-weight memories: An alternative to dynamic recurrent networks / Neural Computation, Vol. 4, Iss. 1, pp. 131–139 // https://doi.org/10.1162/neco.1992.4.1.131

2164

Schmidhuber J. (1993). Reducing the ratio between learning complexity and number of time varying variables in fully recurrent nets. / International Conference on Artificial Neural Networks (ICANN), pp. 460–463 // https://doi.org/10.1007/978-1-4471-2063-6_110

2165

Schlag I., Irie K., Schmidhuber J. (2021). Linear Transformers Are Secretly Fast Weight Programmers // https://arxiv.org/abs/2102.11174

2166

Devlin J., Chang M.-W., Lee K., Toutanova K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // https://arxiv.org/abs/1810.04805

2167

Shaw P., Uszkoreit J., Vaswani A. (2018). Self-Attention with Relative Position Representations // https://arxiv.org/abs/1803.02155

2168

Huang C.-Z. A., Vaswani A., Uszkoreit J., Shazeer N., Simon I., Hawthorne C., Dai A. M., Hoffman M. D., Dinculescu M., Eck D. (2018). Music Transformer // https://arxiv.org/abs/1809.04281

2169

Su J., Lu Y., Pan S., Murtadha A., Wen B., Liu Y. (2021). RoFormer: Enhanced Transformer with Rotary Position Embedding // https://arxiv.org/abs/2104.09864

2170

Sun Y., Dong L., Patra B., Ma S., Huang S., Benhaim A., Chaudhary V., Song X., Wei F. (2022). A Length-Extrapolatable Transformer // https://arxiv.org/abs/2212.10554

2171

Press O., Smith N. A., Lewis M. (2021). Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation // https://arxiv.org/abs/2108.12409

2172

Kazemnejad A., Padhi I., Ramamurthy K. N., Das P., Reddy S. (2023). The Impact of Positional Encoding on Length Generalization in Transformers // https://arxiv.org/abs/2305.19466

2173

Lan Z., Chen M., Goodman S., Gimpel K., Sharma P., Soricut R. (2019). ALBERT: A Lite BERT for Self-supervised Learning of Language Representations // https://arxiv.org/abs/1909.11942

2174

Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., Levy O., Lewis M., Zettlemoyer L., Stoyanov V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach // https://arxiv.org/abs/1907.11692

2175

McCann B., Bradbury J., Xiong C., Socher R. (2017). Learned in Translation: Contextualized Word Vectors // https://arxiv.org/abs/1708.00107

2176

Peters M. E., Neumann M., Iyyer M., Gardner M., Clark C., Lee K., Zettlemoyer L. (2018). Deep contextualized word representations // https://arxiv.org/abs/1802.05365

2177

Howard J., Ruder S. (2018). Universal Language Model Fine-tuning for Text Classification // https://arxiv.org/abs/1801.06146

2178

Radford A., Narasimhan K., Salimans T., Sutskever I. (2018). Improving Language Understanding by Generative Pre-Training // https://paperswithcode.com/paper/improving-language-understanding-by

2179

Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I. (2019). Language Models are Unsupervised Multitask Learners // https://paperswithcode.com/paper/language-models-are-unsupervised-multitask

2180

Brown T. B., Mann B., Ryder N., Subbiah M., Kaplan J., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A., Agarwal S., Herbert-Voss A., Krueger G., Henighan T., Child R., Ramesh A., Ziegler D. M., Wu J., Winter C., Hesse C., Chen M., Sigler E., Litwin M., Gray S., Chess B., Clark J., Berner C., McCandlish S., Radford A., Sutskever I., Amodei D. (2020). Language Models are Few-Shot Learners // https://arxiv.org/abs/2005.14165

2181

Raffel C., Shazeer N., Roberts A., Lee K., Narang S., Matena M., Zhou Y., Li

1 ... 438 439 440 441 442 443 444 445 446 ... 482

Перейти на страницу:

Тайна древнего бальзама мумие-асиль - Адыль Шарипович Шакиров

2021
Медицина

ООО "Кремль". Трест, который лопнет - Андрей Колесников

2021
Политика

Психология согласия. Революционная методика пре-убеждения - Роберт Бено Чалдини

2021
Разная литература / Бизнес / Психология

Фантастика 2024-84 - Константин Давидович Мзареулов

2021
Научная фантастика / Разная литература

Биология для тех, кто хочет понять и простить самку богомола - Андрей Шляхов

2021
Домашняя

Комментарии

Минимальная длина комментария - 20 знаков. Уважайте себя и других!

Комментариев еще нет. Хотите быть первым?

Смотрите также:

Тайна древнего бальзама мумие-асиль - Адыль Шарипович Шакиров

ООО &quot;Кремль&quot;. Трест, который лопнет - Андрей Колесников

Психология согласия. Революционная методика пре-убеждения - Роберт Бено Чалдини

Фантастика 2024-84 - Константин Давидович Мзареулов

Биология для тех, кто хочет понять и простить самку богомола - Андрей Шляхов

ООО "Кремль". Трест, который лопнет - Андрей Колесников