Шрифт:
Интервал:
Закладка:
2699
Wei J., Wang X., Schuurmans D., Bosma M., Ichter B., Xia F., Chi E., Le Q., Zhou D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models // https://arxiv.org/abs/2201.11903
2700
Yao S., Yu D., Zhao J., Shafran I., Griffiths T. L., Cao Y., Narasimhan K. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models // https://arxiv.org/abs/2305.10601
2701
Besta M., Blach N., Kubicek A., Gerstenberger R., Gianinazzi L., Gajda J., Lehmann T., Podstawski M., Niewiadomski H., Nyczyk P., Hoefler T. (2023). Graph of Thoughts: Solving Elaborate Problems with Large Language Models // https://arxiv.org/abs/2308.09687
2702
Dehghani M., Gouws S., Vinyals O., Uszkoreit J., Kaiser Ł. (2018). Universal Transformers // https://arxiv.org/abs/1807.03819
2703
Wang Z., Ma Y., Liu Z., Tang J. (2019). R-Transformer: Recurrent Neural Network Enhanced Transformer // https://arxiv.org/abs/1907.05572
2704
Dai Z., Yang Z., Yang Y., Carbonell J., Le Q. V., Salakhutdinov R. (2019). Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context // https://arxiv.org/abs/1901.02860
2705
Giannou A., Rajput S., Sohn J.-Y., Lee K., Lee J. D., Papailiopoulos D. (2023). Looped Transformers as Programmable Computers // https://arxiv.org/abs/2301.13196
2706
Graves A. (2016). Adaptive Computation Time for Recurrent Neural Networks // https://arxiv.org/abs/1603.08983
2707
Fojo D., Campos V., Giro-i-Nieto X. (2018). Comparing Fixed and Adaptive Computation Time for Recurrent Neural Networks // https://arxiv.org/abs/1803.08165
2708
Sapunov G. (2019). Adaptive Computation Time (ACT) in Neural Networks // https://moocaholic.medium.com/adaptive-computation-time-act-in-neural-networks-part-1-2a28484b53df
2709
Orvieto A., Smith S. L., Gu A., Fernando A., Gulcehre C., Pascanu R., De S. (2023). Resurrecting Recurrent Neural Networks for Long Sequences // https://arxiv.org/abs/2303.06349
2710
Peng B., Alcaide E., Anthony Q., Albalak A., Arcadinho S., Cao H., Cheng X., Chung M., Grella M., GV K. K., He X., Hou H., Kazienko P., Kocon J., Kong J., Koptyra B., Lau H., Mantri K. S. I., Mom F., Saito A., Tang X., Wang B., Wind J. S., Wozniak S., Zhang R., Zhang Z., Zhao Q., Zhou P., Zhu J., Zhu R. (2023). Reinventing RNNs for the Transformer Era // https://arxiv.org/abs/2305.13048
2711
Fu D. Y., Dao T., Saab K. K., Thomas A. W., Rudra A., Ré C. (2022). Hungry Hungry Hippos: Towards Language Modeling with State Space Models // https://arxiv.org/abs/2212.14052
2712
Gu A., Goel K., Ré C. (2021). Efficiently Modeling Long Sequences with Structured State Spaces // Статья: https://arxiv.org/abs/2111.00396
2713
Gu A., Johnson I., Timalsina A., Rudra A., Ré C. (2022). How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections // https://arxiv.org/abs/2206.12037
2714
Hasani R., Lechner M., Wang T.-H., Chahine M., Amini A., Rus D. (2022). Liquid Structural State-Space Models // https://arxiv.org/abs/2209.12951
2715
Gu A., Gupta A., Goel K., Ré C. (2022). On the Parameterization and Initialization of Diagonal State Space Models // https://arxiv.org/abs/2206.11893
2716
Smith J. T. H., Warrington A., Linderman S. W. (2022). Simplified State Space Layers for Sequence Modeling // https://arxiv.org/abs/2208.04933
2717
Sun Y., Dong L., Huang S., Ma S., Xia Y., Xue J., Wang J., Wei F. (2023). Retentive Network: A Successor to Transformer for Large Language Models // https://arxiv.org/abs/2307.08621
2718
Thoppilan R., Freitas D. D., Hall J., Shazeer N., Kulshreshtha A., Cheng H., Jin A., Bos T., Baker L., Du Y., Li Y., Lee H., Zheng H. S., Ghafouri A., Menegali M., Huang Y., Krikun M., Lepikhin D., Qin J., Chen D., Xu Y., Chen Z., Roberts A., Bosma M., Zhao V., Zhou Y., Chang C., Krivokon I., Rusch W., Pickett M., Srinivasan P., Man L., Meier-Hellstern K., Morris M. R., Doshi T., Santos R. D., Duke T., Soraker J., Zevenbergen B., Prabhakaran V., Diaz M., Hutchinson B., Olson K., Molina A., Hoffman-John E., Lee J., Aroyo L., Rajakumar R., Butryna A., Lamm M., Kuzmina V., Fenton J., Cohen A., Bernstein R., Kurzweil R., Aguera-Arcas B., Cui C., Croak M., Chi E., Le Q. (2022). LaMDA: Language Models for Dialog Applications // https://arxiv.org/abs/2201.08239
2719
Schick T., Dwivedi-Yu J., Dessì R., Raileanu R., Lomeli M., Zettlemoyer L., Cancedda N., Scialom T. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools // https://arxiv.org/abs/2302.04761
2720
Hao S., Liu T., Wang Z., Hu Z. (2023). ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings // https://arxiv.org/abs/2305.11554
2721
Shen Y., Song K., Tan X., Li D., Lu W., Zhuang Y. (2023). HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face // https://arxiv.org/abs/2303.17580
2722
Patil S. G., Zhang T., Wang X., Gonzalez J. E. (2023). Gorilla: Large Language Model Connected with Massive APIs // https://arxiv.org/abs/2305.15334
2723
OpenAI (2023). ChatGPT plugins // https://openai.com/blog/chatgpt-plugins
2724
* Сегодня для такого синтеза часто используют термин «генерация, дополненная поиском» (Retrieval-augmented Generation, RAG).
2725
Schlag I., Sukhbaatar S., Celikyilmaz A., Yih W.-t., Weston J., Schmidhuber J., Li X. (2023). Large Language Model Programs // https://arxiv.org/abs/2305.05364
2726
Heafield K. (2011). KenLM: Faster and Smaller Language Model Queries // https://kheafield.com/papers/avenue/kenlm.pdf
2727
Borgeaud S., Mensch A., Hoffmann J., Cai T., Rutherford E., Millican K., van den Driessche G., Lespiau J.-B., Damoc B., Clark A., de Las Casas D., Guy A., Menick J., Ring R., Hennigan T., Huang S., Maggiore L., Jones C., Cassirer A., Brock A., Paganini