VECTORIZATION OF REGULATORY - REFERENCE INFORMATION USING THE BERT NEURAL NETWORK | Информационные технологии и математическое моделирование в управлении сложными системами

Авторы:

Saygin Andrey Aleksandrovich

Plotnikova Natalya Pavlovna

Дата поступления:

29.04.2021

Рубрика:

2. Information technology in the management of technical and socio-economic objects

Год:

2021

Номер журнала (Том):

2(10) 2021

УДК:

004.032.26

DOI:

10.26731/2658-3704.2021.2(10).52-59

Файл статьи:

vectorization_of_regulatory-reference_information.pdf

Страницы:

Аннотация:

The article describes a vectorization normative and reference information using Bidirectional Encoder Representations from Transformers (BERT) – a neural network for natural language processing. The architecture of the transformer neural network and the principle of its operation are considered. The architecture of the neural network BERT is described, its use with the Transformers library. An example of a program code for using the model in practice is given. The work of several models based on the described architecture supporting the Russian language is assessed by the method of determining the similarity of words. The compilation of a dataset for evaluating the performance of models is described. The results of evaluating the performance of different models are compared.

Ключевые слова:

BERT

transformers

natural language processing

vectorization

method of determining similarity

correlation

Список цитируемой литературы:

1. Devlin J. et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding //arXiv preprint arXiv:1810.04805v2 – 2019.

2. Vaswani A. et al. Attention Is All You Need //arXiv preprint arXiv:1706.03762v5 – 2017.

3. Wolf T. et al. Transformers: State-of-the-Art Natural Language Processing //arXiv preprint arXiv:1910.03771v5 – 2019.

4. The Hugging Face Team. Transformers [Электронный ресурс]. – Режим доступа: https://huggingface.co/transformers/index.html, свободный. – (дата обращения: 08.02.2021).

5. Kalyan KS Sangeetha S. SECNLP: A Survey of Embeddings in Clinical Natural Language Processing //arXiv preprint arXiv:1903.01039v4 – 2020.

6. Reimers N. Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks //arXiv preprint arXiv:1908.10084v1 – 2019.