VECTORIZATION OF REGULATORY - REFERENCE INFORMATION USING THE BERT NEURAL NETWORK | Информационные технологии и математическое моделирование в управлении сложными системами

Authors:

Saygin Andrey Aleksandrovich

Plotnikova Natalya Pavlovna

Receipt date:

29.04.2021

Section:

2. Information technology in the management of technical and socio-economic objects

Year:

2021

Journal number:

2(10) 2021

УДК:

004.032.26

DOI:

10.26731/2658-3704.2021.2(10).52-59

Article File:

vectorization_of_regulatory-reference_information.pdf

Pages:

Abstract:

The article describes a vectorization normative and reference information using Bidirectional Encoder Representations from Transformers (BERT) – a neural network for natural language processing. The architecture of the transformer neural network and the principle of its operation are considered. The architecture of the neural network BERT is described, its use with the Transformers library. An example of a program code for using the model in practice is given. The work of several models based on the described architecture supporting the Russian language is assessed by the method of determining the similarity of words. The compilation of a dataset for evaluating the performance of models is described. The results of evaluating the performance of different models are compared.

Keywords:

BERT

transformers

natural language processing

vectorization

method of determining similarity

correlation

List of references:

1. Devlin J. et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding //arXiv preprint arXiv:1810.04805v2 – 2019.

2. Vaswani A. et al. Attention Is All You Need //arXiv preprint arXiv:1706.03762v5 – 2017.

3. Wolf T. et al. Transformers: State-of-the-Art Natural Language Processing //arXiv preprint arXiv:1910.03771v5 – 2019.

4. The Hugging Face Team. Transformers [Электронный ресурс]. – Режим доступа: https://huggingface.co/transformers/index.html, свободный. – (дата обращения: 08.02.2021).

5. Kalyan KS Sangeetha S. SECNLP: A Survey of Embeddings in Clinical Natural Language Processing //arXiv preprint arXiv:1903.01039v4 – 2020.

6. Reimers N. Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks //arXiv preprint arXiv:1908.10084v1 – 2019.