Proceedings of International Conference on Applied Innovation in IT  ·  2026/04/22  ·  Vol. 14  ·  Issue 2  ·  pp. 79–86
A Method Based on NER and CRF for Extracting Named Entities from Text and Textual Representations of Chemical Reactions
Anna Vasileva and Natalia Evstifeeva
The rapid growth of scientific and technological publications has increased the demand for automated methods capable of extracting structured knowledge from unstructured textual data. This problem is particularly relevant for chemical and technological texts, where essential information is often represented through chemical reactions, physical formulas, and domain-specific terminology that standard natural language processing techniques handle poorly. This paper proposes a hybrid information extraction method that combines Named Entity Recognition (NER), a rule-based MiningNOUN algorithm, and Conditional Random Fields (CRF) to improve the identification of entities and relationships in domain-specific scientific texts. The proposed approach integrates statistical and ontological principles, enabling the recognition of substances, processes, physical quantities, and formally structured expressions that are typically missed by baseline NER models. The method was evaluated on a corpus of chemical and technological texts describing experimental procedures and reaction processes. The results show that the combined NER + MiningNOUN + CRF configuration significantly increases the coverage of extracted entities compared to a standard NER pipeline, allowing the system to capture information expressed in both natural language and formal notation. The extracted entities and relations are integrated into an ontological knowledge graph compliant with RDF/OWL standards and further applied within a Retrieval-Augmented Generation (RAG) architecture. The proposed method supports the development of more reliable knowledge graphs for intelligent scientific data processing and can be adapted to other technical domains with complex symbolic representations.
Named Entity Recognition Conditional Random Fields Ontology-Based Information Extraction Knowledge Graph Construction Scientific Text Processing.
References
  1. L. Ehrlinger and W. Wöß, “Towards a definition of knowledge graphs,” in Proc. SEMANTiCS 2016 Conf., 2016.
  2. A. Hogan et al., “Knowledge graphs,” in ACM Computing Surveys, vol. 54, no. 4, pp. 1-37, 2021, [Online]. Available: https://doi.org/10.1145/3447772.
  3. S. L. Dixon, K. R. M. Mackay, and A. A. Butler, “Extracting chemical reactions from text using Snorkel,” in BMC Bioinformatics, vol. 21, no. 1, pp. 1-14, 2020, [Online]. Available: https://doi.org/10.1186/s12859-020-03542-1.
  4. Y. Chen, M. Sun, and J. Zhao, “OpenChemIE: An information extraction toolkit for chemistry literature,” arXiv:2404.01462, 2024.
  5. S. I. Sanabria and T. N. Hart, “Suitability of large language models for extraction of high-quality chemical reaction dataset from patent literature,” in Journal of Cheminformatics, vol. 16, no. 1, pp. 1-12, 2024, [Online]. Available: https://doi.org/10.1186/s13321-024-00928-8.
  6. D. Bekoulis, J. Deleu, T. Demeester, and C. Develder, “Joint entity recognition and relation extraction as a multi-head selection problem,” arXiv:1804.07847, 2018, doi: 10.48550/arXiv.1804.07847.
  7. J. Ferreira, R. Martins, and M. Araújo, “Ontology-driven extraction of contextualized information,” in Proc. ICAART 2023, vol. 3, pp. 642-649, 2023.
  8. İ. Karadeniz and A. Özgür, “Linking entities through an ontology using word embeddings and syntactic re-ranking,” in BMC Bioinformatics, vol. 20, no. 1, p. 156, 2019, [Online]. Available: https://doi.org/10.1186/s12859-019-2678-8.
  9. M. Y. Jaradeh et al., “Information extraction pipelines for knowledge graphs,” in Knowledge and Information Systems, 2023, [Online]. Available: https://doi.org/10.1007/s10115-022-01826-x.
  10. Q. Qiu et al., “Integrating NLP and ontology matching into a unified system for automated information extraction from geological hazard reports,” in Journal of Earth Science, vol. 34, no. 5, pp. 1433-1446, 2023, [Online]. Available: https://doi.org/10.1007/s12583-022-1716-z.
  11. Z. Han and J. Wang, “Knowledge enhanced graph inference network based entity-relation extraction and knowledge graph construction for industrial domain,” in Frontiers of Engineering Management, vol. 11, no. 1, pp. 143-158, 2024, [Online]. Available: https://doi.org/10.1007/s42524-023-0273-1.
  12. K. Kozaki et al., “Role representation model using OWL and SWRL,” in Proc. Workshop on Roles and Relationships in Object-Oriented Programming, Multiagent Systems, and Ontologies, 2007, pp. 39-46.
  13. L. Massel, T. Vorozhtsova, and N. I. Pjatkova, “Ontology engineering to support strategic decision-making in the energy sector,” in Ontology of Designing, vol. 7, pp. 66-76, 2017, [Online]. Available: https://doi.org/10.18287/2223-9537-2017-7-1-66-76.
  14. S. Yu, “Application of artificial intelligence methods in knowledge graphs,” in Applied and Computational Engineering, vol. 106, pp. 52-58, 2024, [Online]. Available: https://doi.org/10.54254/2755-2721/106/20241287.
  15. D. Vrandecic and M. Krötzsch, “Wikidata: A free collaborative knowledge base,” in Communications of the ACM, vol. 57, no. 10, pp. 78-85, 2014, [Online]. Available: https://doi.org/10.1145/2629489.
  16. M. Nickel et al., “A review of relational machine learning for knowledge graphs,” in Proceedings of the IEEE, vol. 104, no. 1, pp. 11-33, 2016, [Online]. Available: https://doi.org/10.1109/JPROC.2015.2483592.
  17. R. Speer, J. Chin, and C. Havasi, “ConceptNet 5.5: An open multilingual graph of general knowledge,” in Proc. AAAI Conf. Artificial Intelligence, 2017, [Online]. Available: https://doi.org/10.1609/aaai.v31i1.11164.
  18. N. Reimers and I. Gurevych, “Sentence-BERT: Sentence embeddings using Siamese BERT-networks,” arXiv:1908.10084, 2019.
  19. F. Souza, R. Nogueira, and R. Lotufo, “BERTimbau: Pretrained BERT models for Brazilian Portuguese NLP,” arXiv:1909.10649, 2019.
ICAIIT 2026
International Conference on Applied Innovation in IT
Bringing together researchers, engineers and practitioners to share advances in applied information technology.
Submission deadline
September 29, 2026
Paper acceptance
November 2, 2026
Journal publication
November 30, 2026
Next conference
March 11, 2027 · Köthen, Germany
© 2026 ICAIIT · Anhalt University of Applied Sciences ISSN 2198-8005 (online)

Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0  ·  This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License