https://dx.doi.org/10.1108/AJIM-03-2022-0141">
 

Creative Commons License

Creative Commons Attribution-Noncommercial-Share Alike 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

Abstract

Purpose

The purpose of this paper is to present a preliminary work on extracting band gap information of materials from academic papers. With increasing demand for renewable energy, band gap information will help material scientists design and implement novel photovoltaic (PV) cells.

Design/methodology/approach

The authors collected 1.44 million titles and abstracts of scholarly articles related to materials science, and then filtered the collection to 11,939 articles that potentially contain relevant information about materials and their band gap values. ChemDataExtractor was extended to extract information about PV materials and their band gap information. Evaluation was performed on randomly sampled information records of 415 papers.

Findings

The findings of this study show that the current system is able to correctly extract information for 51.32% articles, with partially correct extraction for 36.62% articles and incorrect for 12.04%. The authors have also identified the errors belonging to three main categories pertaining to chemical entity identification, band gap information and interdependency resolution. Future work will focus on addressing these errors to improve the performance of the system.

Originality/value

The authors did not find any literature to date on band gap information extraction from academic text using automated methods. This work is unique and original. Band gap information is of importance to materials scientists in applications such as solar cells, light emitting diodes and laser diodes.

Department

Computer Science

Publication Date

2022

Journal Title

Aslib Journal of Information Management

Publisher

Emerald Publishing Limited

Digital Object Identifier (DOI)

https://dx.doi.org/10.1108/AJIM-03-2022-0141

Document Type

Article

Rights

Copyright © 2022, Emerald Publishing Limited

Comments

This is an accepted manuscript published by Emerald Publishing in 2022 in Aslib Journal of Information Management, available online: https://doi.org/10.1108/AJIM-03-2022-0141

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.