Domain-specific Word2vec-based trained models - Chem300, Phys300, MatSci200, MatSci300, and Mixed300
Апстракт
The repository contains materials science, chemistry, and physics-specialized unsupervised trained models. Word embeddings are generated by means of the Word2vec, a natural language processing technique comprised of language model architectures for fast and efficient learning of distributed representations of words. Continuous Skip-gram model architecture with a negative sampling strategy, as implemented in the Gensim library, is employed for model training. The word embeddings consisting of 200 and 300 vectorial components for materials science and 300 vectorial components for chemistry, physics, and mixed domain are here provided.
Кључне речи:
word2vec model / In-silico materials design / Word embeddings as autonomous predictors / Static word embeddings / Word embeddings variability / Materials stability / Materials informatics / Digital design / Cheminformatics / Natural language processingИзвор:
figshare, 2025Напомена:
- This digital object is hosted on the Figshare server due to its size and is available under the Creative Commons Attribution 4.0 International License.
Колекције
Институција/група
VinčaTY - GEN AU - Radaković, Jana AU - Batalović, Katarina PY - 2025 UR - https://vinar.vin.bg.ac.rs/handle/123456789/15178 AB - The repository contains materials science, chemistry, and physics-specialized unsupervised trained models. Word embeddings are generated by means of the Word2vec, a natural language processing technique comprised of language model architectures for fast and efficient learning of distributed representations of words. Continuous Skip-gram model architecture with a negative sampling strategy, as implemented in the Gensim library, is employed for model training. The word embeddings consisting of 200 and 300 vectorial components for materials science and 300 vectorial components for chemistry, physics, and mixed domain are here provided. T2 - figshare T1 - Domain-specific Word2vec-based trained models - Chem300, Phys300, MatSci200, MatSci300, and Mixed300 DO - 10.6084/m9.figshare.28740122.v1 ER -
@misc{
author = "Radaković, Jana and Batalović, Katarina",
year = "2025",
abstract = "The repository contains materials science, chemistry, and physics-specialized unsupervised trained models. Word embeddings are generated by means of the Word2vec, a natural language processing technique comprised of language model architectures for fast and efficient learning of distributed representations of words. Continuous Skip-gram model architecture with a negative sampling strategy, as implemented in the Gensim library, is employed for model training. The word embeddings consisting of 200 and 300 vectorial components for materials science and 300 vectorial components for chemistry, physics, and mixed domain are here provided.",
journal = "figshare",
title = "Domain-specific Word2vec-based trained models - Chem300, Phys300, MatSci200, MatSci300, and Mixed300",
doi = "10.6084/m9.figshare.28740122.v1"
}
Radaković, J.,& Batalović, K.. (2025). Domain-specific Word2vec-based trained models - Chem300, Phys300, MatSci200, MatSci300, and Mixed300. in figshare. https://doi.org/10.6084/m9.figshare.28740122.v1
Radaković J, Batalović K. Domain-specific Word2vec-based trained models - Chem300, Phys300, MatSci200, MatSci300, and Mixed300. in figshare. 2025;. doi:10.6084/m9.figshare.28740122.v1 .
Radaković, Jana, Batalović, Katarina, "Domain-specific Word2vec-based trained models - Chem300, Phys300, MatSci200, MatSci300, and Mixed300" in figshare (2025), https://doi.org/10.6084/m9.figshare.28740122.v1 . .


