Substances in database
Example sheet
Database access
Last subscription

Topic modelling and text classification models for applications within EFSA

Undefined

This report presents an overview of topic modelling and classification models in relation to four case studies in the EFSA project OC/EFSA/AMU/2020/02. As adequate document embeddings have a positive influence on the effectiveness of topic modelling as well as text classification, an extensive number of different possibilities for word and document embeddings are discussed. It was found that a multitude of increasingly more complex embeddings are readily available for off-the-shelf use. But as they are trained on large but mostly general text corpora, their utility for domain specific text varies. Fine tuning or creating document embeddings from scratch is only feasible in the presence of enough data and has an associated computational cost. For some domains (like scientific articles), pretrained embeddings are available. For topic modelling, we discuss standard techniques like non-negative matrix factorization and latent Dirichlet allocation as well as more recent methods based on clustering of document embeddings like Top2Vec and BERTopic. For text classification, we consider hierarchical text classification approaches combined with established techniques for text classification via document embeddings. We propose a selection of techniques for each of the case studies justifying their choice and present a plan for evaluation. Finally, we discuss our findings after having implemented and validated the selected techniques

Last Tweets

06/06/2016 - 11:11
Nuovo componente inserito: Simmondsia Chinensis (Jojoba) Seed Oil https://t.co/mGvGLRQ05k
29/05/2016 - 10:04
Nuovo componente inserito: Linum Usitatissimum seed oil Error
27/05/2016 - 16:38
Nuovo componente inserito: Soybean oil https://t.co/9prxB7xyiu

NOAEL-crew

Project coordinator

Content revision

Toxicological datasheet creation

Website Administration

Subscriptions & Contacts

To get more info about the NOAEL project, go to FAQ section or contact Roberto Narducci at this address:
 
infoATnoaelproject.it or robymagnoATgmail.com
 
For info about subscription costs click the following link or contact Federica Cambiganu:
 
federica.cambiganuATpin.unifi.it
+390574602511
PIN s.c.r.l. - Piazza Giovanni Ciardi 25, 59100 - Prato