AUTOMATISERT KLASSIFIKASJON AV NORSKE MÅLFORMER VHA. DATAUTVINNING AV UANNOTERT TEKST

ØVERLAND, Fartein Thorsen

AUTOMATISERT KLASSIFIKASJON AV NORSKE MÅLFORMER VHA. DATAUTVINNING AV UANNOTERT TEKST

dc.coverage	STUDIA UBB PHILOLOGIA, Volume 65 (LXV), No. 3, September 2020, pp. 107-124 DOI: 10.24193/subbphilo.2020.3.08	en-US
dc.creator	ØVERLAND, Fartein Thorsen
dc.date	2020-09-30
dc.date.accessioned	2026-02-21T08:39:01Z
dc.description	Automated classification of Variants of Norwegian by Means of Text Mining of Unannotated Text. This article presents a model for automatically classifying different variants of modern Norwegian Language (bokmål and nynorsk ranging from 1930 to 2011) by means of data mining unannotated text. The model is built in the Orange visual programming interface, and is based on a modification of an example model presented by the project which had the original purpose of semantical classification of fairy tale types in the Aarne-Thompson-Uther Index. The core modules of the model are Bag-of-Words and Logistic Regression. The model is trained with four different translations of the Gospel of John, and cross validated with various random texts. The model is proven to be very sound for classification of Norwegian language variation, and yields correct classification in 100% of the realistic tests. REZUMAT. Clasificare automatizată a diferitelor variante de norvegiană utilizând extragerea digitalizată a textelor neanotate. Acest articol prezintă un model pentru clasificarea automată a diferitelor variante ale limbii norvegiene moderne (bokmål și nynorsk, între 1930 și 2011) cu ajutorul extragerii automatizate a textului neanotat. Modelul este construit în interfața de programare vizuală Orange și se bazează pe modificarea unui model-exemplu prezentat de proiect, care a avut ca scop inițial clasificarea semantică a tipurilor de povești din indexul Aarne-Thompson-Uther. Modulele de bază ale modelului sunt Bag-of-Words și Regresie logistică. Modelul este axat pe patru traduceri diferite ale Evangheliei lui Ioan și este validat de alegerea aleatorie a fragmentelor. Modelul s-a dovedit a fi foarte solid pentru clasificarea variației limbii norvegiene și obține o clasificare corectă în 100% din testări. Cuvinte cheie: variația limbii, extragerea digitalizată, interfața de programare Orange, clasificarea textelor, Bag-of-Words, regresie logistică, model predictibil, limbă norvegiană, nynorsk, bokmål	en-US
dc.format	application/pdf
dc.identifier	https://studia.reviste.ubbcluj.ro/index.php/subbphilologia/article/view/1901
dc.identifier	10.24193/subbphilo.2020.3.08
dc.identifier.uri	https://hdl.handle.net/20.500.14637/381
dc.language	eng
dc.publisher	Babeș-Bolyai University / Cluj University Press	en-US
dc.relation	https://studia.reviste.ubbcluj.ro/index.php/subbphilologia/article/view/1901/1827
dc.rights	Copyright (c) 2020 Studia Universitatis Babeș-Bolyai Philologia	en-US
dc.rights	https://creativecommons.org/licenses/by-nc-nd/4.0	en-US
dc.source	Studia Universitatis Babeș-Bolyai Philologia; Volume 65, No. 3, 2020; 107-124	en-US
dc.source	2065-9652
dc.source	1220-0484
dc.source	10.24193/subbphilo.2020.3
dc.subject	Language Variation, Text mining, Orange Data Mining, Text Clustering, Text Classification, Bag-of-Words, Logistic Regression, Predictive Model, Norwegian Language, Nynorsk, Bokmål.	en-US
dc.title	AUTOMATISERT KLASSIFIKASJON AV NORSKE MÅLFORMER VHA. DATAUTVINNING AV UANNOTERT TEKST	en-US
dc.type	info:eu-repo/semantics/article
dc.type	info:eu-repo/semantics/publishedVersion
dc.type	text	en-US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 1827.pdf
Size:: 1.07 MB
Format:: Adobe Portable Document Format
Description:: PDF imported from OJS (https://studia.reviste.ubbcluj.ro/index.php/subbphilologia/article/download/1901/1827)

Download

Collections

Studia Universitatis Babeș-Bolyai Philologia