Language Technology teaching

Language Technology study module (25 credits)

Organized together with the Department of Computing and the School of Languages and Translation Studies. No prior knowledge of language technology is needed. Students coming outside the Department of Computing will learn the basics of programming and automatic text processing during the first and second period courses so that they are able to continue to more advanced cources. All advanced courses are organized so that motivated students also outside Computing Department are able to complete the study module.

Courses

KKLT0030 Automaattinen tekstiprosessointi (5 op)

Teacher: Veronika Laippala, School of Languages and Translation studies

Language: Finnish

Time: Every year, first period

Update: The course starts on Monday 10.9. Classes on Mondays at 10.15 and Thursdays at 12.15 in A252, Arcanum, IT-luokka. Course materials here.

Level: Intermediate

After the course the student knows how to manipulate and analyze large corpora from command line. The student is familiar with various simple Unix tools, such as sorting and counting frequencies, using regular expressions, running loops and using pipes. Further, the student knows how to search for instructions in online manuals. The practical assignments prepare the students to apply the learned skills for instance in theses, and the learned skills are further developed in the more advanced courses of the studying module.

BIOI2250 Introduction to Programming (5–6 ECTS)

Teacher: Department of Computing

Language: English

Time: Every year, first and second period

Level: Intermediate

The course targets students with no prior programming experience. The students will acquire basic skills in algorithm design and programming, learning to write simple, practical programs in the Python programming language.

Students from the Department of Computing (TKT or DI) cannot take this course, you must take the programming courses meant for TKT or DI degree students.

TKO_7095 Introduction to Human Language Technology (5 op)

Teachers: Jenna Kanerva, Filip Ginter, Sampo Pyysalo, Department of Computing

Language: English

Time: Every year, fourth period (late spring)

Level: Advanced

The course introduces the use of human language as data in data analysis or machine learning, starting from the concepts of textual corpora and corpus annotation, and continuing to building simple language technology applications. The basic feature representation methods of textual data are explained and their practical implementations are shown while building machine learning pipelines for selected language technology applications. The course also introduces the notion of semantic textual similarity and semantic vector spaces of languages. Language modeling is introduced as a task to learn such representations. The course will also introduce several language technology applications, starting from elementary text processing such as segmentation, and continuing to selected end-user applications such as text classification and sentiment mining.

KKLT0031 Korpuksia ja kieliteknologiaa (5 op)

Teacher: Veronika Laippala, School of Languages and Translation studies

Language: Finnish

Time: Every year, spring

Level: Advanced

After the course the student is familiar with ready-made corpora from different fields, understands the importance of corpora in linguistics and knows how to avoid the most common problems in corpus compilation. Further, the student knows how to use corpus tools, such as Antconc and Wordsmith, is familiar with basic natural language processing tools and their functioning and understands the potentials of machine learning for language studies. In addition, the students learn methods to analyze large digital corpora. These include both traditional corpus linguistics methods and new possibilities offered by natural language processing, such as automatic syntactic analysis, distributional semantics, text classification and sentiment analysis. The studied corpora represent various languages and genres, such as social media, learner language and texts form different time periods.

TKO_8964 Textual Data Analysis (5 op)

Teachers: Filip Ginter, Sampo Pyysalo, Department of Computing

Language: English

Time: Every other year (odd years), third period (start of year)

Level: Advanced

The course focuses on practical applications of the methods introduced especially in the “Deep Learning in Human Language Technology” course to various text mining tasks as typically met in research and data science industry. Rather than introducing the inner workings of these methods, their practical applicability to real-world tasks is pursued.

  • Web crawls and other large collections of textual data, their usage and processing
  • Data sourcing, annotation, and quality control
  • Information retrieval and document search engines, including modern dense vector representations based on deep learning
  • Surface and semantic similarity of texts and its application in clustering
  • Document classification and text labeling in practical text mining tasks
  • Extraction of relations between entities in text

TKO_8965 Deep Learning in Human Language Technology (5 op)

Teacher: Filip Ginter, Sampu Pyysalo, Department of Computing

Language: English

Time: Starting from 2023 every year, first period (autumn)

Level: Advanced

The course introduces the use of basic neural network architectures to textual data and focuses in detail on the application of state-of-the-art architectures and models to a range of natural language processing tasks. Exercises and the course project emphasize practical skills in training neural networks to address a range of tasks and the use of deep learning-based models as components of practical systems and provide students with the skills to train models in GPU-accelerated environments.