Welcome to GriCo homepage. GriCo is a annotated corpus containing the materials published on the Movimento 5 Stelle blog, created and maintained by a multidisciplinary team. It is available in different formats and with different types of annotations.

Git Repo View Documentation

Overview

XML-ready

Available as annotated XML, including text’s details from the website.

Diachronic

Contains all materials from 2005 to 2018.

Structured

The metadata structure preserves the original materials’ hierarchy.

POS tagged

Choose between the version tagged with TreeTagger or spacy.io.

≈ 447mln tokens

≈ 7mln tokens for posts, ≈ 440mln tokens for comments.

Plain .txt

Also available in plain .txt format.

Team

Researchers

Avatar

Dario Del Fante

Researcher, PhD Candidate

Corpus Linguistics, CADS (Corpus-Assisted Discourse Studies), Language and Media Discourse, Populism and Cyber-Populism, Linguistics and Conceptual Metaphor Theory, Language and Ideology, Migration Studies

Avatar

Federica Formato

Researcher, Lecturer

Gendered Language, Media Studies

Avatar

Virginia Zorzi

Researcher

Applied Linguistics, Public Communication, Science and Technology

Avatar

Angela Zottola

Researcher

Language, Gender and Sexuality Studies, Queer Linguistics, Media Discourse, Ecolinguistics, Corpus Linguistics, CADS (Corpus-Assisted Discourse Studies)

Roadmap

GriCo v1 online

First version of GriCo pre-loaded online

XML corpus

Creation of the XML corpus

Sampler corpus

Creation of the plain-text format sample corpus

Data collection

Collecting the raw data (html) from ilblogdellestelle.it

Recent Posts

Short intro to the corpus and the website.

Recent & Upcoming Talks

The GriCo Project - A Web Corpus of the Italian Populist Movimento 5 Stelle

Contact

Questions? Send us an email!

Tweets