Module: wordcloud¶

NOTE Brand new module! This has lots of random parameters that seem to make a good picture.

The hashtag tends to dominate the graph. I like that because it serves as like a title or anchoring word. But some folks want to see it without the hashtag itself dominating. So there's a config option hashtag_fix that takes one of 3 values. (Default if omitted is as-is). In this section, I show the same data set from Kung-Fu Saturday, 7 December 2024 visualized 3 different ways.

Alt Text Generation¶

As of version 1.2.0, the wordcloud module now automatically generates descriptive alt text for each wordcloud image. This alt text includes:

The hashtag and date of the analysis
Total number of unique words in the wordcloud
Top 10 most frequent words with their counts
Information about hashtag treatment method (as-is, remove, reduce)
List of any custom stop words that were used

The alt text is saved to a text file with the same name as the wordcloud image but with a .txt extension. For example, if the wordcloud is saved as wordcloud-monsterdon-20250409-as-is.png, the alt text will be saved as wordcloud-monsterdon-20250409-as-is.txt.

This feature makes the wordclouds more accessible and provides a quick summary of the key words from the visualization.

Custom Stop Words¶

You can exclude specific words from appearing in your wordcloud by adding a stop_words parameter to the [wordcloud] section of your INI file. This is particularly useful for filtering out common words that aren't meaningful to your analysis.

To use this feature:

Add a stop_words parameter to the [wordcloud] section of your INI file
Provide a comma-separated list of words to exclude

For example:

[wordcloud]
graph_title  = Wordcloud
font         = /path/to/font.otf
size_x       = 1280
size_y       = 960
hashtag_fix  = remove
stop_words   = movie, film, watching, watch, tonight, scene, scenes, actor, actors

These words will be excluded from the wordcloud in addition to the default stop words and any other configured exclusions. This is especially useful for event-specific hashtags where certain common words might dominate the visualization without adding meaningful information.

`as-is`¶

Leave the hashtag alone.

kungfu saturday as-is

`remove`¶

Remove all instances of the hashtag kungfu saturday remove

`reduce`¶

Remove most (currently hard-coded at 90%) occurrences of the hashtag. It will still be popular enough to be quite large, but it won't dominate. In this example, "KungFuSat" is near the top right, in a dark purple.

kungfu saturday reduce

Synopsis¶

mastoscore --debug=info ini/monsterdon-20241201.ini wordcloud

Creates a file named {journaldir}/wordcloud-{journalfile}.png.

A Word about Emoji¶

While it is possible to make a word cloud that includes emoji, it's a bit complicated. See, it really boils down to the font and matplotlib's support for fonts. I think a lot of fancy word processing systems use multiple fonts (one for text, one for rendering symbols like emoji). But matplotlib needs a single font that has everything you want in it. The only one I have found like that is Symbola, which is OK, but the words themselves look pretty terrible. I think the right answer is probably to build emoji support into word_cloud itself to give it some emoji awareness and then use a different font for emojis. For now, I'm just dropping all emojis and punctuation.

Examples¶

Example Monsterdon

Code Reference¶

Module to take the data in from analysis and produce a wordcloud graphic.

`get_random_font(config)` ¶

Get a random font from the fonts list in config.

Source code in mastoscore/wordcloud.py

def get_random_font(config:ConfigParser) -> str:
    """Get a random font from the fonts list in config."""
    fonts_str = config.get("wordcloud", "fonts")
    fonts = [f.strip() for f in fonts_str.split(',')]
    return choice(fonts)

`write_wordcloud(config)` ¶

This is the only function, for now. It invokes get_toots_df() to get the DataFrame. Then it discards basically everything other than the content column. I post-process to remove some weird things (there's lots of emoji-like things). I also remove the hashtag itself, because it's obviously gonna have the highest frequency.

Parameters:

Name	Type	Description	Default
`config`	`ConfigParser`	A ConfigParser object from the config module	required

Config Parameters Used¶

Option	Description
`graph:journalfile`	Filename that forms the base of the graph's filename.
`graph:journaldir`	Directory where we will write the graph file
`fetch:hashtag`	Hashtag to search for
`wordcloud:font_path`	Path to fonts like Symbola
`wordcloud:hashtag_fix`	What to do with the main hashtag? 'reduce', 'remove', or 'as-is'
`wordcloud:size_x`	Size in pixels for the image. Default 1280
`wordcloud:size_y`	Size in pixels for the image. Default 960
`wordcloud:stop_words`	Comma-separated list of words to exclude
`mastoscore:event_year`	Year of the event (YYYY)
`mastoscore:event_month`	Month of the event (MM)
`mastoscore:event_day`	Day of the event (DD)

Returns:

Type	Description
`None`	None

Writes the graph to a file named wordcloud/wordcloud-hashtag-YYYYMMDD-hashtag_fix.png Writes alt text description to wordcloud/wordcloud-hashtag-YYYYMMDD-hashtag_fix.txt