WordCloud

小贾嗯嗯2021-02-252022-09-27

WordCloud

class  wordcloud.WordCloud(

font_path=None, #(string)字体OTF or TTF路径，需要展现什么字体就把该字体路径+后缀名写上，如：font_path = '黑体.ttf'

width=400,#(int)输出画布的宽度，默认400像素

height=200,(int) #输出画布的宽度，默认200像素

margin=2, #(int)画布边缘留白的空隙，默认留白空间是2像素

ranks_only=None, 

prefer_horizontal=0.9, #(float)词语水平方向排版出现的频率，默认 0.9 ,所以词语垂直方向排版出现频率为 0.1 

mask=None, #(nd-array or None) 是否使用mask（蒙板），默认不使用。若使用mask，则需提供一个二值化的mask（即只有0和1的黑白色mask），此时参数width和height会被忽略，单词会出现在mask非白色（#FFFFFF）的位置上。

scale=1, #(float)在字段width和height乘以的倍数，最终呈现的画布尺寸以这个结果。默认是1，此方法适合需要呈现大尺寸的画布

color_func=None,#(callable)生成新颜色的函数，默认为空。如果为空，则使用 self.color_func

max_words=200, #(int)单词最多显示数量，默认200个

min_font_size=4,  #(int)单词最小尺寸，默认4像素

stopwords=None,#(set of strings or None)设置需要屏蔽展示的词，如果为空，则使用内置的STOPWORDS。若使用generate_from_frequencies生成方式，则会忽略此参数

random_state=None, #(int or None)为每个单词返回一个PIL颜色

background_color='black', #(string)输出画布背景颜色，默认黑色

max_font_size=None, #(int)单词最大尺寸，默认不限制

font_step=1,#(int)字体步长,默认1。如果步长大于1，会加快运算但是可能导致结果出现较大的误差（这块确实不知道啥意思）

mode='RGB', #(string) 颜色显示模式，默认”RGB”。当参数为“RGBA”并且background_color是None时，背景色为透明

relative_scaling='auto', #(float)词频和字体大小的关联性（倍数）。默认是auto，即为0.5。若为0，只考虑单词的排列顺序；若为1，则单词展现的大小和出现的频率一致；若两者都考虑则可以设置为auto。若参数repeat=True，则此项为0

regexp=None, #(string or None (optional))把文本切片的通用方法。若为空，则使用正则匹配r"\w[\w']；若使用generate_from_frequencies生成方式，则忽略此参数

collocations=True,#(bool) 是否包含两个单词的搭配性，默认包含。若使用generate_from_frequencies生成方式，则忽略此参数

colormap=None, #(string or matplotlib colormap)给每个单词随机分配颜色或者使用Matplotlib调色板，默认颜色是”viridis”即翠绿色。若使用了参数color_func，则忽略此项

normalize_plurals=True, #(bool)是否去掉单词末尾的‘s’，默认去掉。若为真，并且单词以‘s’结尾（若以‘ss’结尾则不符合此规则），‘s’会被去除并且去除后的单词出现的频率会被统计。若使用generate_from_frequencies生成方式，则忽略此参数

contour_width=0, #(float)mask轮廓线宽。若mask不为空且此项值大于0，就绘制出mask轮廓 (default=0)

contour_color='black', #(color value) Mask轮廓颜色，默认黑色

repeat=False #（bool）单词是否重复展示，默认不重复

)

方法名	参数	返回值	备注
`fit_words`(frequencies)	frequencies:dict from string to float	self	根据单词及其频率生成词云
`generate_from_frequencies`(frequencies, max_font_size=None)	frequencies:dict from string to floatmax_font_size:int	self
`generate`(text)	text:string	self	根据文本生成词云，是方法generate_from_text的别称。输入的文本应该是一个自然文本。若输入的是已排列好的单词，那么单词会出现两次，可以设置参数collocations=False去除此单词重复。调用process_text和genereate_from_frequences
`generate_from_text`(text)	text:string	self
`process_text`(text)	text:string	words:dict (string, int)	将一长段文本切片成单词，并去除stopwords。返回单词（words）和其出现次数的字典格式
`recolor(random_state=None, color_func=None, colormap=None)`	random_state:RandomState, int, or None, default=Nonecolor_func:function or None, default=Nonecolormap:string or matplotlib colormap, default=None	self
`to_array`()		image:nd-array size (width, height, 3)	转换成numpy array
`to_file(filename)`	filename:string	self	保存图片文件

# -*- coding: utf-8 -*-
from wordcloud import WordCloud
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import jieba

# 打开文本
text = open('xyj.txt').read()

# 中文分词
text = ' '.join(jieba.cut(text))
print(text[:100])

# 生成对象
mask = np.array(Image.open("black_mask.png"))
wc = WordCloud(mask=mask, font_path='Hiragino.ttf', mode='RGBA', background_color=None).generate(text)

# 显示词云
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.show()

# 保存到文件
wc.to_file('wordcloud4.png')