
π· The Fill-Mask Association Test (ζ©η ε‘«η©Ίθη³»ζ΅ιͺ).
The Fill-Mask Association Test (FMAT) is an integrative and probability-based method using BERT Models to measure conceptual associations (e.g., attitudes, biases, stereotypes, social norms, cultural values) as propositions in natural language (Bao, 2024, JPSP).
β οΈ Please update this package to version β₯ 2025.4 for faster and more robust functionality.


Bruce H. W. S. Bao ε ε―ε΄ι
π¬ baohws@foxmail.com
π psychbruce.github.io
The R package FMAT and three Python packages
(transformers, torch,
huggingface-hub) all need to be installed.
## Method 1: Install from CRAN
install.packages("FMAT")
## Method 2: Install from GitHub
install.packages("devtools")
devtools::install_github("psychbruce/FMAT", force=TRUE)Install Anaconda (a recommended package manager that automatically installs Python, its IDEs like Spyder, and a large list of common Python packages).
Specify the Anacondaβs Python interpreter in RStudio.
RStudio β Tools β Global/Project Options
β Python β Select β Conda Environments
β Choose ββ¦/Anaconda3/python.exeβ
Install specific versions of Python packages βtransformersβ,
βtorchβ, and βhuggingface-hubβ.
(RStudio Terminal / Anaconda Prompt / Windows Command)
For CPU users:
pip install transformers==4.40.2 torch==2.2.1 huggingface-hub==0.20.3For GPU (CUDA) users:
pip install transformers==4.40.2 huggingface-hub==0.20.3
pip install torch==2.2.1 --index-url https://download.pytorch.org/whl/cu121To use some models (e.g., microsoft/deberta-v3-base),
βYou need to have sentencepiece installed to convert a slow tokenizer to
a fast oneβ:
pip install sentencepieceHTTPSConnectionPool(host='huggingface.co', port=443),
please try to (1) reinstall Anaconda so that
some unknown issues may be fixed, or (2) downgrade the βurllib3β package to version
β€ 1.25.11 (pip install urllib3==1.25.11) so that it will
use HTTP proxies (rather than HTTPS proxies as in later versions) to
connect to Hugging Face.Use BERT_download() to download BERT models. Model files are saved in your local
cache folder β%USERPROFILE%/.cache/huggingfaceβ. A full list of BERT
models are available at Hugging
Face.
Use BERT_info() and BERT_vocab() to obtain
detailed information of BERT models.
Design queries that conceptually represent the constructs you would measure (see Bao, 2024, JPSP for how to design queries).
Use FMAT_query() and/or FMAT_query_bind()
to prepare a data.table of queries.
Use FMAT_run() to get raw data (probability estimates)
for further analysis.
Several steps of preprocessing have been included in the function for
easier use (see FMAT_run() for details).
<mask> rather than
[MASK] as the mask token, the input query will be
automatically modified so that users can always use
[MASK] in query design.\u0120 and \u2581 will be
automatically added to match the whole words (rather than
subwords) for [MASK].By default, the FMAT package uses CPU to enable the
functionality for all users. But for advanced users who want to
accelerate the pipeline with GPU, the FMAT_run() function
now supports using a GPU device, about 3x faster than
CPU.
Test results (on the developerβs computer, depending on BERT model size):
Checklist:
torch package) with CUDA
support.
torch without CUDA
support, please first uninstall it (command:
pip uninstall torch) and then install the suggested
one.torch version supporting CUDA 12.1, the same
version of CUDA
Toolkit 12.1 may also be installed).Example code for installing PyTorch with CUDA support:
(RStudio Terminal / Anaconda Prompt / Windows Command)
pip install torch==2.2.1 --index-url https://download.pytorch.org/whl/cu121The reliability and validity of the following 12 BERT models in the FMAT have been established in our research, but future work is needed to examine the performance of other models.
(model name on Hugging Face - model file size)
For details about BERT, see:
library(FMAT)
models = c(
  "bert-base-uncased",
  "bert-base-cased",
  "bert-large-uncased",
  "bert-large-cased",
  "distilbert-base-uncased",
  "distilbert-base-cased",
  "albert-base-v1",
  "albert-base-v2",
  "roberta-base",
  "distilroberta-base",
  "vinai/bertweet-base",
  "vinai/bertweet-large"
)
BERT_download(models)βΉ Device Info:
R Packages:
FMAT          2024.5
reticulate    1.36.1
Python Packages:
transformers  4.40.2
torch         2.2.1+cu121
NVIDIA GPU CUDA Support:
CUDA Enabled: TRUE
CUDA Version: 12.1
GPU (Device): NVIDIA GeForce RTX 2050
ββ Downloading model "bert-base-uncased" ββββββββββββββββββββββββββββββββββββββββββ
β (1) Downloading configuration...
config.json: 100%|ββββββββββ| 570/570 [00:00<00:00, 114kB/s]
β (2) Downloading tokenizer...
tokenizer_config.json: 100%|ββββββββββ| 48.0/48.0 [00:00<00:00, 23.9kB/s]
vocab.txt: 100%|ββββββββββ| 232k/232k [00:00<00:00, 1.50MB/s]
tokenizer.json: 100%|ββββββββββ| 466k/466k [00:00<00:00, 1.98MB/s]
β (3) Downloading model...
model.safetensors: 100%|ββββββββββ| 440M/440M [00:36<00:00, 12.1MB/s] 
β Successfully downloaded model "bert-base-uncased"
ββ Downloading model "bert-base-cased" ββββββββββββββββββββββββββββββββββββββββββββ
β (1) Downloading configuration...
config.json: 100%|ββββββββββ| 570/570 [00:00<00:00, 63.3kB/s]
β (2) Downloading tokenizer...
tokenizer_config.json: 100%|ββββββββββ| 49.0/49.0 [00:00<00:00, 8.66kB/s]
vocab.txt: 100%|ββββββββββ| 213k/213k [00:00<00:00, 1.39MB/s]
tokenizer.json: 100%|ββββββββββ| 436k/436k [00:00<00:00, 10.1MB/s]
β (3) Downloading model...
model.safetensors: 100%|ββββββββββ| 436M/436M [00:37<00:00, 11.6MB/s] 
β Successfully downloaded model "bert-base-cased"
ββ Downloading model "bert-large-uncased" βββββββββββββββββββββββββββββββββββββββββ
β (1) Downloading configuration...
config.json: 100%|ββββββββββ| 571/571 [00:00<00:00, 268kB/s]
β (2) Downloading tokenizer...
tokenizer_config.json: 100%|ββββββββββ| 48.0/48.0 [00:00<00:00, 12.0kB/s]
vocab.txt: 100%|ββββββββββ| 232k/232k [00:00<00:00, 1.50MB/s]
tokenizer.json: 100%|ββββββββββ| 466k/466k [00:00<00:00, 1.99MB/s]
β (3) Downloading model...
model.safetensors: 100%|ββββββββββ| 1.34G/1.34G [01:36<00:00, 14.0MB/s]
β Successfully downloaded model "bert-large-uncased"
ββ Downloading model "bert-large-cased" βββββββββββββββββββββββββββββββββββββββββββ
β (1) Downloading configuration...
config.json: 100%|ββββββββββ| 762/762 [00:00<00:00, 125kB/s]
β (2) Downloading tokenizer...
tokenizer_config.json: 100%|ββββββββββ| 49.0/49.0 [00:00<00:00, 12.3kB/s]
vocab.txt: 100%|ββββββββββ| 213k/213k [00:00<00:00, 1.41MB/s]
tokenizer.json: 100%|ββββββββββ| 436k/436k [00:00<00:00, 5.39MB/s]
β (3) Downloading model...
model.safetensors: 100%|ββββββββββ| 1.34G/1.34G [01:35<00:00, 14.0MB/s]
β Successfully downloaded model "bert-large-cased"
ββ Downloading model "distilbert-base-uncased" ββββββββββββββββββββββββββββββββββββ
β (1) Downloading configuration...
config.json: 100%|ββββββββββ| 483/483 [00:00<00:00, 161kB/s]
β (2) Downloading tokenizer...
tokenizer_config.json: 100%|ββββββββββ| 48.0/48.0 [00:00<00:00, 9.46kB/s]
vocab.txt: 100%|ββββββββββ| 232k/232k [00:00<00:00, 16.5MB/s]
tokenizer.json: 100%|ββββββββββ| 466k/466k [00:00<00:00, 14.8MB/s]
β (3) Downloading model...
model.safetensors: 100%|ββββββββββ| 268M/268M [00:19<00:00, 13.5MB/s] 
β Successfully downloaded model "distilbert-base-uncased"
ββ Downloading model "distilbert-base-cased" ββββββββββββββββββββββββββββββββββββββ
β (1) Downloading configuration...
config.json: 100%|ββββββββββ| 465/465 [00:00<00:00, 233kB/s]
β (2) Downloading tokenizer...
tokenizer_config.json: 100%|ββββββββββ| 49.0/49.0 [00:00<00:00, 9.80kB/s]
vocab.txt: 100%|ββββββββββ| 213k/213k [00:00<00:00, 1.39MB/s]
tokenizer.json: 100%|ββββββββββ| 436k/436k [00:00<00:00, 8.70MB/s]
β (3) Downloading model...
model.safetensors: 100%|ββββββββββ| 263M/263M [00:24<00:00, 10.9MB/s] 
β Successfully downloaded model "distilbert-base-cased"
ββ Downloading model "albert-base-v1" βββββββββββββββββββββββββββββββββββββββββββββ
β (1) Downloading configuration...
config.json: 100%|ββββββββββ| 684/684 [00:00<00:00, 137kB/s]
β (2) Downloading tokenizer...
tokenizer_config.json: 100%|ββββββββββ| 25.0/25.0 [00:00<00:00, 3.57kB/s]
spiece.model: 100%|ββββββββββ| 760k/760k [00:00<00:00, 4.93MB/s]
tokenizer.json: 100%|ββββββββββ| 1.31M/1.31M [00:00<00:00, 13.4MB/s]
β (3) Downloading model...
model.safetensors: 100%|ββββββββββ| 47.4M/47.4M [00:03<00:00, 13.4MB/s]
β Successfully downloaded model "albert-base-v1"
ββ Downloading model "albert-base-v2" βββββββββββββββββββββββββββββββββββββββββββββ
β (1) Downloading configuration...
config.json: 100%|ββββββββββ| 684/684 [00:00<00:00, 137kB/s]
β (2) Downloading tokenizer...
tokenizer_config.json: 100%|ββββββββββ| 25.0/25.0 [00:00<00:00, 4.17kB/s]
spiece.model: 100%|ββββββββββ| 760k/760k [00:00<00:00, 5.10MB/s]
tokenizer.json: 100%|ββββββββββ| 1.31M/1.31M [00:00<00:00, 6.93MB/s]
β (3) Downloading model...
model.safetensors: 100%|ββββββββββ| 47.4M/47.4M [00:03<00:00, 13.8MB/s]
β Successfully downloaded model "albert-base-v2"
ββ Downloading model "roberta-base" βββββββββββββββββββββββββββββββββββββββββββββββ
β (1) Downloading configuration...
config.json: 100%|ββββββββββ| 481/481 [00:00<00:00, 80.3kB/s]
β (2) Downloading tokenizer...
tokenizer_config.json: 100%|ββββββββββ| 25.0/25.0 [00:00<00:00, 6.25kB/s]
vocab.json: 100%|ββββββββββ| 899k/899k [00:00<00:00, 2.72MB/s]
merges.txt: 100%|ββββββββββ| 456k/456k [00:00<00:00, 8.22MB/s]
tokenizer.json: 100%|ββββββββββ| 1.36M/1.36M [00:00<00:00, 8.56MB/s]
β (3) Downloading model...
model.safetensors: 100%|ββββββββββ| 499M/499M [00:38<00:00, 12.9MB/s] 
β Successfully downloaded model "roberta-base"
ββ Downloading model "distilroberta-base" βββββββββββββββββββββββββββββββββββββββββ
β (1) Downloading configuration...
config.json: 100%|ββββββββββ| 480/480 [00:00<00:00, 96.4kB/s]
β (2) Downloading tokenizer...
tokenizer_config.json: 100%|ββββββββββ| 25.0/25.0 [00:00<00:00, 12.0kB/s]
vocab.json: 100%|ββββββββββ| 899k/899k [00:00<00:00, 6.59MB/s]
merges.txt: 100%|ββββββββββ| 456k/456k [00:00<00:00, 9.46MB/s]
tokenizer.json: 100%|ββββββββββ| 1.36M/1.36M [00:00<00:00, 11.5MB/s]
β (3) Downloading model...
model.safetensors: 100%|ββββββββββ| 331M/331M [00:25<00:00, 13.0MB/s] 
β Successfully downloaded model "distilroberta-base"
ββ Downloading model "vinai/bertweet-base" ββββββββββββββββββββββββββββββββββββββββ
β (1) Downloading configuration...
config.json: 100%|ββββββββββ| 558/558 [00:00<00:00, 187kB/s]
β (2) Downloading tokenizer...
vocab.txt: 100%|ββββββββββ| 843k/843k [00:00<00:00, 7.44MB/s]
bpe.codes: 100%|ββββββββββ| 1.08M/1.08M [00:00<00:00, 7.01MB/s]
tokenizer.json: 100%|ββββββββββ| 2.91M/2.91M [00:00<00:00, 9.10MB/s]
β (3) Downloading model...
pytorch_model.bin: 100%|ββββββββββ| 543M/543M [00:48<00:00, 11.1MB/s] 
β Successfully downloaded model "vinai/bertweet-base"
ββ Downloading model "vinai/bertweet-large" βββββββββββββββββββββββββββββββββββββββ
β (1) Downloading configuration...
config.json: 100%|ββββββββββ| 614/614 [00:00<00:00, 120kB/s]
β (2) Downloading tokenizer...
vocab.json: 100%|ββββββββββ| 899k/899k [00:00<00:00, 5.90MB/s]
merges.txt: 100%|ββββββββββ| 456k/456k [00:00<00:00, 7.30MB/s]
tokenizer.json: 100%|ββββββββββ| 1.36M/1.36M [00:00<00:00, 8.31MB/s]
β (3) Downloading model...
pytorch_model.bin: 100%|ββββββββββ| 1.42G/1.42G [02:29<00:00, 9.53MB/s]
β Successfully downloaded model "vinai/bertweet-large"
ββ Downloaded models: ββ
                           size
albert-base-v1            45 MB
albert-base-v2            45 MB
bert-base-cased          416 MB
bert-base-uncased        420 MB
bert-large-cased        1277 MB
bert-large-uncased      1283 MB
distilbert-base-cased    251 MB
distilbert-base-uncased  256 MB
distilroberta-base       316 MB
roberta-base             476 MB
vinai/bertweet-base      517 MB
vinai/bertweet-large    1356 MB
β Downloaded models saved at C:/Users/Bruce/.cache/huggingface/hub (6.52 GB)BERT_info(models)                      model   size vocab  dims   mask
                     <fctr> <char> <int> <int> <char>
 1:       bert-base-uncased  420MB 30522   768 [MASK]
 2:         bert-base-cased  416MB 28996   768 [MASK]
 3:      bert-large-uncased 1283MB 30522  1024 [MASK]
 4:        bert-large-cased 1277MB 28996  1024 [MASK]
 5: distilbert-base-uncased  256MB 30522   768 [MASK]
 6:   distilbert-base-cased  251MB 28996   768 [MASK]
 7:          albert-base-v1   45MB 30000   128 [MASK]
 8:          albert-base-v2   45MB 30000   128 [MASK]
 9:            roberta-base  476MB 50265   768 <mask>
10:      distilroberta-base  316MB 50265   768 <mask>
11:     vinai/bertweet-base  517MB 64001   768 <mask>
12:    vinai/bertweet-large 1356MB 50265  1024 <mask>(Tested 2024-05-16 on the developerβs computer: HP Probook 450 G10 Notebook PC)
While the FMAT is an innovative method for the computational intelligent analysis of psychology and society, you may also seek for an integrative toolbox for other text-analytic methods. Another R package I developedβPsychWordVecβis useful and user-friendly for word embedding analysis (e.g., the Word Embedding Association Test, WEAT). Please refer to its documentation and feel free to use it.