
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.



Understanding how 2022 CPU works as noob

11 minute read


CPU design is pretty complex in modern world (M1, Intel big little, AMD chiplet) and looks nothing like what I learned in computing architecture back when I was young. In the past few months I read some good articles and wanted to write some of my understanding down.


1 minute read


不知不覺就在小公司待了滿一年,相比同屆去上市公司/外商我算是異類中的異類。雖然工作就是工作也沒什麼好拿出來說的,但是我覺得在大家都在說跳外商或豬屎屋( IC design house ) 多好的時候,我就想出來亂一下。

Limits of files you can store in a directory

3 minute read


04/03/2022 Update: Instead of using ext4, you should simply consider other file format such as xfs, zfs, btrfs for high file counts storage

Designing web search for my personal need

6 minute read


In 2021, I kept going into the idea of building a search engine ( or tools to organize the information within my bubble ). This tool should support semantic retrieval in terms of text form.

Spp in gans

1 minute read


Recently a paper named : Characterizing signal propagation to close the performance gap in unnormalized ResNets caught my eyes because Brock et al suggest by ploting the signal propogation plots (SPP) you can debug the possible bugs in your networks. Since GANs study has largely focus in stabilizing the training process (although recent years theres been a shift to more stable method such as denoising diffussion, VAE methods ), the possible of using SPP as a probe tools to debug GANs is still quite attractive.

Debugging deny hosts whitelist

1 minute read


For several months I am having issues connecting from my lab IP to a remote server. For months we thought the issues was caused by the crappy network backbone where we place the server.

Slurm TLDR;

1 minute read


As hinted in the title, this is a short intro to slurm command.


1 minute read


說實在的因為小時候英文學不夠好,結果目前在看自然語言處理的 paper 總是遇到很多每次都要查的專有名詞(有時候自己會有股衝動去修外文系的語言學 …. )。所以有了這篇專有名詞對照表。

ICLR 2020 papers in two sentences

3 minute read


This is a list of papers I went through during International Conference on Learning Representations (ICLR) 2020. I want to thank ICLR organizer for selecting me as a volunteer.

Telegram + Moodle 作業提醒

less than 1 minute read


有鑑於 iCloud 常常把重要的通知吃掉或是出現郵箱已滿+我自己很眼殘,導致常常錯過重要的通知或deadline.

How not to do Google Stadia

3 minute read


Google’s Stadia service, an cloud gaming service quickly destined to fail when consumer of premium package release receive their package but cannot use the service due to not receiving the activation code. With Google previously bad refund reputation [ 1, 2, 3]. Aside from the activation code arrival issue (likely because google wants to control the compute resource usage), there’s still some lesson we should learn from their mistake.

將 Postgresql 資料庫備份結合

less than 1 minute read



(Note) Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks (LEEA)

1 minute read


This paper introduce some insight about why a gradient optimization algo such as SGD works so well and how evolution algo (EA) can perform as well as gradient based algo.

EA learning method is as follows:

  1. A population of individual of the models are generated and a fitness score is evaluated on all individuals.

  2. Only top N individuals are selected and sexual reproduction and asexual reproduction is used to generate a new generation

  3. repeat step 1,2 until convergence is reached.

SGD has always touted that it will find local optimal solution, however, in a ANN model, there’s lots of weight configuration that finding a local optimal solution should not result in state of the art result as seen today. The argument given is that there’s many optimal path to escape from optimal solution such that the final result isn’t a local optima. They however suggest that that SGD the real culprit should be saddle point problem which other gradient based algo such as RMSProp, Adagrad etc aimed to solve.

小心David Publishing學術欺詐

less than 1 minute read


今天查看信息時,收到來自自稱 David Publishing 的公司,希望能將我之前發表的一篇論文發表在他們的期刊上。乍看之下很有趣,但是細看之後發現電郵內容用語非常奇怪。例如這句 “we wish to become your friends if we may.” 應該只有華文文化才會說的(一般上正式來信不會用 friends )。

我推薦的3個英文 Podcasts 頻道

less than 1 minute read



  • Wikipedia 播客

Byte Pair Encoding, 平衡語料詞典大小與編碼資訊

1 minute read


在自然語言的文字處理步驟中,第一步就是將文字轉為某種數字表達式。舉例詞向量(word embedding)來說,我們就以一個字典將每個詞對應到一個詞向量上。然而常常在真實世界中總是會出現不在你字典中的新詞彙,這時候只能以一個表達未知的符號來代替該字。另外語言的詞彙太多,如果將所有詞彙對應到一個詞向量模型將會變得太大無法在一般的電腦上運行(3百萬詞彙的 FastText 就需要6GB 的記憶體)。

Summarizing MT-DNN the state of the art model in language tasks before XLNet.

1 minute read


Multi-Task deep neural network (MT-DNN) for deep language understanding proposed by Microsoft achieve SOTA results in April on 10 language tasks ( GLUE, SNLI, … ). The author purpose that multi task learning which used in computer vision domain before to achieve state of the art result, can also be used to improve language understanding scores.

自然語言從古至今 Part 2

1 minute read



類神經網路自回歸密度估計 neural autogressive density estimator

less than 1 minute read


在機器學習領域中如果想要找出數據的機率分布密度函數 ( probability density function PDF ),一般上用的是 Auto encoder, restricted boltzman machine (RBM ) , GMM。然而另一個比較少人知道的方法是類神經網路自回歸密度估計 ( NADE ) 就是利用自回歸來找出數據的 PDF。NADE 在實驗中證明比RBM, Auto encoder 來的更加優秀,尤其超越了伯努力分佈一直非常有效 RBM 。

用Alpha Zero完成期末作業結論

less than 1 minute read


反正故事大概是這樣:交大人工智慧概論期末是一個人工智慧遊戲競賽,比的是一種從來沒有玩過的遊戲。為了贏得比賽我跟隊友直接訓練 Alpha Zero 類神經網路,然而中途殺出新規則限制硬體規格與不能使用套件(numpy 除外)的規定,讓我們中途放棄改用魔改 蒙地卡羅樹狀搜尋(MCTS)來繳交期末作業。

Targeted Dropout 更有效的 Dropout 機制

less than 1 minute read


Dropout 作為廣泛被使用來防止類神經網路過度擬合 (ovefitting) 的一種機制,在許多大型的類神經模型都會被用到。一般的 dropout 在訓練過程中,隨機將部分的類神經網路屏蔽掉,迫使參數變少後的模型能學習到目標任務。

FRACTALNET 不適用殘差連接完成超深的類神經網路

less than 1 minute read


這年頭時不時都來一個 skip connection(跳躍連接/殘差連接), 認為此風不可長的研究員就想出了 FractalNet。基於Skip connection 解決vanishing gradient 問題的理念,Fractal Net 使用梯子設計形式,增加信息可以流向的pipeline 。由於每個 pipline 所經過的權重比較少,因此 gradient 可以透過比較沒有什麼阻抗的 pipeline 更新比較底層的權重。


1 minute read


大學生涯以來,自己累積了一些web 端的 side projects。 但是考量開發、維護與運算成本,使用萬能的Django 也許不是最佳的方法。因為Django 的的架構複雜性太高,維護成本也相對的高。使用在重要性低又簡單的 side project 實在不太適合 ( 90% 的專案不就是做 CRUD )。

Firebase Android Build Exception

less than 1 minute read


最近興致勃勃的在專案加入Firebase 整合NoSQL 資料時,將gradle 都設置好以後,點擊同步Gradle 時出現了Exception Error 問題….@@

Ansible Introduction

1 minute read


Recently I got my hands on a brand new orange pi. However, I found myself stuck in typing the same boring command across all sessions. Hence, I believe this serves as a good opportunity to anisible for better automation across all servers.

Passing Object Between Activity Using Gson

1 minute read


So you wanna pass data between different activity however you are passing a custom object list which is not parseable using putExtra function.

Make Django Great Again Part 3 Sass Compiler

less than 1 minute read


Sass 是一種CSS 的進階寫法,具有巢狀迴圈、變數、運算、函數、可繼承(Mixins)的語法。但是本身卻無法直接被瀏覽器解讀,因此需要借助“編譯器” 轉換成CSS 才能使用。

Make Django Development Great Again Part 2 Live Reload

less than 1 minute read


之前在PyCharm 上開發Django有一個插件能在更改任何的文件以後,瀏覽器自動更新的功能。然而,這個功能僅限Chrome (因為只有支援Chrome extension ….) 。這讓習慣使用Safari 的Responsive Mode( cmd + option + R) 的我來說,不怎麼方便啊…..

Make Django Development Great Again Part 1

1 minute read


2017 年,身為Django 粉絲,心情百感交集。雖然碰過React, Angular 各種前端框架,可是都僅在NodeJS 上跟這教程寫。內心難免因為背叛Django而感覺小內疚。


Character-preserving coherent story visualization

Published in ECCV, 2020

Story visualization aims at generating a sequence of images to narrate each sentence in a multi-sentence story.

Recommended citation: Song, Y.Z., Tam, Z.R., Chen, H.J., Lu, H.H., & Shuai, H.H. (2020). "Character-preserving coherent story visualization." European Conference on Computer Vision, 18-33.
Download Paper

Gradient normalization for generative adversarial networks

Published in ICCV, 2021

In this paper, we propose a novel normalization method called gradient normalization (GN) to tackle the training instability of Generative Adversarial Networks (GANs) caused by the sharp gradient space.

Recommended citation: Wu, Y.L., Shuai, H.H., Tam, Z.R., & Chiu, H.Y. (2021). "Gradient normalization for generative adversarial networks." ICCV, 6373-6382.
Download Paper

Improving entity disambiguation using knowledge graph regularization

Published in PAKDD, 2022

Entity disambiguation plays the role on bridging between words of interest from an input text document and unique entities in a target Knowledge Base (KB).

Recommended citation: Tam, Z.R., Wu, Y.L., & Shuai, H.H. (2022). "Improving entity disambiguation using knowledge graph regularization." Pacific-Asia Conference on Knowledge Discovery and Data Mining, 341-353.
Download Paper

Openassistant conversations-democratizing large language model alignment

Published in NeurIPS, 2024

Aligning large language models (LLMs) with human preferences has proven to drastically improve usability and has driven rapid adoption as demonstrated by ChatGPT.

Recommended citation: Köpf, A., Kilcher, Y., von Rütte, D., Anagnostidis, S., Tam, Z.R., et al. (2024). "Openassistant conversations-democratizing large language model alignment." NeurIPS, 36.
Download Paper

An improved traditional chinese evaluation suite for foundation model

Published in arXiv, 2024

We present TMMLU+, a new benchmark designed for Traditional Chinese language understanding. TMMLU+ is a multi-choice question-answering dataset with 66 subjects from elementary to professional level. It is six times larger and boasts a more balanced subject distribution than its predecessor, Taiwan Massive Multitask Language Understanding (TMMLU).

Recommended citation: Tam, Z.R., Pai, Y.T., Lee, Y.W., Chen, J.D., Chu, W.M., Cheng, S., & Shuai, H.H. (2024). "An improved traditional chinese evaluation suite for foundation model." arXiv preprint arXiv:2403.01858.
Download Paper

Personalized EDM Subject Generation via Co-factored User-Subject Embedding

Published in PAKDD, 2024

This paper introduces the Co-Factored User-Subject Embedding based Personalized EDM Subject Generation Framework (COUPES), a model for creating personalized Electronic Direct Mail (EDM) subjects.

Recommended citation: Chen, Y.H., Tam, Z.R., & Shuai, H.H. (2024). "Personalized EDM Subject Generation via Co-factored User-Subject Embedding." Pacific-Asia Conference on Knowledge Discovery and Data Mining, 55-67.
Download Paper

I Need Help! Evaluating LLM’s Ability to Ask for Users’ Support: A Case Study on Text-to-SQL Generation

Published in EMNLP 2024 Main Track, 2024

This study explores the proactive ability of LLMs to seek user support. We propose metrics to evaluate the trade-off between performance improvements and user burden, and investigate whether LLMs can determine when to request help under varying information availability.

Recommended citation: Wu, C.K., Tam, Z.R., Wu, C.C., Lin, C.Y., Lee, H., & Chen, Y.N. (2024). "I Need Help! Evaluating LLM's Ability to Ask for Users' Support: A Case Study on Text-to-SQL Generation." arXiv preprint arXiv:2407.14767.
Download Paper

Let Me Speak Freely? A Study On The Impact Of Format Restrictions On Large Language Model Performance

Published in EMNLP 2024 Industry Track, 2024

Structured generation, the process of producing content in standardized formats like JSON and XML, is widely utilized in real-world applications to extract key output information from large language models (LLMs).

Recommended citation: Tam, Z.R., Wu, C.K., Tsai, Y.L., Lin, C.Y., Lee, H., & Chen, Y.N. (2024). "Let Me Speak Freely? A Study On The Impact Of Format Restrictions On Large Language Model Performance." EMNLP Industry Track, 1218-1236.
Download Paper

Clear Minds Think Alike: What Makes LLM Fine-tuning Robust? A Study of Token Perplexity

Published in arXiv, 2025

A systematic analysis revealing that fine-tuning with LLM-generated data not only improves target task performance but also reduces out-of-domain degradation compared to fine-tuning with ground truth data and ways to mitigate it

Recommended citation: Wu, C.C., Tam, Z.R., Lin, C.Y., Lee, H.Y., & Chen, Y.N. (2025). "Clear Minds Think Alike: What Makes LLM Fine-tuning Robust? A Study of Token Perplexity." arXiv preprint arXiv:2501.14315.
Download Paper

Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models

Published in arXiv, 2025

This study formalizes the task of risk-aware decision making in LLMs, explores how models adapt their decisions to different risk levels, and proposes skill decomposition solutions to improve performance. The findings show that even advanced LMs require explicit prompt chaining to handle risk-aware decision making effectively.

Recommended citation: Wu, C.K., Tam, Z.R., Lin, C.Y., Chen, Y.N., & Lee, H. (2024). "Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models." arXiv preprint arXiv:2503.01332.
Download Paper

None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering

Published in arXiv preprint, 2025

This study examines how “None of the Above” (NA) options affect LLM performance on multiple-choice questions. Results reveal a consistent 30-50% performance drop when NA is the correct answer, with domain dependency showing minimal impact on math reasoning but severe effects on uncertainty handling tasks like business ethics.

Recommended citation: Tam, Z.R., Wu, C.K., & Chen, Y.N. (2025). "None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering." arXiv preprint arXiv:2503.01550.
Download Paper
