Short writeup of "Unpopular opinions on AI"
“Unpopular opinions on AI” is a podcast episode by ZhenFund team thoughts on LLM at a sharing in China in simplfied chinese with Peak Ji
Short writeup of recent challenges found in LLMs models (May 2023)
TLDR bullet points by GPT-4, so you don’t need to wait for the plugins to generate it for you.
List of papers about human aligned conversation bot
Read MoreUnderstanding how 2022 CPU works as noob
CPU design is pretty complex in modern world (M1, Intel big little, AMD chiplet) and looks nothing like what I learned in computing architecture back when I was young. In the past few months I read some good articles and wanted to write some of my understanding down.
電機轉軟體工作一年的心得
不知不覺就在小公司待了滿一年,相比同屆去上市公司/外商我算是異類中的異類。雖然工作就是工作也沒什麼好拿出來說的,但是我覺得在大家都在說跳外商或豬屎屋( IC design house ) 多好的時候,我就想出來亂一下。
Bag of tricks found in "Building MT system for next 1000 languages"
Cover photo generated by dalle-mini with the prompt : “machine translating 1000 different languages”
圖論專有名詞對照表
Homogeneity : 同質性
Limits of files you can store in a directory
04/03/2022 Update: Instead of using ext4, you should simply consider other file format such as xfs, zfs, btrfs for high file counts storage
List of machine learning slides by myself
This is a collections of slides I made during the past few years for group study or paper presentation sessions.
An survey report on scientific document processing
Cover photo generated by dalle-mini with the prompt : “a robot reading different scientific documents”
Types of different vaccine
目前市面上的疫苗:
Designing web search for my personal need
In 2021, I kept going into the idea of building a search engine ( or tools to organize the information within my bubble ). This tool should support semantic retrieval in terms of text form.
Spp in gans
Recently a paper named : Characterizing signal propagation to close the performance gap in unnormalized ResNets caught my eyes because Brock et al suggest by ploting the signal propogation plots (SPP) you can debug the possible bugs in your networks. Since GANs study has largely focus in stabilizing the training process (although recent years theres been a shift to more stable method such as denoising diffussion, VAE methods ), the possible of using SPP as a probe tools to debug GANs is still quite attractive.
Debugging deny hosts whitelist
For several months I am having issues connecting from my lab IP to a remote server. For months we thought the issues was caused by the crappy network backbone where we place the server.
Slurm TLDR;
As hinted in the title, this is a short intro to slurm command.
自然語言專有名詞對照表
說實在的因為小時候英文學不夠好,結果目前在看自然語言處理的 paper 總是遇到很多每次都要查的專有名詞(有時候自己會有股衝動去修外文系的語言學 …. )。所以有了這篇專有名詞對照表。
Not all Chinese are the same
For many non Chinese speaker, these sentences may appear same to you.
ICLR 2020 papers in two sentences
This is a list of papers I went through during International Conference on Learning Representations (ICLR) 2020. I want to thank ICLR organizer for selecting me as a volunteer.
Telegram + Moodle 作業提醒
有鑑於 iCloud 常常把重要的通知吃掉或是出現郵箱已滿+我自己很眼殘,導致常常錯過重要的通知或deadline.
Listing disk usage under UNIX system
TLDR; **du -xa / | sort -n -r | head -n 30** list the top 30 disk hogger files under your system disk. |
Kill zombie process using GPU memory
I was repeating my experiments and I notice there’a a dead process hoarding GPU memory.
Extract high quality corpus from common crawl efficiently using CCNet
How not to do Google Stadia
Google’s Stadia service, an cloud gaming service quickly destined to fail when consumer of premium package release receive their package but cannot use the service due to not receiving the activation code. With Google previously bad refund reputation [ 1, 2, 3]. Aside from the activation code arrival issue (likely because google wants to control the compute resource usage), there’s still some lesson we should learn from their mistake.
What is github.com.cnpmjs.org
將 Postgresql 資料庫備份結合
最近自己的資料庫占滿雲端的容量讓我開始焦慮起來。但因為有些資料基本上不會被用戶讀取。但是為了日後分析的完整性卻又不能直接刪除,因此我購置了硬碟打算在我自己的主機將所有遠端伺服器上太舊的資料轉移到我自己的本地主機。
(Note) Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks (LEEA)
This paper introduce some insight about why a gradient optimization algo such as SGD works so well and how evolution algo (EA) can perform as well as gradient based algo.
EA learning method is as follows:
-
A population of individual of the models are generated and a fitness score is evaluated on all individuals.
-
Only top N individuals are selected and sexual reproduction and asexual reproduction is used to generate a new generation
-
repeat step 1,2 until convergence is reached.
SGD has always touted that it will find local optimal solution, however, in a ANN model, there’s lots of weight configuration that finding a local optimal solution should not result in state of the art result as seen today. The argument given is that there’s many optimal path to escape from optimal solution such that the final result isn’t a local optima. They however suggest that that SGD the real culprit should be saddle point problem which other gradient based algo such as RMSProp, Adagrad etc aimed to solve.
小心David Publishing學術欺詐
今天查看信息時,收到來自自稱 David Publishing 的公司,希望能將我之前發表的一篇論文發表在他們的期刊上。乍看之下很有趣,但是細看之後發現電郵內容用語非常奇怪。例如這句 “we wish to become your friends if we may.” 應該只有華文文化才會說的(一般上正式來信不會用 friends )。
我推薦的3個英文 Podcasts 頻道
播客(台灣和香港[1]直接稱為「Podcasting」)是一種數碼媒體,指一系列的音訊、影片、電子電台或文字檔以列表形式經互聯網發佈,然後聽眾經由電子裝置訂閱該列表以下載或串流當中的電子檔案
- Wikipedia 播客
Byte Pair Encoding, 平衡語料詞典大小與編碼資訊
在自然語言的文字處理步驟中,第一步就是將文字轉為某種數字表達式。舉例詞向量(word embedding)來說,我們就以一個字典將每個詞對應到一個詞向量上。然而常常在真實世界中總是會出現不在你字典中的新詞彙,這時候只能以一個表達未知的符號來代替該字。另外語言的詞彙太多,如果將所有詞彙對應到一個詞向量模型將會變得太大無法在一般的電腦上運行(3百萬詞彙的 FastText 就需要6GB 的記憶體)。
Summarizing MT-DNN the state of the art model in language tasks before XLNet.
Multi-Task deep neural network (MT-DNN) for deep language understanding proposed by Microsoft achieve SOTA results in April on 10 language tasks ( GLUE, SNLI, … ). The author purpose that multi task learning which used in computer vision domain before to achieve state of the art result, can also be used to improve language understanding scores.
Revisit Brown clustering, a class based n-grams model
Projecting word to high dimensional space (word2vec) has been a normal practice in NLP domain. The most common algorithm used are continuous bag of word (CBOW) and skip-gram.
自然語言從古至今 Part 2
第二場的演講比較屬於分享會的形式,內容比較零散,而我只能盡力的整理內容。我覺得重點是要提出一些比較新的概念可以作為大家以後研究方向的入門,讓大家自己去摸索。畢竟要把所有概念講完可能要蠻久的。
自然語言從古至今 Part 1
在自然語言領域 (NLP) 做了十幾年的韓國教授 Seung won Hwang 受邀來交大給2個關於 NLP 的演講。演講大概就是分享近年 NLP 的進展程度與NLP常見的任務種類。
XLNet 超越GPT、BERT的自然語言模型
2019 年最新的論文 XLNet: Generalized Autoregressive Pretraining for Language Understanding, 作者在結果中證實超越了目前 NLP 模型中最好的模型如 GPT, BERT。這裡就稍微解析一下為何 XLNet 的架構可以超越 GPT、BERT 模型吧~
類神經網路自回歸密度估計 neural autogressive density estimator
在機器學習領域中如果想要找出數據的機率分布密度函數 ( probability density function PDF ),一般上用的是 Auto encoder, restricted boltzman machine (RBM ) , GMM。然而另一個比較少人知道的方法是類神經網路自回歸密度估計 ( NADE ) 就是利用自回歸來找出數據的 PDF。NADE 在實驗中證明比RBM, Auto encoder 來的更加優秀,尤其超越了伯努力分佈一直非常有效 RBM 。
用Alpha Zero完成期末作業結論
反正故事大概是這樣:交大人工智慧概論期末是一個人工智慧遊戲競賽,比的是一種從來沒有玩過的遊戲。為了贏得比賽我跟隊友直接訓練 Alpha Zero 類神經網路,然而中途殺出新規則限制硬體規格與不能使用套件(numpy 除外)的規定,讓我們中途放棄改用魔改 蒙地卡羅樹狀搜尋(MCTS)來繳交期末作業。
筆記 R(2+1)D and Mixed-Convolutions for Action Recognition
使用 Convolution 有效學習圖片+時間資訊: R(2+1)D and Mixed-Convolutions for Action Recognition
Targeted Dropout 更有效的 Dropout 機制
Dropout 作為廣泛被使用來防止類神經網路過度擬合 (ovefitting) 的一種機制,在許多大型的類神經模型都會被用到。一般的 dropout 在訓練過程中,隨機將部分的類神經網路屏蔽掉,迫使參數變少後的模型能學習到目標任務。
FRACTALNET 不適用殘差連接完成超深的類神經網路
這年頭時不時都來一個 skip connection(跳躍連接/殘差連接), 認為此風不可長的研究員就想出了 FractalNet。基於Skip connection 解決vanishing gradient 問題的理念,Fractal Net 使用梯子設計形式,增加信息可以流向的pipeline 。由於每個 pipline 所經過的權重比較少,因此 gradient 可以透過比較沒有什麼阻抗的 pipeline 更新比較底層的權重。
微框架實測評比
大學生涯以來,自己累積了一些web 端的 side projects。 但是考量開發、維護與運算成本,使用萬能的Django 也許不是最佳的方法。因為Django 的的架構複雜性太高,維護成本也相對的高。使用在重要性低又簡單的 side project 實在不太適合 ( 90% 的專案不就是做 CRUD )。
近藤麻理惠:怦然心动的程式整理魔法
不知道有沒有人在看近Netflix 的藤麻理惠:怦然心動的人生整理魔法。我覺得近藤麻理惠的語氣超療癒的。
Firebase Android Build Exception
最近興致勃勃的在專案加入Firebase 整合NoSQL 資料時,將gradle 都設置好以後,點擊同步Gradle 時出現了Exception Error 問題….@@
Ansible Introduction
Recently I got my hands on a brand new orange pi. However, I found myself stuck in typing the same boring command across all sessions. Hence, I believe this serves as a good opportunity to anisible for better automation across all servers.
Sugarorm Traps To Avoid
Skip Java Support In Armv6
最近花了幾天將自己的三個電腦叢架上Hadoop(什麼是Hadoop) , 其中一個datanode是一代的raspberry pi
Passing Object Between Activity Using Gson
So you wanna pass data between different activity however you are passing a custom object list which is not parseable using putExtra function.
Make Django Great Again Part 4 Restful Framework
Make Django Great Again Part 3 Sass Compiler
Sass 是一種CSS 的進階寫法,具有巢狀迴圈、變數、運算、函數、可繼承(Mixins)的語法。但是本身卻無法直接被瀏覽器解讀,因此需要借助“編譯器” 轉換成CSS 才能使用。
Make Django Development Great Again Part 2 Live Reload
之前在PyCharm 上開發Django有一個插件能在更改任何的文件以後,瀏覽器自動更新的功能。然而,這個功能僅限Chrome (因為只有支援Chrome extension ….) 。這讓習慣使用Safari 的Responsive Mode( cmd + option + R) 的我來說,不怎麼方便啊…..
Make Django Development Great Again Part 1
2017 年,身為Django 粉絲,心情百感交集。雖然碰過React, Angular 各種前端框架,可是都僅在NodeJS 上跟這教程寫。內心難免因為背叛Django而感覺小內疚。