Fun facts
-
Leaving academic to apply my knowledge in real world applications
- Focus in NLP, data mining and graphs
-
Current side-projects
-
Unisearch : A vector based search engine demo. Underlying neural network is based on CLIP , but trained on title-content and text-image pair datasets. The text encoder also supports 9 different languages ( english, chinese, spanish, italian, japanese, korean, vietnamese, german, french ). Image search works pretty good, but some finetuning required to work better in text search.
-
Today Headlines news aggregation website using an experimental graph clustering in 5 different regions (US, Singapore, Taiwan, Malaysia).The aim is to optimize for speed, no statistic-based nor neural network models required.
-
-
Active maintainers of these pypi packages:
-
fastlangid : The only language detection library that supports simplified chinese, traditional chinese and cantonese
-
h5record : Easy to use large scale dataset format for pytorch
-
-
Multilinguist : I speak in 4 different languages (English, Chinese, Cantonese, Malay)
-
Published work:
-
Gradient Normalization for Generative Adversarial Networks : Accepted by ICCV 2021
-
Character-Preserving Coherent Story Visualization : generate clear and coherent characters between images of the same story. Accepted by ECCV 2020
-