About

Leaving academic to apply my knowledge in real world applications
- Focus in NLP, data mining and graphs
Current side-projects
- Unisearch : A vector based search engine demo. Underlying neural network is based on CLIP , but trained on title-content and text-image pair datasets. The text encoder also supports 9 different languages ( english, chinese, spanish, italian, japanese, korean, vietnamese, german, french ). Image search works pretty good, but some finetuning required to work better in text search.
- Today Headlines news aggregation website using an experimental graph clustering in 5 different regions (US, Singapore, Taiwan, Malaysia).The aim is to optimize for speed, no statistic-based nor neural network models required.
Active maintainers of these pypi packages:
- fastlangid : The only language detection library that supports simplified chinese, traditional chinese and cantonese
- h5record : Easy to use large scale dataset format for pytorch
Multilinguist : I speak in 4 different languages (English, Chinese, Cantonese, Malay)
Published work:
- Gradient Normalization for Generative Adversarial Networks : Accepted by ICCV 2021
- Character-Preserving Coherent Story Visualization : generate clear and coherent characters between images of the same story. Accepted by ECCV 2020