data_science
seeing is believing. A witty saying proves nothing.
"When solving a problem of interest, do not solve a more general problem as an intermediate step." (Vladimir Vapnik)
Must read
- foundation of dl: https://www.youtube.com/watch?time_continue=157&v=zl99IZvW7rE
- (Bradley)Bayesian, Frequentist and Scientist: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.179.1454&rep=rep1&type=pdf
- (Breiman) 2 cultures http://www2.math.uu.se/~thulin/mm/breiman.pdf
- https://gluebenchmark.com/leaderboard
My implementations
Chatbot
- https://github.com/chiphuyen/stanford-tensorflow-tutorials/tree/master/assignments/chatbot
- https://botlist.co/
- https://github.com/JStumpp/awesome-chatbots
- https://github.com/fendouai/Awesome-Chatbot
- https://github.com/dennybritz/chatbot-retrieval/
- https://realpython.com/python-keras-text-classification/
- https://github.com/ekapolc/nlp_course/blob/master/slides/L10.2-chatbotsOverview.pdf
RecSys
- https://github.com/maciejkula/spotlight
- session based: https://arxiv.org/pdf/1511.06939.pdf
- pool next item: https://www.semanticscholar.org/paper/Deep-Neural-Networks-for-YouTube-Recommendations-Covington-Adams
- tune nlp: http://ruder.io/deep-learning-nlp-best-practices/index.html#classification
Winining solutions
- http://ndres.me/kaggle-past-solutions/
- Rossmann Sales Forecasting, 1st solution: https://kaggle2.blob.core.windows.net/forum-message-attachments/102102/3454/Rossmann_nr1_doc.pdf
Stats
- Good, Hardin. Common Errors in Statistics (and How to Avoid Them) (2003)
- Kanji. 100 statistical tests (2006)
- Doing Data Science: Straight Talk from the Frontline
Game Industry:
- https://project.dke.maastrichtuniversity.nl/cig2018/proceedings/
- https://www.slideshare.net/africaperianez/game-data-science-the-state-of-the-art
Case stydies:
- auc, https://www.kaggle.com/c/acquire-valued-shoppers-challenge#evaluation
- auc, https://www.kaggle.com/c/kdd-cup-2014-predicting-excitement-at-donors-choose#description
DS Coursera
- http://www.chioka.in/how-to-select-your-final-models-in-a-kaggle-competitio/
- http://scikit-learn.org/stable/modules/cross_validation.html
Heroes of DL
- Geoffrey Hinton: https://www.youtube.com/watch?v=-eyhCTvrEtE
- Andreij Karpathy: https://www.youtube.com/watch?v=_au3yw46lcg
Top conferences:
- KDD 2018 London, UK: http://www3.imperial.ac.uk/newsandeventspggrp/imperialcollege/engineering/datascienceinstitute/newssummary/news_22-8-2017-11-17-28
- WSDM 2018, US: http://www.wsdm-conference.org/2018/
- NIPS 2017, Long Beach, US: https://nips.cc/
- DepLing 2017: http://www.depling.org/depling2017/program.html
- CIKM 2017: http://cikm2017.org/
- https://webdocs.cs.ualberta.ca/~zaiane/htmldocs/ConfRanking.html
- http://www.guide2research.com/topconf/machine-learning
- http://portal.core.edu.au/conf-ranks/?search=&by=all&source=CORE2017&sort=atitle&page=1
Deep Learning
Events: I will put word cloud for that.
EMNLP 2017: http://noisy-text.github.io/2017/
NLPStan reading
LXMLS16:
- http://lxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf
- http://lxmls.it.pt/2016/lxmls-dl2.pdf
ACL2017
- keynote: linguistic is back, reduce search space: https://drive.google.com/file/d/0B2cCJQ2_aOwjMlg5MnFjTEpBNG8/view
VietAI
- Quoc Le (Google Brain): http://cs.stanford.edu/~quocle/
- Thang Luong (Google Brain): http://t.co/3zNHouUn
- Dustin (Columbia) http://dustintran.com/
- Thien (NYU) http://www.cs.nyu.edu/~thien/
- Hieu Pham (CMU) https://www.quora.com/profile/Hieu-Pham-20
- Ken Tran (Microsofts) http://www.kentran.net/
- Laurent Dinh (MILA):https://laurent-dinh.github.io/about/
- Luong Hoang, Harvard: https://github.com/lhoang29/recurrent-entity-networks
- Vu Pham
My SOTA
- My ATIS: sequence tagging, nb of params: 324335, bi-LSTM
- Quore question duplicate detection: Accuracy 85% on Wang's test
- best F1 score: 94.92/94.64
- train scores: 97.5446666667/96.17
- val scores: 93.664/92.94
Game industry
- TCCP PU learning https://arxiv.org/pdf/1802.09788.pdf
- By last time login: https://mpra.ub.uni-muenchen.de/82871/1/paper8.pdf
- https://www.slideshare.net/aistconf/webgames-61437118
Yandex
- https://github.com/ddtm/dl-course
- https://github.com/vkantor/MIPT_Data_Mining_In_Action_2016/tree/master/trends
- https://github.com/yandexdataschool/Practical_RL
- https://github.com/yandexdataschool/HSE_deeplearning
ICLR 2017 Review
- if you wanna turn LSTM, it's worth to read (from Socher): https://arxiv.org/pdf/1611.05104v2.pdf
LearningNewThingIn2017
- Torch/Lua (Facebook/HarvardNLP): http://nlp.seas.harvard.edu/code/, http://cs287.fas.harvard.edu/
- TF/Python (Google/Stanford): https://github.com/BinRoot/TensorFlow-Book
- cs287: https://github.com/CS287/Lectures
Conf events
- Coling 2016, Osaka Japan: http://coling2016.anlp.jp/
- ICLR 2017, Apr in France: http://www.iclr.cc/doku.php?id=ICLR2017:main&redirect=1
- open review: http://openreview.net/group?id=ICLR.cc/2017/conference
NIPs 2016 slides
- https://github.com/hindupuravinash/nips2016
- Ian GAN tut: http://www.iangoodfellow.com/slides/2016-12-9-gans.pdf
- Ng nuts and bolts: https://www.dropbox.com/s/dyjdq1prjbs8pmc/NIPS2016%20-%20Pages%202-6%20(1).pdf
- variational inference: http://www.cs.columbia.edu/~blei/talks/2016_NIPS_VI_tutorial.pdf
Theano based DL applications
learn to learn: algos optimization
- sgd and friends: http://cs231n.github.io/neural-networks-3/#update
- overview of gd: http://sebastianruder.com/optimizing-gradient-descent/
- keras-team/keras#898
- I used to choose adam and rmsprop with tuning lr and batch size.
People
- http://people.stat.sc.edu/haigang/techBlog.html
- http://aejjrsite.free.fr/goodmorning/gm122/gm122_ThayToiMauriceAllais.pdf
- http://www.thesaigontimes.vn/271832/cau-chuyen-tri-tue-nhan-tao.html
Pin:
- semantic scholar: https://www.semanticscholar.org/
- grow a mind: http://web.mit.edu/cocosci/Papers/tkgg-science11-reprint.pdf
- trendingarxiv: http://trendingarxiv.smerity.com/
- https://github.com/andrewt3000/DL4NLP
- Natural languague inference NLI: https://github.com/Smerity/keras_snli
- ACL: http://www.aclweb.org/anthology/P/P16/
Data type: NOQ
- Nominal (N):cat, dog --> x,o | vis: shape, color
- Ordinal (O): Jan - Feb - Mar - Apr | vis: area, density
- Quantitative (Q): numerical 0.42, 0.58 | vis: length, position
People:
Fin data:
- Reuters 8M (2007-2016): https://github.com/philipperemy/Reuters-full-data-set.git
- Bloomberg https://github.com/philipperemy/financial-news-dataset
- stocktwits: https://github.com/goodwillyoga/E107project/tree/master/pooja/data
Projects:
Wikidata:
- https://github.com/VladimirAlexiev/VladimirAlexiev.github.io/blob/master/CH-names/README.org
- https://github.com/VladimirAlexiev/VladimirAlexiev.github.io/tree/master/CH-names
Cartoons & Quotes:
- "cause you know sometimes words have two meanings" led zeppelin
- http://stats.stackexchange.com/questions/423/what-is-your-favorite-data-analysis-cartoon?newsletter=1&nlcode=231076%7C1179
Books:
- http://neuralnetworksanddeeplearning.com/index.html
- u.cs.biu.ac.il/~yogo/nnlp.pdf
Done:
- EMNLP 2016, Austin, 2-4 Nov: http://www.emnlp2016.net/tutorials.html#practical
-
Dynet (CMU: https://t.co/nSCkBt0i0F
-
lifelong ML (Google): http://www.emnlp2016.net/tutorials/chen-liu-t3.pdf
-
Markov logic for scalable joint inference: http://www.emnlp2016.net/tutorials/venugopal-gogate-ng-t2.pdf
-
good summary of sentiment analysis with NN (Singapore): http://www.emnlp2016.net/tutorials/zhang-vo-t4.pdf
-
structure prediction (POS, NER)(Singapore): http://www.emnlp2016.net/tutorials/sun-feng-t6.pdf
-
BADLS: 2 day conference at Stanford university
day 1:
- Hugo(Twitter): Feed forward NN
- Kartpathy(OpenAI): Convnet
- Socher(MetaMind): NLP = word2vec/glove + GRU + MemNet
- Tensorflow tut: from 5:55:49
- Ruslan: Deep Unsup Learning: from 7:10:39
- Andrew Ng: Nuts and bolts in applied DL from 9:09:46
day 2:
-
Schulman: RL from 06:40
-
Pascal(MILA): theano, from 1:52:03
-
ASR from 4:01:11
-
NN with Torch from 5:49:32, https://github.com/alexbw/bayarea-dl-summerschool
-
seq2seq learning, Quoc Le: from 7:03:44
-
Bengio: Foundations and challenges in DL, from 9:01:14
-
data fest: https://alexanderdyakonov.wordpress.com/
-
8,9,12,13 Sept: data science week: http://dsw2016.datascienceweek.com/
-
KDD 2016: http://www.kdd.org/kdd2016/
-
ACL 2016, Berlin, 7-12 Aug: http://acl2016.org/index.php?article_id=60
AI mistakes:
- napalm girl: https://techcrunch.com/2016/09/12/facebook-employees-say-deleting-napalm-girl-photo-was-a-mistake/
- fine for his car shadow: http://www.independent.co.uk/news/world/europe/russian-driver-fined-car-shadow-moscow-a7225146.html
- human on motorcycle: http://cs.stanford.edu/people/karpathy/deepimagesent/generationdemo/
Keras:
- image classification with vgg16: http://www.pyimagesearch.com/2016/08/10/imagenet-classification-with-python-and-keras/
- hualos, keras viz: https://github.com/fchollet/hualos
- https://github.com/dylandrover/keras_tutorial/blob/master/keras_tutorial/keras_deck.pdf
- https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/wide_n_deep_tutorial.py
- model zoo:https://github.com/tensorflow/models
- music auto tag: https://github.com/keunwoochoi/music-auto_tagging-keras
- expose API: https://github.com/samjabrahams/inception-resnet-flask-demo
NLP:
- https://github.com/attardi/deepnl
- https://github.com/biplab-iitb/practNLPTools
- http://ml.nec-labs.com/senna/
- LSTM + CNN char on NER: https://transacl.org/ojs/index.php/tacl/article/viewFile/792/202
- https://metamind.io/research/the-wikitext-long-term-dependency-language-modeling-dataset/
Apps:
- https://github.com/fginter/w2v_demo
- http://bionlp-www.utu.fi/wv_demo/
- 3top: https://github.com/3Top/word2vec-api
- next wave of nn: http://www.nextplatform.com/2016/09/14/next-wave-deep-learning-applications/
- labeling tools: http://cs.stanford.edu/people/karpathy/ilsvrc/
- deep art: https://deepart.io/hire/kzXhuUPf/
- text sum: http://esapi.intellexer.com/Summarizer
- http://www.deeplearningpatterns.com/doku.php/applications
- mt: http://104.131.78.120/
- rnn: http://www.cs.toronto.edu/~ilya/fourth.cgi?prefix=I+have+a+dream.+&numChars=150
- chatbot: http://sumve.com/firesidechat/
- text vis: http://slanglab.cs.umass.edu/topic-animator/
- music auto tag: https://github.com/keunwoochoi/music-auto_tagging-keras
- deep image sent: http://cs.stanford.edu/people/karpathy/deepimagesent/rankingdemo/
German word embedding:
- pretrained: http://devmount.github.io/GermanWordEmbeddings/
- vis: pca, tsne: https://github.com/devmount/GermanWordEmbeddings/blob/master/code/pca.ipynb
PyGotham:
- textacy: http://michelleful.github.io/code-blog/2016/07/23/nlp-at-pygotham-2016/
- nlp with keras, rnn, cnn
- https://github.com/drincruz/PyGotham-2016
- skipthought: https://libraries.io/github/LeavesBreathe/Sequence-To-Sequence-Generation-Skip-Thoughts-
- https://github.com/ryankiros/skip-thoughts
- doc sum: http://mike.place/talks/pygotham/#p1
Journalist LDA and ML:
- http://knightlab.northwestern.edu/2015/03/10/nicar-2015-machine-learning-lessons-for-journalists/
- summary on hanna wallach https://docs.google.com/document/d/1kIIzBAF9T9Zu99i0DU9akIajvYZ-CfHeBFVBhIJyEY8/edit?pref=2&pli=1
- http://www.cs.ubc.ca/~murphyk/MLbook/pml-toc-22may12.pdf
- http://slides.com/stevenrich/machine-learning#/18
- https://github.com/cjdd3b/nicar2015/tree/master/machine-learning
- https://github.com/cjdd3b/fec-standardizer
Europython:
- http://kjamistan.com/i-hate-you-nlp/
- https://github.com/adewes/machine-learning-chinese
- https://github.com/GaelVaroquaux/my_topics
- https://github.com/arnicas/nlp_elasticsearch_reviews
Scipy 2016:
Performance Evaluation(PE):
- book ELA: http://www.cambridge.org/us/academic/subjects/computer-science/pattern-recognition-and-machine-learning/evaluating-learning-algorithms-classification-perspective
- slides: http://www.icmla-conference.org/icmla11/PE_Tutorial.pdf
- bayesian hypothesis testing: http://ipg.idsia.ch/preprints/corani2015c.pdf
Hypothesis testing
- http://bebi103.caltech.edu/2015/tutorials/t6b_frequentist_hypothesis_testing.html
- central limit theorem: http://nbviewer.jupyter.org/github/mbakker7/exploratory_computing_with_python/blob/master/notebook_s3/py_exp_comp_s3_sol.ipynb
- hypothesis testing and p value: http://vietsciences.free.fr/khaocuu/nguyenvantuan/bieudoR/ch7-kiemdinhgiathiet.htm
Metrics:
Rock, Metal and NLP:
- http://www.deepmetal.io/
- https://github.com/ijmbarr/metal_models
- http://www.degeneratestate.org/posts/2016/Sep/12/heavy-metal-and-natural-language-processing-part-2/
- http://www.degeneratestate.org/posts/2016/Apr/20/heavy-metal-and-natural-language-processing-part-1/
Financial:
Twitter:
- http://nlp.stanford.edu/projects/glove/preprocess-twitter.rb
- GATE NER dataset: https://gate.ac.uk/wiki/broad-twitter-corpus.html
Deep Learning Frameworks/Toolkits:
- Tensorflow
- Torch
- Theano
- Keras
- Dynet
- CNTK
ElasticSearch + Kibana:
- install ES 2.4 + Kibana: default sense in console 5601
- http://ghostweather.slides.com/lynncherny/deck
Attention based:
- code RWA in TF: https://github.com/jostmey/rwa
- decomposable attention: https://github.com/explosion/spaCy/tree/master/examples/keras_parikh_entailment
- customized lstm with attention: http://benjaminbolte.com/blog/2016/keras-language-modeling.html
- vis + cnn + lstm: https://blog.heuritech.com/2016/01/20/attention-mechanism/
ResNet: Residual Networks
- http://yanran.li/peppypapers/2016/01/10/highway-networks-and-deep-residual-networks.html
- how deep Vgg 16,19 vs 152 200 layers: https://www.reddit.com/r/MachineLearning/comments/4cmcfs/how_can_resnet_cnn_go_deep_to_152_layers_and_200/
- http://www.slideshare.net/Textkernel/practical-deep-learning-for-nlp
Sentiment
- dataset: 1.6M: https://docs.google.com/uc?id=0B04GJPshIjmPRnZManQwWEdTZjg&export=download
- quandl: https://github.com/kszela24/options-daily
- stocktwit: http://stocktwits.com/symbol/FINL
- https://github.com/jssandh2/Stock_Search_Engine
- https://www.quantopian.com/posts/crowd-sourced-stock-sentiment-using-stocktwits
- https://www.crowdflower.com/data-for-everyone/
NER
- https://github.com/aleju/ner-crf
- 2017 conference: http://noisy-text.github.io/2017/
- demo: http://nlp.stanford.edu:8080/ner/process
- ritter: https://www.cise.ufl.edu/class/cis6930fa11lad/cis6930fa11_NEROverTweets.pdf
- cmu tweetnlp: http://www.cs.cmu.edu/~ark/TweetNLP/
- opencalais: http://www.opencalais.com/opencalais-demo/
- https://www.quora.com/How-can-I-find-city-country-company-name-from-a-tweet-text-using-Java
- no broad domain, average accuracy 80-85% is quite good: https://www.quora.com/How-accurate-are-entity-extraction-tools
- http://blog.districtdatalabs.com/named-entity-recognition-and-classification-for-entity-extraction
- http://noisy-text.github.io/2016/ner-shared-task.html
- https://noisy-text.github.io/2016/pdf/WNUT26.pdf
- dataset: https://www.dropbox.com/s/yaoy7zi9vz71nki/wnut_ner_evaluation.tgz?dl=0
- wnut solution: https://github.com/napsternxg/TwitterNER
- dataset wnut16: https://github.com/aritter/twitter_nlp/tree/master/data/annotated/wnut16/data
ML Stacking
Tensorflow tutorials
Covariate shift
- https://www.quora.com/What-is-Covariate-shift
- https://blog.bigml.com/2013/11/01/machine-learning-next/
- https://blog.bigml.com/2013/03/12/machine-learning-from-streaming-data-two-problems-two-solutions-two-concerns-and-two-lessons/
#PydataLondon2017
- https://pydata.org/london2017/schedule/presentation/12/
- https://pydata.org/london2017/schedule/presentation/20/
- https://pydata.org/london2017/schedule/presentation/34/
- https://pydata.org/london2017/schedule/presentation/17/
- https://pydata.org/london2017/schedule/presentation/47/
- https://pydata.org/london2017/schedule/presentation/16/
- https://pydata.org/london2017/schedule/presentation/52/
- https://pydata.org/london2017/schedule/presentation/22/
- https://pydata.org/london2017/schedule/presentation/30/
- https://pydata.org/london2017/schedule/presentation/23/
- https://pydata.org/london2017/schedule/presentation/69/
NLP course
Dataset
Tricks of DL
- https://engineering.purdue.edu/~qobi/papers/ad2016d.pdf
- practical DL: http://www.deeplearningbook.org/slides/11_practical.pdf
- tuning cnn: http://lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html
- https://github.com/Conchylicultor/Deep-Learning-Tricks
- https://cs224d.stanford.edu/lectures/CS224d-Lecture6.pdf
- http://karpathy.github.io/neuralnets/
- http://www.deeplearningbook.org/slides/11_practical.pdf
Pointer network
- http://fastml.com/introduction-to-pointer-networks/
- keras: https://github.com/zygmuntz/pointer-networks-experiments
- https://arxiv.org/pdf/1511.06391v4.pdf
- https://www.slideshare.net/KeonKim/attention-mechanisms-with-tensorflow
Attention
Log likelihood test
- tool http://ucrel.lancs.ac.uk/llwizard.html
- significance testing of word frequency in corpora: https://users.ics.aalto.fi/lijffijt/articles/lijffijt2015a.pdf
- TA and TM for social: https://de.dariah.eu/tatom/
- http://sappingattention.blogspot.com/2011/10/comparing-corpuses-by-word-use.html#comments
- http://sappingattention.blogspot.com/2011/11/dunning-amok.html
- https://tedunderwood.com/2011/11/09/identifying-the-terms-that-characterize-an-author-or-genre-why-dunnings-may-not-be-the-best-method/
MLtrainings.ru
- quora presentation: https://gh.mltrainings.ru/presentations/Skornyakov_KaggleQuora_2017.pdf
- hearthstone: https://gh.mltrainings.ru/presentations/Patekha_Hearthstone_2017.pdf
GCloud
- http://www.albertauyeung.com/post/setup-jupyter-nginx-supervisor/
- https://medium.com/google-cloud/running-jupyter-notebooks-on-gpu-on-google-cloud-d44f57d22dbd
Current conference
https://github.com/aymericdamien/TensorFlow-Examples
Timeline
- kaggle in Russian: https://boosters.pro/champs
- https://github.com/mariazm/Spring2017_ProfFosterProvost/tree/master/Module8_Unsupervised_MLreview
- https://github.com/johnpateha/ml_hacks/blob/master/dj_explore_algoparameters.ipynb
WSDM 2019
- https://sites.google.com/view/wsdm19-fairness-tutorial
- https://causalinference.gitlab.io/wsdm-tutorial/
- https://sites.google.com/view/wsdm19-privacy-tutorial
- https://www.slideshare.net/TetsuyaSakai/wsdm2019tutorial
- https://arxiv.org/abs/1808.05163
- https://www.google.com/maps/@-37.8067424,144.9921405,13z/data=!3m1!4b1!4m3!11m2!2sn0Jgpeo5HjLhS61R5hCfiUgIaOhuHQ!3e3
Computer Vision
- http://slazebni.cs.illinois.edu/spring17/
- https://skymind.ai/wiki/convolutional-network
- https://medium.com/@jonathan_hui/what-do-we-learn-from-single-shot-object-detectors-ssd-yolo-fpn-focal-loss-3888677c5f4d
- https://towardsdatascience.com/faster-r-cnn-object-detection-implemented-by-keras-for-custom-data-from-googles-open-images-125f62b9141a
- https://medium.com/@jonathan_hui/design-choices-lessons-learned-and-trends-for-object-detections-4f48b59ec5ff
- https://medium.com/@jonathan_hui/what-do-we-learn-from-single-shot-object-detectors-ssd-yolo-fpn-focal-loss-3888677c5f4d
- https://github.com/akTwelve/tutorials/blob/master/mask_rcnn/MaskRCNN_TrainAndInference.ipynb
- https://github.com/RockyXu66/Faster_RCNN_for_Open_Images_Dataset_Keras/blob/master/frcnn_train_vgg.ipynb
- https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
- https://www.superdatascience.com/blogs/the-ultimate-guide-to-convolutional-neural-networks-cnn
- https://skymind.ai/wiki/convolutional-network
- https://yerevann.com/a-guide-to-deep-learning/
- https://towardsdatascience.com/faster-r-cnn-object-detection-implemented-by-keras-for-custom-data-from-googles-open-images-125f62b9141a
- https://towardsdatascience.com/facial-keypoint-detection-detect-relevant-features-of-face-in-a-go-using-cnn-your-own-dataset-e09cf359c2bc
ICCV 2019
07.10
- https://stackoverflow.com/questions/42307949/color-theme-for-vs-code-integrated-terminal/46166487
- https://github.com/zhulingchen/tfp-tutorial
- tf2 keras for researcher: https://colab.research.google.com/drive/1UCJt8EYjlzCs1H1d1X0iDGYJsHKwu-NO
- visualizing outliers in big data: https://www.cs.uic.edu/~wilkinson/Publications/outliers.pdf
13.06
- https://github.com/tmbdev/ocropy
- https://github.com/keras-team/keras/blob/master/examples/image_ocr.py
04.06
18.05
17.05
- https://handong1587.github.io/deep_learning/2015/10/09/object-detection.html
- https://github.com/experiencor/keras-yolo3
- https://github.com/Adamdad/keras-YOLOv3-mobilenet
- https://arxiv.org/pdf/1805.02283.pdf
- https://github.com/seasonSH/DocFace/tree/master/src
- https://habr.com/ru/company/avito/blog/452142/
- https://towardsdatascience.com/one-shot-learning-with-siamese-networks-using-keras-17f34e75bb3d
14.05
- https://medium.com/@zhanwenchen/install-cuda-10-1-and-cudnn-7-5-0-for-pytorch-on-ubuntu-18-04-lts-9b6124c44cc
- https://stackoverflow.com/questions/43214346/split-queue-into-train-test-set
13.05
- https://www.sites.google.com/site/yorkyuhuang/home/tutorial/deep-learning-1/objectdetectiontrackingrecognition-with-deep-learning
- deepsystems ctc loss: https://www.youtube.com/watch?v=eYIL4TMAeRI
- https://github.com/linkedin/TonY/blob/master/tony-examples/tony-in-gcp/scripts/install_gpu_cu10.sh
08.05
07.05
- https://towardsdatascience.com/build-a-handwritten-text-recognition-system-using-tensorflow-2326a3487cd5
- https://github.com/Hyperparticle/one-pixel-attack-keras
- https://github.com/sozykin/dlpython_course/blob/master/computer_vision/foto_comparison/foto_verification.ipynb
- http://mostafadehghani.com/2019/05/05/universal-transformers/
- https://blogs.dropbox.com/tech/2017/04/creating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning/
03.05
- tf.dataset https://www.youtube.com/watch?v=kVEOCfBy9uY&feature=youtu.be
- https://github.com/SkalskiP/ILearnDeepLearning.py/blob/master/01_mysteries_of_neural_networks/06_numpy_convolutional_neural_net_IN_PROGRESS/Building%20convolutional%20neural%20network%20in%20Numpy.ipynb
28.04
24.04
- https://medium.com/@philosophygeek/selling-data-products-is-the-wrong-business-model-for-ai-startups-300835c4eb92
- https://simplystatistics.org/2019/04/17/tukey-design-thinking-and-better-questions/
- https://www.youtube.com/watch?v=qFtJaq4TlqE&feature=youtu.be
- https://medium.com/nybles/create-your-first-image-recognition-classifier-using-cnn-keras-and-tensorflow-backend-6eaab98d14dd
19.04
- https://github.com/jingw222/tf2-serving-w-docker/blob/master/serving_w_docker.ipynb
- https://medium.com/@jingw222/tensorflow-serving-with-docker-an-end-to-end-example-24b412e31ae1
- https://bitbucket.org/pbcquoc/ocr/src/64e6eb1d0e63?at=master
- https://speakerdeck.com/alexkimxyz/monitoring-ml-applications-in-production
10.04
- http://d2l.ai/chapter_introduction/intro.html#A-Motivating-Example
- pytorch tuts 10K+: https://github.com/yunjey/pytorch-tutorial
- tf v2 tuts: https://github.com/aymericdamien/TensorFlow-Examples/tree/master/tensorflow_v2
- catboost lecture: https://compscicenter.ru/media/courses/2018-spring/spb-machine-learning-2/slides/machine_learning_2_lecture_260218.pdf
- https://habr.com/ru/post/447376/
09.04
- https://towardsdatascience.com/which-deep-learning-framework-is-growing-fastest-3f77f14aa318
- https://threader.app/thread/1105139360226140160
08.04
- https://hurenjun.github.io/
- beam search: https://www.coursera.org/lecture/nlp-sequence-models/beam-search-4EtHZ
- joint embedding for transportation: https://hurenjun.github.io/pubs/aaai2019-slides.pdf
- embedding for anomaly detection: https://hurenjun.github.io/pubs/icde2016-slides.pdf
05.04
- https://towardsdatascience.com/3-ways-to-load-csv-files-into-colab-7c14fcbdcb92
- https://github.com/kaiwaehner/python-jupyter-apache-kafka-ksql-tensorflow-keras
- https://www.kaggle.com/mlg-ulb/creditcardfraud/kernels
- https://jakevdp.github.io/blog/2015/07/23/learning-seattles-work-habits-from-bicycle-counts/
03.04
- https://slides.com/vladimiriglovikov/title-texttitle-text-17#/0/25
- http://gameaibook.org/book.pdf
- https://services.google.com/fh/files/blogs/insights_for_evaluating_lifetime_value_for_game_developers.pdf
01.04
- https://berkeley-deep-learning.github.io/cs294-131-s19/
- https://www.technologyreview.com/s/613170/emtech-digital-dawn-song-adversarial-machine-learning/
31.03
- https://blog.ml.cmu.edu/2019/03/29/building-machine-learning-models-via-comparisons/
- https://pmbaumgartner.github.io/notebooks/colored-roc-curves/
- http://dsd.future-lab.cn/members/2015nlp/readings/rW_IS_AIsHallofFame.pdf
- http://dsd.future-lab.cn/members/2015nlp/nature482.pdf
30.03
- https://www.usenix.org/conference/enigma2017/conference-program/presentation/evans
- http://web.stanford.edu/class/cs224n/index.html#coursework
- https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9
- https://www.usenix.org/conference/enigma2017/conference-program/presentation/evans
- https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes
29.03
28.03
- https://www.analyticsvidhya.com/blog/2018/07/introductory-guide-maximum-likelihood-estimation-case-study-r/
- https://scholarspace.manoa.hawaii.edu/bitstream/10125/50002/1/paper0115.pdf
- https://medium.freecodecamp.org/keras-vs-pytorch-avp-transfer-learning-c8b852c31f02
21.03
20.03
- https://drive.google.com/file/d/1idTS63oXT1jBUNm_qH9fke0VDGihM7ir/view
- gan https://github.com/Dyakonov/DL/blob/master/AMD_DL09gan_17.pdf
14.03
- https://www.math3ma.com/blog/matrices-probability-graphs
- https://explained.ai/rf-importance/index.html
- http://julian.togelius.com/Drachen2013Game.pdf
11.03
07.03
- https://medium.com/tensorflow/recap-of-the-2019-tensorflow-dev-summit-1b5ede42da8d
- http://gltr.io/dist/index.html
- https://www.amazon.com/Analytics-Descriptive-Predictive-Network-Techniques/dp/1119133122
06.03
- https://www.youtube.com/channel/UCZ_qlZbg9EzwRnLq_hFQumQ/featured?app=desktop
- https://www.slideshare.net/albedan/kaggle-days-paris-alberto-danese-ml-interpretability
- xgboost from 0: https://www.youtube.com/watch?v=0hxX4XAf2DA
- kdd2016 recsys ctr field awared https://www.youtube.com/watch?v=1cRGpDXTJC8
01.03
- https://github.com/lexfridman/mit-deep-learning
- https://en.wikipedia.org/wiki/No_free_lunch_theorem
- https://en.wikipedia.org/wiki/Sunk_cost
- https://en.wikipedia.org/wiki/Reinforcement_learning
- https://dyakonov.org/2019/02/21/%D0%BD%D0%B5%D0%BC%D0%B0%D1%82%D0%B5%D0%BC%D0%B0%D1%82%D0%B8%D0%BA%D0%B0-%D0%B2-%D0%B0%D0%BD%D0%B0%D0%BB%D0%B8%D0%B7%D0%B5-%D0%B4%D0%B0%D0%BD%D0%BD%D1%8B%D1%85/
21.02
- https://boosters.pro/championships
- machine learns physic laws. https://arxiv.org/abs/1807.10300
- https://istina.msu.ru/media/publications/article/972/9eb/7537819/sw-factors-dyakonov.pdf
20.02
19.02
- http://deliprao.com/archives/314
- https://console.cloud.google.com/storage/browser/commonsense-reasoning/reproduce/stories_corpus?pli=1
13.02
- https://github.com/omarsar/nlp_highlights/blob/master/NLP_2018_Highlights.pdf
- https://hbr.org/2019/02/companies-are-failing-in-their-efforts-to-become-data-driven
- https://www.nytimes.com/2019/02/05/business/media/artificial-intelligence-journalism-robots.html
12.02
- https://github.com/google/sentencepiece
- https://www.youtube.com/watch?v=0EtD5ybnh_s
- https://aws.agorize.com/en/challenges/vietnam-2019
- https://nlp.stanford.edu/seminar/details/jdevlin.pdf
- https://www.lyrn.ai/2019/02/11/xlm-cross-lingual-language-model/
11.02
09.02
- https://github.com/DiligentPanda/Tencent_Ads_Algo_2018
- https://github.com/Dyakonov/ml_hacks/blob/master/dj_MLDM_kernels.ipynb
03.02
- https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models
- http://iranarze.ir/wp-content/uploads/2016/10/E2281.pdf
24.01
21.01
- http://www.econ.upf.edu/~michael/stanford/maeb6.pdf
- http://www.econ.upf.edu/~michael/stanford/maeb4.pdf
- http://www.econ.upf.edu/~michael/stanford/maeb5.pdf
18.01
- kids learn and acquire language using statistic learning. Chomsky school. https://www.youtube.com/watch?v=uSFPgDuyv6E
- bootstrap with pitfalls: https://arxiv.org/pdf/1411.5279.pdf
- categorial data analysis: https://www.youtube.com/watch?v=FCrYGuO8CmU
- humbio: https://www.ted.com/talks/robert_sapolsky_the_biology_of_our_best_and_worst_selves?language=en
16.01
- https://machinelearningforkids.co.uk/
- www.quantamagazine.org/been-kim-is-building-a-translator-for-artificial-intelligence-20190110
- https://ai.googleblog.com/2019/01/looking-back-at-googles-research.html
- hbr.org/2019/01/data-science-and-the-art-of-persuasion
14.01
- https://www.datasciencecentral.com/m/blogpost?id=6448529:BlogPost:791619
- https://www.datasciencecentral.com/profiles/blogs/how-to-choose-fraud-detection-software-features-characteristics
- https://learnk8s.io/blog/scaling-machine-learning-with-kubeflow-tensorflow
- https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/
- https://medium.com/acing-ai/capital-one-data-science-interview-questions-b6263d8a3af6
- https://github.com/FunctorML/BellkorAlgorithm
- https://blogs.mathworks.com/loren/2015/04/22/the-netflix-prize-and-production-machine-learning-systems-an-insider-look/
03.01
- https://github.com/Erlemar/digit-draw-recognize
- https://medium.com/analytics-and-data/on-customer-lifetime-value-in-ecommerce-d3c151c6fdc0
- http://blog.kaggle.com/2017/01/23/a-kaggle-master-explains-gradient-boosting/
- http://mattturck.com/bigdata2018
02.01
- startup genome: https://s3.amazonaws.com/startupcompass-public/StartupGenomeReport2_Why_Startups_Fail_v2.pdf
- https://www.amazon.com/gp/product/0470650931
- https://peadarcoyle.com/2019/01/01/think-you-need-to-learn-bayesian-analysis-read-this-first/
- https://inst.eecs.berkeley.edu/~cs188/fa18/
- https://www.jmp.com/en_us/academic/data-mining-techniques.html
- birthday effect: https://dyakonov.org/2016/11/28/%D0%B4%D0%B5%D0%BD%D1%8C-%D0%BD%D0%B0%D1%88%D0%B5%D0%B9-%D1%81%D0%BC%D0%B5%D1%80%D1%82%D0%B8/
===== GOODBYE 2018
29.12
- https://preferred.ai/category/education/
- http://www.ousia.jp/en/page/en/2017/02/20/wsdm-cup/
- https://medium.com/@bryan.gregory1/predicting-customer-churn-extreme-gradient-boosting-with-temporal-data-332c0d9f32bf
25.12
22.12
- https://qiita.com/namakemono/items/f9574fe0a6b7ebb91e73
- https://github.com/ShuaiW/kaggle-classification/
- http://www.chioka.in/kaggle-competition-solutions/
- https://github.com/Far0n/kaggletils
- https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/
- https://github.com/rushter/heamy
- https://scikit-learn.org/stable/auto_examples/ensemble/plot_feature_transformation.html
- https://research.fb.com/publications/practical-lessons-from-predicting-clicks-on-ads-at-facebook/
- https://github.com/iamtodor/data-science-interview-questions-and-answers
20.12
- https://www.dgsiegel.net/talks/the-bullet-hole-misconception
- https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions-asked-at-startups-in-machine-learning-data-science/
- https://github.com/jessevig/bertviz
19.12
- https://static.googleusercontent.com/media/research.google.com/en//bigpicture/ML_Visualization_NeurIPS_Tutorial.pdf
- https://www.facebook.com/nipsfoundation/videos/203530960558001/
- https://www.slideshare.net/BryanGregory2/kaggle-wsdm-2018-winning-solution-predicting-customer-churn-xgboost-with-temporal-data-87662268
18.12
- http://www.ruiyan.me/pubs/tutorial-emnlp18.pdf
- http://newsletter.ruder.io/issues/neurips-2018-the-nature-of-research-advances-in-image-generation-protein-folding-and-rl-144756
17.12
12.12
- https://towardsdatascience.com/in-browser-object-detection-using-yolo-and-tensorflow-js-d2a2b7429f7c
- https://github.com/SkalskiP/ILearnMachineLearning.py
10.12
- https://github.com/tensorflow/models/tree/master/research/cvt_text
- https://lilianweng.github.io/lil-log/
- https://github.com/zalandoresearch/flair
09.12
- https://medium.com/@kcimc/how-to-recognize-fake-ai-generated-images-4d1f6f9a2842
- https://machinethoughts.wordpress.com/2017/09/01/deep-meaning-beyond-thought-vectors/
- https://arxiv.org/pdf/1809.04559.pdf
07.12
- https://www.stateoftheart.ai/?area=Computer%20Vision
- https://adversarial-ml-tutorial.org/
- https://www.kaggle.com/kernels.json?sortBy=hotness&group=everyone&pageSize=200
06.12
04.12
- https://machinelearning.apple.com/2018/12/03/optimizing-siri-on-homepod-in-far-field-settings.html
- https://seeing-theory.brown.edu/basic-probability/index.html
- https://jalammar.github.io/illustrated-bert/
02.12
- https://medium.com/@kt.era.ee/the-data-science-workflow-43859db0415
- https://colab.research.google.com/drive/1lEu7qNBMSIm2g7YfBhgug7IAB6Rw4b5E
01.12
29.11
26.11
- transform net for target sentiment analysis: https://ai.tencent.com/ailab/media/publications/acl/Transformation_Networks_for_Target-Oriented_Sentiment_Classification.pdf
- https://lixin4ever.github.io/paper/ACL2018/slides/acl18_lixin_slides.pdf
BERT with <3
- https://github.com/facebookresearch/XNLI
- https://hanxiao.github.io/2018/06/24/4-Encoding-Blocks-You-Need-to-Know-Besides-LSTM-RNN-in-Tensorflow/
- https://github.com/google-research/bert#pre-trained-models
- https://github.com/hanxiao/bert-as-service#q-what-is-the-parallel-processing-model-behind-the-scene
20.11
- https://ai.tencent.com/ailab/Transformation_Networks_for_Target-Oriented_Sentiment_Classification.html
- https://github.com/alicezheng/feature-engineering-book
15.11
- https://medium.com/analytics-vidhya/python-libraries-for-data-science-other-than-pandas-and-numpy-95da30568fad
- https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CIKM14_tutorial_HeGaoDeng.pdf
14.11
- vietnamese ner: https://github.com/duongna21/VNsequencelabeling
- pzad data preprocessing: https://github.com/Dyakonov/PZAD/blob/master/PZAD2018_09_datapreprocessing_15.pdf
- https://medium.com/acing-ai/what-is-hidden-in-the-hidden-markov-models-eee7bab45ac3
13.11
- http://ruder.io/optimizing-gradient-descent/
- dont decay lr, double your batch size: https://arxiv.org/abs/1711.00489
12.11
- https://www.youtube.com/watch?v=uvH1zB7qahI
- https://www.youtube.com/watch?v=6n-kCYn0zxU
- https://github.com/Featuretools/predicting-customer-churn/blob/master/churn/4.%20Feature%20Engineering%20on%20Spark.ipynb
- https://demo.ipavlov.ai/
- https://towardsdatascience.com/how-to-create-value-with-machine-learning-eb09585b332e
- https://github.com/Featuretools/predicting-customer-churn
- https://arxiv.org/pdf/1810.09591.pdf
10.11
- deep learning in airbnb search: https://arxiv.org/pdf/1810.09591.pdf
- https://www.youtube.com/watch?v=FmKU-1LZGoE
08.11
- https://github.com/hse-aml
- http://web.stanford.edu/class/cs20si/lectures/slides_13.pdf
- https://www.kaggle.com/sudalairajkumar/a-look-at-different-embeddings
- https://github.com/pjankiewicz/mercari-solution/blob/master/mercari/transformers.py
07.11
- https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
- https://towardsdatascience.com/my-weaknesses-as-a-data-scientist-1310dab9f566
- https://jhu-advdatasci.github.io/2018/lectures/12-being-skeptical.html
- https://chauff.github.io/2018-11-04-emnlp/
06.11
04.11
01.11
- http://yowconference.com.au/slides/yowdata2017/Hougland-SparkMLWorkflows.pdf
- https://github.com/rjurney/Agile_Data_Code_2
29.10
25.10
23.10
- https://www.kdnuggets.com/2018/05/deep-learning-apache-spark-part-2.html/2
- https://www.kaggle.com/c/pzadbabki/discussion
- https://www.kaggle.com/c/pubg-finish-placement-prediction
18.10
- https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf
- https://github.com/hse-aml/natural-language-processing
16.10
- https://gh.mltrainings.ru/presentations/Kulagin_Dbrain_2018.pdf
- https://gh.mltrainings.ru/presentations/Kuzin_DLCompetitionsStory_2018.pdf
- https://gh.mltrainings.ru/presentations/Kayumov_DSCompetitions_2018.pdf
10.10
- elmo at apple: https://machinelearning.apple.com/2018/09/27/can-global-semantic-context-improve-neural-language-models.html
- https://github.com/MorvanZhou/Tensorflow-Tutorial
- expose blackbox: https://github.com/tsterbak/pydata2018-amsterdam/blob/master/presentation.ipynb
- elmo with keras: https://github.com/UKPLab/elmo-bilstm-cnn-crf
09.10
08.10
- https://github.com/YuyangZhangFTD/awesome-RecSys-papers
- https://github.com/MorvanZhou
- https://github.com/SkalskiP/ILearnDeepLearning.py
- https://ml.informatik.uni-freiburg.de/papers/18-AUTOML-AutoChallenge.pdf
03.10
- https://www.onlinemathtraining.com/wp-content/uploads/2016/04/Math-for-Machine-Learning-Book-Preview.pdf
- http://leananalyticsbook.com/wp-content/uploads/2013/01/Analytics-Lessons-Learned.pdf
- https://blogs.rstudio.com/tensorflow/posts/2018-09-26-embeddings-recommender/
02.10
- CVTraining better then ELMO? https://arxiv.org/abs/1809.08370
- https://machinelearning.apple.com/2018/09/27/can-global-semantic-context-improve-neural-language-models.html
29.09
27.09
- https://databricks.com/blog/2015/06/02/statistical-and-mathematical-functions-with-dataframes-in-spark.html
- https://blogs.rstudio.com/tensorflow/posts/2018-09-26-embeddings-recommender/
- https://medium.com/feature-labs-engineering/featuretools-on-spark-e5aa67eaf807
26.09
- https://christophm.github.io/interpretable-ml-book/index.html
- https://github.com/roamanalytics/roamresearch/blob/master/BlogPosts/Categorical_variables_in_tree_models/categorical_variables_post.ipynb
- https://goku.me/blog/EHR?utm_campaign=Data_Elixir
25.09
- http://datajournalismhandbook.org/1.0/en/index.html
- https://opinionator.blogs.nytimes.com/2010/04/25/chances-are/
- https://www2.cs.duke.edu/courses/spring15/compsci216/lectures/04-stats.pdf
24.09
- ranksums check correlated features: https://www.kaggle.com/aantonova/797-lgbm-and-bayesian-optimization
- https://www.inovex.de/fileadmin/files/Vortraege/2018/bridging-the-gap-from-data-science-to-production-europython2018-wilhelm.pdf
- https://github.com/Santosh-Gupta/Research2Vec
- https://towardsdatascience.com/elmo-embeddings-in-keras-with-tensorflow-hub-7eb6f0145440
- https://sites.google.com/a/ucsc.edu/krumholz/teaching-and-courses/ast119_w15/class-10
- http://hanj.cs.illinois.edu/cs412/bk3/08.pdf
- https://roamanalytics.com/2016/10/28/are-categorical-variables-getting-lost-in-your-random-forests/
21.09
- https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45189.pdf
- https://towardsdatascience.com/elmo-embeddings-in-keras-with-tensorflow-hub-7eb6f0145440
- https://github.com/deepmipt/DeepPavlov
20.09
- http://hanj.cs.illinois.edu/cs412/bk3/08.pdf
- https://algorithms-tour.stitchfix.com/#data-platform
- https://githubengineering.com/towards-natural-language-semantic-code-search/
- https://www.svds.com/pivoting-data-in-sparksql/
- https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
- https://kagglerank.azurewebsites.net/
- http://www.pkbigdata.com/common/cmpt/2018%E7%A7%91%E5%A4%A7%E8%AE%AF%E9%A3%9EAI%E8%90%A5%E9%94%80%E7%AE%97%E6%B3%95%E5%A4%A7%E8%B5%9B_%E8%B5%9B%E4%BD%93%E4%B8%8E%E6%95%B0%E6%8D%AE.html
19.09
18.09
- https://www.kaggle.com/ogrellier/feature-selection-with-null-importances
- https://www.kaggle.com/aantonova/797-lgbm-and-bayesian-optimization
16.09
- https://onnx.ai/
- https://medium.com/@srnghn/the-mathematics-of-decision-trees-random-forest-and-feature-importance-in-scikit-learn-and-spark-f2861df67e3
- https://roamanalytics.com/2016/10/28/are-categorical-variables-getting-lost-in-your-random-forests/
- https://gist.github.com/rnowling/fa6f1007e3547c75f8b2
13.09
- https://towardsdatascience.com/simtext-2nd-solution-for-cikm-analyticup-2018-b3347e026e67
- http://blog.madhukaraphatak.com/spark-vector-to-numpy/
- https://github.com/zziz/pwc
11.09
- http://www.ams.org/notices/199502/golubitsky.pdf
- https://nbviewer.jupyter.org/github/lazarusA/CodeSnippets/blob/master/CodeSnippetsPython/SymmetricChaos.ipynb
08.09
- hyperbolic RS: https://arxiv.org/pdf/1809.01703.pdf
07.09
04.09
- life metaphor: noise vs signal: https://www.johndcook.com/blog/2013/10/28/remove-noise-remove-signal/
- erf and cdf (normal) https://www.johndcook.com/erf_and_normal_cdf.pdf
- https://medium.com/data-science-school/practical-apache-spark-in-10-minutes-part-6-graphx-9cc953afa487
- best paper kdd: https://medium.com/syncedreview/kdd-2018-announces-best-paper-other-awards-4835ab8475a4
- https://habr.com/company/eastbanctech/blog/422093/
- https://github.com/GINK03/kaggle-dae
- https://www.business-school.ed.ac.uk/crc/wp-content/uploads/sites/55/2017/02/Credit-Scoring-and-the-Optimization-Concerning-Area-Under-the-Curve-Anne-Kraus-and-Helmut-K%C3%BCchenhoff.pdf
28.08
- kdd wrap up: https://habr.com/company/mailru/blog/421041/
- bayesian reasoning: https://github.com/bayesgroup/deepbayes-2018/blob/master/day1_bayesian-reasoning/presentation.pdf
27.08
- normality test with kurtosis: http://www.columbia.edu/~ld208/psymeth97.pdf
- botanical prime: https://www.c82.net/work/?id=352
- https://glowingpython.blogspot.com/2017/04/solving-two-spirals-problem-with-keras.html
- http://bit.ly/beautifulObjectJupyterCon
- https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/edit#slide=id.g362da58057_0_1
23.08
- https://sites.google.com/view/kdd2018-tutorial/home/slides
- knowledge distillation: https://www.youtube.com/watch?v=lSjBc1wSJMI
- https://docs.google.com/presentation/d/17h_ylV84m_AY6-0uhYxFcI59nsiYoev4nTESUvoFlrA/edit#slide=id.g389fd03f42_0_112
- https://hackernoon.com/towards-ai-how-long-does-it-take-you-to-go-from-idea-to-working-prototype-a-day-a-month-8a03ffecca0a
- http://rsos.royalsocietypublishing.org/content/5/5/171274
22.08
- stats and sport https://statsbylopez.com/276labs/
- cs229 https://stanford.edu/~shervine/teaching/cs-229.html
21.08
- ncsoft blade & soul churn prediction https://arxiv.org/pdf/1802.02301.pdf
- bayesian intro: https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials
20.08
- churn data science game https://arxiv.org/pdf/1802.02301.pdf
- https://speakerdeck.com/teoliphant/ml-in-python?slide=46
- Murphy law: anything that can go wrong will go wrong https://en.wikipedia.org/wiki/Murphy%27s_law
- https://alexanderdyakonov.wordpress.com/2018/07/30/%D0%B1%D0%B0%D0%B9%D0%B5%D1%81%D0%BE%D0%B2%D1%81%D0%BA%D0%B8%D0%B9-%D0%BF%D0%BE%D0%B4%D1%85%D0%BE%D0%B4/
- https://github.com/springcoil/PyDataLondonTutorial/blob/master/notebooks/LogisticRegScikitlearn.ipynb
18.08
- http://brohrer.github.io/how_bayesian_inference_works.html
- https://docs.google.com/presentation/d/1325yenZP_VdHoVj-tU0AnbQUxFwb8Fl1VdyAAUxEzfg/edit#slide=id.p
17.08
16.08
- 3 schools of data http://slides.com/springcoil/
- https://github.com/springcoil/PyDataLondonTutorial/blob/master/notebooks/Statistics.ipynb
- https://github.com/godatadriven/os-training-materials
15.08
- TrueSkill2 : https://www.microsoft.com/en-us/research/uploads/prod/2018/03/trueskill2.pdf
- https://blog.ycombinator.com/learning-math-for-machine-learning/
- https://github.com/tensorflow/model-analysis
- https://anvaka.github.io/greview/hands-on-ml/1/
14.08
- large to small better than small to large: http://koaning.io/variable-selection-in-machine-learning.html
- bayesian is good https://blog.datank.ai/how-i-learned-to-stop-worrying-and-love-uncertainty-fd13c23442b6
- think bayesian: http://www.greenteapress.com/thinkbayes/thinkbayes.pdf
- bayesian for hackers: https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
13.08
- http://cbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_NET.pdf
- Tim is good: http://timvieira.github.io/
- https://imaddabbura.github.io/blog/machine%20learning/data%20science/2018/03/15/predicting-loan-repayment.html
- gumbel max trick: https://arxiv.org/abs/1611.01144
- love uncertainty: https://github.com/arinarmo/love_uncertainty/blob/master/slides.pdf
- Vincent talk: https://www.youtube.com/watch?v=dE5j6NW-Kzg
10.08
- http://datagenetics.com/blog/february52018/index.html
- https://eng.uber.com/cota/
- https://www.youtube.com/watch?v=Q2HLPCBStLQ
08.08
- https://drive.google.com/file/d/14zSllcWPgsARqpF7D6haSF7_M2PFsZ_Y/view
- https://drive.google.com/file/d/13e2bBwpncMshaMy_UKkUno0_T-K-Ott_/view
- https://pycon.sg/news/slides/
07.08
- https://towardsdatascience.com/7-recommendations-for-data-science-leaders-in-the-game-industry-3d82d45746d2
- http://koaning.io/theme/notebooks/deep-ai-stupid.pdf
- https://github.com/louridas/rwa/blob/master/content/notebooks/chapter_01.ipynb
06.08
- http://cs230.stanford.edu/syllabus.html#midterm
- https://www.youtube.com/watch?v=7CcSm0PAr-Y
- http://cs230.stanford.edu/files/Deep%20Learning%20in%20Healthcare.pdf
03.08
- https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd
- https://blog.ycombinator.com/learning-math-for-machine-learning/
- https://chrisyeh96.github.io/2017/08/08/definitive-guide-python-imports.html
- https://www.analyticsvidhya.com/blog/2018/07/infographic-common-mistakes-amateur-data-scientists-make-how-avoid-them
01.08
- https://cdn-images-1.medium.com/max/1400/1*xbNM_CnEIWQtGbsLmZtE-A.gif
- https://github.com/jhfjhfj1/autokeras
30.7
- https://medium.com/activewizards-machine-learning-company/comparison-of-top-6-python-nlp-libraries-c4ce160237eb
- https://github.com/natasha/ipyannotate
27.7
- https://github.com/lancifollia/tinygbt/blob/master/tinygbt.py
- http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture15.pdf
- https://mapr.com/blog/churn-prediction-pyspark-using-mllib-and-ml-packages/
- https://github.com/ChuckWoodraska/EurekaTrees
- https://github.com/Microsoft/LightGBM/blob/master/examples/python-guide/plot_example.py
26.07
- http://people.stat.sc.edu/haigang/improvement.html
- https://docs.databricks.com/spark/latest/mllib/binary-classification-mllib-pipelines.html
- https://www.coursera.org/lecture/machine-learning-applications-big-data/spark-ml-cross-validation-O0uKs
24.07
- https://www.linkedin.com/pulse/beginners-ask-how-many-hidden-layersneurons-use-artificial-ahmed-gad/
- https://towardsdatascience.com/july-edition-text-understanding-adaaff0bbd63
- https://david-abel.github.io/blog/posts/misc/icml_2018.pdf
20.07
- https://towardsdatascience.com/how-to-build-a-data-science-portfolio-5f566517c79c
- https://david-abel.github.io/blog/posts/misc/icml_2018.pdf
- https://drive.google.com/file/d/1Mw6JZ9k0e8ajfiQ8uI-VP2my96DJINr4/view
- http://ecsocman.hse.ru/data/2012/06/06/1271384006/5.pdf
17.07
- https://eng.lyft.com/from-shallow-to-deep-learning-in-fraud-9dafcbcef743
- http://ecsocman.hse.ru/data/2012/06/06/1271384006/5.pdf
15.07
- https://esc.fnwi.uva.nl/thesis/centraal/files/f244841390.pdf
- http://deliprao.com/archives/294
- https://pdfs.semanticscholar.org/584d/1e09f8e3fa359fcd2b9931bfc71d4672de3a.pdf
- https://www.ijcai.org/proceedings/2017/0504.pdf
- https://github.com/jeremyjordan/imbalanced-data/blob/master/Learning%20from%20imbalanced%20data.ipynb
- https://www.jeremyjordan.me/imbalanced-data/
- https://www.jeremyjordan.me/nn-learning-rate/
14.07
- https://github.com/mathurinm/celer
- https://www.slideshare.net/agramfort/icml-2018-reproducible-machine-learning-a-gramfort
11.07
- https://rise.cs.berkeley.edu/blog/pandas-on-ray-early-lessons/
- https://alexanderdyakonov.wordpress.com/2018/06/28/%D0%BF%D1%80%D0%BE%D1%81%D1%82%D1%8B%D0%B5-%D0%BC%D0%B5%D1%82%D0%BE%D0%B4%D1%8B-%D0%B0%D0%BD%D0%B0%D0%BB%D0%B8%D0%B7%D0%B0-%D0%B4%D0%B0%D0%BD%D0%BD%D1%8B%D1%85/
10.07
- https://arxiv.org/pdf/1802.05365.pdf
- https://thegradient.pub/nlp-imagenet/
- http://www.fast.ai/2018/07/02/adam-weight-decay/
- https://github.com/deepmipt/DeepPavlov
05.07
04.07
- https://frnsys.com/ai_notes/
- https://hackernoon.com/why-businesses-fail-at-machine-learning-fbff41c4d5db
29.06
- https://deepsense.ai/keras-or-pytorch/
- https://www.ijcai.org/proceedings/2017/0504.pdf
- https://www.jeremyjordan.me/nn-learning-rate/
28.06
- https://getstream.io/blog/factorization-recommendation-systems/
- https://www.slideshare.net/stairlab/higherorder-factorization-machines5
- https://www.cs.waikato.ac.nz/~fbravoma/deep_nlp_tut.pdf
26.06
- https://medium.com/datreeio/training-with-keras-mxnet-on-amazon-sagemaker-43a34bd668ca
- https://medium.com/@richardchen_81235/custom-keras-model-in-sagemaker-277a2831ac67
- https://github.com/awslabs/amazon-sagemaker-examples
25.06
- https://www.predictiveanalyticsworld.com/patimes/wise-practitioner-predictive-analytics-interview-series-tauseef-rahman-at-mercer/9538/
- https://arxiv.org/pdf/1704.04565.pdf
- http://ruder.io/tracking-progress-nlp/
- https://www.kdnuggets.com/2015/03/interview-josh-hemann-activision-big-data.html
- https://www.kdnuggets.com/2015/03/interview-josh-hemann-activision-data-science.html
22.06
- https://nlp.stanford.edu/pubs/hancock2018babble.pdf
- https://tomaugspurger.github.io/modern-1-intro.html
- https://einstein.ai/static/images/pages/research/decaNLP/decaNLP.pdf
- https://www.ibm.com/developerworks/community/blogs/jfp/entry/Implementing_Libfm_in_Keras?lang=en
21.06
20.06
- horovod is coool: https://medium.com/searchink-eng/keras-horovod-distributed-deep-learning-on-steroids-94666e16673d
- https://medium.com/product-at-catalant-technologies/using-lightfm-to-recommend-projects-to-consultants-44084df7321c
- https://databricks.com/blog/2016/05/19/approximate-algorithms-in-apache-spark-hyperloglog-and-quantiles.html
19.06
- multi gpus: https://datascience.stackexchange.com/questions/23895/multi-gpu-in-keras
- https://keras.io/getting-started/faq/#how-can-i-run-a-keras-model-on-multiple-gpus
- https://stackoverflow.com/questions/50096/how-to-pass-password-to-scp
- https://stackoverflow.com/questions/31326015/how-to-verify-cudnn-installation
- https://keras.io/utils/#multi_gpu_model
- https://arxiv.org/pdf/1710.02262.pdf
- http://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#sphx-glr-auto-examples-preprocessing-plot-all-scaling-py
- https://www.pyimagesearch.com/2017/10/30/how-to-multi-gpu-training-with-keras-python-and-deep-learning/
18.06
- https://github.com/mephistopheies/mlworkshop39_042017/blob/master/3_masterclass/ipy/feature_extraction.ipynb
- https://pdfs.semanticscholar.org/fc72/59942d3d9d9f0d45565853755e74a983e028.pdf
15.06
- toxic in russian https://www.youtube.com/watch?v=aMlpeDOjib8
- multitask learning https://arxiv.org/pdf/1806.03713.pdf
14.06
- http://www.gamedonia.com/blog/5-ways-to-calculate-lifetime-value-for-free-to-play-games
- https://towardsdatascience.com/how-to-build-a-dynamic-garden-using-machine-learning-d589468f7c04
- https://scholarspace.manoa.hawaii.edu/bitstream/10125/50002/1/paper0115.pdf
12.06
- trieutrinh, google brain: https://github.com/tensorflow/models/tree/master/research/lm_commonsense
- finetune transformer: https://github.com/openai/finetune-transformer-lm
- https://blog.openai.com/language-unsupervised/
11.06
- https://www.poly-ai.com/docs/naacl18.pdf
- https://petewarden.com/2018/06/11/why-the-future-of-machine-learning-is-tiny/
- https://threadreaderapp.com/
- dontforget to check: https://gist.github.com/ttscoff/cded212ec4dd457186ca
09.06
08.06
- job taxonomy: https://www.youtube.com/watch?v=SWjIoRNTCdU
- https://www.blog.google/topics/ai/ai-principles/
- https://github.com/cvanweelden/sequence_labeling_example/blob/master/sequence_labeling_example.ipynb
- textkernel: https://www.youtube.com/watch?v=xUxjW308CcI
- https://github.com/mattilyra/LSH/blob/master/examples/Introduction.ipynb
- https://www.youtube.com/watch?v=n3dCcwWV4_k&index=40&list=PLGVZCDnMOq0ovNxfxOqYcBcQOIny9Zvb-
07.06
- http://forums.fast.ai/t/30-best-practices/12344/12
- https://bgweber.github.io/
- https://github.com/jacobeisenstein/gt-nlp-class/
- https://towardsdatascience.com/statistics-for-people-in-a-hurry-a9613c0ed0b
06.06
- https://github.com/datascienceinc/oreilly-intro-to-predictive-clv/blob/master/oreilly-an-intro-to-predictive-clv-tutorial.ipynb
- http://brucehardie.com/notes/004/bgnbd_spreadsheet_note.pdf
- https://mattilyra.github.io/2017/05/23/document-deduplication-with-lsh.html
- http://nbviewer.jupyter.org/github/mattilyra/LSH/blob/master/examples/Introduction.ipynb
- wsdm 2018 papers: http://www.wsdm-conference.org/2018/accepted-papers.html
- http://brucehardie.com/notes/
- https://community.firstmarkcap.com/content/clv-in-e-commerce-2013-10-23
- http://brucehardie.com/notes/004/bgnbd_spreadsheet_note.pdf
05.06
- https://www.tensorflow.org/hub/modules/google/universal-sentence-encoder/1
- okcupid, basic stats: https://ww2.amstat.org/publications/jse/v23n2/kim.pdf
04.06
- https://www.oreilly.com/learning/introduction-to-okrs
- http://adigaskell.org/2015/06/15/reputation-the-sharing-economy-and-the-market-for-lemons/
02.06
- RL: https://towardsdatascience.com/introduction-to-various-reinforcement-learning-algorithms-i-q-learning-sarsa-dqn-ddpg-72a5e0cb6287
- https://towardsdatascience.com/introduction-to-various-reinforcement-learning-algorithms-part-ii-trpo-ppo-87f2c5919bb9
- https://github.com/JannesKlaas/sometimes_deep_sometimes_learning/blob/master/reinforcement.ipynb
- why no mosaic plot in seaborn: https://www.perceptualedge.com/articles/visual_business_intelligence/are_mosaic_plots_worthwhile.pdf
- https://alexanderdyakonov.wordpress.com/2017/10/30/%D0%B2%D0%B8%D0%B7%D1%83%D0%B0%D0%BB%D0%B8%D0%B7%D0%B0%D1%86%D0%B8%D1%8F-%D1%87%D0%B0%D1%81%D1%82%D1%8C-1/
- http://karpathy.github.io/2016/05/31/rl/
01.06
- https://arxiv.org/pdf/1803.11175.pdf
- https://databricks.com/blog/2017/10/19/introducing-natural-language-processing-library-apache-spark.html
- https://petewarden.com/2018/05/28/why-you-need-to-improve-your-training-data-and-how-to-do-it/
29.05
28.05
- https://github.com/dennybritz/nn-from-scratch/blob/master/nn-from-scratch.ipynb
- https://www.kdnuggets.com/2016/08/include-high-cardinality-attributes-predictive-model.html
- https://www.forbes.com/sites/naomirobbins/2012/01/19/when-should-i-use-logarithmic-scales-in-my-charts-and-graphs/3/
- https://github.com/ianozsvald/data_science_delivered/blob/master/ml_creating_correct_capable_classifiers.ipynb
- https://github.com/RobRomijnders/weight_uncertainty
26.05
- https://chrisalbon.com/#deep_learning
- https://github.com/IBMDecisionOptimization/tutorials/blob/master/jupyter/MachineLearning_and_CPLEX.ipynb
25.05
- https://www.slideshare.net/SessionsEvents/misha-bilenko-principal-researcher-microsoft
- https://medium.com/moonshot/how-to-install-faiss-c986fe474a8f
- https://medium.com/ibm-data-science-experience/optimizing-a-marketing-campaign-moving-from-predictions-to-actions-e39b8ab1f865
- https://www.lunametrics.com/blog/2016/06/30/marketing-channel-attribution-markov-models-r/
- https://github.com/IBMDecisionOptimization/tutorials/blob/master/jupyter/MachineLearning_and_CPLEX.ipynb
- https://pocketphilosopher.net/2016/01/27/using-machine-learning-to-tune-a-game/
- http://davidmlane.com/hyperstat/chi_square.html
- https://medium.com/@erushton214/a-simple-spell-checker-built-from-word-vectors-9f28452b6f26
24.05
- http://davidmlane.com/hyperstat/viswanathan/appreciation.html
- https://storage.googleapis.com/pub-tools-public-publication-data/pdf/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf
- http://davidmlane.com/hyperstat/viswanathan/chi_square_marketing.html
- https://www.statisticssolutions.com/chi-square-2/
- Marketing โ Are women more likely than men to buy a product online?
- https://medium.com/@inlinecoder/disrupting-the-entrance-point-to-a-predictive-data-analytics-12676aa91a8d
- https://github.com/alessiamarcolini/deep-learning_best-practices
- https://github.com/reshamas/fastai_deeplearn_part1
23.05
- https://support.appsflyer.com/hc/en-us/articles/115002667326-Best-Practices-for-Detection-of-Mobile-Fraud
- https://github.com/SeitaroShinagawa/FavoritePapers/blob/master/nlp.md
22.05
- https://developers.google.com/machine-learning/rules-of-ml/
- https://www.datasciencecentral.com/profiles/blogs/why-logistic-regression-should-be-the-last-thing-you-learn-when-b
- http://vita.had.co.nz/papers/engineering-da.pdf
- http://vita.had.co.nz/presentations.html
21.05
18.05
- https://peerj.com/collections/50-practicaldatascistats/
- https://medium.com/indeed-data-science/theres-no-such-thing-as-a-data-scientist-8dae923c14e3
- https://medium.com/indeed-data-science/marketing-for-data-science-a-7-step-go-to-market-plan-for-your-next-data-product-60c034c34d55
- https://blog.ouseful.info/2016/09/13/making-music-and-embedding-sounds-in-jupyter-notebooks/
- https://xcitech.github.io/tutorials/travelers/
- https://github.com/jfpuget/LibFM_in_Keras/blob/master/keras_blog.ipynb
17.05
- sentence piece, sub word: https://github.com/google/sentencepiece
- fastai nlp with transfer learning: http://forums.fast.ai/t/part-2-lesson-10-wiki/14364
- https://xcitech.github.io/tutorials/heroku_tutorial/
- lime: https://homes.cs.washington.edu/~marcotcr/blog/lime/
- http://nlp.fast.ai/
- https://medium.com/activewizards-machine-learning-company/top-7-data-science-use-cases-in-finance-303c05a3cb58
- https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime
15.05
14.05
- https://www.slideshare.net/Nordeus/early-churn-prediction-and-personalised-interventions-in-top-eleven-game
- https://medium.com/googleplaydev/five-tips-to-improve-your-games-as-a-service-monetization-1a99cccdf21
- http://www.cs.cmu.edu/~./dpelleg/download/yachurn.pdf
13.05
- https://www.slideshare.net/TakanoriHayashi3/talkingdata-adtracking-fraud-detection-challenge-1st-place-solution
- https://github.com/jfpuget/LibFM_in_Keras/blob/master/keras_blog.ipynb
- https://github.com/vdutor/tf-rex
10.05
- https://events.prace-ri.eu/event/686/material/slides/0.pdf
- https://medium.com/@mrpowers/working-with-dates-and-times-in-spark-491a9747a1d2
- https://github.com/MSusik/newgradientboosting/blob/master/pydata.pdf
09.05
- https://cilab.sejong.ac.kr/gdmc2017/index.php/tutorial/
- ds politics: https://www.rdisorder.eu/2017/09/13/most-difficult-thing-data-science-politics/
- https://towardsdatascience.com/how-to-survive-corporate-politics-as-a-data-scientist-ba914fac2471
- http://businessforecastblog.com/whats-the-lift-of-your-churn-model-predictive-analytics-and-big-data/
- http://blog.datalifebalance.com/lift-charts-a-data-scientists-secret-weapon/
08.05
- employee attrition https://www.youtube.com/watch?v=pviTahK6KuQ
- https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PySpark_SQL_Cheat_Sheet_Python.pdf
07.05
- https://statsbot.co/blog/calculating-customer-lifetime-value-sql-example/
- https://databricks.com/blog/2015/06/02/statistical-and-mathematical-functions-with-dataframes-in-spark.html
- https://github.com/datatalesblog/Feature-Engineering-in-PySpark/blob/master/Value%20Investing%20PySpark%20Code.py
- https://gist.github.com/anish749/6a815ed281f538068a0d3a20ca9044fa
02.05
- make nnet uncool again: http://www.fast.ai/2018/04/29/categorical-embeddings/
- https://pdfs.semanticscholar.org/8004/cd728305c9abb203cc09885c64fcc5e45f43.pdf
01.05
- http://ianozsvald.com/
- https://github.com/marcotcr/lime/blob/master/doc/notebooks/Tutorial%20-%20continuous%20and%20categorical%20features.ipynb
- https://github.com/ianozsvald/data_science_delivered/blob/master/ml_explain_regression_prediction.ipynb
30.04
- plot decision plane: https://github.com/arogozhnikov/MLatImperial2017/blob/master/utils.py
- http://contest.ai-academy.ru/hackathon
- https://github.com/alxmamaev/Dota2Competition/blob/master/Solution.ipynb
- https://github.com/ikatsov/algorithmic-examples/blob/master/promotions/MarkovLTV.ipynb
29.04
- https://www.slideshare.net/PyData/random-forests-best-practices-for-the-business-world
- https://speakerd.s3.amazonaws.com/presentations/45e7e9769a17481c9957300105c45041/PyData_London_2018_Full_Fact.pdf
28.04
- http://gael-varoquaux.info/interpreting_ml_tuto/content/interpreting_random_forests.html#meaning-and-caveats
- https://thuijskens.github.io/2017/10/07/feature-selection/
- https://hackernoon.com/a-guide-to-scaling-machine-learning-models-in-production-aa8831163846
- https://github.com/arogozhnikov/arogozhnikov.github.io/blob/master/notebooks/2015-09-29-NumpyTipsAndTricks1.ipynb
24.04
23.04
- https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0
- https://compscicenter.ru/media/slides/machine_learning_2_2018_spring/2018_02_26_machine_learning_2_2018_spring.pdf
- https://towardsdatascience.com/exploring-the-census-income-dataset-using-bubble-plot-cfa1b366313b
- https://medium.com/scribd-data-science-engineering/multi-armed-bandits-for-the-win-240b71bc3464
- https://github.com/ChenglongChen/tensorflow-XNN/blob/master/doc/Mercari_Price_Suggesion_Competition_ChenglongChen_4th_Place.pdf
20.04
- https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-2-visual-data-analysis-in-python-846b989675cd
- http://www.statisticshowto.com/probability-and-statistics/skewed-distribution/
- https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-0002-introduction-to-computational-thinking-and-data-science-fall-2016/lecture-slides-and-files/index.htm
- https://reshamas.github.io/to-kaggle-or-not/
- http://learningsys.org/nips17/assets/papers/paper_11.pdf
- chisquare: http://uregina.ca/~gingrich/ch10.pdf
19.04
18.04
- http://marcotcr.github.io/lime/tutorials/Tutorial%20-%20continuous%20and%20categorical%20features.html
- mean roc auc: http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc_crossval.html
- https://www.kdnuggets.com/2018/04/7-books-mathematical-foundations-data-science.html
- https://www.appsflyer.com/blog/4-new-ways-use-boost-performance-audiences/
15.04
- crosswire device matching/grouping https://www.youtube.com/watch?v=nfEDGY2siU8
- https://www.apptamin.com/blog/lifetime-value-mobile-customer/
- https://lloydmelnick.com/2013/01/08/ltv-the-lifeblood-of-your-business/
- https://data36.com/wp-content/uploads/2016/08/practical_data_dictionary_final_data36_tomimester_published.pdf
10.04
- https://stats.stackexchange.com/questions/18844/when-and-why-should-you-take-the-log-of-a-distribution-of-numbers
- https://www.coursera.org/learn/vvedenie-mashinnoe-obuchenie
- https://github.com/mortido/mlbootcamp_online_game/blob/master/itog.py
- http://www.machinelearning.ru/wiki/images/4/4f/Voron-ML-Modeling-slides.pdf
- http://www.machinelearning.ru/wiki/images/archive/9/97/20140227072517!Voron-ML-Logic-slides.pdf
- https://habrahabr.ru/post/324590/
09.04
- https://www.digitaldoughnut.com/articles/2017/january/how-to-use-customer-lifetime-value-in-your-plan
- https://github.com/mstephenmsmith/predictive_LTV_analysis
- https://github.com/fastai/fastai/blob/master/courses/ml1/lesson3-rf_foundations.ipynb
- https://alexanderdyakonov.wordpress.com/2017/10/30/%D0%B2%D0%B8%D0%B7%D1%83%D0%B0%D0%BB%D0%B8%D0%B7%D0%B0%D1%86%D0%B8%D1%8F-%D1%87%D0%B0%D1%81%D1%82%D1%8C-1/
06.04
- https://github.com/harvardnlp/annotated-transformer/blob/master/The%20Annotated%20Transformer.ipynb
- https://github.com/YixuanLi/LEMON
05.04
- https://people.cs.umass.edu/~jpjiang/cs646/03_eval_basics.pdf
- https://towardsdatascience.com/facebook-research-just-published-an-awesome-paper-on-learning-hierarchical-representations-34e3d829ede7
- https://www.saama.com/blog/poincare-embeddings-for-representing-hierarchical-data/
- https://rare-technologies.com/implementing-poincare-embeddings/
- https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Poincare%20Tutorial.ipynb
04.04
02.04
- https://mlbootcamp.ru/news_list/
- https://github.com/catboost/catboost/blob/master/catboost/tutorials/advanced_tutorials/catboost_coreml_export_tutorial.ipynb
- https://developer.apple.com/documentation/coreml/integrating_a_core_ml_model_into_your_app
01.04
churn:
- https://www.mapd.com/blog/VW-Predicts-Churn-with-GPU-Accelerated-Machine-Learning-and-visual-analytics
- https://activewizards.com/blog/top-9-data-science-use-cases-in-banking/
- https://indico.cern.ch/event/617754/contributions/2590694/attachments/1459648/2254154/catboost_for_CMS.pdf
repeat purchase:
- https://www.youtube.com/watch?v=kOqLbibOGus
- https://github.com/PengInGitHub/Repeat-Buyer-Prediction-for-E-Commerce/blob/master/solution.pdf
- https://www.slideshare.net/moa108/repeat-buyer-prediction-for-e-commerce-kdd2016
31.03
- https://hackernoon.com/what-leading-artificial-intelligence-course-should-you-take-and-what-should-you-do-after-261a933bb3da
- https://www.tensorflow.org/dev-summit/
30.03
- https://zenodo.org/record/166035#.Wr4nGtNubOQ
- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0180735
- http://textvis.lnu.se/
- https://github.com/xvoland/Extract/blob/master/extract.sh
- https://github.com/DmitryUlyanov/Multicore-TSNE/blob/master/MulticoreTSNE/examples/test.py
28.03
- https://ahmedbesbes.com/how-to-mine-newsfeed-data-and-extract-interactive-insights-in-python.html
- https://data36.com/reporting-optimizing-predicting-data/
- https://data36.com/ab-testing-5-rules/
- https://github.com/stared/livelossplot/blob/master/keras_example.ipynb
27.03
- https://gist.github.com/jiffyclub/905bf5e8bf17ec59ab8f#file-hdf_to_parquet-py
- https://nbviewer.jupyter.org/github/JasonKessler/Scattertext-PyData/blob/master/PyData-Scattertext-Part-1.ipynb
26.03
- http://web.stanford.edu/class/ee380/Abstracts/141112-slides.pdf
- https://data36.com/predictive-analytics-101-part-1/
- https://data36.com/ab-testing-5-rules/
- https://data36.com/fake-door-testing/
24.03
- https://www.stat.berkeley.edu/~stark/Java/Html/index.htm
- https://data36.com/statistical-bias-types-explained/
23.03
- https://hackernoon.com/aspiring-data-scientists-start-to-learn-statistics-with-these-6-books-a33bbb55b8e9
- https://towardsdatascience.com/catboost-vs-light-gbm-vs-xgboost-5f93620723db
- https://github.com/Featuretools/predict_next_purchase/blob/master/Tutorial.ipynb
22.03
- https://tonysyu.github.io/raw_content/matplotlib-style-gallery/gallery.html
- https://github.com/Featuretools/predict_next_purchase/
- https://github.com/josolnik/behavioral-learnings-projects/
21.03
- https://biendata.com/competition/kdd_2018/
- https://soundcloud.com/piskvorky/rrp-4-leo-boytsov-on-approximate-search-and-information-retrieval
- https://mailchi.mp/radimrehurek/radims-machine-learning-newsletter-3263153
- https://www.featuretools.com/
20.03
- https://www.slideshare.net/ShangxuanZhang/kaggle-winning-solution-xgboost-algorithm-let-us-learn-from-its-author
- https://docs.google.com/spreadsheets/d/1gnx_AnJPp_Ez6hTukVPMNHDAjb0yCTEgzyo9NUiQjfc/edit#gid=1036524761
19.03
- https://github.com/mm-mansour/Fast-Pandas
- https://pdfs.semanticscholar.org/0203/6a9565159f19633c5de023321cdf422f43d3.pdf
- https://medium.com/@joshelman/the-only-metric-that-matters-ab24a585b5ea
- https://medium.com/@joshelman/the-only-metric-that-matters-ab24a585b5ea
18.03
- https://www.urbanairship.com/blog/churn-prediction-our-machine-learning-model
- mobile app events with ML: https://pdfs.semanticscholar.org/0203/6a9565159f19633c5de023321cdf422f43d3.pdf
- http://josolnik.com/simulating_product_usage_data.html
- http://www.graphviz.org/pdf/dotguide.pdf
16.03
- https://medium.com/@pushkarmandot/https-medium-com-pushkarmandot-what-is-lightgbm-how-to-implement-it-how-to-fine-tune-the-parameters-60347819b7fc
- https://docs.google.com/spreadsheets/d/1gnx_AnJPp_Ez6hTukVPMNHDAjb0yCTEgzyo9NUiQjfc/edit#gid=1036524761
- http://www.graphviz.org/pdf/dotguide.pdf
12.03
- https://s3.amazonaws.com/assets.datacamp.com/production/course_3374/slides/ch3_slides.pdf
- https://www2.unil.ch/biomapper/Download/Lobo-GloEcoBioGeo-2007.pdf
- https://github.com/Volodymyrk/stats-testing-in-python/blob/master/04%20-%20AB%20testing%20revenues.ipynb
- https://github.com/anvaka/word2vec-graph
- http://blog.minitab.com/blog/adventures-in-statistics-2/understanding-t-tests%3A-1-sample%2C-2-sample%2C-and-paired-t-tests
08.03
- https://distill.pub/2018/building-blocks/
- https://www.kaggle.com/anokas/talkingdata-adtracking-eda
- https://github.com/PavelOstyakov/toxic/blob/master/fit_predict.py
- https://github.com/MLWave/Kaggle-Ensemble-Guide/blob/master/src/kaggle_rankavg.py
- https://mlwave.com/kaggle-ensembling-guide/
- https://www2.unil.ch/biomapper/Download/Lobo-GloEcoBioGeo-2007.pdf
07.03
- https://www.kdnuggets.com/2017/06/kmeans-clustering-tableau-call-detail-records.html
- https://github.com/neptune-ml/kaggle-toxic-starter
05.03
- http://konukoii.com/blog/2018/02/19/twitter-sentiment-analysis-using-combined-lstm-cnn-models/
- https://www.kaggle.com/ogrellier/lgbm-with-words-and-chars-n-gram/code
04.03
- https://github.com/mxbi/mlcrate/blob/master/mlcrate/ensemble.py
- datacamp.com/community
- https://nbviewer.jupyter.org/github/repmax/topic-model/blob/master/topic-modelling.ipynb
01.03
- https://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf
- http://www.abigailsee.com/2018/02/21/deep-learning-structure-and-innate-priors.html
- http://cikm2017.org/download/analytiCup/session3/CIKMAnalytiCup2017_LazadaProductTitleQuality_T2.pdf
- http://www.businessinsider.com/app-users-are-quick-to-uninstall-2016-11
- https://jjallaire.shinyapps.io/keras-customer-churn/#section-customer-scorecard
- https://github.com/rstudio/keras-customer-churn
- https://github.com/nzw0301/spooky/blob/master/features/bigram_supervised_fasttext.ipynb
- https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-IMDB.ipynb
28.02
27.02
- text classification with fastai https://www.youtube.com/watch?v=37sFIak42Sc&feature=youtu.be&t=3745
- https://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf
- https://www.kaggle.com/jhoward/nb-svm-strong-linear-baseline
- https://github.com/rstudio/keras-customer-churn
26.02
- https://github.com/deepmipt/DeepPavlov
- orange cars are not lemon, really? http://cdn2.hubspot.net/hubfs/2176909/Resources/Whitepaper_Are_Orange_Cars_Really_not_Lemons.pdf?submissionGuid=347eadf2-9cdb-48a1-8edc-7de4698c3d28
- http://python-3-patterns-idioms-test.readthedocs.io/en/latest/PythonForProgrammers.html
21.02
- http://www.cmap.polytechnique.fr/~lepennec/enseignement/DSSP_Orange/
- doing data science: frontline
- https://github.com/cstorm125/thai2vec/blob/master/notebooks/text_classification.ipynb
- http://hamelg.blogspot.com/2015/11/python-for-data-analysis-part-24.html
20.02
- https://medium.mybridge.co/machine-learning-top-10-open-source-projects-v-feb-2018-d1d39062bd20
- https://tableplus.io/
- https://mailchi.mp/radimrehurek/radims-machine-learning-newsletter-1544193
13.02
09.02
- https://github.com/maciejkula/mixture
- https://towardsdatascience.com/neural-network-architectures-156e5bad51ba
- https://medium.com/@aldamiz/how-we-grew-from-0-to-4-million-women-on-our-fashion-app-with-a-vertical-machine-learning-approach-f8b7fc0a89d7
- https://cs224d.stanford.edu/reports/pascal.pdf
- https://pinboard.in/u:aldamiz/t:re-read/
- https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
- employee quit prediction https://dzone.com/articles/employee-turnover-prediction-with-deep-learning
07.02
- https://github.com/the-deep-learners/TensorFlow-LiveLessons/
- https://github.com/jfloff/pywFM
- Uber epxirement design: https://www.youtube.com/watch?v=9bl7SPSqbX0
06.02:
- https://www.getrevue.co/profile/wildml/issues/the-wild-week-in-ai-andrew-ng-s-new-ai-fund-mini-alphago-implementation-bias-variance-in-rl-and-more-94390
- https://www.technologyreview.com/s/610095/more-efficient-machine-learning-could-upend-the-ai-paradigm/?utm_source=twitter.com&utm_medium=social&utm_content=2018-02-05&utm_campaign=Technology+Review
- http://christophjanz.blogspot.com/2012/05/know-your-user-cohorts.html
05.02
- https://wsdm-cup-2018.kkbox.events/
- https://medium.com/@nokkk/jupyter-notebook-tricks-for-data-science-that-enhance-your-efficiency-95f98d3adee4
- https://wsdm-cup-2018.kkbox.events/pdf/5_BGregory_WSDM2018_PredictingCustomerChurn.pdf
- https://brage.bibsys.no/xmlui/bitstream/handle/11250/2433761/16128_FULLTEXT.pdf
02.02
- https://github.com/Kaggle/kaggle-api
- https://github.com/maciejkula/triplet_recommendations_keras
- https://www.coursera.org/learn/nlp-sequence-models
- https://arxiv.org/pdf/1703.01365.pdf
01.02
- https://www.economist.com/news/leaders/21728617-life-age-facial-recognition-what-machines-can-tell-your-face
- https://blog.openai.com/requests-for-research-2/
31.01
- https://github.com/hiranumn/IntegratedGradients
- https://github.com/valentina-s/Novice2DataNinja/blob/master/Videos.ipynb
- https://www.pyimagesearch.com/2018/01/29/scalable-keras-deep-learning-rest-api/
- https://github.com/maciejkula/triplet_recommendations_keras
30.01
- https://medium.com/@ialuronico/my-take-on-pydata-seattle-2017-e8c7b0fa6bf5
- https://github.com/JonathanRaiman/wikipedia_ner
- https://github.com/nadiinchi/hse_cs_ml_course_2017_FTAD/blob/master/materials/presentation_vis_features.pdf
- https://github.com/summer1227/appsflyer/blob/master/click_install.py
- http://www.kdd.org/kdd2016/papers/files/adf0755-vanderveldAbr.pdf
- https://www.youtube.com/watch?v=UmP3UePGO7E
29.01
- https://www.kaggle.com/sudalairajkumar/simple-exploration-notebook-zillow-prize
- https://www.kaggle.com/nikunjm88/creating-additional-features
- https://github.com/nadiinchi/hse_cs_ml_course_2017_FTAD/blob/master/materials/presentation_vis_features.pdf
- https://blog.metaflow.fr/tensorflow-how-to-optimise-your-input-pipeline-with-queues-and-multi-threading-e7c3874157e0
26.01
- https://github.com/Microsoft/AutonomousDrivingCookbook/tree/master/AirSimE2EDeepLearning
- https://blog.insightdatascience.com/how-to-solve-90-of-nlp-problems-a-step-by-step-guide-fda605278e4e
25.01
- https://syncedreview.com/2017/06/06/re%C2%B7work-deep-learning-in-retail-summit-london-uk/
- https://www.re-work.co/events/deep-learning-in-retail-summit-london-2017/schedule#day_2
- https://medium.com/@Synced/customer-lifetime-value-prediction-using-embeddings-53f54e2ac59d
- https://github.com/cseward/ngram_language_model
- https://arxiv.org/abs/1704.04110
- https://arxiv.org/pdf/1702.02098.pdf
- https://rare-technologies.com/implementing-poincare-embeddings/
- http://blog.fastforwardlabs.com/2018/01/22/exploring-recommendation-systems.html?utm_campaign=Data%2BElixir&utm_medium=email&utm_source=Data_Elixir_166
- https://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186fff1d
22.01
- https://thenextweb.com/artificial-intelligence/2018/01/10/you-think-it-and-a-robot-sees-it-the-future-is-here-with-mind-reading-ai/
- hyperQA https://github.com/vanzytay/HyperQA
20.01
- https://www.kaggle.com/steubk/fixing-typos
- https://github.com/ChenglongChen/Kaggle_HomeDepot/blob/master/Code/Chenglong/google_spelling_checker_dict.py
- https://github.com/ChenglongChen/Kaggle_HomeDepot/tree/master/Code/Chenglong
- https://github.com/lystdo/Codes-for-WSDM-CUP-Music-Rec-1st-place-solution/blob/master/nn_structure.pdf
19.01
- fashion relevant is not enough: https://arxiv.org/pdf/1406.3561.pdf
- Yahoo portrait user: https://arxiv.org/pdf/1512.04912.pdf
- predict buying intention: https://arxiv.org/pdf/1511.06247.pdf
- realtime community detection: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188702
18.01
- http://www.bayareabikeshare.com/assets/pdf/Bjorn.pdf
- https://github.com/baumanab/BayAreaBikeShare
- https://blog.modeanalytics.com/python-data-visualization-libraries/
- http://thfield.github.io/babs/
17.01
- https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/what-ai-can-and-cant-do-yet-for-your-business
- https://github.com/mryab/webgames-ltv-prediction/blob/master/Webgames%20LTV%20Prediction-Android.ipynb
15.01
- https://github.com/diefimov/MTH594_MachineLearning/tree/master/ipython
- https://www.slideshare.net/RubensZimbres/portfolio-82-2017
- https://github.com/deanwampler/JustEnoughScalaForSpark
- https://www.datasciencecentral.com/profiles/blogs/business-intelligence-and-data-science-fuzzy-borders
- https://ourworldindata.org/
- https://github.com/primetang/pyflann
- https://github.com/jfloff/pywFM
- https://lab.getbase.com/pandarize-spark-dataframes/
- https://towardsdatascience.com/understanding-feature-engineering-part-1-continuous-numeric-data-da4e47099a7b
12.01
- https://blog.goodaudience.com/ai-in-2018-for-researchers-8955df0caaf9
- https://github.com/pandas-profiling/pandas-profiling
- http://support.minitab.com/en-us/minitab-express/1/help-and-how-to/modeling-statistics/regression/supporting-topics/basics/a-comparison-of-the-pearson-and-spearman-correlation-methods/
- https://github.com/dipanjanS/practical-machine-learning-with-python
11.01
10.01
- https://medium.com/wish-engineering/scaling-analytics-at-wish-619eacb97d16
- http://www.awesomestats.in/
- https://dataelixir.com/issues/164#start
08.01
- http://www.predictiveanalyticsworld.com/patimes/uplift-modeling-making-predictive-models-actionable/8578/
- http://www.cs.columbia.edu/~evs/papers/
- http://www.nit.eu/czasopisma/JTIT/2012/2/43.pdf
- https://github.com/PGuti/Uplift/blob/master/Uplift%20Evaluation.ipynb
- http://www.predictiveanalyticsworld.com/book/press.php#articlesbytheauthor
04.01
- https://spark-in.me/post/learn-data-science
- https://habrahabr.ru/company/ods/blog/322626/
- https://docs.google.com/spreadsheets/d/1dXghGL0hH6gs3H9Km7zhOpk9MWufRJ_bSrFw0NLaRuo/edit#gid=791694085
- https://github.com/mmcs-sfedu/ds_workshop/blob/master/refs.md
- http://www.inference.vc/
- http://fastml.com/two-faces-of-overfitting-subscribers-only/
- http://quantresearchgroup.ru/
- https://livebook.datascienceheroes.com/index.html
- https://github.com/esokolov/ml-course-msu
- https://medium.freecodecamp.org/every-single-machine-learning-course-on-the-internet-ranked-by-your-reviews-3c4a7b8026c0
- http://www.offconvex.org/
- https://github.com/ogrisel/parallel_ml_tutorial/blob/master/notebooks/08%20-%20Large%20Scale%20Text%20Classification%20for%20Sentiment%20Analysis.ipynb
- https://github.com/shervinea
03.01
- http://rahnamayan.ca/assets/documents/Customer%20Shopping%20Pattern%20Prediction-%20A%20Recurrent%20Neural%20Network%20Approach.pdf
- http://eprints.bournemouth.ac.uk/10107/1/Consumer_Behaviour_Theory_-_Approaches_%26_Models.pdf
- https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks?utm_campaign=Data%2BElixir&utm_medium=email&utm_source=Data_Elixir_163
- https://blog.keen.io/architecture-of-giants-data-stacks-at-facebook-netflix-airbnb-and-pinterest-9b7cd881af54
- https://doogkong.github.io/2017/papers/paper2.pdf
- https://pdfs.semanticscholar.org/a8cd/90fd6fce09f38a391579057d3207235a431b.pdf
- http://www.marekrei.com/blog/ml-nlp-publications-in-2017/
- http://www.aihelsinki.com/a-collection-of-tensorflow-resources-for-self-study/
02.01
- https://github.com/blast-analytics-marketing/RFM-analysis
- https://cran.r-project.org/web/packages/BTYD/vignettes/BTYD-walkthrough.pdf
- http://www.blastam.com/blog/rfm-analysis-boosts-sales
- http://cdn.intechopen.com/pdfs/13162.pdf
22.12
21.12
- https://github.com/Arturus/kaggle-web-traffic
- https://oneau.wordpress.com/2011/02/28/simple-statistics-with-scipy/
20.12
- https://datahack.analyticsvidhya.com/contest/all/
- https://github.com/zixia/chinese-whispers
- https://github.com/tudarmstadt-lt/sensegram/tree/master/chinese-whispers
- https://github.com/zhly0/facenet-face-cluster-chinese-whispers-
18.12
- nips review https://docs.google.com/spreadsheets/d/1ZQMXFAVapEOm1y53ijEJ1Ds6Mls-z6ZtoJKpJmHogzo/edit#gid=0
- https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5
- http://www.akbc.ws/2017/
17.12
- http://proceedings.mlr.press/v7/guyon09/guyon09.pdf
- http://proceedings.mlr.press/v7/miller09/miller09.pdf
- https://ragulpr.github.io/assets/draft_master_thesis_martinsson_egil_wtte_rnn_2016.pdf
- https://github.com/catboost/benchmarks/tree/master/quality_benchmarks
16.12
15.12
- http://lifelines.readthedocs.io/en/latest/Survival%20analysis%20with%20lifelines.html#estimating-the-survival-function-using-kaplan-meier
- http://daynebatten.com/2015/02/customer-churn-survival-analysis/
- survival model, princeton notes: http://data.princeton.edu/wws509/notes/
- http://daynebatten.com/2017/02/recurrent-neural-networks-churn/
- http://www.machinelearning.ru/wiki/index.php
- http://www.machinelearning.ru/wiki/images/0/06/PZAD2016_03_visualize.pdf
- http://www.machinelearning.ru/wiki/images/c/cc/PZAD2016_09_rf.pdf
- http://www.machinelearning.ru/wiki/images/8/8e/PZAD2016_10_tfeatures.pdf
- http://www.machinelearning.ru/wiki/images/e/e7/PZAD2016_14_social.pdf
14.12
- https://github.com/daynebatten/keras-wtte-rnn
- https://ragulpr.github.io/2016/12/22/WTTE-RNN-Hackless-churn-modeling/
- https://github.com/erikbern
- conversion rate, survival analysis: https://erikbern.com/2017/05/23/conversion-rates-you-are-most-likely-computing-them-wrong.html
- https://data-literacy.geckoboard.com
- https://erikbern.com/2017/12/12/learning-from-users-faster-using-machine-learning.html
- https://github.com/UrbanInstitute/pyspark-tutorials
- GP: http://bridg.land/posts/gaussian-processes-1
- mining non redundant sequence https://arxiv.org/pdf/1712.04159.pdf
13.12
- https://www.davidculley.com/installing-python-on-a-mac/
- http://bomilanovich.com/blog/howto-install-pyqt-on-mac-with-python-3/
- https://sascompetitions.ru/
12.12
- https://rare-technologies.com/mummy-effect-bridging-gap-between-academia-industry/
- http://ruder.io/deep-learning-optimization-2017/
- dont decay learning rate, increase batch size: https://pdfs.semanticscholar.org/3299/aee7a354877e43339d06abb967af2be8b872.pdf
- https://medium.com/@Synced/nips-2017-day-1-2-highlights-67ab464086c
11.12
- http://learningsys.org/nips17/assets/slides/dean-nips17.pdf
- https://www.datascience.com/resources/notebooks/overview-churn-modeling-techniques
- https://swarbrickjones.wordpress.com/2017/03/28/cross-entropy-and-training-test-class-imbalance/
10.12
07.12
- bayesian variable explanation: https://www.kdnuggets.com/2017/11/bayesian-networks-understanding-effects-variables.html
- end2end ML/DL https://aws.amazon.com/sagemaker/ (colab?)
- test of time https://www.youtube.com/watch?time_continue=2&v=Qi1Yry33TQE
06.12
- http://proceedings.mlr.press/v7/niculescu09/niculescu09.pdf
- https://www.nature.com/articles/d41586-017-07522-z
- http://www.wiseathena.com/pdf/wa_dl.pdf
05.12
- https://www.dataiku.com/learn/guide/tutorials/churn-prediction.html
- https://www.dataiku.com/solutions/use-cases/lifetime-value-optimisation/
04.12
- scikit optimize https://www.youtube.com/watch?v=DGJTEBt0d-s
- https://github.com/fmfn/BayesianOptimization/blob/master/examples/xgboost_example.py
- https://www.dataapplab.com/wp-content/uploads/2017/05/DAL-Kaggle-cometition.pdf
02.12
online marketing applications
- https://pydata.org/carolinas2016/schedule/presentation/23/
- https://github.com/maoting1223/pycon_sg_2016
- https://www.youtube.com/watch?v=gx6oHqpRgpY
01.12
30.11
- https://hbr.org/2017/06/a-refresher-on-ab-testing
- Reuters Tracer: https://arxiv.org/pdf/1711.04068.pdf
- https://github.com/DmitryUlyanov/deep-image-prior
- https://research.googleblog.com/2017/11/interpreting-deep-neural-networks-with.html
29.11
- https://www.nytimes.com/2017/11/28/technology/artificial-intelligence-research-toronto.html
- https://rare-technologies.com/machine-learning-hardware-benchmarks/
- how xgboost handle nans: dmlc/xgboost#21
- https://github.com/ledmaster?tab=repositories
- automata extraction from RNN https://arxiv.org/pdf/1711.09576.pdf
- shap vis: https://github.com/slundberg/shap
- http://www.cs.jhu.edu/~ayuille/courses/Stat161-261-Spring14/Big%20data_%20are%20we%20making%20a%20big%20mistake_%20-%20FT.pdf
- http://cdn2.hubspot.net/hub/215445/file-1390429685-pdf/DI_ebook_-_How_to_Build_and_Lead_a_Winning_Data_Team-1.pdf?t=1435065619454
28.11
- https://www.aarki.com/blog/using-machine-learning-to-predict-campaign-performance
- https://www.forbes.com/sites/forbesagencycouncil/2017/11/15/how-machine-learning-can-maximize-the-success-of-marketing-campaigns/3/#4bbeb8df7846
27.11
- https://www.slideshare.net/DataRobot/featurizing-log-data-before-xgboost
- https://www.slideshare.net/DataRobot/make-sense-out-of-data-with-feature-engineering
- https://www.slideshare.net/KaiX/xavier-conort-datascience-sg-meetup-challenges-in-insurance-pricing
- https://www.slideshare.net/KaiX/forecasting-techniques-data-science-sg
- https://github.com/thiakx?tab=repositories
- https://www.kdnuggets.com/2017/11/ng-deep-learning-specialization-21-lessons.html?utm_content=buffera7008&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
- https://www.slideshare.net/DataRobot/featurizing-log-data-before-xgboost
- https://github.com/gigamailer/simplenin/blob/master/Mastering%20Feature%20Engineering%20%2528Early%20Release%2529-O%2527Reilly%25282016%2529.pdf
- https://github.com/svegapons/kaggle_airbnb/blob/master/code_keras.py
24.11
- https://www.bloomberg.com/company/announcements/bloomberg-magic-machine-learning/
- https://www.investopedia.com/terms/a/alpha.asp
- http://web.nchu.edu.tw/~jodytsao/MarkegingG/IIR10-Sentiment%20Analysis.pdf
- https://flyyufelix.github.io/2017/11/17/direct-future-prediction.html
- https://medium.com/@jeffykao/more-than-a-million-pro-repeal-net-neutrality-comments-were-likely-faked-e9f0e3ed36a6
23.11
- http://blog.paralleldots.com/data-science/breakthrough-research-papers-and-models-for-sentiment-analysis/?lipi=urn%3Ali%3Apage%3Ad_flagship3_pulse_read%3BiC%2Fq1jhKSuCkAgj9YxVOuQ%3D%3D
- https://github.com/Far0n/xgbfi
22.11
- https://github.com/CleverTap/Analytics_ds_articles/tree/master/Data-Informed/Feature_Engineering
- https://towardsdatascience.com/diary-of-a-data-scientist-at-booking-com-924734c71417
- A/B testing at Booking https://arxiv.org/pdf/1710.08217.pdf
- https://booking.ai/named-entity-classification-d14d857cb0d5
- https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.mstats.winsorize.html
- http://data-informed.com/how-to-improve-machine-learning-tricks-and-tips-for-feature-engineering/
21.11
- https://www.analyticsvidhya.com/blog/2017/06/which-algorithm-takes-the-crown-light-gbm-vs-xgboost/
- tune lightgbm: https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters-Tuning.rst
- https://github.com/mdda/compressing-word-embeddings/tree/master/notebooks
- https://dashee87.github.io/deep%20learning/python/predicting-cryptocurrency-prices-with-deep-learning/
- https://github.com/dashee87/blogScripts/blob/master/Jupyter/2017-11-20-predicting-cryptocurrency-prices-with-deep-learning.ipynb
17.11
- https://medium.com/searchink-eng/keras-horovod-distributed-deep-learning-on-steroids-94666e16673d
- https://us3.campaign-archive.com/?u=6a29d4cc0471455d38260b3cc&id=ddf2eee959
- http://wangzhinan.com/2017/02/20/wsdm17-summary/#more
16.11
- deep ensembling: https://cambridgespark.com/content/tutorials/neural-networks-tuning-techniques/index.html
- https://www.technologyreview.com/s/609495/ai-can-be-made-legally-accountable-for-its-decisions/?utm_source=twitter.com&utm_medium=social&utm_content=2017-11-15&utm_campaign=Technology+Review
- https://beamandrew.github.io/deeplearning/2017/06/04/deep_learning_works.html
- https://github.com/taolei87/rcnn
- https://research.googleblog.com/2017/11/sling-natural-language-frame-semantic.html
- model intepretation: https://blog.kjamistan.com/towards-interpretable-reliable-models/
- https://github.com/cgnorthcutt/rankpruning
- https://github.com/PAIR-code/facets/blob/master/facets_overview/Overview_demo.ipynb
- https://github.com/google/sling
15.11
- https://github.com/catboost/catboost/blob/master/catboost/tutorials/quora_catboost_w2v.ipynb
- https://spacy.io/usage/v2
14.11
13.11
- https://github.com/kaz-Anova/StackNet
- https://tech.yandex.com/catboost/doc/dg/concepts/python-reference_catboostclassifier_fit-docpage/
- https://machinelearning.apple.com/2017/10/01/hey-siri.html
- MLConf SF 2017: https://www.slideshare.net/JuneAndrews/counter-intuitive-machine-learning-for-the-industrial-internet-of-things-81862870/1
- https://www.slideshare.net/SessionsEvents
- https://towardsdatascience.com/7-takeaways-from-mlconf-sf-1b2703db5ecb
10.11
- what wrong with CNN: https://www.youtube.com/watch?v=rTawFwUvnLE
- https://medium.com/@culurciello/deep-neural-network-capsules-137be2877d44
09.11
- vizuka: https://github.com/0011001011/Vizuka
- https://www.youtube.com/watch?feature=youtu.be&v=klYBPl1ljTQ&list=PLGVZCDnMOq0rjkF7p_F4qtaVJQnjK1oKT&app=desktop
08.11
- pearson correlation: https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
- jensen inequality: https://en.wikipedia.org/wiki/Jensen%27s_inequality
- ui2code: https://uizard.io/
- https://pypi.python.org/pypi/textstat/
- mse vs pearson correlation: http://www.bwgriffin.com/gsu/courses/edur8132/notes/Notes8c2_RegressionModelFit.pdf
3.11
2.11
- https://github.com/XifengGuo/CapsNet-Keras/blob/master/CapsNet.py
- outlier detection: http://bugra.github.io/work/notes/2014-03-31/outlier-detection-in-time-series-signals-fft-median-filtering/
- actionable classification: https://arxiv.org/abs/1607.02501
- https://www.youtube.com/watch?v=NOUMgThZ5UE
- http://www.swisstext.org/#daeniken
- http://people.inf.ethz.ch/ganeao/emnlp17_deep_ed.pdf
- http://www.swisstext.org/docs/2017/Presentation/daeniken/swisstext_pius_von_daeniken.pdf
- http://www.swisstext.org/docs/2017/Presentation/pappas/swisstext17.pdf
- http://www.swisstext.org/docs/2017/Presentation/pappas/swisstext17.pdf
1.11
- two sample test, mean: https://www.isixsigma.com/tools-templates/hypothesis-testing/making-sense-two-sample-t-test/
- two sample test, ratio: https://github.com/maoting1223/pycon_sg_2016
- welchs test vs t student: http://daniellakens.blogspot.com/2015/01/always-use-welchs-t-test-instead-of.html
31.10
- structure data: https://github.com/random-forests/tensorflow-workshop/blob/master/examples/07_structured_data.ipynb
- https://www.pyimagesearch.com/2017/10/30/how-to-multi-gpu-training-with-keras-python-and-deep-learning/
- kaggle survey: LR first, tree second: https://www.kaggle.com/surveys/2017
- fe best practice: https://www.quora.com/What-are-some-best-practices-in-Feature-Engineering
- ppmi vs svd: https://github.com/piskvorky/word_embeddings/blob/master/run_embed.py
- class imbalance in cnn: https://arxiv.org/pdf/1710.05381.pdf
- rnnvis: https://arxiv.org/pdf/1710.10777.pdf
- task detection from email: https://medium.com/@rodrigo_23805/extracting-tasks-from-emails-first-challenges-86e7fbbf4672
- interactive cm: https://rare-technologies.com/interactive-confusion-matrix-python/
30.10
- radim newsletter http://us3.campaign-archive.com/?u=6a29d4cc0471455d38260b3cc&id=9f47229ab0
- prodLDA in keras: https://github.com/nzw0301/keras-examples/blob/master/prodLDA.ipynb
- prodLDA: https://openreview.net/pdf?id=BybtVK9lg
- bounter: https://github.com/RaRe-Technologies/bounter
- http://cikm2017.org/mainconschedule.html
- http://gael-varoquaux.info/stats_in_python_tutorial/
- http://matthewrocklin.com/blog/work/2017/10/16/streaming-dataframes-1?utm_campaign=Data%2BElixir&utm_medium=email&utm_source=Data_Elixir_154
- GA: http://blog.otoro.net/2017/10/29/visual-evolution-strategies/
- nlp talk: https://www.cs.umb.edu/~twang/file/cs188_TongWang.pdf
- http://yutori-datascience.hatenablog.com/entry/2017/10/29/205433
29.10
- linguistic structure is back, acl 2017: http://www.abigailsee.com/2017/08/30/four-deep-learning-trends-from-acl-2017-part-1.html
28.10
27.10
26.10
- Coursera kaggle: https://www.coursera.org/learn/competitive-data-science
25.10
- how to start ML/DL/NLP https://drive.google.com/file/d/0B2cCJQ2_aOwjUmFnRko2QjRGelE/view
- https://www.slideshare.net/lopusz/debugging-machinelearning
- https://github.com/meereeum/lda2vec-tf
- https://medium.com/@rchang/advice-for-new-and-junior-data-scientists-2ab02396cf5b
- https://github.com/YuriyGuts/kaggle-quora-question-pairs/blob/master/notebooks/classify-lightgbm-cv-pred.ipynb
24.10
- https://github.com/plaidml/plaidml
- https://www.youtube.com/watch?v=G4uDBe28ryQ
- https://github.com/ilkarman/DeepLearningFrameworks
23.10
20.10
- https://docs.google.com/presentation/d/1vFlR9QJ4v1XnRg0-sNhe0_1gZUjj1utDdAUHScjzOtI/edit#slide=id.g271203ffb6_2_8
- http://matrixmultiplication.xyz
- http://blog.yhat.com/posts/logistic-regression-python-rodeo.html
19.10
- https://jeremykun.com/2016/04/18/singular-value-decomposition-part-1-perspectives-on-linear-algebra/
- http://multithreaded.stitchfix.com/blog/2017/10/18/stop-using-word2vec/
- https://github.com/uber/horovod
18.10
- swish = x.sigmoid(x) https://arxiv.org/pdf/1710.05941.pdf
- DrQA: document retriever, document reader: https://github.com/facebookresearch/DrQA
- https://gist.github.com/GaelVaroquaux/ead9898bd3c973c40429
17.10
- outlier detection: https://storage.googleapis.com/supplemental_media/udacityu/3104648634/Hodge+Austin_OutlierDetection_AIRE381.pdf
- https://lilianweng.github.io/lil-log/2017/09/28/anatomize-deep-learning-with-information-theory.html
- opening the black box of DNN: https://arxiv.org/pdf/1703.00810.pdf
- information plane for DL: https://www.youtube.com/watch?v=bLqJHjXihK8
- information theory with C.Olah: http://colah.github.io/posts/2015-09-Visual-Information/
16.10
15.10
- nlp curator: https://github.com/Kyubyong/nlp_tasks
- https://github.com/Kulbear/deep-learning-coursera
13.10
- Information theory of DL https://www.youtube.com/watch?v=RKvS958AqGY
- https://arxiv.org/pdf/1709.03856.pdf
12.10
- https://github.com/facebookresearch/StarSpace
- https://www.youtube.com/watch?v=aircAruvnKk&feature=youtu.be
- https://research.googleblog.com/2017/10/tensorflow-lattice-flexibility.html
11.10
- book stats learning of Hastie: https://web.stanford.edu/~hastie/CASI_files/PDF/casi.pdf?utm_content=bufferaea53&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer
- http://www.recognition.mccme.ru/pub/RecognitionLab.html/slbook.pdf
- https://www.ted.com/talks/jeremy_howard_the_wonderful_and_terrifying_implications_of_computers_that_can_learn
- tsne map: https://artsexperiments.withgoogle.com/tsnemap/#2072.02,145.27,5710.37,2039.00,138.00,5689.00
- http://people.cs.umass.edu/~brenocon/inlp2016/lectures/05,06-classif-scan.pdf
- capsules https://research.google.com/pubs/pub46351.html
- http://www.cs.toronto.edu/~fritz/absps/transauto6.pdf
- http://cseweb.ucsd.edu/~gary/cs200/s12/Hinton.pdf
10.10
- https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/
- https://github.com/esokolov/ml-course-hse/blob/master/2016-fall/lecture-notes/lecture11-dl.pdf
07.10
05.10
- emoji2 https://medium.com/huggingface/understanding-emotions-from-keras-to-pytorch-3ccb61d5a983
- attention layer: https://gist.github.com/thomwolf/e309e779a08c1ba899514d44355cd6df#file-attention_layer_keras-py
04.10
- hard sigmoid: https://stackoverflow.com/questions/35411194/how-is-hard-sigmoid-defined
- https://data.world/rickyhennessy/startup-names-and-descriptions/workspace/file?filename=startups.csv
- position attention bi-lstm: https://arxiv.org/pdf/1703.10089.pdf
- https://obilaniu6266h16.wordpress.com/2016/02/04/einstein-summation-in-numpy/
- https://arxiv.org/pdf/1512.04916.pdf
- oov: https://github.com/cheng6076/SNLI-attention/blob/master/oov_vec.py
- https://fasttext.cc/blog/2017/10/02/blog-post.html
- https://fasttext.cc/docs/en/language-identification.html
- https://teachablemachine.withgoogle.com/
- https://www.edvancer.in/machine-learning-vs-statistics/
- https://www.slideshare.net/nikhildandekar/maintaining-high-quality-user-generated-content-through-machine-learning
03.10
- https://statweb.stanford.edu/~candes/talks/Wald1.pdf
- http://aimotion.blogspot.com/2011/11/machine-learning-with-python-logistic.html
- https://arxiv.org/pdf/1705.08039.pdf
- https://medium.com/@shanif/our-data-science-workflow-b974f30a124d
- http://u.cs.biu.ac.il/~yogo/DepLing2017invited.pdf
02.10
- https://github.com/DataScienceUB/DeepLearningfromScratch
- https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211
- https://medium.com/@shanif/our-data-science-workflow-b974f30a124d
- https://www.slideshare.net/GaelVaroquaux/computational-practices-for-reproducible-science
30.09
- https://developers.google.com/machine-learning/glossary/
- https://www.slideshare.net/GaelVaroquaux/computational-practices-for-reproducible-science
- https://github.com/SSDS-Croatia/SSDS-2017
- https://sites.google.com/site/ssdatascience2017/lecture-notes
29.09
- feature selection multiple hypothesis testing: http://kelvinguu.com/posts/feature-selection-and-multiple-hypothesis-testing/
- how to do feature selection correctly: http://kelvinguu.com/posts/why-naive-cross-validation-fails-at-feature-selection/
- https://habrahabr.ru/post/326122/
- http://soloro.ru
- http://kelvinguu.com/
- http://jakob.uszkoreit.net/
- coarse to fine QA for long document: https://arxiv.org/pdf/1611.01839.pdf
- generating sentences by editing prototypes: https://arxiv.org/pdf/1709.08878.pdf
28.09
- http://ruder.io/optimizing-gradient-descent/
- https://github.com/kuza55/keras-extras/blob/master/layers/DiffForest.py
- https://arxiv.org/pdf/1702.08835.pdf
- https://docs.google.com/presentation/d/1Ze7BAiWbMPyF0ax36D-aK00VfaGMGvvgD_XuANQW1gU/edit#slide=id.p
- https://uima.apache.org/
27.09
- https://arxiv.org/pdf/1608.01238.pdf
- https://web.stanford.edu/~jurafsky/slp3/16.pdf
- http://www.aclweb.org/anthology/N12-2009
- https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
25.09
- brown cluster: https://arxiv.org/pdf/1608.01238.pdf
- word sense: http://www.cs.columbia.edu/~mcollins/courses/6998-2011/lectures/yarowsky.pdf
- http://www.derczynski.com/sheffield/papers/brown_impact.pdf
- http://people.cs.georgetown.edu/cosc572/f16/21b_dist_slides.pdf
- https://paulx-cn.github.io/blog/5th_Blog/
22.09
21.09
- rossmann nnet https://arxiv.org/pdf/1604.06737.pdf
- http://blog.kaggle.com/2016/01/22/rossmann-store-sales-winners-interview-3rd-place-cheng-gui/
- https://kaggle2.blob.core.windows.net/forum-message-attachments/102102/3454/Rossmann_nr1_doc.pdf
19.09
- memory augmented nnet for nlp: https://drive.google.com/file/d/0B9dqzboiV5u-UmxJQlJqcUl6anM/view
- kaggle quora blog: https://indatalabs.com/blog/data-science/how-to-win-kaggle-competition
18.09
- http://u.cs.biu.ac.il/~yogo/DepLing2017invited.pdf
- http://newsletter.ruder.io/issues/nlp-news-review-of-emnlp-2017-analyzing-bias-google-brain-ama-dragnn-and-allennlp-72584
17.09
- http://xrds.acm.org/blog/2017/07/power-wordnet-use-python/
- https://simons.berkeley.edu/sites/default/files/docs/5950/2017.02.01-21.15.12-simons-nlp-tutorial.pdf
- talking to machine: http://cs.stanford.edu/~pliang/papers/talking-xrds2014.pdf
- zero learning talk: https://www.youtube.com/watch?v=6O5sttckalE
16.09
15.09
14.09
13.09
- strong algos: GBT, RF, SVM for classification: https://arxiv.org/pdf/1708.05070.pdf
- https://medium.com/slalom-engineering/detecting-malicious-requests-with-keras-tensorflow-5d5db06b4f28
- https://github.com/tensorflow/workshops
- https://github.com/chuckyee/cardiac-segmentation
- real time CNN: https://github.com/lampts/face_classification/blob/master/technical_report.pdf
12.09
- https://en.wikipedia.org/wiki/White_Noise_(novel)
- hitchhike guide to the galaxy:
- https://www.cs.bgu.ac.il/~yoavg/uni/bloglike/baboons.html
- http://u.cs.biu.ac.il/~yogo/courses/sem2017/
11.09
- word embedding Komninos https://www.cs.york.ac.uk/nlp/extvec/
- https://ku.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=0954a17c-2702-4d8e-9412-12ae958a2790
- score distribution is better: https://arxiv.org/abs/1707.09861
- make a stable architecture: https://arxiv.org/abs/1707.06799, pretrained embedding, last layer of lstm is crucial.
- https://github.com/lanwuwei/paraphrase-dataset
- why non convex: https://github.com/lanwuwei/paraphrase-dataset
- https://www.reddit.com/r/dataisbeautiful/comments/6ykfvl/average_word_length_for_nytimes_crossword_answers/
10.09
- dilated convnet https://medium.com/@TalPerry/convolutional-methods-for-text-d5260fd5675f
- quora view: https://www.quora.com/challenges#views
09.09
- https://ydkahin.github.io/blog/views-prediction---a-quora-challenge---part-iii-eda-feature-engineering-and-more/?utm_content=buffera82c7&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
- https://github.com/Unbabel/
- https://andre-martins.github.io/docs/emnlp2017_final.pdf
- http://allennlp.org/tutorials/configuration
08.09
- https://www.eff.org/ai/metrics
- http://courses.wcupa.edu/rbove/Berenson/10th%20ed%20CD-ROM%20topics/section12_5.pdf
- percy liang: http://shrdlurn.sidaw.xyz/acl16/
- https://www.youtube.com/watch?v=mhHfnhh-pB4
- https://manning-content.s3.amazonaws.com/download/d/bcdc8c6-3f2e-4a2d-974b-487fc1da7cdf/Chollet_DLwPython_MEAP_V05_ch1.pdf
- http://ofir.io/Neural-Language-Modeling-From-Scratch/
- https://www.thoughtco.com/normal-approximation-to-the-binomial-distribution-3126589
07.09
- https://www.thoughtco.com/normal-approximation-to-the-binomial-distribution-3126589
- http://www.stat.purdue.edu/~xuanyaoh/stat350/xyJan23Lec4.pdf
- https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/mnist_client.py
- https://medium.com/towards-data-science/how-to-deploy-machine-learning-models-with-tensorflow-part-2-containerize-it-db0ad7ca35a7
- https://medium.com/towards-data-science/how-to-deploy-machine-learning-models-with-tensorflow-part-3-into-the-cloud-7115ff774bb6
- https://github.com/Vetal1977/tf_serving_example
- https://github.com/udacity/deep-learning/blob/master/semi-supervised/semi-supervised_learning_2_solution.ipynb
06.09
- https://medium.com/towards-data-science/how-to-deploy-machine-learning-models-with-tensorflow-part-2-containerize-it-db0ad7ca35a7
- https://medium.com/zendesk-engineering/how-zendesk-serves-tensorflow-models-in-production-751ee22f0f4b
- https://github.com/lampts/deep-learning-with-python-notebooks/blob/master/3.5-classifying-movie-reviews.ipynb
05.09
- ds interview: http://www.thedsinterview.com/
- 4 trends: structure is back, re embedding, blackbox transparency, attention: http://www.abigailsee.com/2017/08/30/four-deep-learning-trends-from-acl-2017-part-2.html
- https://github.com/UKPLab/emnlp2017-relation-extraction
- intepret rnn: https://github.com/philipperemy/tensorflow-isan-rnn
04.09
- http://theorangeduck.com/page/neural-network-not-working
- https://dzone.com/articles/natural-language-processing-adit-deshpande-cs-unde
- https://github.com/ddtm/dl-course
03.09
- http://multithreaded.stitchfix.com/blog/2017/08/31/warehouse-layouts/
- https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/
02.09
- https://github.com/AlexandreRobicquet?tab=repositories
- https://pillbox.nlm.nih.gov/developer.html#images
01.09
- http://artemis-ml.readthedocs.io/en/latest/plotting.html
- https://github.com/krystianity/keras-serving
- https://github.com/Lausbert/Exermote/tree/master/ExermotePreprocessingAndTraining
31.08
- http://liufuyang.github.io/2017/04/02/just-another-tensorflow-beginner-guide-4.html
- https://github.com/Lausbert/Exermote/blob/master/ExermotePreprocessingAndTraining/trainer/exermote.py
- http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeaveOneOut.html
30.08
- effective tf: https://github.com/vahidk/EffectiveTensorflow
- knn and bilstm https://arxiv.org/pdf/1708.07863.pdf
- https://nlp.stanford.edu/pubs/jia2017adversarial.pdf
- https://github.com/dformoso/machine-learning-mindmap
29.08
28.08
- https://nlp.stanford.edu/courses/cs224n/2015/reports/29.pdf
- https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463?_lrsc=ce853194-65af-4e5e-a424-7d21025fd0c9
- https://blog.fineighbor.com/tensorflow-dealing-with-imbalanced-data-eb0108b10701
- https://arxiv.org/pdf/1707.05127.pdf
26.08
- https://github.com/zalandoresearch/fashion-mnist/blob/master/README.md
- https://github.com/DrMichaelWang/Kaggle_Cancer_Project/blob/master/Kaggle%20cancer%20-%20text%20key%20word%20frequency%20count_xgboost.ipynb
25.08
- http://krisztianbalog.com/
- https://medium.com/@erogol/designing-a-deep-learning-project-9b3698aef127
- https://github.com/idiap/importance-sampling
- http://krisztianbalog.com/files/talks/russir2016-el.pdf
- https://github.com/kbalog/russir2016-el
24.08
- http://blog.rtwilson.com/how-to-rescue-lost-code-from-a-jupyteripython-notebook/
- http://maxberggren.se/2017/06/18/deep-learning-vs-xgboost/
- http://beamandrew.github.io/deeplearning/2017/06/04/deep_learning_works.html
22.08
- https://gist.github.com/menshikh-iv/0c691219314da35f48f10826b6d34d97
- https://github.com/minimaxir/reactionrnn
- http://www.kdnuggets.com/2017/08/oreilly-nyc-ai-conference-highlights.html
- https://speakerdeck.com/tmylk/pycon-russia-2017-tiematichieskoie-modielirovaniie-dlia-liudiei
- http://newsletter.ruder.io/issues/nlp-news-data-selection-ml-nlp-in-esports-vqa-bias-lyric-annotations-68803
- https://github.com/fchollet/keras/releases/tag/2.0.7
21.08
- https://github.com/rasbt/python-machine-learning-book-2nd-edition
- https://github.com/sjvasquez/instacart-basket-prediction
18.08
17.08
16.08
- http://mltrainings.ru/
- asap https://github.com/ddofer/asap/wiki/Getting-Started:-A-Basic-Tutorial
- https://arxiv.org/pdf/1701.08318.pdf
- genome modeling: https://cs224d.stanford.edu/reports/jessesz.pdf
- https://www.reddit.com/r/MachineLearning/comments/6tu9gu/what_is_the_process_of_deploying_machine_learning/?st=j6ee7uoq&sh=12c17107
- https://github.com/chrisranderson/beholder
- https://github.com/rasbt/deep-learning-book
15.08
14.08
- https://github.com/experiencor/deep-viz-keras
- https://github.com/facebookresearch/SentEval
- http://machinelearningmastery.com/reproducible-results-neural-networks-keras/
- https://github.com/rasbt/deep-learning-book/blob/master/code/model_zoo/file-queues.ipynb
- https://github.com/nlml/np-to-tf-embeddings-visualiser/blob/master/save_embeddings.py
13.08
11.08
- http://chri.stophr.be/
- https://github.com/nadbordrozd/text-top-model/tree/master/ttm/keras_models
- https://tryolabs.com/blog/2017/08/10/finding-the-right-representation-for-your-nlp-data/
- https://www.mira.law/blogposts/2017/5/12/semantic-averaging-of-documents-using-word2vec-representations
10.08
09.08
08.08
- roc auc: http://www.navan.name/roc/
- https://worksheets.codalab.org/worksheets/0x50757a37779b485f89012e4ba03b6f4f/
- https://nlp.stanford.edu/pubs/jia2016recombination.pdf
07.08
- best paper ICML: https://github.com/mlresearch/v70
- https://explosion.ai/blog/prodigy-annotation-tool-active-learning
- https://github.com/brannondorsey/keras_weight_animator
- https://github.com/keveman/tensorflow-tutorial/blob/master/PTB%20Word%20Language%20Modeling.ipynb
06.08
- https://github.com/brannondorsey/keras_weight_animator
- https://github.com/pavitrakumar78/Anime-Face-GAN-Keras
- https://code.facebook.com/posts/289921871474277/transitioning-entirely-to-neural-machine-translation/
- https://prodi.gy/demo
- https://prodi.gy/docs/
04.08
- emoji transfer learning: https://arxiv.org/pdf/1708.00524.pdf
- http://deepmoji.mit.edu/
- importance sampling https://arxiv.org/pdf/1706.00043.pdf
- larochelle https://drive.google.com/file/d/0ByUKRdiCDK7-LXZkM3hVSzFGTkE/view
- bengio https://drive.google.com/file/d/0ByUKRdiCDK7-UXB1R1ZpX082MEk/view
01.08
- pca with jake http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.09-Principal-Component-Analysis.ipynb
- https://openreview.net/pdf?id=HyaF53XYx
31.07
- http://casa.disi.unitn.it/~moschitt/Teaching-slides/slides-AINLP-2016/NER&POS-AINLP.pdf
- noise in feature space: https://openreview.net/pdf?id=HyaF53XYx
- data augmentation using thesaurus: https://arxiv.org/pdf/1509.01626.pdf
- https://theneuralperspective.com/
- http://casa.disi.unitn.it/~moschitt/since2013/2015_SIGIR_Severyn_TwitterSentimentAnalysis.pdf
- https://einstein.ai/research/state-of-the-art-deep-learning-model-for-question-answering
- https://sigmoidal.io/boosting-your-solutions-with-nlp/
- http://www.fast.ai/2017/07/28/deep-learning-part-two-launch/
- https://huyenchip.com/2017/07/28/confession.html
- https://blog.slavv.com/37-reasons-why-your-neural-network-is-not-working-4020854bd607
25.07
- how to ensemble https://mlwave.com/kaggle-ensembling-guide/
- https://www.slideshare.net/TedXiao/winning-kaggle-101-dmitry-larkos-experiences
- http://togelius.blogspot.se/2017/07/some-advice-for-journalists-writing.html
- https://sadanand-singh.github.io/posts/treebasedmodels/
- regression with keras: https://www.datacamp.com/community/tutorials/deep-learning-python
24.05
- data readiness: https://arxiv.org/pdf/1705.02245.pdf
- trophy data scientist: https://peadarcoyle.wordpress.com/2017/07/23/avoiding-being-a-trophy-data-scientist/
- best paper cvpr 17: https://arxiv.org/pdf/1608.06993.pdf, https://github.com/liuzhuang13/DenseNet
- https://github.com/titu1994/DenseNet
- https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf
23.07
22.07
- https://medium.com/huggingface/state-of-the-art-neural-coreference-resolution-for-chatbots-3302365dcf30
- https://github.com/bloomberg/scatteract
- http://gree2.github.io/ocr/2017/03/08/tesseract-ocr-parser-within-tika
21.07
- https://www.youtube.com/watch?v=5sQ8-Er8tXM
- https://github.com/HouJP/kaggle-quora-question-pairs
- http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning/
20.07
19.07
- https://pjreddie.com/darknet/yolo/
- ridge lr: http://www.utstat.toronto.edu/~guerzhoy/303/lec/lec8/ridge.pdf
- https://github.com/catboost/catboost/tree/master/catboost/tutorials
18.07
- https://kiko01b.wordpress.com/2011/07/16/replace-a-word-containing-a-slash-with-sed/
- https://stackoverflow.com/questions/11392478/how-to-replace-a-string-in-multiple-files-in-linux-command-line
- https://blog.keras.io/the-limitations-of-deep-learning.html
- https://github.com/pair-code/facets
17.07
- https://medium.com/@anandr42/the-data-science-delusion-7759f4eaac8e
- https://gist.github.com/menshikh-iv/0c691219314da35f48f10826b6d34d97
- http://www.fast.ai/2016/12/08/org-structure/
- https://github.com/sarchak/MachineLearningNotebooks
- nn for ir: https://arxiv.org/pdf/1707.04242.pdf
- https://github.com/LeiG/Applied-Predictive-Modeling-with-Python
15.07
- http://www.vjsonline.org/scientist-portrait/1500039392
- https://github.com/jeongyoonlee/data-science-process-management
14.07
- large csv: http://pythondata.com/working-large-csv-files-python/
- https://arimo.com/data-science/2016/bayesian-optimization-hyperparameter-tuning/
- bo https://github.com/phvu/misc/blob/master/sf_crimes/crimes_job_nn.py
- foolbox https://arxiv.org/abs/1707.04131
- http://www.aifounded.com/aifounded/recent-evolution-of-the-qa-datasets-and-going-forward/
- https://gist.github.com/thomasjungblut/b58d70d260abf0eff1a8c447f3d07389#file-xgb_bayes_opt_cv-py
- http://www.bosatsu.net/talks/sletten-datascience.pdf
- https://github.com/dipanjanS/text-analytics-with-python/blob/master/Chapter-6/document_similarity.py
13.07
- http://static.squarespace.com/static/51156277e4b0b8b2ffe11c00/t/53ad86e5e4b0b52e4e71cfab/1403881189332/Applied_Predictive_Modeling_in_R.pdf
- https://github.com/minimaxir/predict-reddit-submission-success
- https://www.google.com/finance/company_news?q=NASDAQ%3AFB&ei=ZA5nWaCMMImFsAG8p4ewCw
12.07
- https://github.com/organisciak/Text-Mining-Course
- http://news.efinancialcareers.com/uk-en/285249/machine-learning-and-big-data-j-p-morgan?utm_content=buffer29288&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
- https://twitter.com/search?q=%23machinelearningflashcards&src=tyah
10.07
- https://github.com/Wrosinski/berlin-ml-article
- https://github.com/saulpw/visidata/blob/stable/docs/tours.rst
- social emnlp: https://twitterinadvertising.files.wordpress.com/2017/02/tweeted-about-742-times.pdf
- good pointers on nn: https://drive.google.com/file/d/0ByUKRdiCDK7-UXB1R1ZpX082MEk/view
- https://github.com/0xnurl/keras_character_based_ner
- https://www.aclweb.org/mirror/emnlp2016/proceedings/2016-emnlp-handbook.pdf
06.07
- https://nlp.stanford.edu/software/crf-faq.shtml
- Redcatlab: http://www.redcatlabs.com/2015-11-24_IES-2015_NER-from-Experts/
- embedding compression http://sei.pku.edu.cn/~moull12/paper/cikm16.pdf
- https://github.com/facebookresearch/InferSent
Maxout:
- https://github.com/philipperemy/tensorflow-maxout/blob/master/maxout.py
- https://arxiv.org/pdf/1302.4389.pdf
05.07
- working with text for social: https://de.dariah.eu/tatom/
- clickbait: https://github.com/saurabhmathur96/clickbait-detector
- http://blog.echen.me/2012/01/03/introduction-to-conditional-random-fields/
- http://nbviewer.jupyter.org/github/tpeng/python-crfsuite/blob/master/examples/CoNLL%202002.ipynb
- CRF: https://arjoonn.blogspot.com/2016/01/prerequisites-for-conditional-random.html
- NYT 1M: https://drive.google.com/file/d/0B0CbnDgKi0PyM1FEQXJRTlZtSTg/view
- https://github.com/davidsbatista/NER-English-Gigaword-LDC
- https://github.com/andreasvlachos/ALTA_ML_for_NLP
04.07
- https://www.slideshare.net/RasmusRothe/3-learnings-from-applying-deep-learning-to-real-world-problems
- pytorch vs tf: https://medium.com/@dubovikov.kirill/pytorch-vs-tensorflow-spotting-the-difference-25c75777377b
- https://github.com/Franck-Dernoncourt/NeuroNER/blob/master/trained_models/performances.md
- http://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/
- https://offbit.github.io/how-to-read/
03.07
- https://mathematical-coffees.github.io/mc07-ml/
- ran: http://www.kentonl.com/pub/llz.2017.pdf
- https://www.microsoft.com/en-us/research/wp-content/uploads/2017/06/fntir-neuralir-mitra.pdf
- https://github.com/bdhingra/ga-reader/blob/master/model/GAReader.py
- https://github.com/allenai/deep_qa/tree/master/deep_qa/layers
- Gate for QA: https://arxiv.org/pdf/1606.01549.pdf
- TWINE https://www.aclweb.org/anthology/E/E17/E17-3007.pdf
- 30 nlp interview questions: https://www.analyticsvidhya.com/blog/2017/07/30-questions-test-data-scientist-natural-language-processing-solution-skilltest-nlp/
- mlss: http://nuit-blanche.blogspot.com/2017/06/slides-machine-learning-summer-school.html
- network analysis: http://i.stanford.edu/~jure/pub/talks2/leskovec-networks-01-nodes.pdf
- dl: http://mlss.tuebingen.mpg.de/2017/speaker_slides/Ruslan1.pdf, http://mlss.tuebingen.mpg.de/2017/speaker_slides/Ruslan2.pdf
- https://offbit.github.io/how-to-read/
02.07
- http://ianozsvald.com/2017/07/01/kaggles-mercedes-benz-greener-manufacturing/
- https://github.com/atveit/GANforiPhoneWithCoreML/blob/master/GAN.ipynb
- https://www.raywenderlich.com/164213/coreml-and-vision-machine-learning-in-ios-11-tutorial
- http://www.cs.nyu.edu/shasha/papers/StatisticsIsEasyExcerpt.html
- http://www.physics.csbsju.edu/stats/
30.06
- http://yerevann.com/a-guide-to-deep-learning/
- https://github.com/stitchfix/seetd
- https://github.com/minimaxir/facebook-page-post-scraper
- https://github.com/rykov8/ssd_keras
- https://github.com/yhenon/keras-frcnn
- https://github.com/niderhoff/nlp-datasets
- http://yerevann.github.io/2016/09/21/presentation-sentence-representations-and-question-answering/
29.06
- scorecard application: https://www.linkedin.com/pulse/credit-risk-scorecard-monitoring-tracking-shailendra
- http://cds.nyu.edu/wp-content/uploads/2014/04/bertini_datascience_showcase_May12_2014.pdf
- annotation tool: https://github.com/RicardoUsbeck/QRTool
- ned dataset: https://datahub.io/dataset/reuters-128-nif-ner-corpus
28.06
- wsd: https://web.stanford.edu/class/cs224n/reports/2762042.pdf
- speech and lang processing: http://www.cs.colorado.edu/~martin/slp.html
- nlp course: http://naviglinlp.blogspot.com/2017/
- ted dunning: http://aclweb.org/anthology/J93-1003
- http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html
- ll calculation: http://ucrel.lancs.ac.uk/llwizard.html
- http://www.prooffreader.com/2014/12/most-decade-specific-words-in-billboard.html
- https://github.com/Prooffreader/data-science-blogs
- http://www.prooffreader.com/2015/05/most-characteristic-words-in-pro-and.html
- https://github.com/zafarali?tab=repositories
27.06
- http://nikolenko.livejournal.com/275253.html
- CRF survey: http://nlpx.net/archives/464
- https://github.com/LopezGG/NN_NER_tensorFlow
- https://medium.com/hockey-stick/tl-dr-bayesian-a-b-testing-with-python-c495d375db4d
- https://alexanderdyakonov.files.wordpress.com/2017/06/book_boosting_pdf.pdf
- https://github.com/backstopmedia/tensorflowbook
- csi with tf: http://web.stanford.edu/class/cs20si/syllabus.html
- rnn in excel https://docs.google.com/spreadsheets/d/18bkheoJbmMUqdRFrviUy_TiooSjvvpDqiti7hm2EASY/edit#gid=0
- why elu not (relu) http://www.picalike.com/blog/2015/11/28/relu-was-yesterday-tomorrow-comes-elu/
- https://medium.com/@timanglade/how-hbos-silicon-valley-built-not-hotdog-with-mobile-tensorflow-keras-react-native-ef03260747f3
- https://gist.github.com/J-DM
26.06
- is it significant? http://www.ox.ac.uk/media/global/wwwoxacuk/localsites/uasconference/presentations/P8_Is_it_statistically_significant.pdf
- PSI: http://ucanalytics.com/blogs/population-stability-index-psi-banking-case-study/
- loan credit: http://ucanalytics.com/blogs/data-visualization-case-study-banking/
- FE: https://courses.cit.cornell.edu/cs5304/Lectures/lec5_FeatureEngineering.pdf
- https://github.com/maciejkula/recommender_datasets
- EL: https://github.com/namkhanhtran/EntityLinkingRetrieval-ELR
- https://github.com/raghakot/keras-vis
- https://gh.mltrainings.ru/presentations/Semenov_TinkoffChallenge_2017.pdf
- http://ucanalytics.com/blogs/information-value-and-weight-of-evidencebanking-case/
24.06
23.06
- http://multithreaded.stitchfix.com/blog/2015/08/13/weight-of-evidence/
- WOE: https://github.com/patrick201/information_value
- https://github.com/akashgit/autoencoding_vi_for_topic_models
- https://github.com/carpedm20/variational-text-tensorflow
- AVITM https://openreview.net/pdf?id=BybtVK9lg
- https://www.hackerearth.com/practice/machine-learning/advanced-techniques/winning-tips-machine-learning-competitions-kazanova-current-kaggle-3/tutorial/
- gensim 2.2.3 https://github.com/RaRe-Technologies/gensim/releases/tag/2.2.0
- tkm quora solution: https://www.slideshare.net/tkm2261/quora-76995457
- http://yutori-datascience.hatenablog.com/
22.06
- https://github.com/lampts/kaggle-quora-solution-8th
- https://github.com/Far0n/xgbfi
- http://microposts2016.seas.upenn.edu/challenge.html
- https://github.com/wikilinks/nel/blob/master/notebooks/train.ipynb
- http://www.semantic-web-journal.net/system/files/swj1562.pdf
- https://github.com/jeniyat/TweeTime
21.06
- sentiment corpus: https://www.w3.org/community/sentiment/wiki/Datasets
- paper2code: https://github.com/daviddao/awesome-very-deep-learning
- https://handong1587.github.io/deep_learning/2015/10/09/rnn-and-lstm.html
- task benchmark https://www.eff.org/ai/metrics
- http://willwolf.io/2017/06/15/random-effects-neural-networks/
- http://colah.github.io/posts/2015-08-Backprop/
- http://sdsawtelle.github.io/blog/output/getting-started-with-tensorflow-in-jupyter.html
- https://arxiv.org/pdf/1611.05418.pdf
19.06
- all you need is attention: https://github.com/Kyubyong/transformer
- http://damiano.github.io/learning-similarity-functions-ORM/
- https://github.com/abhishekkrthakur/clickbaits_revisited
- entity filtering and topic detection: thesis-DamianoSpina.pdf
- https://alexanderdyakonov.files.wordpress.com/2017/06/book_boosting_pdf.pdf
- https://github.com/ejmeij/entity-linking-and-retrieval-tutorial
14.06
- automating FE, OneBM: https://arxiv.org/pdf/1706.00327.pdf
- imbalance sklearn: https://glemaitre.github.io/talks/2017_PyParis/#1
- feature selection: http://www.kdnuggets.com/2017/06/practical-importance-feature-selection.html
- https://groups.google.com/a/tensorflow.org/forum/#!msg/discuss/Dhy9MseSXQI/naoy_EElBAAJ
- https://github.com/curiousily
- EL and ER: https://www.dropbox.com/sh/h7fr4yfrih6tisr/Q9BU8Qshcq?lst=
13.06
- https://github.com/ageron/handson-ml
- http://ft-interactive.github.io/visual-vocabulary/
- https://phvu.net/2016/05/13/count-featurizer/
12.06
- https://www.slideshare.net/HJvanVeen/kaggle-presentation
- https://medium.com/udacity/launching-astra-fab2b76b6420
- https://medium.com/@curiousily/tensorflow-for-hackers-part-ii-building-simple-neural-network-2d6779d2f91b
- http://alexanderdyakonov.narod.ru/lpot4emu.pdf
- https://github.com/turboNinja2/Homesite/blob/master/SubmissionsKeras.py
09.06
- https://medium.com/@yoav.goldberg/an-adversarial-review-of-adversarial-generation-of-natural-language-409ac3378bd7
- https://www.slideshare.net/HJvanVeen/feature-engineering-72376750
07.06
- https://medium.com/@curiousily/tensorflow-for-hackers-part-ii-building-simple-neural-network-2d6779d2f91b
- rnn in excel: https://docs.google.com/spreadsheets/d/18bkheoJbmMUqdRFrviUy_TiooSjvvpDqiti7hm2EASY/edit#gid=316082502
- http://nlp.cs.rpi.edu/paper/sigmod2016.pdf
- http://distill.pub/2016/augmented-rnns/
- http://xren7.web.engr.illinois.edu/KDD15-ClusType_v3.pdf
- gp: https://github.com/phvu/misc/blob/master/bayesopt/gaussian_process.py
05.06
02.06
- https://github.com/kailashahirwar/cheatsheets-ai/blob/master/All%20Cheat%20Sheets.pdf
- https://docs.microsoft.com/en-us/cognitive-toolkit/Using-CNTK-with-Keras
01.06
- https://github.com/georgeiswang/Query_Classfication_LSTM
- https://www.oreilly.com/ideas/language-understanding-remains-one-of-ais-grand-challenges
- AI and NLP: https://www.xenonstack.com/blog/overview-of-artificial-intelligence-and-role-of-natural-language-processing-in-big-data
- http://ndres.me/kaggle-past-solutions/
- https://github.com/UKPLab/semeval2017-scienceie
- http://www.nada.kth.se/~ann/exjobb/jan_vandekerkhof.pdf
- https://blog.booking.com/multivariant-tests-for-performance.html
- https://www.ambiverse.com/make-your-news-smarter/
- learn to search: https://hunch.net/~l2s/merged.pdf
31.05
- https://dennisforbes.ca/#a302
- http://www.namedevelopment.com/blog/default.html
- http://www.telegraph.co.uk/finance/personalfinance/comment/4478124/The-name-game.html
- http://new.opencalais.com/wp-content/uploads/2016/01/Thomson-Reuters-Intelligent-Tagging-On-Premise-API-User-Guide.pdf
- http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf
30.05
- http://tkipf.github.io/graph-convolutional-networks/
- http://deeploria.gforge.inria.fr/thomasTalk.pdf
- graph cnn https://github.com/tkipf/gcn
- deeploria: http://deeploria.gforge.inria.fr/
- dedupe: https://github.com/dedupeio/dedupe
- http://sebastianruder.com/multi-task/index.html
- https://arxiv.org/pdf/1705.09585.pdf
- https://clgiles.ist.psu.edu/pubs/jcdl2015-name-disambiguation.pdf
29.05
- why PReLU, maxout: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture6.pdf
26.05
- https://medium.springboard.com/interesting-talks-from-pydata-london-2017-d17b06c1ed5e
- https://github.com/DistrictDataLabs/yellowbrick
- https://github.com/lucjb/pydata2017/blob/master/Multicolinearity.py
- https://github.com/cavaunpeu/dotify/blob/master/notebooks/neural_implicit_mf.ipynb
25.05
- https://www.zanaducloud.com/CC6612B2-B42A-4765-A0C8-4FDB3CEF50E2
- http://willwolf.io/2017/05/18/minimizing_the_negative_log_likelihood_in_english/
- https://github.com/cavaunpeu/dotify/blob/master/notebooks/neural_implicit_mf.ipynb
21.05
- data interview: https://github.com/talolard/Interview
- https://medium.com/@nikasa1889/a-guide-to-receptive-field-arithmetic-for-convolutional-neural-networks-e0f514068807
- https://medium.com/@TalPerry/convolutional-methods-for-text-d5260fd5675f
20.05
19.05
- https://github.com/Microsoft/LightGBM/wiki/Installation-Guide
- https://github.com/ArdalanM/pyLightGBM
18.05
17.05
16.05
- https://www.youtube.com/watch?v=HS7mObQttxU
- https://en.wikipedia.org/wiki/BLEU
- http://www.mathsisfun.com/data/quincunx.html
15.05
- https://github.com/hengluchang/Quora-Paraphrase-Question-Identification
- online w2v: https://markroxor.github.io/gensim/static/notebooks/online_w2v_tutorial.html
- https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm
- http://climberg.de/page/smith-waterman-distance-for-feature-extraction-in-nlp/
13.05
- https://blog.dataiku.com/2015/08/24/xgboost_and_dss
- https://gist.github.com/walterreade/6e20dba959277bd9af77
- https://github.com/lucjb/pydata2017/blob/master/Multicolinearity.py
- https://github.com/christophebourguignat/notebooks/blob/master/Calibration.ipynb
12.05
- https://github.com/christophebourguignat/notebooks
- https://www.kaggle.com/tqchen/understanding-xgboost-model-on-otto-data#script-save-run
- http://education.parrotprediction.teachable.com/p/practical-xgboost-in-python
- https://github.com/makeyourowntextminingtoolkit/makeyourowntextminingtoolkit
- https://docs.google.com/presentation/d/1ukZMzz4rNN0MHegTNgjwLAI-kMWL1mGZPkp1bUCVckc/edit#slide=id.g21fc752465_0_74
11.05
- https://en.wikiquote.org/wiki/X_me_no_Xs#English
- https://www.ff.umb.sk/app/cmsSiteAttachment.php?ID=2348
10.05
- A/B test common pitfalls: https://www.youtube.com/watch?v=NkQ51iyFgs0
09.05
08.05
- https://github.com/benathi/word2gm
- http://mirnazim.org/writings/python-ecosystem-introduction/
- https://speakerdeck.com/marcobonzanini/static-type-analysis-for-robust-data-products-at-pydata-london-2017
- https://github.com/konradczechowski/discopt/blob/master/discopt_general_usage.ipynb
- high order fm: https://arxiv.org/pdf/1607.07195.pdf
- https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/OperaSlides.pdf
- https://pydata.org/london2017/schedule/
- http://www.kemaswill.com/uncategorized/from-matrix-factorization-to-factorization-machines/
- https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/OperaSlides.pdf
- https://github.com/geffy/tffm
05.05
- https://github.com/geffy/tffm
- ds handbook: https://github.com/jakevdp/PythonDataScienceHandbook
- https://github.com/bstriner/keras-tqdm
- https://github.com/src-d/wmd-relax
- https://github.com/krasch/presentations/blob/master/unit_testing_data_science.pdf
04.05
- https://www.kaggle.com/wangyijia/xgboost-tfidf-logloss-0-3/comments/code
- https://www.kaggle.com/jturkewitz/magic-features-0-03-gain/
03.05
- https://github.com/stared/keras-sequential-ascii
- https://github.com/abhishekkrthakur/clickbaits_revisited
02.05
30.04
27.04
26.04
- bm25 implemention: https://github.com/alexeygrigorev/avito-duplicates-kaggle/blob/master/bm25.py
- bm25 vs tfidf: https://lettier.github.io/posts/2016-10-25-tf-idf-vsm-vs-bm25-with-vuejs.html
- https://kkulma.github.io/2017-04-24-determining-optimal-number-of-clusters-in-your-data/
- https://www.kaggle.com/c/quora-question-pairs/discussion/32069#177710
- https://www.reddit.com/r/MachineLearning/comments/67gonq/d_batch_normalization_before_or_after_relu/?st=j1y4j36m&sh=1d708b41
25.04
- https://github.com/ChenglongChen/Kaggle_CrowdFlower/tree/master/Code/Feat
- https://dnc1994.com/2016/05/rank-10-percent-in-first-kaggle-competition-en/
- https://www.slideshare.net/HJvanVeen/feature-engineering-72376750
- http://hotgram1.filmiro.com/2017/03/11/109/6118559518814109698.pdf
24.04
- http://arogozhnikov.github.io/2016/04/28/demonstrations-for-ml-courses.html
- https://github.com/Babylonpartners/fastText_multilingual
- https://github.com/pYr0rAGE/KaggleQuoraQuestionSimilarity/blob/master/notebooks/Initial%20Analysis.ipynb
- http://aylien.com/web-summit-2015-tweets-part1
- https://github.com/pksohn/tweet-clustering
- http://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html
21.04
20.04
- https://gab41.lab41.org/batch-normalization-what-the-hey-d480039a9e3b
- https://github.com/nbgallery/nbgallery.github.io
- https://github.com/yanyang729/656_kaggle_quora_question_pair
- https://github.com/lodrice/LabelGAN
- https://github.com/bathulas/kaggle-quora/blob/master/quora.ipynb
- https://github.com/ArtistScript/Kaggle-Quora-/blob/master/kaggle/xgb.py
- https://github.com/Mustufain/Quora--Detecting-Duplicate-Questions/blob/master/Quora_Features.py
- https://github.com/codeheadshopon/Quora-Question-Pair-Classification/blob/master/SImple_Lstm_Short
19.04
- https://tryolabs.com/blog/machine-learning-deep-learning-conferences/?N
- https://gab41.lab41.org/batch-normalization-what-the-hey-d480039a9e3b
- https://gab41.lab41.org/jupyter-notebook-sharing-is-caring-5ed4831d7f71
- http://blog.smola.org/post/4110255196/real-simple-covariate-shift-correction
18.04
- http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization.html
- http://blog.datadive.net/selecting-good-features-part-i-univariate-selection/
- http://blog.smola.org/post/4110255196/real-simple-covariate-shift-correction
- http://www.mitpressjournals.org/doi/abs/10.1162/089976602753284446#.WPVs_VOGPdQ
- http://scikit-learn-general.narkive.com/ShZKenFK/real-simple-covariate-shift-correction-using-logistic-regression
- http://wan.poly.edu/KDD2012/docs/p168.pdf
- http://www.ml.uni-saarland.de/Publications/Hein%20-%20Binary%20Classification%20under%20Sample%20Selection%20Bias(2008).pdf
- http://www.gatsby.ucl.ac.uk/~gretton/papers/covariateShiftChapter.pdf
- SVD http://econometricsense.blogspot.com/2011/11/singular-value-decomposition-and-text.html
- Pearson vs Kendall http://www.statisticssolutions.com/correlation-pearson-kendall-spearman/
- http://www.gatsby.ucl.ac.uk/~gretton/papers/covariateShiftChapter.pdf
17.04
- df rolling http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html
- https://github.com/DingKe/qrnn
- https://spacy.io/docs/usage/training-ner
- https://www.tensorflow.org/versions/master/api_docs/python/tf/contrib/crf/viterbi_decode
16.04
- http://llcao.net/cu-deeplearning17/project/midterm_summarize.pdf
- https://gist.github.com/stared/dfb4dfaf6d9a8501cd1cc8b8cb806d2e
- http://www.orbifold.net/default/2016/11/25/some-feedforward-neural-networks-using-keras/
15.04
- http://blog.nikhilgarg.me/2016/05/a-million-different-lives.html
- http://www.aclweb.org/anthology/W16-16
14.04
- http://blog.nikhilgarg.me/
- https://www.slideshare.net/NikhilGarg51?utm_campaign=profiletracking&utm_medium=sssite&utm_source=ssslideview
- https://qconsf.com/sf2016/system/files/presentation-slides/scaling_quality_using_machine_learning_-_qcon_sf_2016.pdf
- pydatalondon, May: https://pydata.org/london2017/schedule/
- https://github.com/airalcorn2/RankNet
13.04
- hamaru: https://arxiv.org/pdf/1704.03477.pdf
- https://github.com/maartenbreddels/ipyvolume
- https://pydata.org/amsterdam2017/schedule/presentation/11/
- https://github.com/godatadriven/risk-analysis
- https://github.com/godatadriven/pydata-2017-dsp-tutorial
12.04
- modern nlp: http://nbviewer.jupyter.org/github/skipgram/modern-nlp-in-python/blob/master/executable/Modern_NLP_in_Python.ipynb
- maxout: http://www-etud.iro.umontreal.ca/~goodfeli/maxout.html
- https://jamesmccaffrey.wordpress.com/2013/11/05/why-you-should-use-cross-entropy-error-instead-of-classification-error-or-mean-squared-error-for-neural-network-classifier-training/
- https://github.com/dmesquita/understanding_tensorflow_nn
- https://pub.uni-bielefeld.de/data
- gated non consecutive cnn: https://arxiv.org/pdf/1512.05726.pdf
- tf for baby: https://medium.freecodecamp.com/big-picture-machine-learning-classifying-text-with-neural-networks-and-tensorflow-d94036ac2274
- acl 16 workshop: http://www.aclweb.org/anthology/W16-16
10.04
- http://nbviewer.jupyter.org/github/skipgram/modern-nlp-in-python/blob/master/executable/Modern_NLP_in_Python.ipynb
- https://github.com/rykov8/ssd_keras/blob/master/SSD_training.ipynb
- https://vkolachalama.blogspot.in/2016/05/keras-implementation-of-mlp-neural.html
- best practice: https://arxiv.org/pdf/1704.01568.pdf
- https://www.slideshare.net/khomenko1/from-data-science-to-production-deploy-scale-enjoy-pydata-amsterdam-mar-12-2016
- https://github.com/gianlucahmd/loads_clustering/blob/master/loads_clustering.ipynb
08.04
- ffm: http://www.csie.ntu.edu.tw/~r01922136/slides/ffm.pdf
- https://medium.com/startup-grind/i-reverse-engineered-a-500m-artificial-intelligence-company-in-one-week-heres-the-full-story-d067cef99e1c
- http://www.learnbymarketing.com/950/winning-a-kaggle-competition-analysis/
- imbalance: https://silicon-valley-data-science.github.io/learning-from-imbalanced-classes/Gaussians.html
- https://www.svds.com/learning-imbalanced-classes/
- 3 idiots, ad prediction criteo: http://www.csie.ntu.edu.tw/~r01922136/kaggle-2014-criteo.pdf
- https://docs.google.com/presentation/d/1bte84MNQu3LDq5WjNMP3ZBDsMfn0eKlnwBvvKFBWVFI/edit#slide=id.g20276450fa_1_28
- https://medium.com/startup-grind/i-reverse-engineered-a-500m-artificial-intelligence-company-in-one-week-heres-the-full-story-d067cef99e1c
07.04
- https://gist.github.com/udibr
- tf sequence tagging: https://guillaumegenthial.github.io/sequence-tagging-with-tensorflow.html
- tweet2vec cluster: https://github.com/vendi12/tweet2vec_clustering
- learning to generate review and discore sentiment: https://github.com/openai/generating-reviews-discovering-sentiment
- https://aclweb.org/anthology/K15-1013
- https://github.com/brmson/dataset-sts
- https://drive.google.com/drive/folders/0B-btHzfJjPnobXZ0MndjSkxkRkk
06.04
- http://pasky.or.cz/cp/poster-repl4nlp2016.pdf
- https://www.quora.com/How-do-I-learn-deep-learning-in-2-months
- non-linear transformation: https://swarbrickjones.wordpress.com/2017/03/28/cross-entropy-and-training-test-class-imbalance/#more-2486
- homedepot: https://github.com/ChenglongChen/Kaggle_HomeDepot
05.04
- https://github.com/kootenpv/tweetokenize
- http://labs.septeni-technology.jp/
- pointer LSTM: https://github.com/keon/pointer-networks
- https://rare-technologies.com/text-summarization-in-python-extractive-vs-abstractive-techniques-revisited/
- https://github.com/mattilyra/glove2h5
04.04
- http://slides.com/smerity/quora-frontiers-of-memory-and-attention#/35
- https://github.com/cesc-park/CRCN/blob/master/keras/examples/kaggle_otto_nn.py
- https://www.visme.co/make-information-beautiful/dona-wong-visualizing-financial-data/
- http://web.stanford.edu/class/cs224n/reports.html
- http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/
03.04
- https://nlp.stanford.edu/~socherr/pa4_ner.pdf
- https://github.com/chokkan/crfsuite/blob/master/example/ner.py
- https://www.reddit.com/r/MachineLearning/comments/3dz3fl/dl_architectures_for_entity_recognition_and_other/
01.04
31.03
30.03
- deepnl: https://github.com/attardi/deepnl
- https://gist.github.com/jeremystan/c236000a4159f9d47c28784fa6693c45#file-initial_architecture-py
- Relationship Modeling network: https://pbs.twimg.com/media/C7dvymYVQAAut9_.jpg:large
- https://tech.instacart.com/deep-learning-with-emojis-not-math-660ba1ad6cdc
- Rethink RNN: https://docs.google.com/document/d/1X9f-wst8QhrCCFTWiJIz6vq1qAOlpyYAUo_kaFf0J8M/edit
- crfasrnn: https://github.com/torrvision/crfasrnn
29.03
- silicon valley ds: https://github.com/silicon-valley-data-science/RNN-Tutorial
- https://github.com/richliao/textClassifier
- https://richliao.github.io/supervised/classification/2016/12/26/textclassifier-RNN/
28.03
- https://www.dropbox.com/s/tohrsllcfy7rch4/SimpleQuestions_v2.tgz
- https://github.com/sujitpal/dl-models-for-qa
- http://allenai.org/data.html
- https://www.nervanasys.com/building-skip-thought-vectors-document-understanding/
27.03
- https://truyentran.github.io/talks/ai16-tute-part-I.pdf
- https://github.com/truyentran
- RE with LSTM in TF: https://github.com/thunlp/TensorFlow-NRE
- http://www.exegetic.biz/blog/2015/12/making-sense-logarithmic-loss/
- http://nghiaho.com
- https://liusida.github.io/2016/10/31/translate-from-tf-2-keras/
26.03
- https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf
- https://github.com/stanfordnlp/cs224n-winter17-notes/
25.03
- https://github.com/seatgeek/fuzzywuzzy
- misunderstanding of P: http://tuanvannguyen.blogspot.com/2017/03/10-hieu-lam-ve-tri-so-p-trong-khoa-hoc.html
23.03
- http://cs224d.stanford.edu/reports_2016.html
- https://github.com/hycis/bidirectional_RNN
- https://github.com/MLWave/Kaggle-Ensemble-Guide
- https://github.com/stanfordnlp/cs224n-winter17-notes
- https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CIKM14_tutorial_HeGaoDeng.pdf
21.03
- DSSM: https://www.microsoft.com/en-us/research/project/dssm/?from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fprojects%2Fdssm%2F
- MS NLP https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CIKM14_tutorial_HeGaoDeng.pdf
20.03
- https://github.com/kweonwooj/kaggle_santander_product_recommendation
- bn in application: https://github.com/yskmt/kaggle-otto/tree/master/keras
- https://github.com/WenchenLi/kaggle/blob/master/otto/keras/kaggle_otto_nn.py
- https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md
I haven't gone back to check what they are suggesting in their original paper, but I can guarantee that recent code written by Christian applies relu before BN. It is still occasionally a topic of debate, though.
17.03
- install keras on gpu: please use --no-deps flags: https://github.com/fchollet/keras/wiki/Keras-2.0-release-notes
- quora again: https://github.com/abhishekkrthakur/is_that_a_duplicate_quora_question
- clickbait: https://github.com/abhishekkrthakur/clickbaits_revisited
16.03
- http://www.cs.cornell.edu/courses/cs474/2005fa/Handouts/advanced-qa.pdf
- https://github.com/fchollet/keras/wiki/Keras-2.0-release-notes
- https://www.slideshare.net/anirudhkoul/squeezing-deep-learning-into-mobile-phones
- https://automatedinsights.com/blog/the-python-nlp-ccosystem-a-short-and-very-opinionated-guide
- https://metamind.io/research/learning-when-to-skim-and-when-to-read
15.03
- https://github.com/rguthrie3/DeepLearningForNLPInPytorch/blob/master/Deep%20Learning%20for%20Natural%20Language%20Processing%20with%20Pytorch.ipynb
- http://pytorch.org/#pip-install-pytorch
- tweet calendar: http://ec2-54-170-89-29.eu-west-1.compute.amazonaws.com:8000//month/201703/
- https://www.cs.cornell.edu/courses/cs6740/2010sp/
- hello keras 2: I love it, https://blog.keras.io/
- how to annotate: https://docs.google.com/document/d/1caUD8h-M117pKlds8rRP8jzQ0GN41NzD9UYvog4NyuQ/edit#heading=h.ggo1tu2159da
- social health mining: http://www.cs.jhu.edu/~mdredze/code.php
- http://www.sciencedirect.com/science/article/pii/S088523081630002X
14.03
- seq2seq on tf(general) https://github.com/google/seq2seq
- sentencepiece tokenizer https://github.com/google/sentencepiece
13.03
- visual search in es: https://github.com/tuan3w/visual_search
- 9-15% twitter active users are bot: https://arxiv.org/pdf/1703.03107.pdf
- http://www.springer.com/gp/book/9783319472409
- https://arxiv.org/pdf/1602.04427.pdf
- Socher at LXMS: http://lxmls.it.pt/2014/socher-lxmls.pdf
- use vgg to classify cat/dog: https://gist.github.com/embanner/6149bba89c174af3bfd69537b72bca74
- https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
10.03
09.03
- https://github.com/fastai/courses
- https://www.slideshare.net/0xdata/arno-candel-aibythebay-030617
- https://nodexlgraphgallery.org/Pages/Graph.aspx?graphID=98336
- https://medium.com/salesforce-engineering/salesforce-research-deep-learning-breakthroughs-d83c8b2ac4c3#.a9zswyhov
08.03
- CMU RF and control course: https://katefvision.github.io/
- https://www.slideshare.net/JasonKessler/turning-unstructured-content-into-kernels-of-ideas/52
- norvig ngram: http://norvig.com/ngrams/
07.03
- https://www.slideshare.net/JasonKessler/turning-unstructured-content-into-kernels-of-ideas/52
- https://arxiv.org/pdf/1703.00565.pdf
- https://jasonkessler.github.io/st-sim.html
- Dr Bao H.T JAIST: http://www.jaist.ac.jp/~bao/VIASM-SML/Lecture/L1-ML%20overview.pdf
- Khanh UMD: https://github.com/khanhptnk?tab=repositories
06.03
- http://campuspress.yale.edu/yw355/deep_learning/
- https://github.com/georgeiswang/Keras_Example
- https://github.com/thomasj02/DeepLearningProjectWorkflow
- https://tensorflow.github.io/serving/docker.html
- Deep learning in NLP: http://campuspress.yale.edu/yw355/deep_learning/
05.03
- fcholet: xception https://arxiv.org/pdf/1610.02357.pdf
04.03
- https://github.com/jfsantos/TensorFlow-Book
- https://github.com/jfsantos/keras-tutorial/blob/master/notebooks/5%20-%20Improving%20generalization%20with%20regularizers%20and%20constraints.ipynb
02.03
- https://explosion.ai/blog/supervised-similarity-siamese-cnn
- https://github.com/TeamHG-Memex/eli5/blob/master/README.rst
- https://github.com/cemoody/topicsne?files=1
- http://smerity.com/articles/2017/deepcoder_and_ai_hype.html
01.03
- http://smerity.com/articles/2017/deepcoder_and_ai_hype.html
- Twitter NER annotation: https://docs.google.com/document/d/12hI-2A3vATMWRdsKkzDPHu5oT74_tG0-PPQ7VN0IRaw/edit
- WNUT 19, Japan, result: https://noisy-text.github.io/2016/pdf/WNUT19.pdf
- pytorch vs keras/tf: https://www.reddit.com/r/MachineLearning/comments/5w3q74/d_so_pytorch_vs_tensorflow_whats_the_verdict_on/
- quora duplicate question detection: accuracy 1%(84.8) higher but 100x params than my model: https://github.com/abhishekkrthakur/is_that_a_duplicate_quora_question/blob/master/deepnet.py
- https://github.com/chiphuyen/tf-stanford-tutorials?files=1
- pretrained fasttext on wikipedia: https://github.com/facebookresearch/fastText
28.02
- https://github.com/uclmr/emoji2vec/blob/master/TwitterClassification.ipynb
- http://blog.outcome.io/pytorch-quick-start-classifying-an-image/
- https://blog.mariusschulz.com/2014/06/03/why-using-in-regular-expressions-is-almost-never-what-you-actually-want
27.02
- random walk -> graph -> node2vec: http://www.kdd.org/kdd2016/subtopic/view/node2vec-scalable-feature-learning-for-networks
- URL2VEC: http://www.newfoundland.nl/wp/?p=112
- 5 diseases of doing science: http://www.sciencedirect.com/science/article/pii/S104898431730070X
- recommended book: https://www.amazon.com/Language-Processing-Perl-Prolog-Implementation/
- Martin DL without PHD: https://github.com/martin-gorner/tensorflow-mnist-tutorial
- https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/#0
- https://docs.google.com/presentation/d/18MiZndRCOxB7g-TcCl2EZOElS5udVaCuxnGznLnmOlE/pub?slide=id.p
- https://docs.google.com/presentation/d/1TVixw6ItiZ8igjp6U17tcgoFrLSaHWQmMOwjlgQY9co/pub?slide=id.p
26.02
- https://medium.com/zendesk-engineering/how-zendesk-serves-tensorflow-models-in-production-751ee22f0f4b#.diz6kjaus
- https://github.com/gkamradt/Lessons-Learned-Data-Science-Interviews/blob/master/Lessons%20Learned%20-%20Data%20Science%20Interviews.pdf
25.02
- gensim 1.0: https://rare-technologies.com/gensim-switches-to-semantic-versioning/
- https://www.slideshare.net/AhmadQamar3/using-deep-neural-networks-for-fashion-applications
24.02
- how to init uniform (-b,b), summerschool of marek http://www.marekrei.com/blog/26-things-i-learned-in-the-deep-learning-summer-school/
- Beam preprocessing: https://research.googleblog.com/2017/02/preprocessing-for-machine-learning-with.html
- https://github.com/offbit/char-models/blob/master/doc-rnn2.py
23.02
- http://affinelayer.com/pixsrv/
- https://github.com/affinelayer/pix2pix-tensorflow#datasets-and-trained-models
22.02
- https://github.com/offbit/char-models
- https://offbit.github.io/how-to-read/
- https://hackernoon.com/learning-ai-if-you-suck-at-math-p4-tensors-illustrated-with-cats-27f0002c9b32#.xqpspe69f
- Beam search, NN tut from Quoc Le: https://cs.stanford.edu/~quocle/tutorial2.pdf
- marek sequence tagger: https://github.com/marekrei/sequence-labeler
21.02
- https://github.com/marekrei/sequence-labeler
- markrei word + char attention: http://www.marekrei.com/blog/
- datalab: https://github.com/googledatalab/
- https://tw.pycon.org/2017/en-us/speaking/cfp/
20.02
- https://github.com/ZhitingHu/logicnn
- http://www.cs.cmu.edu/~zhitingh/data/acl16harnessing_slides.pdf
- Lample: https://arxiv.org/pdf/1603.01360.pdf, https://github.com/glample/tagger
- stacked NN LSTM: https://github.com/clab/stack-lstm-ner
- https://github.com/napsternxg/DeepSequenceClassification/blob/master/model.py
- chatbot: https://github.com/Marsan-Ma/tf_chatbot_seq2seq_antilm
- keras crf https://github.com/pressrelations/keras/blob/98b2bb152b8d472150a3fc4f91396ce7f767bed9/examples/conll2000_bi_lstm_crf.py
- Ma Xue, CMU: best paper in ACL 2016, Germany https://github.com/XuezheMax/LasagneNLP
- rnn+cnn+crf: https://arxiv.org/pdf/1603.01354.pdf
- https://github.com/napsternxg/DeepSequenceClassification/blob/master/model.py
- https://github.com/pth1993/vn_spam_sms_filtering/blob/master/src/sms_filtering.py
- https://data36.com/wp-content/uploads/2016/08/practical_data_dictionary_final_data36_tomimester_published.pdf
19.02
- scikit plot: https://github.com/reiinakano/scikit-plot
18.02
- really cool Francis: https://github.com/frnsys/
- ai notes: http://frnsys.com/ai_notes/ai_notes.pdf
- brilliant wrong, ROC explanation: http://arogozhnikov.github.io/2015/10/05/roc-curve.html
- yandex MLSchool in Londo: https://github.com/yandexdataschool/MLatImperial2017/
17.02
- RNNs bag of applications: http://www.cs.toronto.edu/~urtasun/courses/CSC2541_Winter17/RNN.pdf
- BiMPM https://arxiv.org/pdf/1702.03814.pdf
- TextSum step by step: http://www.fastforwardlabs.com/luhn/
- https://keon.io/rl/deep-q-learning-with-keras-and-gym/
- https://medium.com/startup-grind/fueling-the-ai-gold-rush-7ae438505bc2#.ny8j80fl3
- big 5 for DS: https://www.quora.com/How-do-you-judge-a-good-Data-scientist-with-just-5-questions
- keon: https://github.com/keon/awesome-nlp
- quid: word2vec + wikipedia: https://quid.com/feed/how-quid-improved-its-search-with-word2vec-and-wikipedia?utm_content=42445351&utm_medium=social&utm_source=twitter
- https://gist.github.com/asmeurer/5843625
16.02
- market2vec: https://github.com/talolard/MarketVectors/blob/master/preparedata.ipynb
- anything2vec: https://gist.github.com/nzw0301/333afc00bd508501268fa7bf40cafe4e
- https://github.com/bradleypallen/keras-movielens-cf
- https://www.slideshare.net/t_koshikawa?utm_campaign=profiletracking&utm_medium=sssite&utm_source=ssslideview
- https://github.com/lipiji/App-DL
- http://www.slideshare.net/LimZhiYuanZane/deep-learning-for-stock-prediction
- https://github.com/kh-kim/stock_market_reinforcement_learning
- stock2vec: https://github.com/kh-kim/stock2vec
- deepwalk and word2vec: http://nadbordrozd.github.io/blog/2016/06/13/deepwalking-with-companies/
- http://m-mitchell.com/NAACL-2016/SemEval/SemEval-2016.pdf
- gandl: https://github.com/codekansas/gandlf
- predictive on stock trading with sentiment: http://www.kdnuggets.com/2016/01/sentiment-analysis-predictive-analytics-trading-mistake.html
- https://github.com/bradleypallen/keras-emoji-embeddings
- https://github.com/bradleypallen/keras-quora-question-pairs/blob/master/README.md
- DESM: https://www.microsoft.com/en-us/research/project/dual-embedding-space-model-desm/
15.02
- sentiment analysis on Super Bowl: http://blog.aylien.com/sentiment-analysis-of-2-2-million-tweets-from-super-bowl-51/
- spacy advanced text analysis: https://github.com/JonathanReeve/advanced-text-analysis-workshop-2017/blob/master/advanced-text-analysis.ipynb
- pytorch: https://github.com/vinhkhuc/PyTorch-Mini-Tutorials
- Quora engineering: https://engineering.quora.com/Semantic-Question-Matching-with-Deep-Learning
- Space bag of nns: https://explosion.ai/blog/quora-deep-text-pair-classification
- AUC 0.875 http://analyzecore.com/2017/02/08/twitter-sentiment-analysis-doc2vec/
14.02
- event detection, extraction, triggering, mention: https://github.com/anoperson/jointEE-NN
- batch renorm, due to sensitivity of batch size, initiation: https://arxiv.org/pdf/1702.03275.pdf
- https://github.com/bmitra-msft/Demos/blob/master/notebooks/DESM.ipynb
- nn for document ranking, mistra, ms cntk: https://github.com/bmitra-msft/NDRM
- TFDevSummit: https://events.withgoogle.com/tensorflow-dev-summit/watch-the-videos/#content
13.02
- Quora siamese: https://github.com/erogol/QuoraDQBaseline
12.02
10.02
- kerlym: https://github.com/osh/kerlym
- ICLR 17: https://amundtveit.com/2016/11/12/deep-learning-for-natural-language-processing-iclr-2017-discoveries/
- https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation.ipynb
- all but of the top, pca on word2vec: https://arxiv.org/pdf/1702.01417.pdf
- https://github.com/peter3125/sentence2vec
08.02
- polarised term for document anonymisation: https://ddu1.github.io/Anonymization/
- oxford course: https://github.com/oxford-cs-deepnlp-2017/lectures
- tf fold: dynamic batching: https://research.googleblog.com/2017/02/announcing-tensorflow-fold-deep.html
- https://www.insight-centre.org/sites/default/files/publications/newhorizons_online.pdf
- https://github.com/chsasank/Traffic-Sign-Classification.keras/blob/master/Traffic%20Sign%20Classification.ipynb
07.02
- openrefine: http://alexpetralia.com/posts/2015/12/14/the-problem-with-openrefine-clean-vs-messy-data
- https://www.linkedin.com/pulse/keras-neural-networks-win-nvidia-titan-x-abhishek-thakur
- deep q learning with keras and gym: https://keon.io/rl/deep-q-learning-with-keras-and-gym/
- structured attention, Yoon Kim and Hoang Luong: https://github.com/harvardnlp/struct-attn
- understanding DL requires rethinking generalisation: https://openreview.net/pdf?id=Sy8gdB9xx
- GAN: https://github.com/osh/KerasGAN
06.02
- http://lxmls.it.pt/2016/LxMLS2016.pdf
- http://www.cs.umb.edu/~twang/file/tricks_from_dl.pdf
- https://svn.spraakdata.gu.se/repos/richard/pub/ml2016_web/LT2306_2016_example_solution.pdf
- https://svn.spraakdata.gu.se/repos/richard/pub/ml2015_web/l7.pdf
- https://chsasank.github.io/spoken-language-understanding.html
- ML4NLP: http://stp.lingfil.uu.se/~shaooyan/ml/nn.part2.pdf
- Topic Modeling for extracting key words: http://bugra.github.io/work/notes/2017-02-05/topic-modeling-for-keyword-extraction/
- Google Scraper: https://github.com/NikolaiT/GoogleScraper
- Richard Johanson: https://svn.spraakdata.gu.se/repos/richard/pub/ml2015_web/l7.pdf
- https://code.facebook.com/posts/457605107772545/under-the-hood-building-accessibility-tools-for-the-visually-impaired-on-facebook/
- l2svm outperforms softmax: https://arxiv.org/pdf/1306.0239v4.pdf
- xent vs hinge loss: http://cs231n.github.io/linear-classify/
- https://github.com/nzw0301/keras-examples/blob/master/Skip-gram-with-NS.ipynb
- model zoo pytorch: https://github.com/Cadene/tensorflow-model-zoo.torch
- quora question pair: http://www.forbes.com/sites/quora/2017/01/30/data-at-quora-first-quora-dataset-release-question-pairs/#3d052ef475cb
- Psychometric, CA and Trump: https://motherboard.vice.com/en_us/article/how-our-likes-helped-trump-win
27.1
- https://github.com/bbelderbos/Codesnippets/tree/master/python
- https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/index.htm
26.1
- https://jaan.io/food2vec-augmented-cooking-machine-intelligence/
- http://multithreaded.stitchfix.com/blog/2017/01/23/scaling-ds-at-sf-slides-from-ddtexas/
- https://docs.docker.com/docker-for-mac/
- https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/index.html#1
- https://petewarden.com/
25.1
- question duplication of Quora: https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs
- stats for hackers code: https://github.com/croach/blog/tree/master/content
- http://multithreaded.stitchfix.com/blog/2017/01/23/scaling-ds-at-sf-slides-from-ddtexas/
24.1
- wordrank: http://deliprao.com/archives/124
- code: https://bitbucket.org/shihaoji/wordrank
- https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/WordRank_wrapper_quickstart.ipynb
- https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/WordRank_wrapper_quickstart.ipynb
- https://github.com/parulsethi/gensim/blob/wordrank_wrapper/docs/notebooks/Wordrank_comparisons.ipynb
- https://rare-technologies.com/wordrank-embedding-crowned-is-most-similar-to-king-not-word2vecs-canute/
23.1
- nlp terms for novice: http://www.datasciencecentral.com/profiles/blogs/10-common-nlp-terms-explained-for-the-text-analysis-novice?utm_content=buffer172af&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
- blockchain: https://opendatascience.com/blog/what-is-the-blockchain-and-why-is-it-so-important/
- nbgrader: https://github.com/jupyter/nbgrader
- Adversarial ML: https://mascherari.press/introduction-to-adversarial-machine-learning/
- 4 questions for G. Hinton: https://gigaom.com/2017/01/16/four-questions-for-geoff-hinton/
- Debug in TF: https://wookayin.github.io/TensorflowKR-2016-talk-debugging/#1
20.1
- demysify DS: https://docs.google.com/presentation/d/1N3KhPA--cQNjF9mD4Z4IzjKKFdwq1Ff6wQ6NN102uIk/edit#slide=id.g1be386a8a6_0_21
- ML on mobile: http://alexsosn.github.io/ml/2015/11/05/iOS-ML.html
- https://www.bignerdranch.com/blog/use-tensorflow-and-bnns-to-add-machine-learning-to-your-mac-or-ios-app/
- https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/ios_examples
- https://github.com/dennybritz/sentiment-analysis
19.1
- Facebook again, pytorch: http://pytorch.org/
- https://rare-technologies.com/new-gensim-feature-author-topic-modeling-lda-with-metadata/
- pointer network: https://github.com/devsisters/pointer-network-tensorflow
18.1
- http://blog.dennybritz.com/2017/01/17/engineering-is-the-bottleneck-in-deep-learning-research/
- ml for practitioner: http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf
- write dl/nn from scratch: https://github.com/dmlc/minpy
17.1
- improve headlines with salient words and seo score: http://www-personal.umich.edu/~tdszyman/misc/nlpmj16.pdf
- text summarisation: http://www-personal.umich.edu/~tdszyman/misc/summarization15.pdf
- word embedding over time: http://www-personal.umich.edu/~tdszyman/misc/InsightSIGNLP16.pdf
- victor DS politech in France: https://github.com/Vict0rSch/data_science_polytechnique
- Thien NYU: http://www.cs.nyu.edu/~thien/
- tonymooori: https://github.com/TonyMooori/studying
- learning theory: https://web.stanford.edu/class/cs229t/notes.pdf
- time series predictions: http://danielhnyk.cz/predicting-sequences-vectors-keras-using-rnn-lstm/
16.1
- Edward Dustin Tran in TF already, so cool: https://arxiv.org/pdf/1701.03757v1.pdf
- keras in tensorflow now on. @fchollet informed on Twitter.
- squeezednet = tiny alexnet (5MB) https://github.com/rcmalli/keras-squeezenet
- won $5k: https://medium.freecodecamp.com/recognizing-traffic-lights-with-deep-learning-23dae23287cc#.9yb31nsm4
- https://github.com/karoldvl/paper-2015-esc-convnet/blob/master/Code/Results.ipynb
15.1
- deep spell code: https://github.com/MajorTal/DeepSpell
- draw svg in jupyter: https://github.com/uclmr/egal
- sound classification with cnn: https://github.com/karoldvl/paper-2015-esc-convnet
14.1
- https://medium.com/@majortal/deep-spelling-9ffef96a24f6
- line bot + rnn + tf, vanhuyz: https://github.com/vanhuyz/line-sticker-bot
- https://github.com/Vict0rSch/deep_learning/tree/master/keras
- https://github.com/openai/pixel-cnn
- AWS Lambda: http://blog.matthewdfuller.com/p/aws-lambda-pricing-calculator.html
- deep text corrector: http://atpaino.com/2017/01/03/deep-text-correcter.html
- https://github.com/dhwajraj/deep-text-classifier-mtl
13.1
- convlstm: https://github.com/carlthome/tensorflow-convlstm-cell
- GAN and RNN: https://www.reddit.com/r/MachineLearning/comments/40ldq6/generative_adversarial_networks_for_text/
- generate sentences from continuous space: https://arxiv.org/pdf/1511.06349v2.pdf
- How to train your Gen. model: Sampling, likelihood or adversary
12.1
- https://www.raywenderlich.com/126063/react-native-tutorial
- ml practitioners: https://news.ycombinator.com/item?id=10954508
- spotify word2vec: https://douweosinga.com/projects/marconi?song1_id=45yEy5WJywhJ3sDI28ajTm&song2_id=
- https://github.com/DOsinga/marconi/blob/master/train_model.py
- True| Good | Kind | Useful | Relevant | Necessary https://www.quora.com/What-is-Triple-Filter-test-of-Socrates
- https://www.youtube.com/watch?v=ifYfJdo27_k
- student note: https://adeshpande3.github.io/adeshpande3.github.io/Deep-Learning-Research-Review-Week-3-Natural-Language-Processing
11.1
- ggplot2 in R: http://sharpsightlabs.com/blog/mapping-vc-investment/
- TF 1.0, mature. https://opendatascience.com/blog/rnns-in-tensorflow-a-practical-guide-and-undocumented-features/
- NN semantic encoder: https://github.com/pdasigi/neural-semantic-encoders/blob/master/nse.py
- DL in NN, overview: https://arxiv.org/pdf/1404.7828v4.pdf
- jurgen schmid: http://people.idsia.ch/~juergen/
10.1
- GDG NL: http://www.slideshare.net/RokeshJankie/introducing-tensorflow-the-game-changer-in-building-intelligent-applications
- https://github.com/ToferC/Twitter_graphing_python
- http://www.oujago.com/DL_more.html
- thiago DS at Yahoo: https://tgmstat.wordpress.com/
- deepstack playing poker: https://arxiv.org/pdf/1701.01724v1.pdf
- silly DL: https://news.ycombinator.com/item?id=13353941
- http://p.migdal.pl/2017/01/06/king-man-woman-queen-why.html
- AE for new molecule: http://www.impactjournals.com/oncotarget/index.php?journal=oncotarget&page=article&op=view&path[]=14073&pubmed-linkout=1
9.1
- xlingual embedding: https://levyomer.wordpress.com/2017/01/08/a-strong-baseline-for-learning-cross-lingual-word-embeddings-from-sentence-alignments/
- greg notebooks: https://github.com/gjreda/gregreda.com/tree/master/content/notebooks
- the periodic table of AI: http://ai.xprize.org/news/periodic-table-of-ai
- the same table of DL: http://www.deeplearningpatterns.com/doku.php/overview
- aylien text mining and analysis: Sebastien Ruder: https://arxiv.org/pdf/1609.02746v1.pdf
- DS as a freelancer from Greg Yhat: http://www.gregreda.com/2017/01/07/freelance-data-science-experience/
7.1
- how bayesian inference works: http://brohrer.github.io/how_bayesian_inference_works.html
- best vis projects in 2016: http://flowingdata.com/2016/12/29/best-data-visualization-projects-of-2016/
- https://flowingdata.com/2012/12/17/getting-started-with-charts-in-r/
5.1
- allenai biattflow: https://github.com/allenai/bi-att-flow
- fork guy: https://github.com/BinbinBian
- ICRL 17, DCNN: https://arxiv.org/pdf/1611.01604v2.pdf
- victor zhong: https://github.com/vzhong/posts-notebooks
- BN, if you wann gaussian, zero mean: https://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html
- statsnlp https://github.com/uclmr/stat-nlp-book
- sota of qa: http://metamind.io/research/state-of-the-art-deep-learning-model-for-question-answering/
4.1
- dynet: CMU neural networks in C++: https://github.com/clab
- systran: https://arxiv.org/pdf/1610.05540v1.pdf
- punctuation normalisation: http://www.statmt.org/wmt11/normalize-punctuation.perl
- GAN in keras: https://github.com/osh/KerasGAN
- reinforcement learning in keras and gym: https://github.com/osh/kerlym
- ML 101 for DE: https://drive.google.com/drive/folders/0B3bb7xB2VOUBMW1LQjVYUlJNRFU
3.1
- variational for text processing: https://github.com/carpedm20/variational-text-tensorflow
- spotify CNN music classification: https://www.dropbox.com/s/22bqmco45179t7z/thesis-FINAL.pdf
- kaggle winning solution for whale detection: https://github.com/benanne
- https://github.com/zygmuntz?tab=repositories
2.1.17
- overfitting in life: http://tuanvannguyen.blogspot.com/2016/12/over-fitting-va-y-nghia-thuc-te-trong.html
- optimal stopping problem: https://plus.maths.org/content/solution-optimal-stopping-problem
31.12
- visualisation NLP: http://www.aclweb.org/anthology/N16-1082
30.12
- zero shot translation: https://techcrunch.com/2016/11/22/googles-ai-translation-tool-seems-to-have-invented-its-own-secret-internal-language/
29.12
- Music Tagging, CRNN https://arxiv.org/pdf/1609.04243v3.pdf
- Benmusic: http://www.bensound.com/
- event detection: http://anthology.aclweb.org/C/C14/C14-1134.pdf
28.12
- NIPs 2016, embedding projector: https://arxiv.org/pdf/1611.05469.pdf
- stats learning: https://web.stanford.edu/class/cs229t/notes.pdf
- http://www.normansoft.com/blog/index.html
- Tf projector is really cool: https://github.com/normanheckscher/mnist-tensorboard-embeddings/blob/master/mnist_t-sne.py
- Who to follow on Twitter in ML/DL: https://twitter.com/DL_ML_Loop/lists/deep-learning-loop/members
- How to learn? BPTT https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b#.sunmvqmsx
27.12
- deep learning with Torch: https://github.com/soumith/cvpr2015
- T7: https://github.com/soumith/cvpr2015/blob/master/cvpr-torch.pdf
- GPOD general purpose object detector: https://github.com/EvgenyNekrasov/gpod
- mckinseys: http://www.forbes.com/sites/louiscolumbus/2016/12/18/mckinseys-2016-analytics-study-defines-the-future-machine-learning
- gumbel add noise to sigmoid: https://github.com/yandexdataschool/gumbel_lstm
- fastai wordembedding: https://github.com/fastai/courses/blob/master/deeplearning1/nbs/wordvectors.ipynb
26.12
- spotify cnn: http://benanne.github.io/2014/08/05/spotify-cnns.html
- Gated RNN https://arxiv.org/pdf/1612.08083v1.pdf
- http://www.slideshare.net/SebastianRuder/nips-2016-highlights-sebastian-ruder
- monolingal dataset WMT 2014: http://www.statmt.org/wmt14/translation-task.html
- neural turing machine: https://github.com/shawntan/neural-turing-machines
- yandex ml school HSE: https://github.com/yandexdataschool/HSE_deeplearning
24.12
- Laurent Dinh: Density estimation https://docs.google.com/presentation/d/152NyIZYDRlYuml5DbBONchJYA7AAwlti5gTWW1eXlLM/
- Swiftkey, LM: https://blog.swiftkey.com/swiftkey-debuts-worlds-first-smartphone-keyboard-powered-by-neural-networks/
- porting Theano to TF: https://medium.com/@sentimentron/faceoff-theano-vs-tensorflow-e25648c31800
- tractica: DL for retailer: https://www.tractica.com/automation-robotics/leveraging-deep-learning-to-improve-the-retail-experience/
- Effective Size: is Singaporean better in math than Vietnamese? if ES = 0.3, the overlap is near 90%, nothing to say in this Pisa's ranking.
- dracula: twitter POS utilised GATE: https://github.com/Sentimentron/Dracula/
- Business process with LSTM: https://arxiv.org/pdf/1612.02130v1.pdf
23.12
22.12
- https://quid.com/feed/how-quid-uses-deep-learning-with-small-data
- dl for coders: http://course.fast.ai/, notebooks here: https://github.com/fastai/courses
- encoder-decoder RNN: http://www.slideshare.net/ssuser77b8c6/reducing-the-dimensionality-of-data-with-neural-networks
- https://trello.com/b/rbpEfMld/data-science
- http://tuanvannguyen.blogspot.com/2016/12/yeu-to-nao-anh-huong-en-iem-pisa-2015.html
21.12
- https://github.com/napsternxg/TwitterNER
- news arxiv: https://news.google.com/newspapers?hl=en#F
- https://github.com/skillachie/binaryNLP
- https://github.com/skillachie/nlpArea51/blob/master/Financial_News_Text_Classification.ipynb
- http://www.kdnuggets.com/2016/12/machine-learning-artificial-intelligence-main-developments-2016-key-trends-2017.html
20.12
- http://opennmt.net
- neural relation extraction https://www.aclweb.org/anthology/P/P16/P16-1200.pdf
- claim classification: https://github.com/UKPLab/coling2016-claim-classification
- https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2016/2016_COLING_CG.pdf
19.12
- fasttext.zip https://arxiv.org/abs/1612.03651
- bi sequence classification: same SNLI, event detection: https://pdfs.semanticscholar.org/6f42/cb23262066b4034aba99bf674783ed6cac8b.pdf
- large scale contextual LSTM and NLP task: https://arxiv.org/pdf/1602.06291.pdf
- main advances in ML 2016, Xavier at Quora: https://www.quora.com/What-were-the-main-advances-in-machine-learning-artificial-intelligence-in-2016?
17.12
16.12
- tensorflow book with code: https://github.com/BinRoot/TensorFlow-Book
- trading with ML (Georgia university): https://www.udacity.com/course/machine-learning-for-trading--ud501
15.12
- deepbach: https://github.com/SonyCSL-Paris/DeepBach
- https://www.technologyreview.com/s/603137/deep-learning-machine-listens-to-bach-then-writes-its-own-music-in-the-same-style/
- http://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html?_r=0
- http://www.asimovinstitute.org/analyzing-deep-learning-tools-music/
14.12
- spacy vs nltk: https://gist.github.com/rschroll/61b20c41e984a963df2870cfc9e628ed
- psychometrics, precision marketing, privacy no longer: http://www.michalkosinski.com/
- 300+ ML projects from Stanford: http://cs229.stanford.edu/PosterSessionProgram.pdf
- NIPs 2016 codes: https://www.reddit.com/r/MachineLearning/comments/5hwqeb/project_all_code_implementations_for_nips_2016/
- Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences: https://github.com/dannyneil/public_plstm
13.12
- NIPs summary: http://beamandrew.github.io/deeplearning/2016/12/12/nips-2016.html
- how to choose batch size: https://github.com/karpathy/char-rnn, https://svail.github.io/rnn_perf/, http://axon.cs.byu.edu/papers/Wilson.nn03.batch.pdf
- https://github.com/lmthang/thesis
12.12
- Relation classification (RC) via data augmentation: https://arxiv.org/abs/1601.03651
- broader twitter NER: http://www.slideshare.net/leonderczynski/broad-twitter-corpus-a-diverse-named-entity-recognition-resource
- sequence classification such as NER, POS: https://github.com/napsternxg/DeepSequenceClassification
- arctic captions: https://github.com/kelvinxu/arctic-captions/blob/master/alpha_visualization.ipynb
- COLING 2016 from 13 to 16 Dec, Japan: https://github.com/napsternxg/TwitterNER, http://coling2016.anlp.jp/
11.12
- SRL and RC: https://github.com/jiangfeng1124/emnlp14-semi, http://ir.hit.edu.cn/~jguo/papers/coling2016-mtlsrc.pdf
- https://blog.insightdatascience.com/nips-2016-day-3-highlights-robots-that-know-cars-that-see-and-more-1ec958896791
- http://www.newsreader-project.eu/files/2012/12/NWR-D5-2-1.pdf
- http://nlesc.github.io/UncertaintyVisualization/
- http://ixa2.si.ehu.es/nrdemo/demo.php
- http://ir.hit.edu.cn/~jguo/papers/coling2016-mtlsrc.pdf
9.12
- if then learning: https://papers.nips.cc/paper/6284-latent-attention-for-if-then-program-synthesis.pdf
- reinforcement learning: https://github.com/DanielTakeshi
- NIPS 2016: https://github.com/mphuget/NIPS2016
- https://github.com/zelandiya/KiwiPyCon-NLP-tutorial
- http://www.wrangleconf.com/apac.html
- http://cs231n.github.io/aws-tutorial/
- clickbait F1 98, AUC 99, too good too be true: https://arxiv.org/pdf/1612.01340v1.pdf
- https://arxiv.org/abs/1606.04474
- https://github.com/deepmind/learning-to-learn
8.12
- hackermath: https://github.com/amitkaps/hackermath/blob/master/talk.pdf
- tensorboard: https://www.tensorflow.org/versions/master/how_tos/embedding_viz/index.html
- embedding projector: http://projector.tensorflow.org/
- dl4nlp at ukplab, Germany: https://github.com/UKPLab/deeplearning4nlp-tutorial/tree/master/2016-11_Seminar
- Filter bubble vs Info cascading, Eli Pariser: https://www.ted.com/talks/eli_pariser_beware_online_filter_bubbles
7.12
- tidy data in pandas: http://www.jeannicholashould.com/tidy-data-in-python.html
- graph db: https://blog.grakn.ai/adding-semantics-to-graph-databases-with-mindmapsdb-part-1-82022bbb3b1c
- https://github.com/mikonapoli
- reinforcement learninghttp, open ai://people.eecs.berkeley.edu/~pabbeel/nips-tutorial-policy-optimization-Schulman-Abbeel.pdf
- meal description and food tagging: https://pdfs.semanticscholar.org/5f55/c5535e80d3e5ed7f1f0b89531e32725faff5.pdf
6.12
- rationale cnn [keras] https://github.com/bwallace/rationale-CNN
- churn analysis, f1 75%, lr, svm hinge: http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9849/9527
- thanapon noraset: https://northanapon.github.io/read/
- https://github.com/NorThanapon/adaptive_lm
- train general AI: https://openai.com/blog/universe/
- NIPS 2016 https://nips.cc/Conferences/2016/Schedule
- full ds notebook: https://github.com/donnemartin/data-science-ipython-notebooks
- Quoc Le, tut2: Autoencoder, CNN, RNN: http://ai.stanford.edu/~quocle/tutorial2.pdf
- Quoc Le, tut1: nonlinear classifier and backprop: http://ai.stanford.edu/~quocle/tutorial1.pdf
- Quoc Le, ex1: http://ai.stanford.edu/~quocle/exercise1.py
- https://alexanderdyakonov.wordpress.com/2016/12/04/ััะฝะดัะบะธ-ะธ-ะผะพะฝะตัั/#more-4401
5.12
- semantic role labelings: https://blog.acolyer.org/2016/07/05/end-to-end-learning-of-semantic-role-labeling-using-recurrent-neural-networks/
- ml yearning: https://gallery.mailchimp.com/dc3a7ef4d750c0abfc19202a3/files/Machine_Learning_Yearning_V0.5_01.pdf
- stock embedding:https://medium.com/@TalPerry/deep-learning-the-stock-market-df853d139e02#.9q1d9hnai
- fast weights: https://github.com/ajarai
2.12
1.12
- https://gist.github.com/honnibal
- siamese lstm: https://github.com/aditya1503/Siamese-LSTM
- accuracy of lunar chinese calendar to predict baby sex http://onlinelibrary.wiley.com/doi/10.1111/j.1365-3016.2010.01129.x/abstract;
- customized keras lambda: https://gist.github.com/keunwoochoi
30.11
- rnn tricks: http://www.slideshare.net/indicods/general-sequence-learning-with-recurrent-neural-networks-for-next-ml
- data mining in action: Moscow, Russia: https://github.com/vkantor/MIPT_Data_Mining_In_Action_2016
- hypo testing, birthday effect: http://www.slideshare.net/SergeyIvanov105/birthday-effect-67829860
- LUI: linguistic UI https://medium.com/swlh/a-natural-language-user-interface-is-just-a-user-interface-4a6d898e9721
- fake news is 80% accuracy better: http://www.mallikarjunan.com/verytas/how-good-are-you-at-recognizing-satire-quiz
- nampi, spain 2017
- decode thought vector: http://gabgoh.github.io/ThoughtVectors/
- unstrained fmin: https://github.com/benfred/fmin
- neural programmer: https://github.com/tensorflow/models/tree/master/neural_programmer
- https://www.tensorflow.org/versions/master/how_tos/embedding_viz/index.html#tensorboard-embedding-visualization
29.11
- https://github.com/nyu-dl/NLP_DL_Lecture_Note
- NYU DL for NLP https://docs.google.com/document/d/1YS5QRvqMJVs9n3sK5fFjuldY7_vh42C5uUfxUGgL-Gc/edit
- http://tuanvannguyen.blogspot.com/2016/11/machine-learning-la-gi.html
- http://sebastianruder.com/cross-lingual-embeddings/
- https://docs.google.com/presentation/d/1O-Ics69y445aWuxQ_VW6SDvKT9BGl3ZXLLZDG9tUiUY/edit#slide=id.p
28.11
- event detection and deep learning: http://www.cs.nyu.edu/~thien/
- https://github.com/anoperson/NeuralNetworksForRE
- ED EE and MD with RNN and CNN: http://www.aclweb.org/anthology/P/P15/P15-2060.pdf
27.11
- http://www.slideshare.net/PyData/fang-xu-enriching-content-with-knowledge-base-by-search-keywords-and-wikidata
- https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual
26.11
- slides from mlconf sf 2016:http://www.slideshare.net/SessionsEvents/anjuli-kannan-software-engineer-google-at-mlconf-sf-2016
- http://www.slideshare.net/KenjiEsaki/kdd-2016-slide
25.11
24.11
- chinese NLP: https://github.com/taozhijiang/chinese_nlp
- not news: http://venturebeat.com/2016/11/23/twitter-cortex-team-loses-some-ai-researchers/
- sentihood: http://annotate-neighborhood.com/download/download.html, https://arxiv.org/pdf/1610.03771v1.pdf
23.11
Multithread in Theano:
- check your blas: https://raw.githubusercontent.com/Theano/Theano/master/theano/misc/check_blas.py
- http://deeplearning.net/software/theano/tutorial/multi_cores.html?highlight=multi%20co
- Theano/Theano#3239
- set OMP_NUM_THREADS=4 inside the notebook with env: https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/
Debug
-
torch vs theano vs tf: https://www.quora.com/Is-TensorFlow-better-than-other-leading-libraries-such-as-Torch-Theano
-
debug Deep Learning: https://gab41.lab41.org/some-tips-for-debugging-deep-learning-3f69e56ea134#.1ldbphlav
-
negative loss: keras-team/keras#1917
-
CAP: Clustering Association Prediction, stas thinking https://www.researchgate.net/publication/310597778_Scientific_discovery_through_statistics
22.11
- stance detection: favour or against: http://isabelleaugenstein.github.io/papers/SemEval2016-Stance.pdf
- Hugo from Twitter to Google Brain, Montreal: https://techcrunch.com/2016/11/21/google-opens-new-ai-lab-and-invests-3-4m-in-montreal-based-ai-research/?sr_share=facebook
- train word2vec in gensim in good way: https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-IMDB.ipynb
21.11
- sparql in python: https://joernhees.de/blog/tag/install/
- minhash: http://mccormickml.com/2015/06/12/minhash-tutorial-with-python-code/
- beating the kaggle easy way: http://www.ke.tu-darmstadt.de/lehre/arbeiten/studien/2015/Dong_Ying.pdf
19.11
- 10 takeaways writeup MLConf SF: https://tryolabs.com/blog/2016/11/18/10-main-takeaways-from-mlconf/
- theano summer school: https://github.com/mila-udem/summerschool2015
- gpu card for macbook pro: http://udibr.github.io/using-external-gtx-980-with-macbook-pro.html
- transfer learning using pretrained vgg, resnet for your problem: https://github.com/dolaameng/transfer-learning-lab
18.11
- wikidata sparql: https://docs.google.com/presentation/d/16HhxRH-kkxqxcyzepXT-dHrnE90yVPlfkPq3cM2UzFg/edit#slide=id.g18e33c9ee6_2_134
- unkify: https://github.com/cdg720/emnlp2016/blob/master/utils.py#L322
- http://smerity.com/articles/2016/google_nmt_arch.html
17.11
- wikidata: http://www.slideshare.net/_Emw/an-ambitious-wikidata-tutorial
- wptools: https://github.com/siznax/wptools/wiki
- google translate: https://arxiv.org/pdf/1611.04558v1.pdf
- https://arxiv.org/pdf/1611.05104v1.pdf
- https://arxiv.org/pdf/1611.01587v2.pdf
16.11
- dssm deep sem sim models: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/wsdm2015.v3.pdf
- twitter @ Singapore: http://www.straitstimes.com/singapore/twitter-eyes-local-talent-for-singapore-data-science-team
- multiple tasks of NLP: https://arxiv.org/pdf/1611.01587v2.pdf
- QUASI RNN: https://arxiv.org/pdf/1611.01576v1.pdf
15.11
- regex learning: http://dlacombejr.github.io/2016/11/13/deep-learning-for-regex.html
- recurrent + cnn for text classification: https://github.com/airalcorn2/Recurrent-Convolutional-Neural-Network-Text-Classifier
- quiver: to view convnet layer https://github.com/jakebian/quiver
- hera: to see training progress board: https://github.com/jakebian/hera
- RAISR: Rapid and Accurate Image Super Resolution https://arxiv.org/pdf/1606.01299v3.pdf
- why is machine learning hard: http://ai.stanford.edu/~zayd/why-is-machine-learning-hard.html
14.11
- event ODSC West: https://www.odsc.com/california
- MLconf SF 12 Nov, summary: https://github.com/adarsh0806/ODSCWest/blob/master/MLConf.md
- Duy Do talk: https://speakerdeck.com/duydo/elasticsearch-for-data-engineers
13.11
- barcampsaigon 2016: some good topics on Elastic Search (Duy Do), Big Data analytics (Trieu Nguyen)
- Altair https://speakerdeck.com/jakevdp/visualization-in-python-with-altair
12.11
-
Applications to explore (most of them are keras based)
11.11
- https://github.com/wiki-ai/revscoring
- Visual OCR attention: https://github.com/da03/Attention-OCR
- startup and DL: https://github.com/lipiji/App-DL
- embed + encode + attend + predict: https://explosion.ai/blog/deep-learning-formula-nlp
- HN: https://www.cs.cmu.edu/~diyiy/docs/naacl16.pdf
10.11
9.11
- ibm researcher, lda gib sampling, doc2vec: https://github.com/jhlau
8.11
- quoc le, rnn with reinforcement learning: http://openreview.net/pdf?id=r1Ue8Hcxg
7.11
- https://github.com/vinhkhuc/MemN2N-babi-python
- similarity proximity: http://www.datasciencecentral.com/profiles/blogs/comparison-between-global-vs-local-normalization-of-tweets-and
- pycon15, elastic search: https://github.com/erikrose/elasticsearch-tutorial
6.11
04.11
- airbnb knowledge scale: https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091#.5moos4eki
- R notebooks: http://rmarkdown.rstudio.com/r_notebooks.html
- dask: https://github.com/dask/dask
- dask vs celery: http://matthewrocklin.com/blog/work/2016/09/13/dask-and-celery
- dask in jupyperlab: https://learning.acm.org/webinar_pdfs/ChristineDoig_WebinarSlides.pdf
3.11
- https://hbr.org/resources/pdfs/hbr-articles/2016/11/the_state_of_machine_intelligence.pdf
- shallow learn: gensim + fasttext: https://github.com/giacbrd/ShallowLearn
- nn for sa: http://www.emnlp2016.net/tutorials/zhang-vo-t4.pdf
2.11
- mask bilstm: http://dirko.github.io/Bidirectional-LSTM