Machine learning notes

To follow:
Spark http://spark.apache.org/community.html
Sean Owen https://www.quora.com/profile/Sean-Owen

https://parquet.apache.org/-> https://arrow.apache.org/
in-memory column data (and on disk) concept of pre-fetching, cache locality

https://en.wikipedia.org/wiki/Stochastic_gradient_descent
https://www.coursera.org/learn/machine-learning/lecture/9zJUs/mini-batch-gradient-descent

Exploit and Explore problem
https://en.wikipedia.org/wiki/Multi-armed_bandit

CAP theorem

https://deeplearning4j.org/ (+GPUs +spark +word2vec ) by https://skymind.io/

Chaos Monkey Army
Common architecture circuitbreakers
Latency monkey (chaos monkey)
https://github.com/Netflix/SimianArmy/wiki/The-Chaos-Monkey-Army

Google whitepaper on Beam
https://cloud.google.com/blog/big-data/2016/02/comparing-the-dataflowbeam-and-spark-programming-models
https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison

Akka streams
http://akka.io/docs/
http://www.cakesolutions.net/teamblogs/lifting-machine-learning-into-akka-streams

http://airflow.datasticks.com/admin/
comes from AirBnB job scheduler

2 read: https://code.facebook.com/posts/1671373793181703/apache-spark-scale-a-60-tb-production-use-case/

Nifi

Data provenance
https://en.wikipedia.org/wiki/Provenance

Alluxio (formerly Tachyon): http://www.alluxio.org/
(like Redis)

Kubernetes

Decouple producers from consumers
Resilience

https://console.cloud.google.com/projectselector/ml/models

Hive Metastore

Holden Karau and Rachel Warren. High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark

FS: https://www.gluster.org/

Spark on Yarn: HA Spark

https://prestodb.io/

Brendan Gregg. Systems Performance: Enterprise and the Cloud

cool visualisation
http://vectoross.io/
monitoring framework

https://github.com/jpmml/jpmml-spark

http://arturmkrtchyan.com/apache-spark-hidden-rest-api

Hystrix dashboard

Video Data
https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/09_Video_Data.ipynb
Youtube tutorials: https://www.youtube.com/playlist?list=PL9Hr9sNUjfsmEu1ZniY0XpHSzl5uihcXZ

Apache Spark partial
https://github.com/apache/spark/tree/master/core/src/main/scala/org/apache/spark/partial
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala

Data profiling Python:
scipy.stats.skew

To check types of EC2 instances
http://ec2instances.info
g2 are GPU instances
and p2 are bad boys

Nick Pentreath. Machine Learning with Spark

Automating Tinder
http://crockpotveggies.com/2015/02/09/automating-tinder-with-eigenfaces.html
also http://www.bernie.ai/

http://vis-www.cs.umass.edu/lfw/
13,000+ face images database

Spark + Stanford CoreNLP (Sentiment)
Word2Vec
Neural network creates vector representation from words, no pre-processing (or some)

On chat bots and enterprise concerns

“Bots are new apps”.

It is a bold statement but it’s supported by Microsoft, Facebook, Slack and Telegram.
So you can almost say it’s a new reality. Which means new opportunities for all of us. Let’s dissect the issue of chat bots so you come prepared as the land grab is starting very soon.

chat bot

Continue reading On chat bots and enterprise concerns

New era of space exploration: commercial

Yay U.S. Congress says yes to space mining!
http://www.wired.com/2015/11/congress-says-yes-to-space-mining-no-to-rocket-regulations/

This is very important. Average person doesn’t give a shit about space. Cold war rockets race is in the past so governments invest near to nothing. Legislation for commercial exploitation motivates businesses to start new era of exploration.

So you know what this means – mankind has just improved its scalability!

Commercial interest = more R&D investment = better propulsion, robotics etc for space travel = colonies on Moon, Mars etc so humanity truly becomes multi-planetary species = much improved chances for humanity to spread across Universe, keep evolving and doing good things (hopefully) as an intelligent race.

Super excited about this space 🙂

Messaging OS

So wanted to share some thoughts on the latest trend, the “messaging OS”.

Not sure if you noticed or not but as my business currently heavily involves  OTT communication, I’m pretty sensitive to this stuff, also feels weird when something you’ve been telling your team and discussing internally suddenly becomes a topic of discussion on TechCrunch, in Mary Meeker reports etc.

Continue reading Messaging OS