Machine learning notes

To follow:
Spark http://spark.apache.org/community.html
Sean Owen https://www.quora.com/profile/Sean-Owen

https://parquet.apache.org/-> https://arrow.apache.org/
in-memory column data (and on disk) concept of pre-fetching, cache locality

https://en.wikipedia.org/wiki/Stochastic_gradient_descent
https://www.coursera.org/learn/machine-learning/lecture/9zJUs/mini-batch-gradient-descent

Exploit and Explore problem
https://en.wikipedia.org/wiki/Multi-armed_bandit

CAP theorem

https://deeplearning4j.org/ (+GPUs +spark +word2vec ) by https://skymind.io/

Chaos Monkey Army
Common architecture circuitbreakers
Latency monkey (chaos monkey)
https://github.com/Netflix/SimianArmy/wiki/The-Chaos-Monkey-Army

Google whitepaper on Beam
https://cloud.google.com/blog/big-data/2016/02/comparing-the-dataflowbeam-and-spark-programming-models
https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison

Akka streams
http://akka.io/docs/
http://www.cakesolutions.net/teamblogs/lifting-machine-learning-into-akka-streams

http://airflow.datasticks.com/admin/
comes from AirBnB job scheduler

2 read: https://code.facebook.com/posts/1671373793181703/apache-spark-scale-a-60-tb-production-use-case/

Nifi

Data provenance
https://en.wikipedia.org/wiki/Provenance

Alluxio (formerly Tachyon): http://www.alluxio.org/
(like Redis)

Kubernetes

Decouple producers from consumers
Resilience

https://console.cloud.google.com/projectselector/ml/models

Hive Metastore

Holden Karau and Rachel Warren. High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark

FS: https://www.gluster.org/

Spark on Yarn: HA Spark

https://prestodb.io/

Brendan Gregg. Systems Performance: Enterprise and the Cloud

cool visualisation
http://vectoross.io/
monitoring framework

https://github.com/jpmml/jpmml-spark

http://arturmkrtchyan.com/apache-spark-hidden-rest-api

Hystrix dashboard

Video Data
https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/09_Video_Data.ipynb
Youtube tutorials: https://www.youtube.com/playlist?list=PL9Hr9sNUjfsmEu1ZniY0XpHSzl5uihcXZ

Apache Spark partial
https://github.com/apache/spark/tree/master/core/src/main/scala/org/apache/spark/partial
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala

Data profiling Python:
scipy.stats.skew

To check types of EC2 instances
http://ec2instances.info
g2 are GPU instances
and p2 are bad boys

Nick Pentreath. Machine Learning with Spark

Automating Tinder
http://crockpotveggies.com/2015/02/09/automating-tinder-with-eigenfaces.html
also http://www.bernie.ai/

http://vis-www.cs.umass.edu/lfw/
13,000+ face images database

Spark + Stanford CoreNLP (Sentiment)
Word2Vec
Neural network creates vector representation from words, no pre-processing (or some)

CometChat vs QuickBlox

Here I compare two messaging (chat) platforms: CometChat and QuickBlox.

Founding history and HQ
Cometchat was built by Inscripts, an IT services company founded in 2009 (India, Mumbai). Both services and product businesses are operational as of May 2016. HQ in Mumbai, India.

QuickBlox was built by Injoit, an IT services company founded in 2007 (Ukraine). It has fully transformed into a product company focusing on QuickBlox in 2013. The company does not operate services business anymore and is solely focused on QuickBlox platform which is funded by own revenues. HQ moved to London, UK in 2010.

StackOverflow N of discussions
(index of popularity among developers):
Cometchat: 114
QuickBlox: 2,522
QuickBlox has exceedingly more mentions on StackOverflow which is a sign of lively developers community and many use cases discussed around integration and customisation.

Alexa rating
(index of overall online popularity, lower = better)
Cometchat: 72,960
QuickBlox: 58,257
QuickBlox is higher than Cometchat in both global and local India rating.

Production stats
(sorry I don’t have Cometchat stats so providing QuickBlox only):

  • QuickBlox servers process over 10 billion requests per month (4K per second) of which 320 per sec are chat messages and the rest are API and presence transactions.
  • Every day 40-50 developers join to use QuickBlox platform, total platform has over 50,000 app publishers registered, from single developers to R&D centres of large enterprises.
  • QuickBlox platform powers over 100 enterprise customers using it for secure and reliable messaging in sectors of Finance, Healthcare, E-commerce, internal and B2C communication. All these are stand-alone implementations on AWS or on-premise under SLA from QuickBlox tech ops team.
  • Over 3,000 apps are running live on QuickBlox shared tier.
  • Across over few hundred EC2 instances, QuickBlox maintains between 99,8% and 99,9% uptime on monthly basis (99,9% on enterprise)

Main differences
Main differences of CometChat and QuickBlox lay here:

Web-first vs Mobile-first: CometChat is web-based and is being positioned as a chat solution for websites. Its mobile SDKs (iOS, Android) come secondary and are less developed.

QuickBlox has been mobile-first from day one (first built in 2009 were iOS and
Android SDK libraries) and has very powerful native SDKs for all mobile platforms
(iOS, Android, Windows, BlackBerry) along with Javascript for web. Numerous features QuickBlox has implemented natively for all popular platforms are simply not matched by any competition.

Core messaging technology stack: CometChat uses long-polling which means users receive messages only when their client requests for them. QuickBlox uses XMPP which is a “true” messaging protocol also used by Whatsapp and many other messengers – meaning messages sent as soon as typed. This is also important for presence and signalling mechanisms. XMPP is very scalable and can support hundreds of thousands of users.

“Web chat” vs “Messaging OS” architecture. CometChat was built as a chat plugin to web-sites. QuickBlox was built as a universal data backend initially competing with Parse etc and it is highly extensible and customiseable thanks to its powerful Users, Content, Custom Objects and Cloud Code APIs. Some customers use QuickBlox not for messaging, but for general data sync across numerous mobile devices, as platform works very well in that context. In 2011 QuickBlox added messaging (XMPP) now integrated with its Users and Custom Objects APIs. In 2013 QuickBlox added video calling and its native iOS WebRTC library is more efficient than official one from Google. Powerful Data + Communication modules form the so-called “Messaging OS” where you can build whatever you want, going low-level if needed.

Plugin VS Stand-alone App. CometChat built as a plugin, QuickBlox focuses on a) providing a ready stand-alone messenger: http://q-municate.com, http://qm.quickblox.com with source codes available for iOS, Android, Javascript and also as a ChatViewController which includes messaging UI+logic for easy integration into existing apps: http://quickblox.com/developers/QuickBlox_Developers.

As a summary, I would say use CometChat if you want a chat window on your website. Use QuickBlox if you want a stand-alone web app or native mobile apps with functionality and visual appearance as seen in WhatsApp, Telegram and other state-of-the-art messaging platforms.

Никита Андрианович Алябьев (Ржавец – Шахово, 1942)

Я пообещал сам себе к 9-му мая получше узнать судьбу своих предков, которые участвовали во второй мировой. Вот они (те, чья судьба известна):

 

Никита Андрианович Алябьев, прадед по отцовской линии (отец бабушки), родом из Курской области.  До войны – лесник / лесной объездчик, ему было за 40 лет в 1942-м.
Призван рядовым стрелком. Погиб у села Ржавец (Белгородская область, Россия), 1942 г, судя по всему в ходе первых контр-наступлений РККА (Курско-Обоянская операция).

 

Федор Павлович Филатов, прадед по отцовской линии (отец деда), родом из Курской области. До войны – счетовод в колхозе. Призван стрелком-красноармейцем. 1893 г.р., на начало войны ему было 48 лет. Погиб в 1944 г. на безымянной высоте 400м у д. Козичево Лиозненского р-на (Витебская область, Беларусь).

 

Тарас Семенович Бугаев, прадед по материнской линии, родом из Краснодарского края, прошел всю войну с 41-го по 45-й, попал в окружение (не плен), откуда в итоге выбрались, но за это попал в штрафбат, и по окончанию войны был сослан на шахты в Донбасс в трудовой лагерь. Бодрость духа не утратил – потом жил долго и счастливо с семьей в Яготине, в 70-80 лет гонял на велосипеде, курил, рубился в карты, пели песни под гитару. Всю жизнь держал на стене портрет Сталина.

 

Михаил Федорович Филатов, мой дед, 1928 г.р., на начало войны ему было 13 лет. Ближе к концу войны уже служил в Советской Армии. Артиллерист (расчетчик). Офицер (ст. лейтенант). Служил под командованием А.В. Чапаева (сына того самого) и потом Жукова (уже после войны).
 
К сегодняшнему дню я сделал мини-исследование про Никиту Андриановича, публикую его ниже. Также узнал больше информации про Филатова Федора Павловича, но нужно больше времени чтобы завершить исследование, это следующее по плану.

Continue reading Никита Андрианович Алябьев (Ржавец – Шахово, 1942)

On chat bots and enterprise concerns

“Bots are new apps”.

It is a bold statement but it’s supported by Microsoft, Facebook, Slack and Telegram.
So you can almost say it’s a new reality. Which means new opportunities for all of us. Let’s dissect the issue of chat bots so you come prepared as the land grab is starting very soon.

chat bot

Continue reading On chat bots and enterprise concerns

Slack sales hack – post inbound leads to chat

So setting up a slack chat for your sales team is a good idea.

Next thing you want to do is add couple quick integrations. They ones I did:

so you can forward inbound leads from Gmail into Slack

iffft_sales_inbound_label

What remains to do is you set up a Filter in Gmail settings that applies a label (say “slack”) to e-mails generated by our lead generation (in our case a form prospects fill at our landing page).

Then use the IFTTT recipe above that posts all e-mails labeled “slack” to Slack.

Here you go – now you get inbound all into your slack chat:

slack sales posting inbound

It’s nice you can see full message body once you click on “Show more” and respond to your leads faster this way.

New era of space exploration: commercial

Yay U.S. Congress says yes to space mining!
http://www.wired.com/2015/11/congress-says-yes-to-space-mining-no-to-rocket-regulations/

This is very important. Average person doesn’t give a shit about space. Cold war rockets race is in the past so governments invest near to nothing. Legislation for commercial exploitation motivates businesses to start new era of exploration.

So you know what this means – mankind has just improved its scalability!

Commercial interest = more R&D investment = better propulsion, robotics etc for space travel = colonies on Moon, Mars etc so humanity truly becomes multi-planetary species = much improved chances for humanity to spread across Universe, keep evolving and doing good things (hopefully) as an intelligent race.

Super excited about this space 🙂

Messaging OS

So wanted to share some thoughts on the latest trend, the “messaging OS”.

Not sure if you noticed or not but as my business currently heavily involves  OTT communication, I’m pretty sensitive to this stuff, also feels weird when something you’ve been telling your team and discussing internally suddenly becomes a topic of discussion on TechCrunch, in Mary Meeker reports etc.

Continue reading Messaging OS

Release self-contained Erlang executables (both Windows and Linux)

Basically information for future own reference.
There are a number of ways to convert your Erlang code into a stand-alone ‘app’ package containing whole OTP environment in it. Starting with standard systools and reltool. Some useful links below.

There are basically 2 kinds of erlang applications:
– the “pure erlang” way is to create a release including the runtime,
OTP and your application(s). Then a simple .bat script can launch the
runtime with the right options. reltool or systools are made for
creating these releases:
http://learnyousomeerlang.com/release-is-the-word
http://www.erlang.org/documentation/doc-1/apps/reltool/index.html
rebar can help you using these tools.
– for a single executable, not distributed, with a single app, you can
create an ‘escript’ from your code which will be launched as an
executable. rebar includes the escriptize command to achieve this.
As an example, you can look at the ‘averell’ web server which is built
that way:
https://github.com/jeanparpaillon/averell

or mad: http://erlang.org/pipermail/erlang-questions/2014-October/081420.html
https://synrc.com/apps/mad

http://erlang.org/pipermail/erlang-questions/2014-October/081429.html
http://howistart.org/posts/erlang/1

http://stackoverflow.com/questions/11796941/how-do-you-compile-an-erlang-program-into-a-standalone-windows-executable

Enabling mesh (ad-hoc) network on multiple Raspberry Pi’s

So I’m just going to dump my experience here hoping it helps somebody.

This is Step 1 – making your Pi’s be able to ping each other and communicate via UDP/TCP, ad-hoc, peer to peer, without any central router. This doesn’t include making them route packets with 2 or more hops between origin and destination.

Long story short I’ve tried like a hundred different things and as usual a simple thing made it work.

So here it started as a weekend / hobby project trying to learn more about mesh networking and mesh communications. I’ve purchased 2 x Raspberry Pi 2 Model B (below) along with all standard stuff including usb Wi-Fi dongles and set sail with a goal of simply making them ping each other and eventually transfer a file in a mesh / ad-hoc network mode, basically so that none of them is an access point and both aren’t connected to any 3rd access point.

two raspberry pis mesh

 

 

 

 

 

 

 

 

 

 

 

Continue reading Enabling mesh (ad-hoc) network on multiple Raspberry Pi’s

Scalabilly begins

I decided to start this blog as a central place to share all things around my professional interests such as building SaaS software and the related aspects of technology, marketing, team work and entrepreneurship in general.

Main thing is I’m going to write about things that both: (a) interest me; and (b) could be useful for others. My interests are quite chaotic in their diversity and include SaaS, mobile messaging (XMPP, WebRTC, SIP), mesh networks, product marketing, startup engineering, but also things such as space exploration, practical psychology,  computer vision, quantum mechanics, aikido and playing guitar.

The name idea basically comes from scalability (as in scalable software, scalable business) and rockabilly which is supposed to mean in this context we don’t only talk & do tech, but we also have fun along the way.

This is a personal blog, I write it in my personal time and all opinions are my own. I may reference some cases from projects and businesses I’m involved with for the benefit of both the reader and business/project in question. This platform allows covering certain aspects in better detail, something I won’t be able to do in a company/project blog due to format or other limitations. There will be no special effort devoted to staying more un-biased and objective than I am; this blog shall be treated as post-work musings of a guy who still thinks about work and may be biased (and certainly doesn’t mind more traffic and leads coming into his projects), but honestly wants to share some insights or maybe ask questions and initiate discussion to learn more from others.

It would be nice in future to build a discussion platform here where other authors could join and write posts, upload tutorials etc but we’ll see how it goes and if I’m able to maintain steady and useful publications in the first place.