Mate1's activity feed: Cassandra, Kafka, Netty, Varnish

Posted by Hisham Thu, 08 Dec 2011 04:25:00 GMT

At Mate1 we’ve just released our activity feed feature to part of our user base. The activity feed contains events based on people’s interactions with you (think you’re hot, like you, send you a message, etc.) and other events that happen in the system that we think interest you. We push these events from the web servers and other parts of the system (automatic image and message review systems, message generators, customer support, etc.) as soon as they occur into Kafka. We have a 3 node Kafka and Zookeeper cluster that holds most of the data for 2 weeks (some specialized topics are kept only for a few days). At the other end of the Kafka brokers we have several consumers that run in Tomcat. These consumers constantly pull the data from Kafka and run some business logic on it. They might decide to drop these events or further save them into Cassandra. The current Cassandra cluster is maintained over 4 nodes. In Cassandra every user has a single row for each tier of their activity feed. Events that make it into the activity feed are grouped into tiers, 4 of them, 1 being the most important events and 4 being the least. A fifth and final row stores the rolled up activity feed. This groups events by type and by user to create entries like “A, B, C, and D viewed your profile” or “A viewed your profile and liked you”. Roll-ups are done either on demand when a user logs in or periodically in the background by a roll-up daemon. Cassandra is wrapped and hidden behind an Netty Http application that accepts requests and hands back JSON objects representing the corresponding parts of user’s activity feed that was requested. We can get tiers, do paging, and mark items as read through this interface. We also use Cassandra to maintain all item (read, unread, per tier, etc.) counters. The web servers then load user’s activity feeds through this Http interface which is in fact a load balanced set up behind Varnish. If we experience high loads we can always enable caching in Varnish and avoid hitting Cassandra as much. In the future we could eliminate the web server as the middle man and directly fetch the feeds using JavaScript from the client (moving most of the work out into the browser). Its also worth mentioning that the Kafka consumers also produce data back into a Kakfa topic for logging and analytics purposes. These topics are consumed and we then push them, transformed / ETL’ed, into MySQL / data stores for analytics purposes.

Guzzler: pause, resume, seek 1

Posted by Hisham Thu, 01 Sep 2011 07:09:00 GMT

Guzzler can now seek file and position, start, stop, and restart (from last good position) streaming from the binary log.

Guzzler with pattern based routing

Posted by Hisham Mon, 22 Aug 2011 03:42:00 GMT

Guzzler now publishes binlogs and allows pattern based subscriptions using "dbName.tableName.opName" with wild-card support.

dbName is the database, tableName is the table being acted upon and opName is one of update, insert, or delete. Wild cards are also supported.

Guzzler: Stream MySQL binary logs and consume them with Scala actors and RabbitMQ. 1

Posted by Hisham Fri, 19 Aug 2011 06:16:00 GMT

Guzzler allows you to stream MySQL binary logs from a master and lets you act on them using Scala actors (consumers). Consumers are configurable in guzzler.conf along with the rest of the required parameters. Included with Guzzler is a dummy consumer and a RabbitMQ one that will push queries into a RabbitMQ server for consumption.

Consumers either in Guzzler itself of behind RabbitMQ can analyze the queries (Guzzler provides an SQL query parser based on ZQL) and may decide to update counters, fire off events, log messages, etc.

https://github.com/mardambey/guzzler

Desktop at home

Posted by Hisham Thu, 24 Mar 2011 04:28:00 GMT

This is my current home desktop running ubuntu/gnome. E17 lives at work right now (=

Shrink, Spiffy, Scala! 2

Posted by Hisham Mon, 14 Mar 2011 05:57:00 GMT

Been writing some Scala code, its fun, and one hell of a fantastic language. You should take a look at it some time.

Happy new year! 20

Posted by Hisham Sat, 01 Jan 2011 07:52:00 GMT

++code ++work ++train (=

Quick graphs with Perl / GD::Graph 30

Posted by Hisham Wed, 08 Jul 2009 19:05:00 GMT

So I had to quickly whip up some graphs today at work based on values coming in from one of our database tables. Nothing better then Perl and GD (GD::Graph) for a quick and effective solution.

use strict;
use DBI;
use GD::Graph::bars;
use GD::Graph::Data;

my $db_host = 'XXXXXX';
my $db_name = 'XXXXXX';   
my $db_user = 'XXXXXX';
my $db_pass = 'XXXXXX';
my $query   = "select month, undelivered from XXXXX where XXXXX";

# create labels and values for x-axis

my ($labels, $values) = get_data_from_sql($db_host, $db_name, $db_user, $db_pass, $query);

# graph and save the output

graph($labels, $values, "Month", "Undelivered", "Undelivered Messages by Month", "undelivered.png");


sub get_data_from_sql($$$$$)
{
  my ($db_host, $db_name, $db_user, $db_pass, $query) = @_;

  my $dbh = DBI->connect("dbi:mysql:$db_name:$db_host", $db_user, $db_pass)
    or die "Couldn't connect to database: " . DBI->errstr;

  my $sth = $dbh->prepare($query)
    or die "Couldn't prepare statement: " . $dbh->errstr;

  $sth->execute();
  my @row= undef;

  my @labels = ();
  my @values = ();

  while (@row = $sth->fetchrow_array())
  {
    push @labels, $row[0];
    push @values, $row[1];
  }

  $dbh->disconnect;

  return (\@labels, \@values);
}

sub graph($$$$$$)
{
  my ($labels, $values, $x_label, $y_label, $title, $out_file) = @_;

  my $data = GD::Graph::Data->new([$labels, $values,])
    or die GD::Graph::Data->error;

  my $my_graph = GD::Graph::bars->new();

  $my_graph->set(
  x_label => $x_label,
  y_label => $y_label,
  title   => $title,
  bar_spacing => 8,
  shadow_depth => 4,
  shadowclr => 'dred',
  transparent => 0,
  )
  or warn $my_graph->error;

  $my_graph->plot($data) or die $my_graph->error;
  my $gd = $my_graph->plot($data) or die $my_graph->error;
  open(IMG, ">$out_file") or die $!;
  binmode IMG;
  print IMG $gd->png;
}

If you do not want your data coming from an SQL statement simply fill in $labels and $values with anything you want, like:

$labels = ['Monday', 'Tuesday', 'Wednesday'];
$data = [45, 66, 89];

Pretty straight forward and handy. the graph looks like this.

EFL / E17 Python build script 5

Posted by Hisham Sun, 16 Nov 2008 23:05:00 GMT

So here's a quick python script I hacked out for folks here at work to set up and install the EFL and E17 from subversion.

1000 Calorie Breakfast 6

Posted by Hisham Sat, 08 Nov 2008 20:34:00 GMT

After a night of solid clubbing, Wendy and I went for some breakfast at L'Avenue. While we stood in line waiting to go in, we decided to pick up a couple of pieces of cake from Premier Moisson because we were told there would be a 15 minute wait. After we finally got in there, we took a quick glance at the menu and ended up ordering 3 servings of egg benedicts for a grand total of: 3 plates of fruit, 6 eggs served on bagels and buns and 3 bowls of potatoes. The entire thing was around $50 and about 1000 calories per person. I haven't eated anything since (its 8:34 PM now).

Older posts: 1 2 3 ... 6