Super Confitura Man

How Super Confitura Man came to be :) Recently at TouK we had a one-day hackathon. There was no main theme for it, you just could post a project idea, gather people around it and hack on that idea for a whole day - drinks and pizza included. My main idea was to create something that could be fun to build and be useful somehow to others. I’d figured out that since Confitura was just around a corner I could make a game, that would be playable at TouK’s booth at the conference venue. This idea seemed good enough to attract Rafał Nowak @RNowak3 and Marcin Jasion @marcinjasion - two TouK employees, that with me formed a team for the hackathon. ...

July 14, 2014 · 5 min · Marcin Cylke

Distributed scans with HBase

HBase is by design a columnar store, that is optimized for random reads. You just ask for a row using rowId as an identifier and you get your data instantaneously. Performing a scan on part or whole table is a completely different thing. First of all, it is sequential. Meaning it is rather slow, because it doesn’t use all the RegionServers at the same time. It is implemented that way to realize the contract of Scan command - which has to return results sorted by key. So, how to do this efficiently? ...

December 10, 2013 · 3 min · Marcin Cylke

Simple HBase ORM

When dealing with data stored in HBase, you are quick to come to a conclusion, that it is extremaly inconvenient to reach to it via HBase native API. It is very verbose and you always need to convert between bytes and simple types - a pain. While I was working on a project of mine, I thought, why not to easy those pains and fetch real objects from HBase. And that’s how this simplistic, hackish ORM came to life. It is no match for projects like Kundera (a JPA compliant solution), or n-orm. Nevertheless, it just suits my needs :) ...

December 8, 2013 · 2 min · Marcin Cylke

Recently at storm-users

I’ve been reading through storm-users Google Group recently. This resolution was heavily inspired by Adam Kawa’s post “Football zero, Apache Pig hero”. Since I’ve encountered a lot of insightful and very interesting information I’ve decided to describe some of those in this post. ...

August 12, 2013 · 2 min · Marcin Cylke

Zookeeper + Curator = Distributed sync

An application developed for one of my recent projects at TouK involved multiple servers. There was a requirement to ensure failover for the system’s components. Since I had already a few separate components I didn’t want to add more of that, and since there already was a Zookeeper ensemble running - required by one of the services, I’ve decided to go that way with my solution. What is Zookeeper?Just a crude distributed synchronization framework. However, it implements Paxos-style algorithms (http://en.wikipedia.org/wiki/Paxos_(computer_science)) to ensure no split-brain scenarios would occur. This is quite an important feature, since I don’t have to care about that kind of problems while using this app. You just need to create an ensemble of a couple of its instances - to ensure high availability. It is basically a virtual filesystem, with files, directories and stuff. One could ask why another filesystem? Well this one is a rather special one, especially for distributed systems. The reason why creating all the locking algorithms on top of Zookeeper is easy is its Ephemeral Nodes - which are just files that exist as long as connection for them exists. After it disconnects - such file disappears. ...

June 24, 2013 · 5 min · Marcin Cylke

Operational problems with Zookeeper

This post is a summary of what has been presented by Kathleen Ting on StrangeLoop conference. You can watch the original here: http://www.infoq.com/presentations/Misconfiguration-ZooKeeper I’ve decided to put this selection here for quick reference. Connection mismanagement too many connections WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@247] - Too many connections from /xx.x.xx.xxx - max is 60 running out of ZK connections? set maxClientCnxns=200 in zoo.cfg HBase client leaking connections? fixed in HBASE-3777, HBASE-4773, HBASE-5466 manually close connections connection closes prematurely ...

March 21, 2013 · 3 min · Marcin Cylke

After WHUG meeting

Here are the slides from the talk a gave yesterday. If you have any questions, please ask.

November 30, 2012 · 1 min · Marcin Cylke

WHUG 8. Beyond Hadoop - checking other options

W najbliższy czwartek - czyli 29.11.2012 - poprowadzę prezentację w ramach Warsaw Hadoop User Group. Swoją obecność można odklinąć tu http://www.meetup.com/warsaw-hug/ A o czym będę mówił? Przeklejka ze strony WHUG: Marcin skupi się na współpracy ekosystemu Hadoopa z innymi narzędziami. Pokaże jak prosto i wygodnie przetwarzać grafy i jak stosować podejście Big Data, w czasie rzeczywistym. Poruszy również temat łatwiejszego tworzenia algorytmów Map-Reduce Będzie to nieco mniej technicza (ale wciąż praktyczna) wycieczka po obrzeżach tematyki, która jest zwykle poruszana w połączeniu z Hadoop-em. ...

November 26, 2012 · 1 min · Marcin Cylke

Hadoop HA setup

With the advent of Hadoop’s 2.x version, there finally is a working High-Availability solution. Even two of those. Now it really is easy to configure and use those solutions. It no longer require external components, like DRBD. It all is just neatly packed into Cloudera Hadoop distribution - the precursor of this solution. Read on to find out how to use it. ...

October 30, 2012 · 4 min · Marcin Cylke

Hadoop for Enterprises

Hadoop's usage as a big data processing framework gains a lot of attention lately. Now, not only big players see, that they can embrace the data their sites or products are generating and develop their businesses on it. For that to happen two things are needed: the data itself and means of processing really big amounts of it. Gathering data is relatively easy. These are not necessarily structured data, you don't need to plan their usage at first. Just start collecting them and than you may experiment with their potential usage. If they'll come out as useless rubbish - deleting them won't be hard But imagine the values it may contribute to your business: ...

June 18, 2012 · 7 min · Marcin Cylke

SoapUI ext libs and its weirdness

Suppose you want to add some additional jars to your SoapUI installation. It all should work ok if you put them in bin/ext directory. It is scanned at startup, and jars found there are automatically added to classpath. However if you want to add some JDBC drivers, and happen to be using SoapUI version higher than 3.5.1 it is a bit more tricky. You may face this NoClassDefFoundError: An error occured [oracle/jdbc/Driver], see error log for details java.lang.NoClassDefFoundError: oracle/jdbc/Driver ...

November 2, 2011 · 2 min · Marcin Cylke

What is NoSQL good for?

... or how I ended up writing a CouchDB proof of concept app? Once upon a time I set out on a journey to discover the NoSQL land. I've decided that doing simple queries wouldn't be interesting enough. That's why I've chose to create an app that would be based on some NoSQL database. The main idea was to create an app, that would dynamically update itself with geographic data flowing in. Since there are myriads of geo-data that are available on the internet, you can pick your favorite one and load them into your SQL database of choice. ...

September 21, 2011 · 6 min · Marcin Cylke

OVal - validate your models quickly and effortlessly!

Some time ago one of the projects at work required me to validate some Java POJOs. Theses were my model classes and I've been creating them from incoming WebService requests. One would say that XSD would be sufficient for the task, for parts of this validations - sure, it would. But there were some advanced rules XSD would not handle, or would render the schema document very complicated. Rules I needed to express were like: person's first_name and last_name should be of appropriate length - between 2 and 20, and additionally one could pass a zero-length string just to remove the previous value state field should consist only defined values - as in dictionary value - this one would be completable with XSD's enumerations, but would require often changing schema files and redistributing them to interested parties :( The library I've decided to use for this task is OVal and it came out really nice! Read on to find out the details! ...

July 14, 2011 · 4 min · Marcin Cylke

Geecon 2011 - day 2

And now for part 2 of my visit to Geecon 2011! 1. Jim Webber "Revisiting SOA for the 21st century" Now this was awesome! Jim Webber, a former ThoughtWorks employee, now Neo4j evangelist (in Neotechnology) described his views on how SOA should look - according to him. This was presented previously, on other occasions as his "Guerilla SOA" talk - generally he advocated for REST based services, loose contracts (stating that WSDLs are too verbose and code generation is evil). ...

May 22, 2011 · 4 min · Marcin Cylke

Geecon 2011 - day 1

Last week's Java conference - Geecon was very interesting. It was well prepared, and gave an insight into the current Java related trends - concurrency, DSLs, polyglot programming. But not only that - there were also some pretty different talks from excellent speakers. The whole event took 4 days: University day (wednesday) 2 regular conference days (Thursday + Friday) hacker garden (Sunday) I decided to attend only on Thursday and Friday - no time for more. Here is what interesting happened during those days. ...

May 19, 2011 · 5 min · Marcin Cylke

JCE keystore and untrusted sites

Recently at work I was in need of connecting to a web service exposed via HTTPS. I've been doing this from inside Servicemix 3.3.1, which may seem a bit inhibiting, but that was a requirement. Nevertheless I've been trying my luck with the included servicemix-http-2008.01 component. I've created a simple Service Unit using that component and made connection attempt. Unfortunately I've encountered issues with the SSL conversation negotiation. I had to dig deeper into the servicemix-http code to find out these had something to do with my JCE keystore. Read more to find out what happened! ...

May 2, 2011 · 3 min · Marcin Cylke

Advisory Messages to the rescue

The most crucial part of software development is testing. It should ensure us, that our code is correct, works according to given specs, etc. There are many kinds of tests: unit tests, integration, functional. In general you should try to test the smallest possible subset of your code and be able to check the state of the objects after the test. This seems as rather easy task, but what if you have an integration end-to-end test to perform? In most cases asserting state in integration test is rather hard due to multiple systems interoperability. Let's focus on a specific situation. ...

April 1, 2011 · 3 min · Marcin Cylke

How to run multiple guest OS in QEMU?

This weekend I've been fiddling with QEMU. I've installed OpenBSD on a single image and wanted to have two instances of it communicating via network. Installing the system was easy, but the networking setup was quite a pain. See how I did that... To make QEMU instances communicate with each other I needed to plug them to a "network". That's why I've created a bridge to which Virtual Instances would connect to. I've used the following script: ...

March 27, 2011 · 1 min · Marcin Cylke

Me on Hadoop on Parleys

Finally I've managed to import my WarJUG presentation to parleys.com. See for yourself :) If you've got problems with opening the parleys' version try the ones uploaded to youtube. Here is part 1: And here is part 2: Comments Sigvatr I'm sorry, but I don't understand Polish :P

March 20, 2011 · 1 min · Marcin Cylke

After WarJUG

Some time ago I'd written about my arsaw JUG presentation. I finally presented the the topic yesterday. I must say I'm fairly content with my yesterday's presentation :) Here are some slides and as soon as the video will be available I'll post it here too. Hadoop i okoliceView more presentations from zygm0nt. Comments Sigvatr Czyli napisałeś ładny post po angielsku, tylko po to, by zaprosić do obejrzenia prezentacji po polsku? (przepraszam, ale mój angielski jest za słaby na pisanie w nim) ...

February 23, 2011 · 1 min · Marcin Cylke