Monthly Archives: January 2014
A nice talk on designing REST JSON API. REST is becoming the API standard in web world, but there’s no strict ‘standard’ and there’re many different styles of implementation. This talk discusses about several guidelines to be cared.
- REST is easy for consumers, but there’s difficulty for providers. One reason is there’s no standard. There’s no RFC, and it’s just based on styles and patterns. Everyone interprets them in slightly different ways.
- Resource can be classified as instance and collection. Collection resource should be named as plural (ex. applications), and may have child instance.
- Behavior can be defined with HTTP verbs like GET, POST, etc. The meaning of POST and PUT are not obvious, and these can be used for both creating and updating resource. One note is that PUT operation needs to be idempotent. When it’s used for creating resource, all the parameters need to be supplemented for achieving the same results. POST doesn’t require to be idempotent.
- (around 39:00) There’s different ways to express link to another resources. XML has standard, but JSON does not. One recommendation is to just use the simple “href” attribute.
- (around 42:00) Having resource expansion feature can reduce the number of requests. optional parameter can be used to list up additionally embedded attributes.
- For authentications, avoid sessions and make it stateless. Check on resource contents rather than specific URL, as it can change or can be redirected.
A nice episode of riak, which is a erlang-based distributed database. It talks about its feature-set, how/where to apply, and some technical backgrounds around replication/partitioning. Also, the podcast page has a lot of references to the great resources, like Amazon’s Dynamo, CAP Theorem, etc.
As Elixir is coming around, erlang may be getting more attention in ruby community. Though its programming-model is quitely different from ruby, nice ruby-syntax pretty much reduces the threshold for trying out.
I haven’t been able to get deep into this different programming model (concurrency, fault-tolerance provided through process and supervisor), but I would like to be working on in near term by using affordable Vagrant and DigitalOcean/Amazon AWS environments.
Just watched the above nice explanation about some technologies around big data.
Big data is a kind of buzz word, but as defined in the presentation,
data so big that traditional solutions are too slow, too small, or too expensive to use.
it involves a technological leap from the traditional solutions, and there’re interesting topics, which I didn’t know much.
- Recent trend: data size is increasing with less formal scheme, and more data driven programs are appearing.
- Hadoop is the popular solution in this field so far with map-reduce backend. However, conversion from normal task to map-reduce one is not trivial.
- There’re other solutions like Spark. It can work 10-100x faster by removing intermediate data save loading which Hadoop imposes.
- NoSQL is often discussed recently, but SQL is not obsolete and now striking back with new frameworks like Impala, Presto, etc. These tools are gathering attention, by utilizing the SQL’s powerful and concise expression. Also, some NoSQL languages (Cassandra, mongo) are adding query language features.
- Map reduce is not suitable for real-time event processing. Storm is taking care of this field.
- Search is one sub-domain of big data solutions. Lucene with sola and elastic search is taking care of this field.
- On top of map reduce, functional style expression and SQL type query languages are appearing. Big data is mathematics and functional languages are the best tools.
After watching this presentation, I’ve look around the official sites of mentioned softwares.
Spark provides in-memory computing, which lets it query data faster than disk-based engines like Hadoop.
It provides SQL like query engine, and provides 10x faster performance compared with Hadoop based Apache Hive. The following blog post describes the details about how impala is ‘newly’ created.
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
Lucene (https://lucene.apache.org/core/) and Sola (http://lucene.apache.org/solr/)
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project.
HIVE, IMPALA AND PRESTO – THE WAR ON SQL OVER HADOOP
Nice comparison blogpost.
Solr vs. ElasticSearch
Discussion about comparison.
Following after the previous trial (Installing Elixir using Chef), I’ve uploaded the initial version of dynamo cookbook on the GitHub repo.
It deploys the elixir dynamo application from the git server and starts up as service. Very basic application is working fine in my Vagrant environment, though there’re some more work to do (e.g. DB migration, remove ruby dependency, etc.).
I was working on Chef over the holidays. I’m finally getting to grasp its concept after the certain amount of struggle, though there still be a lot to go. I’ve uploaded the cookbook to install elixir, which I could create so far. I was originally thinking to write a script for deploying dynamo web application, but being stopped at the elixir part yet.
Recent growth in providers (AWS, Digital Ocean, etc.) or tools (Vagrant, Docker, etc.) pretty much makes it easier to experiment on clean servers, but still need to face some complexities on actual OS and distributions. I was playing around Ubuntu and CentOS on different versions, and each has different dependencies and errors, and I needed a lot of try and errors.
Just watched the above course. I was loving the play-by-play series of PeepCode, but there was no updates after the integration with Pluralsite. But finally, it seems new series are added with Pluralsight’s course authors.
This time, John Papa and Ward Bell are paired-up to develop a flight management system. It’s similar structure as Play by Play: Aaron Patterson and Corey Haines, but it’s more focused on requirement discussion rather than coding.
It’s a relatively complex task with ambiguous requirements, but they’re nicely clarifying the key factors through the discussions, and also defining the scope to be focused within the limited 2-hour timeframe.
In the coding part, template engine (The Breeze/Angular SPA Template?) was being applied. It was kind of a brief introduction, but as I didn’t have much experience on VisualStudio, I found interesting to watch.