Category Archives: Book

Watching/Reading – Reactive Design Patterns – Cont.

https://www.manning.com/books/reactive-design-patterns

Continuing to read the book and additional notes are,

  • Asynchronous interfaces helps decoupling modules and provides horizontal scalability.
  • Location transparency – By writing the application in a distributed manner, even for the local processing, your application can become resilient to the failure. It makes writing tests easier too.
  • Akka’s Actor systems (http://doc.akka.io/docs/akka/snapshot/scala/actors.html) provides simple interface for message passing between components. It provides simple interface and it makes it easier to tests through stubs/mocks on the interface.
  • You need to work with non-determinism in distributed systems. One approach is explicitly synchronize for avoiding ordering issue. The other is to avoid/isolate computations which will be affected by the computation order.
  • Back-pressure and flow control becomes required between components. Reactive stream (http://www.reactive-streams.org/) is one way to control flow.
  • Netflix is building Chaos Monkey which randomly disabled production instances for ensuring system resiliency (https://github.com/Netflix/SimianArmy).

This book covers various concepts and methodologies for building solid system in a distributed manner. Also, the examples written with Scala is relatively simple and good to understand.

Also, the following coursera courses covers similar topic and would help gaining knowledge on this topic too.

Advertisements

Watching/Reading – Reactive Design Patterns

Just started reading the book (2 chapters), but it seems to contain various good insights.

The reactive systems are gaining popularity these days along with the trends in computing like hardware (cpus/cores architecture) and software (granular computing resources – virutal machines or containers).

The reactive systems provide scalability and resiliency, and the need to be designed from ground up. The traditional synchronized processing fundamentally limits the scalability of applications and resilience to the failures. By designing the asynchronous processing on distributed computing resources in to the system, the handling becomes part of the scenario.

Some notes so far are,

  • One difficult aspect of distributed system is caused by the partial failure instead simple all or nothing.
  • Aiming for fault tolerance instead of just avoiding failures, are more reasonable approach, as completely avoiding failure cannot be fully successful. Providing resilience goes one step further from tolerance which aims for recovering to original state and functionality.
  • Distribute and compartmentalize are the only generic methodologies to protect the whole system from failing. However compartments are not fully isolated, the failure can cascade and result in a whole system down (ex. partial failure overloads the system and chains into to the system down).
  • Supervision is one way to provide resilient system. If the bending machine is broken, you will be calling the maintenance operator instead of trying to fixing by itself.

Reading – Programming Phoenix

https://pragprog.com/book/phoenix/programming-phoenix

Just completed reading through the B2.0 of “Programming Phoenix” book during the weekend. I hadn’t been able to use phoenix some time since pre-1.0, but this book was a good material for catching up the latest features.

The book starts with the strong statements about the benefits of phoenix and underlying elixir, as in the subtitle – Productive |> Reliable |> Fast. Then, the main content explains the features by gradually building up simple and concise application called Rumbl. Building up standard CRUD operations to user models backed by Ecto with postgreSQL DB, and implementing authentication using the model didn’t take long and it was a good exercise.

Definitely looking for the remaining part to complete, which includes the web-socket channel.

Reading – Soft Skills: The Software Developer’s Life Manual

http://www.amazon.com/Soft-Skills-software-developers-manual/dp/1617292397/

A well-written and well-covered guidance book for software engineers. Putting some distance from technical topics, it focuses on general productivity and career-developments useful for software engineers. Set of small chapters are providing good starting points for variety of topics. Also, essences are extracted from many good books (Rework, The Power of Habit, The Willpower Instinct, etc.) which guides you to further details.

Some of my notes are,

  • Employment and Career
    • One viewpoint is considering your employer as customer and think on what you’re offering to them. Also, market your offerings to the current employer or other possible future employers.
    • In many cases hiring managers make decisions based on the social aspects of candidates rather than the technical aspect of the sill itself. Blogging and social media presence or referral from other engineer is important.
    • Freelancing is one way to gain controllable career, but normally you need to earn twice as your salary as employee, considering the extra cost for your own business. Standard hourly rate in U.S. can be 50 dollars per hour and you may charge 100 dollars for example.
    • It’s important to specialize. If you have specialized skill-set, it’s easier to sell. For example, the author has experience in building automated-testing framework, and selling a cost-benefit compared with building from scratch without specialized knowledge.
  • Professionalism and Improving Skill
    • Consistency is a key for professional work. Making a habit makes difference.
    • Professional needs to identify when to say no. The client and software engineer can be like doctors and patients. The doctors don’t just do what patients asked to do. Analyze the issue and propose appropriate methods.
    • Teaching and mentoring someone is a good method for learning which provides reorganizing process of your understandings. Don’t be too afraid to tell your knowledge confidently even when you’re not sure if it’s correct or not.
    • Becoming accountable to yourself without relying on external trigger of motivation. Having a internal trigger provides more consistent and controllable behaviors.
    • Breaking down tasks into smaller chunks makes it easier to start working on it, as represented as the structure of this book.

NoSQL Distilled

Looking around the NoSQL resources, and watched/read the above ones. It has great explanation for the advantage/disadvantages of NoSQL approach over traditional relational databases. The followings are some notes.

Notes

Application development productivity

A lot of application development effort is spent on mapping data between in-memory data structures and a relational database. A NoSQL database may provide a data model that better fits the application’s needs, thus simplifying that interaction and resulting in less code to write, debug, and evolve.

For application developers, the biggest frustration has been what’s commonly called the impedance mismatch: the difference between the relational model and the in-memory data structures.

NoSQL’s is often driven by the scaling capability through clusters, but also development productivity is one major factor. Mapping database with in-memory data structure has been the pain in the neck during the application development. Though ORM (Hibernate, ActiveRecord, etc.) alleviates some of them, it still requires certain care and efforts to gain both productivity and effective performance.

Aggregates orientation

Aggregate orientation takes a different approach. It recognizes that often, you want to operate on data in units that have a more complex structure than a set of tuples.

As we’ll see, key-value, document, and column-family databases all make use of this more complex record.

Aggregate is a term that comes from Domain-Driven Design. In Domain-Driven Design, an aggregate is a collection of related objects that we wish to treat as a unit.

In relational databases, data is normalized and split into multiple tables. Instead, NoSQL is storing the primary data with related objects into single item. This approach focuses on maintaining data integrity for each item, rather than a transaction that handles multiple independent tables and rows, which RDB is taking. This aggregate-oriented approach makes it easier to distribute the data over multiple cluster nodes.

Flexible Schema

A common statement about NoSQL databases is that since they have no schema, there is no difficulty in changing the structure of data during the life of an application. We disagree; a schemaless database still has an implicit schema that needs change discipline when you implement it

The claim that NoSQL databases are entirely schemaless is misleading; while they store the data without regard to the schema the data adheres to, that schema has to be defined by the application, because the data stream has to be parsed by the application when reading the data from the database.

NoSQL’s flexible scheme allows concentrating on the domain design, but schemaless database still has an implicit schema that needs change discipline when you implement it. Also, as the NoSQL’s aggregated data is not normalized, analyzing the data from different perspective from the primary-key, and it requires to create index (materialized views) for them. These factors need to be taken cared.

Consistency and Availability

The CAP theorem states that if you get a network partition, you have to trade off availability of data versus consistency.

NoSQL advocates the capability of data distribution. However, there’s a trade-off between consistency and availability. It often involves business decision about which is important for the provided services.

Reading – Seven Concurrency Models in Seven Weeks

http://pragprog.com/book/pb7con/seven-concurrency-models-in-seven-weeks

Just read through the book. Though it’s beta release yet, major topics are covered.

It starts with a basic thread/locking mechanism (traditional dead-locking problems on shared data), and then goes through the functional programming aspects – immutable data structures, future/promise and actors (how to divide problems into small chunks, which can be executed in concurrent and parallel). This book nicely uses several languages (Java, Closure, Elixir) to capitalize the benefits of each concurrency model.

The followings are some reading notes.

Concurrency and Parallelism

Concurrency sounds understandable, but it’s a little difficult to clearly define if we consider parallelism along with it. This book describes as follows using a quote from a presentation.

Concurrency is not Parallelism (it’s better)

Concurrency is about dealing with lots of things at once.

Parallelism is about doing lots of things at once.

Concurrency provides a way to structure a solution to solve a problem that may (but not necessarily) be parallelizable.

As indicated in the book, concurrency and parallelism concepts are often confused and sometimes inter-mixed. I’m not confident enough yet, but concurrency may be one method to scale-out the problem solving, which is often utilized by parallel executing devices/platforms.

Concurrency in Java

Java provides built-in thread and locking mechanism, but also some libraries are provided for assisting concurrency.

  • java.util.concurrent packages provides utility data structures and functions to support concurrent and parallel programming (Package java.util.concurrent)
  • As indicated in “Threads and Locks” section, ConcurrentHashMap can be used to reduce the bottleneck of shared data compared with standard HashMap.
  • Mutable states can be hidden inside the library functions (ex. SimpleDateFormat), and even a simple code can cause conflicts in concurrent execution. It requires caution to use multi-threading.

Actors and Object-Oriented Programming

Actors are very lightweight concurrent entities, which communicates each other with message passing.

we can think of actors as the logical extension of object-oriented programming to the concurrent world. Indeed, you can think of actors as more object-oriented than objects, with stricter message passing and encapsulation.

It’s an interesting viewpoint. The method invocation corresponds to message passing. Java’s CORBA or remote method invocation used to be discussed for providing similar concept. They didn’t go mainstream, but recent buzz around concurrency and actors might go different way.

Actors are explained with Elixir sample codes. It’s underlying erlang’s model is now applied to JVM through akka, and go-lang provides similar functionality through Goroutines. This actor model would be a major driving factor for distributed system, and could be a standard programming paradigm in near future, by partially replacing the traditional thread models.

Reducer in Closure

Reducers – A Library and Model for Collection Processing

Closure’s reducer can provide the benefit of concurrent execution (for map-reduce type ones) along with functional simplicity.

Other Notes

There’re a lot of large-scale system topics lately. The distributed query system Presto may be one example. Or, New Relic’s Rubicon project (New Relic Analytics Aims to Speed IT Problem Solving) are another interesting project as large scale query services.

Functional programming style provides mathematical or abstract form of calculation, but recent programming languages and library functions are bringing these into the practical world. It’s interesting to see how the programming will be structured in 5 years later.

Remote – Office Not Required @ Book

http://37signals.com/remote/

Just completed the “remote” book from 37 signals. The above site has nice introductory video for 37 signals case.

I sometimes ponder about working apart from the office – from home, coffee shop or any other place. However, in the actual life, everyone is packed in the office from the morning to the night, with the “physical” meeting filled in the everyone’s scheduler. There’re difficulties getting out of this condition for myself. But, I found this book insightful about working remotely discussed on several different perspectives.

The followings are some of my reading notes.

Distractions

As discussed in the book, it’s a major issue for completing tasks which requires focuses. Many of the skilled members get interruptions from the colleagues asking for helps constantly, or from scheduled meetings. Then, some of them are forced to come very early in the morning, or work over the weekend. It’s a tough situation.

However, working remotely has different types of distractions or temptations. Some workers may lose focus, though motivated/passionate workers might go different direction (overwork). Freedom is not always a happy path, and it requires certain commitment or regimen to make it work. Some ways to avoid distractions are discussed in the book, but still it would be a large fear. With this in mind, I like the concept of more loosely implemented way, like work from home in the morning and come office afternoon.

Communications

We sometimes work with contractors in foreign countries, and the major communication path is e-mail and messaging. It sometimes cause issues, due to the misunderstanding on expected outputs. It may not only be about “communication”, but F2F chatting has been working as a safe-net for avoiding the this type of issues. It can/should be replaced with phone calls, WebEX or video calls, but it it would require certain cares from members or leaders for maintaining the constant communications. As indicated in the book, setting up the online infrastructure or having constant F2F meet-up would be a good approach.

Workplace

One interesting point is there are myriad reasons why people have to—or want to—move, and they don’t necessarily have to leave the company if they’re working remotely. Maybe that’s the one good reason that 37 signals can hire or maintain the talented people from many locations. Most of the time, employees are tied to one office location, and forced to leave the company due to certain reasons. It’s unfortunate.

Programming Elixir

Just trying out Elixir (http://elixir-lang.org/) using above book and screencast, as part of my concurrency series.

It’s functional programming language. I had been avoiding the purely functional languages after I studied Lisp at school, and only relationship with them was copy-pasting emacs lisp.

However, Elixir seems reasonably well structured for beginners with the essence of Ruby, while keeping the power of underlying Erlang VM. The official documents are still limited, but the book “Programming Elixir” provides a good guidance for understanding the interesting paradigm of this language.

Then the “Meet Elixir” screencast presented by José Valim describes good insights of background philosophy of Elixir, which is pretty much interesting and quite different from the standard object-oriented ones. The contents are relatively presented fast, so maybe it’s good to start from the book.

The power of Elixir would come out on the concurrency part – featured by Erlang VM. The following example (it’s from the book, though slightly modified) spawns 10000 processes, and each process counts up one. It just takes sub-second, interesting.

$ elixir chain.exs
time = 122ms, msg = Result is 10000

chain.exs

defmodule Chain do
  @moduledoc """
  Spawns Erlangs's lightweight processes for counting up numbers.
  """

  # When receiving a number from the previous process, 'receive' block is executed with number 'n'.
  # Then, propagate the 'n + 1' to the next process.
  def counter(next_pid) do
    receive do
      n -> next_pid <- n + 1
    end
  end

  # Creates 'n' number of processes to count upto 'n'
  def create_processes(n) do
    # loop 'n' times to spawn processes, then returns the process id of last one, which comprises
    # of 'n' process chain starting from 'self' process.
    last = Enum.reduce 1..n, self,
             fn(_, send_to) -> spawn(Chain, :counter, [send_to]) end

    # send the initial number 0 to the last process.
    last <- 0

    # wait for the 'n' length propagation to complete back to 'self'
    receive do
      final_answer when is_integer(final_answer) ->
        "Result is #{inspect(final_answer)}"
    end
  end

  # Execute the process creation with the timer count.
  def run(n) do
    {time, msg} = :timer.tc(Chain, :create_processes, [n])
    IO.puts "time = #{div(time, 1000)} milliseconds, msg = #{msg}"
  end
end

# spawning 10000 processes which each counts up one.
Chain.run(10000)

Understanding Computation @ Book

Understanding Computation: From Simple Machines to Impossible Programs

This book covers fundamental mechanism of computation, like automaton or turing machine. Though it’s a kind of text-book topic, this book brings a fun out of it through implementing interpreters for a simple language, using Ruby.

Ruby is popular for implementing DSL, but this book is going further on implementing programming language itself. I once learned this kind of topic while I was in university, but I’ve almost forgotten most of them. One of the reason was maybe I felt a little boring at the time, just with mathematical theories.

Actual code works.

It’s a lot of information covered in this book, and I haven’t been able to read through the book yet. But, I definitely would like to take time to try-out each of the examples.

Explore It! @ Book

Explore It!: Reduce Risk and Increase Confidence with Exploratory Testing

A good book that covers the exploratory testing methods along with general insights on testing.

“Exploratory Testing” is about exploring the system you’re working on. It’s not only for simple functional testing, but also an discovery process for identifying whether the system satisfy the requirements. It has some aspects of creative process to find out the paths and directions, and quite involves high observation skills and experiences.

The fundamentals are basic testing techniques, like picking up boundary conditions and invalid formats, etc. However, it’s sometimes difficult to dig deep on everything in the system for every viewpoint. Exploring a system requires skills and insights, in order to avoid being lost in the infinite landscape.

Another interesting points discussed were about requirements meeting and conflict between testers and developers. Testers and developers can have conflicts because sometimes their short-time goals (getting the current work done) becomes different, though the long-term goals would be the same (making good products). As indicated in the book, discussing the features requirements among members in the early phase, would be an important step for avoiding the reworks.

Notes

  • Exploring involves 3 factors: Target (what to test), Resource (what you bring with) and Information (what you’re hoping to find).
  • “Observation” is one important factor in testing. Some weired noise in the hard-drive could indicate a serious issue. Console and logs have plenty of information, and checking them while testing helps identifying the possible issues.
  • Even the smallest system has number of variations to explore. Leave open possibilities that there is something you haven’t considered yet.
  • Think about how to validate the result. Sometimes it’s difficult, especially if you’re not the expert in the system domain. One approach is to identify the “Never and Always” rules that applies to your system (ex. In server-based systems, you should never make the system unavailable to other users by a user’s certain action).
  • For digging deeper into system, one option is to list up related nouns and verbs, and then combine them. It provides combinations of actions that doesn’t make sense, but thinking about it invokes your creativity to find interesting viewpoints.
  • Exploring the system well helps refining the system requirement too. One example is inconsistencies between behaviors, that is relatively easier to avoid in the early phase of the development.
  • Testers who reports bugs can be seen as “making stuff up” from developers by piling up unexpected behaviors through invalid scenarios. It’s because developers can see them as new requirements, rather than defects (they were not initially indicated! kind of response). If you don’t calibrate expectations with the team early on, you’re likely to end up arguing about real scope of the features later.