Site Archive (Complete)
Architecture & Design
IF YOU BUILD IT

... Will they Come?

by Arnon Rotem-Gal-Oz

May 2006


May 31, 2006

The Open Closed Principle


On one hand, the Open Closed Principle (OCP), defined by Bertrand Meyer nearly 20 years ago, means that classes or components should be open for extension and adaptation. On the other hand, OCP means they should be closed to avoid cascading of changes to existing clients of the code.

In his paper "Protected Variation: The Importance of Being Closed", Craig Larman says that OCP demonstrated the original intent behind the object-oriented concept of information hiding.

OCP is explained both in Larman's paper as well as by "The Open-Closed Principle" by Robert C. Martin. I won't attempt to compete with them in terms of explaining OCP's meaning, but I would like to highlight some points and issues in regard to OCP.

It is easy to see the benefit of having a class that answers this principle: When you need to add a requirement, instead of breaking dependent code (and tests) you just extend it somehow and everything is nice and dandy. Furthermore, violating OCP can result in Rigidity,Fragility, and Immobility.

But how do you do that? The obvious (and naïve) answer is inheritance. Every time something needs to change just add a sub-class. The parent class is not changes and voilà. However, if you add sub-classes all the time you'd get "lazy classes" or freeloaders--sub-classes without a real reason for existence not to mention a maintenance nightmare.

Thus, sub-classing is an option but we need to consider carefully where to apply it. Other (more practical ) OCP preserving steps include:

To sum up, OCP is an important principle, keeping it results in better design. There are several practical and common steps we can take to help keep this principle and handle changes better.

Posted by Arnon Rotem-Gal-Oz at 07:16 AM  Permalink |


May 29, 2006

Dear Abbey Dept.


Since I have been blogging for about a year now on this and other blogs, its time to make this blog into something with more two-way communications.

Consequently, I am going to run a little experiment for a few weeks and see how it goes.

The idea is as follows: If you have an interesting architectural or design dilemma, drop me an email at ask@rgoarchitects.com I'll pick one issue per week and post the dilemma (anonymously) plus voice my opinion (and/or suggested solution)--and then everyone else can chime in with their comments and insight which hopefully will shed some light on the subject.

I'd be interested to hear both your opinions on this initiative and, of course, interesting dilemmas you are facing. Again, send your dilemmas to ask@rgoarchitects.com)

Posted by Arnon Rotem-Gal-Oz at 05:37 AM  Permalink |


May 26, 2006

Object-Oriented Principles


It's sad to say, but the 7 Deadly Sins of Design are widely practiced.

One reader even commented that he has seen all of the first 6 in a single project--definitely more prevalent than we want. On many occasions, I've even seen these types of problems in designs made by senior programmers.

Trying to think of why and how this happens, it seems at least some of the problem can be traced to how object orientation is taught. Programming courses usually focus on the syntax and mechanics of OO in general, and the programming language being taught in particular. Attending class, you hear a lot about objects, classes, state, methods, and constructors. Courses also teach you about inheritance, polymorphism, and encapsulation. Still, the focus is mostly on what it is and how to do it in insert your favorite language here programming language, but not on the motivations. Another problem is that courses usually talk about synthetic examples (shapes, pets, etc.), not real-world problems.

The same is true to "design courses." Often, these courses only teach UML syntax, not how to design or analyze. Or they stress design patterns and their implementation but not the motivations for the solution; for example, use cases is a functional decomposition technique. I've seen many developers "analyze" use cases and come up with a class model where the logic of what should be a single class is fragmented in per-use case classes.

This is a generalization, and I am sure there are some excellent courses out there. Moreover, I probably haven't seen enough courses to really pass judgment. However, I have had the chance to work with many developers and the "knowledge patterns" described above are recurrent.

Anyway, there is no point crying over spilled milk. Instead of just complaining, I thought I'll take the same path I took with the fallacies and dedicate some of the future posts to some of the principles that do lead to better designs. Specifically these posts will talk about:

  • Protected Variation/Open Closed Principle
  • Inversion Of Control
  • Cohesion--Single Responsibility Principle
  • Interface Segregation principle
  • Design-by-Contract/Liskov substitution principle
  • Dependency Inversion principle
  • Dependency Injection
  • YAGNI

Note that these principles are not my invention. These principles are the work of people like Bertrand Meyer, Robert C. Martin, Barbara Liskov, Martin Fowler, Tom Demarco, David Parnas, and probably a few others I don't know since some of these principles are 20 years or more old.

Posted by Arnon Rotem-Gal-Oz at 06:42 AM  Permalink |


May 25, 2006

The Essential Unified Process


I first heard about the "Essential Unified Process" several months ago, back when Microsoft's Visual Studio Team System was still in beta.

About that time, Microsoft and Ivar Jacobson--one of the three amigos who gave us UML--announced that Jacobson was working on a new process that would learn from the mistakes of RUP (Rational Unified Process) and other methodologies, resulting in a process that is both architecture centric and agile.

This certainly looked interesting and I became very excited. However, I began to get worried upon reading "The Essential Unified Process: An Introduction" published on Ivar's site. In particular, the paper promised that EUP will be "complete, sufficient, comprehensive, adaptive, scalable , flexible, lightweight, agile, universal" and so on and so forth.

This bothered me because my experience as an architect has shown me that the process is always a matter of trade-offs. You can't have everything and you need to determine which quality attributes are the most important, then focus on those. When something promises that it is perfect, I begin wondering what's the catch.

Well, I guess we'll find out soon enough, since the EUP is to be formally launched at the Natural History Museum in London on June 27, 2006. What's left now is to wait and see.

Posted by Arnon Rotem-Gal-Oz at 06:27 AM  Permalink |


May 24, 2006

Domain Specific Modeling


Domain Specific Modeling (DSM) is about distilling domain knowledge into a meta-modeling language (using a meta-meta-modeling language). The resulting language can be used to describe domain problems (with appropriate tooling). The idea is to take the resulting designs and, through the use of frameworks and code generation, create applications (with better results compared with traditional approaches).

For example MOF is used to describe UML 2.0. A class diagram (a domain-specific language for describing classes) is used to model C# or Java classes. Through the use of simple template you can actually generate the class definition code in the language of your choice. The raison d’être of DSMs, however, is not to create general-purpose constructs, but rather languages that can be used to generate application (or application fragments) in specific vertical domains (for example, creating a language that can be used to create/define telecom billing applications).

There is one general attempt at achieving the DSM dream--OMG's MDA initiative. (There are tools from several vendors that support this effort. I have experience with three of them: CodaGen, Rational Suite, and I-Logix Rhapsody). Using MDA, you define platform independent models (PIMs) which are made domain specific using a UML profile.

Another approach is to create real domain--specific languages. There are a few vendors with offerings in this field. Honeywell (DOME), MetaCase (Method Workbench & MetaEdit+), and an open source product called GME. For more on domain-specific modeling and MetaCase's approach, see "Domain-Specific Modeling" by Risto Pohjonen and Steven Kelly.

The newest player in this market is Microsoft with its Software Factories initiative. As is common with Microsoft, its goal is to bring the idea of DSLs to masses. For more on Microsoft's approach, see "Domain-Specific Languages & DSL Workbench" by Griffin Caprio.

Sometime ago I opined on the current state of Software Factories. The gist of that post is that distilling domain knowledge to the point of making it into a language is hard in general, and Microsoft's Software Factories initiative in particular is still far from really helping in that area. It is interesting to note that Guy Ron, MCS manager in Israel voiced similar thoughts in TechEd Israel (as reported by Roy Osherove ). Steven Kelly (CTO of MetaCase mentioned above) also blogged about that (here and here).

At the end of the day, DSLs and Domain Specific Modeling is a good direction for the industry to strive for. However, I stand by my opinion that DSLs are still far away from gaining general acceptance. One problem is proper tooling (which is getting better and better). The other, more significant problem is with synthesizing languages from a domain.

Posted by Arnon Rotem-Gal-Oz at 05:55 PM  Permalink |


May 22, 2006

The 7 Deadly Sins of Design


Having just finished describing the 8 fallacies of distributed computing (which, by the way, you can get in a single PDF document), I thought I'd give you another list--the 7 Deadly Sins of Software Design.

(The first four can be found in Robert C. Martin's paper Principles and Patterns*)

  1. Rigidity. Make it hard to change, especially if changes might result in ripple effects or when you don't know what will happen when you make changes.
  2. Fragility. Make it easy to break. Whenever you change something, something breaks.
  3. Immobility. Make it hard to reuse. When something is coupled to everything it uses. When you try to take a piece of code (class etc.) it takes all of its dependencies with it.
  4. Viscosity. Make it hard to do the right thing. There are usually several ways to work with a design. Viscosity happens when it is hard to work with the design the way the designer intended to. The results are tricks and workarounds that, many times, have unexpected outcomes (esp. if the design is also fragile).
  5. Needless Complexity. Over design. When you overdo it; e.g. the "Swiss-Army knife" antipattern. A class that tries to anticipate every possible need. Another example is applying too many patterns to a simple problem etc.
  6. Needless Repetition. The same code is scattered about which makes it error prone.

And in closing the list, the 7th Deadly Sin of Software Design is (the obvious) "Not doing any".



* While I don't intend to expand on the sins I will be writing a few posts on the principles of good designs, many of which are described in this paper.

Posted by Arnon Rotem-Gal-Oz at 01:26 PM  Permalink |


May 19, 2006

Distributed Computing Fallacies Explained: "The Network Is Homogeneous "


The final Distributed Computing fallacy is "The network is homogeneous."

While the first seven fallacies were coined by Peter Deutsch, I read that the eighth was added by James Gosling six years later (in 1997).

Most architects today are not naive enough to assume this fallacy. Any network, except maybe the very trivial ones, are not homogeneous. Heck, even my home network has a Linux based HTPC, a couple of Windows based PCs, a (small) NAS, and a WindowMobile device--all connected by a wireless network. What's true on a home network is almost a certainty in enterprise networks. I believe that a homogeneous network today is the exception, not the rule. Even if you managed to maintain your internal network homogeneous, you will hit this problem when you would try to cooperate with a partner or a supplier.

Assuming this fallacy should not cause too much trouble at the lower network level as IP is pretty much ubiquitous (e.g. even a specialized bus like Infiniband has an IP-Over-IB implementation, although it may result in suboptimal use of the non-native IP resources.

It is worthwhile to pay attention to the fact the network is not homogeneous at the application level. The implication of this is that you have to assume interoperability will be needed sooner or later and be ready to support it from day one (or at least design where you'd add it later).

Do not rely on proprietary protocols--it would be harder to integrate them later. Do use standard technologies that are widely accepted; the most notable examples being XML or Web Services. By the way, much of the popularity of XML and Web Services can be attributed to the fact that both these technologies help alleviate the affects of the heterogeneity of the enterprise environment.

To sum up, most architects/designers today are aware of this fallacy, which is why interoperable technologies are popular. Still it is something you need to keep in mind especially if you are in a situation that mandates use of proprietary protocols or transports.

Posted by Arnon Rotem-Gal-Oz at 06:53 AM  Permalink |


May 18, 2006

Should Architects Code: Round 3


Reading the comments on my previous two posts on whether architects should code (here and here) as well as the comments on Johanna Rothman's posts (here, here and here) leads me to a few observations:

The first apparent thing is that the issue is a very loaded. Some people believe it is essential for architects to code, while others (like me) believe that their time is better spent on other issues. (That said, it seems that a small majority of the commenters think architects should code as part of the development team--at least for feedback purposes if nothing else.)

There is a wide consensus (me included)that architects should know how to code and have extensive experience in coding. It is also agreed that architects should be involved in the project--that is, not just drop off the architecture, then disengage.

I still believe that when the project is big enough (that is, big enough to warrent more than one team working on it) the project is better served by the architect getting involved in all the teams, rather than participating as a developer in one of them. If you are an architect and develop as part of the development team you are (or should be anyway) committed--meaning you need to deliver the piece of code under your responsibility at an acceptable quality level as other developers. Which is exactly why you would be less likely to deliver on your responsibilities for the total quality of the project. I assume some of the differences in opinion can be attributed to disagreement on what software architecture is , at least when compared to design).

I also think those who think architects must code see the architect as some sort of a lead developer again. I don't buy that. The architect's role is much broader than that (see also this post by Kevin Seal, which also discusses this issue). I see a holistic view of the architect role, which is making sure the product is delivererable. This may translate to the architect coding a module or two, but it can also translate to a lot of other things. Examples from my experience as an architect include preparing initial cost estimates, iteration planning, helping debug and testing, solving installation problems, analyzing requirements, conducting design and code review, design, and prototyping (yes, that's coding but as I said in the previous posts, that's not writing the production code and this is not having to meet deadlines etc.).

I also liked a comment by Graham Oakses on one of Johanna's posts :

My experience is that an architect is pulled between three poles--the product, the team and the client. The product pole pulls you towards managing the "conceptual integrity" of the design. The team pole pulls you towards mentoring people, helping them build skills, etc (which may mean consciously letting someone write code that you could do much better yourself). The client pole pulls you towards translating between the technical and the client domains (which is often where you get pulled into powerpoint). You need to trade these poles off differently on every project...

To sum up, the answer to "should architects code? " is like so many things in life is--it depends.

Posted by Arnon Rotem-Gal-Oz at 06:35 AM  Permalink |


May 17, 2006

Distributed Computing Fallacies Explained: "Transport Cost is Zero "


On to Distributed Computing Fallacy number 7--"Transport cost is zero". There are a couple of ways you can interpret this statement, both of which are false assumptions.

One way is that going from the application level to the transport level is free. This is a fallacy since we have to do marshaling (serialize information into bits) to get data unto the wire, which takes both computer resources and adds to the latency. Interpreting the statement this way emphasizes the "Latency is Zero" fallacy by reminding us that there are additional costs (both in time and resources).

The second way to interpret the statement is that the costs (as in cash money) for setting and running the network are free. This is also far from being true. There are costs--costs for buying the routers, costs for securing the network, costs for leasing the bandwidth for internet connections, and costs for operating and maintaining the network running. Someone, somewhere will have to pick the tab and pay these costs.

Imagine you have successfully built Dilbert's Google-killer search engine (maybe using latest Web 2.0 bells-and-whistles on the UI) but you will fail if you neglect to take into account the costs that are needed to keep your service up, running, and responsive (E3 Lines, datacenters with switches, SANs etc.). The takeaway is that even in situations you think the other fallacies are not relevant to your situation because you rely on existing solutions ("yeah, we'll just deploy Cisco's HSRP protocol and get rid of the network reliability problem") you may still be bounded by the costs of the solution and you'd need to solve your problems using more cost-effective solutions.

Posted by Arnon Rotem-Gal-Oz at 07:47 AM  Permalink |


May 16, 2006

Distributed Computing Fallacies Explained: "There Is One Administrator"


The sixth Distributed Computing Fallacy is "There is one administrator". You may be able to get away with this fallacy if you install your software on small, isolated LANs (for instance, a single person IT "group" with no WAN/Internet). However, for most enterprise systems the reality is much different.

The IT group usually has different administrators, assigned according to expertise--databases, web servers, networks, Linux, Windows, MFm and the like. This is the easy situation. The problem is occurs when your company collaborates with external entities (for example, connecting with a business partner), or if your application is deployed for Internet consumption and hosted by some hosting service and the application consumes external services (think Mashups). In these situations, the other administrators are not even under your control and they may have their own agendas/rules.

At this point you may say "Okay, there is more than one administrator. But why should I care?" Well, as long as everything works, maybe you don't care. You do care, however, when things go astray and there is a need to pinpoint a problem (and solve it). For example, I recently had a problem with an ASP.NET application that required full trust on a hosting service that only allowed medium trust--the application had to be reworked (since changing host service was not an option) in order to work.

Furthermore, you need to understand that the administrators will most likely not be part of your development team so we need provide them with tools to diagnose and find problems. This is essential when the application involves more than one company ("Is it their problem or our's?"). A proactive approach is to also include tools for monitoring on-going operations as well; for instance, to allow administrators identify problems when they are small--before they become a system failure.

Another reason to think about multiple administrators is upgrades. How are you going to handle them? How are you going to make sure that the different parts of our application (distributed, remember?) are synchronized and can actually work together; for example, does the current DB schema match the current O/R mapping and object model? Again this problem aggravates when third parties are involved. Can your partner continue to interop with our system when we made changes to the public contract (in an SOA) so, for example, you need to think about backward compatibility (or maybe even forward compatibility) when designing interoperability contracts.

To sum up, when there is more than one administrator (unless we are talking about a simple system and even that can evolve later if it is successful), you need to remember that administrators can constrain your options (administrators that sets disk quotas, limited privileges, limited ports and protocols and so on), and that you need to help them manage your applications.

Posted by Arnon Rotem-Gal-Oz at 04:13 AM  Permalink |


May 15, 2006

Should Architects Code: Round 2


About the same time I wrote the post on whether architects should code, saying that architects should be able to prototype but shouldn't be part of the dev team (in the sense that the architect shouldn't get coding tasks that results in production code), Johanna Rothman wrote a blogpost that claimed architects must code .

Two days ago she posted a more detailed explanation of her view. I agree with most of the points she made:

  1. Architects need to participate in the project; that is, not be some outsider who just drops her architecture on the team and leaves).
  2. The best way to test a design is to code and run it.
  3. It is beneficial for architects to know to code.
  4. It is important that architects understand the implications of their decisions on the code and developers.

I don't see how architects taking coding tasks serves the greater good, versus their monitoring teams that code and making sure all aspects of the architecture actually fit the problem and work. Again, this may work on smaller projects, but probably not on larger ones.

You may also want to look at two related posts I made in the past
SAF Architecture Evaluation: Evaluation in Code talks about some of the ways architecture can be validated in code.
SAF Deployment: What to do when the architecture seems stable? talks about the architect's involvement in the project when they think the architecture is "finished".

A couple of points regarding the analogy Rothman uses--that is, architects who design bathrooms for hotels. Building architects are seldom a good analogy for software architects (I once used it as well). However, there are far too many differences (maybe I'll blog about that sometime in the future).

This brings me to the second point. This analogy doesn't serve Rothman's point well since building architects never actually participate in laying down brick or installing bathrooms. The fact that hotel bathrooms are not comfortable means that this quality was low on their priorities. In any event, verifying if a bathroom is usable--you don't have to install it just use it. (If you do take the analogy, you don't have to code it just stick around to see what's going on.)

Posted by Arnon Rotem-Gal-Oz at 05:01 AM  Permalink |


May 12, 2006

Distributed Computing Fallacies Explained: "Topology Doesn't Change"


The fifth Distributed Computing Fallacy is about network topology. "Topology doesn't change." That's right, it doesn’t--as long as it stays in the test lab.

When you deploy an application in the wild (that is, to an organization), the network topology is usually out of your control. The operations team (IT) is likely to add and remove servers every once in a while and/or make other changes to the network ("this is the new Active Directory we will use for SSO ; we're replacing RIP with OSPF and this application's servers are moving into area 51" and so on). Lastly there are server and network faults which can cause routing changes.

When you're talking about clients, the situation is even worse. There are laptops coming and going, wireless ad-hoc networks , new mobile devices. In short, topology is changing constantly.

What does this mean for the applications we write? Simple. Try not to depend on specific endpoints or routes, if you can't be prepared to renegotiate endpoints. Another implication is that you would want to either provide location transparency (e.g. using an ESB, multicast) or provide discovery services (e.g. a Active Directory/JNDI/LDAP).

Another strategy is to abstract the physical structure of the network. The most obvious example for this is DNS names instead of IP addresses. Recently I moved my (other) blog from one hosting service to another. The transfer went without a hitch as I had both sites up an running. Then when the DNS routing tables were updated (it takes a day or two to the change to ripple) readers just came to the new site without knowing the routing (topology) changed under their feet.

An interesting example is moving from WS-Routing to WS-Addressing. In WS-Routing a message can describes it own routing path--this assumes that a message can know the path it needs to travel in advance. The topology doesn't change (this also causes a security vulnerability--but that's another story) where the newer WS-Addressing relies on "Next Hop" routing (the way TCP/IP works) which is more robust.

Another example is routing in SQL Server Service Broker. The problematic part is that the routes needs to be set inside service broker. This is problematic since IT now has to remember to go into Service Broker and update routing tables when topology change. However, to mitigate this problem the routing relies on next-hop semantics and it allows for specifying the address by DNS name.

Posted by Arnon Rotem-Gal-Oz at 05:02 AM  Permalink |


May 11, 2006

Distributed Computing Fallacies Explained: "The Network Is Secure"


Peter Deutsch introduced the Distributed Computing Fallacies back in 1991. You'd think that in the 15 years since then that "the Network is secure" would no longer be a fallacy.

Unfortunately, that's not the case--and not because the network is now secure. No one would be naive enough to assume it is. Nevertheless, a few days ago I began writing a report about a middleware product some vendor tried to inflict on us that has no regard whatsoever to security! Well that is just anecdotal evidence, however.

Statistics published at Aladdin.com shows that:

For 52% of the networks the perimeter is the only defense

According to Preventsys and Qualys, 52% of chief information security officers acknowledged having a "Moat & Castle" approach to their overall network security . They admitted that once the perimeter security is penetrated, their networks are at risk. Yet, 48% consider themselves to be "proactive" when it comes to network security and feel that they have a good grasp on their enterprise's security posture. 24% felt their security was akin to Fort Knox (it would take a small army to get through), while 10% compared their network security to Swiss cheese (security holes inside and out). The remaining 14% of respondents described their current network security as being locked down on the inside, but not yet completely secured to the outside. Preventsys and Qualys also found that 46% of security officers spend more than a third of their day, and in some cases as much as 7 hours, analyzing reports generated from their various security point solutions.

In case you just landed from another planet the network is far from being secured. Here are few statistics to illustrate that:

Through the continual 24x7 monitoring of hundreds of Fortune 1000 companies, Riptech has dicovered several extremely relevant trends in information security. Among them:


  1. General Internet attack trends are showing a 64% annual rate of growth
  2. The average company experienced 32 attacks per week over the past 6 months
  3. Attacks during weekdays increased in the past 6 months" (from RipTech, July 8, 2002).

When I tried to find some updated incident statistics, I came up with the following (from CERT ):

Note: Given the widespread use of automated attack tools, attacks against Internet-connected systems have become so commonplace that counts of the number of incidents reported provide little information with regard to assessing the scope and impact of attacks. Therefore, as of 2004, we will no longer publish the number of incidents reported. Instead, we will be working with others in the community to develop and report on more meaningful metrics" (the number of incidents for 2003 was 137539 incidents...)

Lastly Aladdin claims that the costs of Malware for 2004 (Viruses, Worms, Trojans etc.) are estimated between $169 billion and $204 billion.

The implications of network (in)security are obvious--you need to build security into your solutions from Day 1. I mentioned in a previous blog post that security is a system quality attribute that needs to be taken into consideration starting from the architectural level. There are dozens of books that talk about security and I cannot begin to delve into all the details in a short blog post.

In essence you need to perform threat modeling to evaluate the security risks. Then following further analyses decide which risk are should be mitigated by what measures (a tradeoff between costs, risks and their probability). Security is usually a multi-layers solution that is handled on the network, infrastructure, and application levels.

As an architect you might not be a security expert--but you still need to be aware that security is needed and the implications it may have (for instance, you might not be able to use multicast, user accounts with limited privileges might not be able to access some networked resource, etc.)

Posted by Arnon Rotem-Gal-Oz at 07:08 AM  Permalink |


May 10, 2006

Distributed Computing Fallacies Explained: "Bandwidth Is Infinite"


The next Distributed Computing Fallacy is "Bandwidth Is Infinite." This fallacy, in my opinion, is not as strong as the others. If there is one thing that is constantly getting better in relation to networks it is bandwidth.

However, there are two forces at work to keep this assumption a fallacy. One is that while the bandwidth grows, so does the amount of information we try to squeeze through it. VoIP, videos, and IPTV are some of the newer applications that take up bandwidth. Downloads, richer UIs, and reliance on verbose formats (XML) are also at work--especially if you are using T1 or lower lines. However, even when you think that this 10Gbit Ethernet would be more than enough, you may be hit with more than 3 Terabytes of new data per day (numbers from an actual project).

The other force at work to lower bandwidth is packet loss (along with frame size). This quote from http://sd.wareonearth.com/~phil/jumbo.html underscores this point very well:

In the local area network or campus environment, rtt and packet loss are both usually small enough that factors other than the above equation set your performance limit (e.g. raw available link bandwidths, packet forwarding speeds, host CPU limitations, etc.). In the WAN however, rtt and packet loss are often rather large and something that the end systems can not control. Thus their only hope for improved performance in the wide area is to use larger packet sizes.

Let's take an example: New York to Los Angeles. Round Trip Time (rtt) is about 40 msec, and let's say packet loss is 0.1% (0.001). With an MTU of 1500 bytes (MSS of 1460), TCP throughput will have an upper bound of about 6.5 Mbps! And no, that is not a window size limitation, but rather one based on TCP's ability to detect and recover from congestion (loss). With 9000 byte frames, TCP throughput could reach about 40 Mbps.

Or let's look at that example in terms of packet loss rates. Same round trip time, but let's say we want to achieve a throughput of 500 Mbps (half a "gigabit"). To do that with 9000 byte frames, we would need a packet loss rate of no more than 1x10^-5. With 1500 byte frames, the required packet loss rate is down to 2.8x10^-7! While the jumbo frame is only 6 times larger, it allows us the same throughput in the face of 36 times more packet loss.

Acknowledging the bandwidth is not infinite has a balancing effect on the implications of the the "Latency Is Zero" fallacy; that is, if acting on the realization the latency is not zero we modeled few large messages. Bandwidth limitations direct us to strive to limit the size of the information we send over the wire.

The main implication then is to consider that in the production environment of our application there may be bandwidth problems which are beyond our control. And we should bare in mind how much data is expected to travel over the wise.

The recommendation I made in my previous post--to try to simulate the production environment--holds true here as well.

Posted by Arnon Rotem-Gal-Oz at 06:50 AM  Permalink |


May 08, 2006

Distributed Computing Fallacies Explained: "Latency is Zero"


The second fallacy of Distributed Computing is the assumption that "Latency is Zero". Latency is how much time it takes for data to move from one place to another (versus bandwidth which is how much data we can transfer during that time). Latency can be relatively good on a LAN--but latency deteriorate quickly when you move to WAN scenarios or internet scenarios.

Latency is more problematic than bandwidth. Here's a quote from a post by Ingo Rammer on latency vs. Bandwidth that illustrates this:

But I think that it’s really interesting to see that the end-to-end bandwidth increased by 1468 times within the last 11 years while the latency (the time a single ping takes) has only been improved tenfold. If this wouldn’t be enough, there is even a natural cap on latency. The minimum round-trip time between two points of this earth is determined by the maximum speed of information transmission: the speed of light. At roughly 300,000 kilometers per second (3.6 * 10E12 teraangstrom per fortnight), it will always take at least 30 milliseconds to send a ping from Europe to the US and back, even if the processing would be done in real time.

You may think all is okay if you only deploy your application on LANs. However even when you work on a LAN with Gigabit Ethernet you should still bare in mind that the latency is much bigger then accessing local memory Assuming the latency is zero you can be easily tempted to assume making a call over the wire is almost like making a local calls--this is one of the problems with approaches like distributed objects, that provide "network transparency"--alluring you to make a lot of fine grained calls to objects which are actually remote and expensive (relatively) to call to.

Taking latency into consideration means you should strive to make as few as possible calls and assuming you have enough bandwidth (which will talk about next time) you'd want to move as much data out in each of this calls. There is a nice example illustrating the latency problem and what was done to solve it in Windows Explorer in here.

Another example is AJAX. The AJAX approach allows for using the dead time the users spend digesting data to retrieve more data - however, you still need to consider latency. Let's say you are working on a new shiny AJAX front-end--everything looks just fine in your testing environment. It also shines in your staging environment passing the load tests with flying colors. The application can still fail miserably on the production environment if you fail to test for latency problems--retrieving data in the background is good but if you can't do that fast enough the application would still stagger and will be unresponsive.… (You can read more on AJAX and latency here.)

You can (and should) use tools like Shunra Virtual Enterprise, Opnet Modeler and many others to simulate network conditions and understand system behavior thus avoiding failure in the production system.


Posted by Arnon Rotem-Gal-Oz at 08:22 AM  Permalink |


May 05, 2006

Distributed Computing Fallacies Explained: "The Network Is Reliable"


The first fallacy is "The network is reliable." Why is this a fallacy? Well, when was the last time you saw a switch fail? After all, even basic switches these days have MTBFs (Mean Time Between Failure) in the 50,000 operating hours and more.

If you application is a mission critical 365x7 kind of application, you can just hit that failure--and Murphy will make sure it happens in the most inappropriate moment. Nevertheless, most applications are not like that. So what's the problem?

Well, there are plenty of problems: Power failures, someone trips on the network cord, all of a sudden clients connect wirelessly, and so on. If hardware isn't enough--the software can fail as well, which it does.

The situation is more complicated if you collaborate with an external partner, such as an e-commerce application working with an external credit-card processing service. Their side of the connection is not under your direct control. Lastly there security threats like DDOS attacks and the like.

What does that mean for your design?

On the infrastructure side, you need to think about hardware and software redundancey and weigh the risks of failure versus the required investment.

On the software side, you need to think about messages/calls getting lost whenever you send a message/make a call over the wire. For one you can use a communication medium that supplies full reliable messaging; WebsphereMQ or MSMQ, for example. If you can't use one, prepare to retry, acknowledge important messages, identify/ignore duplicates (or use idempotent messages), reorder messages (or not depend on message order), verify message integrity, and so on.

One note regarding WS-ReliableMessaging: The specification supports several levels of message guarantee--most once, at least once, exactly once and orders. You should remember though that it only takes care of delivering the message as long as the endnodes are up and running, it doesn’t handle persistency and you still need to take care of that (or use a vendor solution that does that for you) for a complete solution.

To sum up, the network is Unreliable and we as software architect/designers need to address that.

Posted by Arnon Rotem-Gal-Oz at 09:33 AM  Permalink |


May 03, 2006

The Fallacies of Distributed Computing


I heard about the 8 fallacies of distributed computing few years ago. Last year I finally tracked them down online at James Gosling's site. The fallacies, attributed to Peter Deutsch, were coined long before that--back in 1994, in fact.

  1. The network is reliable.
  2. Latency is zero.
  3. Bandwidth is infinite.
  4. The network is secure.
  5. Topology doesn't change.
  6. There is one administrator.
  7. Transport cost is zero.
  8. The network is homogeneous.

As the short preface to the fallacies on Gosling's site says, these are assumptions that almost anyone starting to build distributed systems is tempted to make. Yet all of them prove to be wrong in the long run--resulting in all sorts of troubles and pains for the solution and architects who made the assumptions. While (thankfully) I didn't assume all of them on the first couple of distributed systems I designed, I did assume some of them (like transport cost is zero and the network is reliable). To help you avoid this trouble, I will expand on each fallacy in the next few posts.

By the way, you may also want to take a look at some of the realities of distributed computing that I posted last year.

Posted by Arnon Rotem-Gal-Oz at 10:43 AM  Permalink |


May 02, 2006

Refactoring


Refactoring has become one of the most abused terms I know--right up there with SOA (which I'll probably talk about in another post).

More often than not, I hear people using the term "refactoring" as a sexy synonym for "reworking", as in redesign, rearchitecting, or even completely rewriting.

The original intent of refactoring is best described by Martin Fowler (emphasis in bold added by me):

"Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior. Its heart is a series of small behavior preserving transformations. Each transformation (called a 'refactoring') does little, but a sequence of transformations can produce a significant restructuring. Since each refactoring is small, it's less likely to go wrong. The system is also kept fully working after each small refactoring, reducing the chances that a system can get seriously broken during the restructuring."

This is far from your typical rework--not in the sense that you cannot do major redesigns, but in the sense that you use "baby steps" to do that in order not to break the application by introducing too much change at once. Refactoring is a process, a technique--not a redesign.

By the way, TDD really shines when combined with refactoring, as the unit tests work as regression tests, assuring that indeed the system doesn't brake from change to change.

Common refactorings include adding a parameter, renaming classes, extracting methods, and the like. Many of these refactorings are supported by modern IDEs; see "Refactoring with Eclipse or or "Refactoring with Visual Studio 2005's Document Outline Window".

I do wonder, however, why the refactoring catalog on Fowler's site also features major changes like making a solution tiered as refactorings. Making a change like this is bound to have significant impact on the application. When you cross a tier boundary you can't treat code the same way you treat it when it is local--or you start messing with the fallacies of distributed computing (a topic I'll talk about next time)

Posted by Arnon Rotem-Gal-Oz at 06:29 AM  Permalink |


May 01, 2006

TechEd: Iron Architect Contest


If you are going to attend the upcoming Microsoft TechEd in Boston in June, you may want to check out the Iron Architect Contest.

According to the blog, the object is to provide the best answers to a series of technical questions. A scenario is described, and you must answer a question about it:

"As chief software architect, Bob has hired you to architect a system that will harmonize the customer data, resolve the conflicts, scale across multiple regions, and create a user experience that fits into the lifestyle of the employees."

While the idea sounds nice, I think the question is too synthetic and simplified. A truer scenario would require much more detail (which I guess participants will make up with assumptions).

Maybe on the next contest they would appoint someone to act as a customer representative to reply to those who would ask for more information. I think this would make this much more interesting :).

Nevertheless, the prize is a free certificate for Microsoft Certified Architect --a $10,000 value--so again, if you are attending you may want to play.

Posted by Arnon Rotem-Gal-Oz at 04:12 AM  Permalink |



October 2007
Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      


BLOGROLL
 
INFO-LINK


Related Sites: DotNetJunkies, SD Expo, SqlJunkies