Tuesday, September 27, 2016

Database Schema Migration

Some thoughts: http://workingwithdevs.com/delivering-databases-migrations-vs-state/

This covers a lot of ground on the Declarative vs. Procedural question. It explains a lot of the considerations that lead to choosing a procedural schema evolution vs. a declarative schema with an implied change sequence to migrate to each new declared state.

The article calls the declarative "state-based" and procedural approach "migration-based".

My 2¢ are focused on this point: 
When using a state-based solution you will most often be using a diff tool like those provided by Redgate or Visual Studio to examine the differences and generate an upgrade script. While this is a very efficient solution for most changes, with table renames and a few other types of table refactoring they can do bad things, ...
This point about table refactoring is, for me, the show-stopper. Relational theory tells me that I can map any schema to any other schema using selection, projection, and join. I can denormalize data and I can normalize again via group-by clauses. I can reduce the original schema to a sequence of object-attribute-value triples, and restructure this into any desired new schema. 

Given enough time, a change tracking tool should be able to find a minimal-cost transformation from schema to schema. This might involve a complex search over a large state space, and it certainly involves creating costs for each alternative query plan. 

Pragmatically, I'm not sold on this being a good idea. And I'm rarely sure I even want to get involved in a fully automated solution. While a tool might be able to detect and automate a variety of simple changes, I think that developers must always vet those change scripts.

In particular, the search space is emphatically not limited to select, project, and join. There are also database unload-reload, index create and drop. There are even more complex operations like creating intermediate results which aren't part of the final database structure. With proper indices, these might actually be beneficial.

In some cases, the continuous operation requirements are such that we might have two copies of a database: one being used and the other being transformed. A logger tracks transactions in the older copy and a synchronizer replicates those transactions in the new copy. After the data is moved, the customer access is moved via a feature toggle from the old database to the new database.

Semantic Drift

Also important is the issue of semantic drift. When we're making structural changes where the "before" column names match the "after" column names, then there's little chance for semantic drift. There's still some possibility, though. We can (and sometimes do) repurpose columns, preserving the original name. In some cases, we might change a database constraint without renaming the column.

In the larger case, of course, it doesn't require "‘hot-fix’ changes to QA or even production databases" to create profound semantic changes. All it takes is an app developer deciding that a column should be repurposed. There's may be no structural change on the schema overall. 

A non-structural change in some past release could have implications for structural change in a future release. Imagine three columns in three tables with the same names. Two started out life as simple foreign keys to the third. But one became optional, and now the semantics don't match but the names do. Automated tools are unlikely to discern the intent here. 

Conclusion?

It's all procedural migration. I'm not declarative ("state") tools can be trusted beyond discerning the changes and suggesting a possible migration.

Wednesday, September 21, 2016

Bad Trends and Sloppy Velocity

Read this: https://www.linkedin.com/pulse/story-points-evil-brad-black-in-the-market-

There are good quotes from Ron Jeffries on the worthlessness of story points. (I've heard this from other Agile Consultants, also.) Story Points are a political hack to make management stop measuring the future.

The future is very hard to measure. The difficulty in making predictions is one of the things which distinguishes the future from the past. There's entropy, and laws of thermodynamics, and random quantum events that make it hard to determine exactly what might happen next.

If Schrödinger's cat is alive, we'll deliver this feature in this sprint. If the cat is dead, the feature will be delayed. The unit test result is entangled with the photon that may (or may not) have killed the cat. If you check the unit tests, then the future is determined. If you don't check the unit test, the future can be in any state.

When project management becomes Velocity Dashboard Centered (VDC™), that's a hint that something may be wrong.

My suspicion on the cause of VDC?

Product Owners ("Management") may have given up on Agility, and want to make commitments to a a schedule they made up on a whiteboard at an off-site meeting. Serious Commitments. The commitment has taken on a life of its own, and deliverable features aren't really being prioritized. The cadence or tempo has overtaken the actual deliverable.

It feels like the planning has slipped away from story-by-story details.

What can cause VDC?

I think it might be too many layers of management. The PO's boss (or boss's boss) has started dictating some kind of delivery schedule that is uncoupled from reality. The various bosses up in the stratosphere are making writing checks their teams can't cash.

What can we do?

I don't know. It's hard to provide schooling up the food chain to the boss of the boss of the product owner. It's hard to explain to the scrum master that the story points don't much matter, since the stories exist independent of any numbering scheme.

The link above says that there's some value in ordering the stories; assigning random-ish point numbers somehow helps order them.

I reject the last part of this. The stories can be ordered without any numbers. The Agile manifesto is pretty clear on this point: talk about it. The points don't enhance the conversation. Push the story cards around on the board until you have something meaningful. Assigning numbers is silliness.

Actually. I think its harmful.

Rolling numbers up to "senior" management isn't facilitating a conversation. It's summarizing things with empty numerosity. ("Numerosity"? Yes. Empty numerosity: applying numeric methods inappropriately. For example, averaging the day of the week on which it rains, for example, to discover something about Wednesday.)

The Best Part (TBP™)

Irreproducibility.

On Project X, we had a velocity of 50 Story Points per Sprint. On Project U, the "same" team -- someone quit and two new people were hired -- had a velocity of 100 Story Points per Sprint. Wow! Right?

Except. Of course, the numbers were inflated because the existing folks figured the new folks would take longer to get things done. But the new folks didn't take longer. And now the team is stuck calling a 3-point story a 5-point story because the new guys are calibrated to that new range of random numbers.

So what's comparable between the "same" team on two projects? It's not actually the same people. That's out.

We can try to pretend that the projects have the "same" technology (Java 1.6 v. Java 1.8) or the same CI/CD pipeline (Ant v. Maven, Hudson v. Jenkins) or even the same overall enterprise. But practically, there's nothing repeatable about any of it. It's just empty numerosity.

Sorry. I'm not sold, yet, on the value of Story Points.

Tuesday, September 20, 2016

What was I thinking?

Check out this idiocy: https://github.com/slott56/py-false

What is the point? Seriously. What. The. Actual. Heck?

I think of it this way.

  • Languages are a cool thing. Especially programming languages where there's an absolute test -- the Turing machine -- for completeness. 
  • The Forth-like stack language is a cool thing. I've always liked Forth because if it's elegant simplicity. 
  • The use of a first-class lambda construct to implement if and while is particularly elegant.
  • Small languages are fun because they can be understood completely. There are no tricky edge cases in the semantics.
I have ½ of a working GW-Basic implementation in Python, too. It runs. It runs some programs like HamCalc sort of okay-ish. I use it to validate assumptions about the legacy code in https://github.com/slott56/HamCalc-2.1.  Some day, I may make a sincere effort to get it working.

Even languages like the one that supports the classic Adventure game are part of this small language fascination. See adventure.pdf for a detailed analysis of this game; this includes the little language you use to interact with the game.

Tuesday, September 13, 2016

On One Aspect of Design Patterns -- Flexibility

Something I forget to think about is the degree of detail or granularity of design patterns.  I have my own viewpoint and I often assume that others share it.

Here's a quote from an email describing the PLoP (Pattern Languages of Programs) patterns as quite distinct from the Gang of Four (Design Patterns: Elements of Reusable Object-Oriented Software) patterns.
In the main, the PLoP patterns are less granular than the persnickety GoF
"Design Patterns." (Classic GoF, in part, static type binding work-arounds. And
you need to talk about a "facade" pattern? Really? Although see Fowler's at it
again, coining a ™ term - "fluent API" - for Some Not Egregiously Stupid
Practice, to feed to the credulous who have never reflected on what they are doing.)
Cutting through the editorializing, the author is describing two families.

  • GoF patterns that are essentially ways to cope with static type checking in Java and C++.
  • PLoP patterns which are a little more generic and more widely applicable.

More...
"Plug-in Pattern" is a nice example. Enumerates the stuff you kinda know, with
qualities / attributes of its proposal, plus application samples / outcomes of
applying the pattern. The claims to relevance throughout are reminiscent of the
investigation behind Parnas' "Criteria for Decomposing Systems into Modules."
My habit is to assume this is pretty widely known. I assume everyone has wrestled with design patterns large and small and found that some of the GoF apply to Python, but the implementation details will differ. Dramatically. 

Look at the Singleton design pattern, for example. The concept is profound. There are times when we want stateful, global, Singleton instances. The Java or C++ technique of a small factory method which returns the one-and-only instance (or creates the one-and-only instance in the rare edge case) is extremely strange in Python. We can implement it. But why?

Module objects in Python are stateful singletons. Rather than invent a Singleton class, we can -- trivially -- just use a module. And we're done. Problem solved. No Code Written.

The email served as a reminder that sometimes people aren't quite so flexible in their understanding of design patterns. I need to cut them some slack and guide them to seeing that there's wiggle room there. The email reminds me that some people feel compelled to either follow the GoF prescription or discard the GoF entirely. The reminder about PLoP and other pattern languages is a helpful reminder to be more flexible.

The point here is that patterns are a concept. Not a law.

Tuesday, August 30, 2016

Obscure Standards, Packaged Products, Latent Bugs

Read this: http://jeffq.com/blog/the-ethernet-pause-frame/

Fascinating.

A world of interconnected devices in which we place a kind of implicit trust. There's little visibility for ordinary consumers. It takes a skilled specialist to determine that there are flaws in a product.

It's not that the system is "flaky."

It's that this combination of components each with unexpected edge-case behavior is actually broken.

Tuesday, August 23, 2016

On Generator Functions, Yield and Return

Here's the question, lightly edited to remove the garbage. (Sometimes I'm charitable and call it "rambling". Today, I'm not feeling charitable about the garbage writing style filled with strange assumptions instead of questions.)

someone asked if you could have both a yield and a return in the same ... function/iterator. There was debate and the senior people said, let's actually write code. They wrote code and proved that couldn't have both a yield and a return in the same ... function/iterator. .... 
The meeting moved on w/out anyone asking the why question. Why doesn't it make sense to have both a yield and a return. ...

The impact of the yield statement can be confusing. Writing code to mess around with it was somehow unhelpful. And the shocking "proved that couldn't have both a yield and a return in the same ... function" is a serious problem.

(Or a seriously incorrect summary of the conversation; a very real possibility considering the garbage-encrusted email. Or a sign that Python 3 isn't widely-enough used and the emil omitted this essential fact. And yes, I'm being overly sensitive to the garbage. But there's a better way to come to grips with reality and it involves asking questions and parsing details instead of repeating assumptions and writing garbage.)

An example


>>> def silly(n, stop=None):
 for i in range(n):
  if i == stop: return
  yield i

  
>>> list(silly(5))
[0, 1, 2, 3, 4]
>>> list(silly(5, stop=3))
[0, 1, 2]

This works in both Python 3.5.1 and 2.7.10.

Some discussion

A definition with no yield is a conventional function: the parameters from some domain are mapped to a return value in some range. Each mapping is a single evaluation of the function with concrete argument values.

A definition with a yield statement becomes an iterable generator of (potentially) multiple values. The return statement changes its behavior slightly. It no longer defines the one (and only) return value. In a generator function (one that has a yield) the return statement can be thought of as if it raised the StopIteration exception as a way to exit from the generator.

As can be seen in the example above, both statements are in one function. They both work to provide expected semantics.

The code which gets an error is this:

>>> def silly(n, stop=3):
...     for i in range(n):
...         if i == step: return "boom!"
...         yield i


The "why?" question is should -- perhaps -- be obvious at this point.  The return raises an exception; it doesn't provide a value.

The topic, however, remains troubling. The phrase "have both a yield and a return" is bothersome because it fails to recognize that the yield statement has a special role. The yield statement transforms the semantics of the function to make it into a different object with similar syntax.

It's not a matter of having them "both". It's matter of having a return in a generator. This is an entirely separate and trivial-to-answer question.

A Long Useless Rant

The email seems to contain an implicit assumption. It's the notion that programming language semantics are subtle and slippery things. And even "senior people" can't get it right. Because all programming languages (other then the email sender's personal favorite) are inherently confusing. The confusion cannot be avoided.

There are times when programming language semantics are confusing.  For example, the ++ operator in C is confusing. Nothing can be done about that. The original definition was tied to the PDP-11 machine instructions. Since then... Well.... Aspects of the generated code are formally undefined.  Many languages have one or more places where the semantics are "undefined" or only defined by example.

This is not one of those times.

Here's the real problem I have with the garbage aspect of the email.

If you bring personal baggage to the conversation -- i.e., assumptions based on a comparison between some other language and Python -- confusion will erupt all over the place. Languages are different. Concepts don't map from language to language very well. Yes, there are simple abstract principles which have different concrete realizations in different languages. But among the various concrete realizations, there may not be a simple mapping.

It's essential to discard all knowledge of all previous favorite programming languages when learning a new language.

I'll repeat that for the author of the email.

Don't Go To The Well With A Full Bucket.

You won't get anything.

In this specific case, the notion of "function" in Python is expanded to include two superficially similar things. The syntax is nearly identical. But the behaviors are remarkably different. It's essential to grasp the idea that the two things are different, and can't be casually lumped together as "function/iterator".

The crux of the email appears to be a failure to get the Python language rules in a profound way.