Tuesday, October 18, 2016

Optimizing complex generator expressions [Updated]

See this: https://twitter.com/jakevdp/status/786920174595158018

The core expression is similar to this

y = (f(x) for x in L if f(x) is not None)

There are a lot of variations on the filter. The point is that the function appears twice in the above expression.

We have a number of alternatives.

  • y = filter(None, f(x) for x in L)
  • y = filter(None, map(f, L))
  • y = (x for x in map(f, L) if x)
  • y = (x for x in (f(y) for y in L) if x is not None)
  • y = (val for x in L for val in (f(x),) if val is not None)

My preference is two steps, even though I don't really have a good reason for this.

y1 = (f(x) for x in L)
y2 = (f for f in y1 if f)

The thread leads to this path: https://twitter.com/TomAugspurger/status/786922167522828289  and the idea of "Let Bindings." We could extend the language slightly to bind a variable within the confines of the generator expression.

Like this:

y = (f(x) as val for x in L if val is not None)

The as clause binds the f(x) to val so that it can be used in the if clause.

Summary: Interesting.

Tuesday, October 4, 2016

Alternatives to PowerPoint (or Keynote) for Presentations

This: https://opensource.com/business/16/9/alternatives-powerpoint

Missing from the list? The S5-based slide-show tools that are part of docutils.

The only issue with S5 is that you need to carefully review each and every page to be sure you material fits. There's no autosizing of the fonts, or other trickery to pack too much trash onto a slide.

TIL that Lync/Skype doesn't politely handle Keynote on a Mac. You must mirror displays because Lync can only share one of your displays, and it's not the one Keynote chooses to display on.

I often use Keynote because it's expedient. I sometimes use S5 to show off an entirely open-source toolchain.

Tuesday, September 27, 2016

Database Schema Migration

Some thoughts: http://workingwithdevs.com/delivering-databases-migrations-vs-state/

This covers a lot of ground on the Declarative vs. Procedural question. It explains a lot of the considerations that lead to choosing a procedural schema evolution vs. a declarative schema with an implied change sequence to migrate to each new declared state.

The article calls the declarative "state-based" and procedural approach "migration-based".

My 2¢ are focused on this point: 
When using a state-based solution you will most often be using a diff tool like those provided by Redgate or Visual Studio to examine the differences and generate an upgrade script. While this is a very efficient solution for most changes, with table renames and a few other types of table refactoring they can do bad things, ...
This point about table refactoring is, for me, the show-stopper. Relational theory tells me that I can map any schema to any other schema using selection, projection, and join. I can denormalize data and I can normalize again via group-by clauses. I can reduce the original schema to a sequence of object-attribute-value triples, and restructure this into any desired new schema. 

Given enough time, a change tracking tool should be able to find a minimal-cost transformation from schema to schema. This might involve a complex search over a large state space, and it certainly involves creating costs for each alternative query plan. 

Pragmatically, I'm not sold on this being a good idea. And I'm rarely sure I even want to get involved in a fully automated solution. While a tool might be able to detect and automate a variety of simple changes, I think that developers must always vet those change scripts.

In particular, the search space is emphatically not limited to select, project, and join. There are also database unload-reload, index create and drop. There are even more complex operations like creating intermediate results which aren't part of the final database structure. With proper indices, these might actually be beneficial.

In some cases, the continuous operation requirements are such that we might have two copies of a database: one being used and the other being transformed. A logger tracks transactions in the older copy and a synchronizer replicates those transactions in the new copy. After the data is moved, the customer access is moved via a feature toggle from the old database to the new database.

Semantic Drift

Also important is the issue of semantic drift. When we're making structural changes where the "before" column names match the "after" column names, then there's little chance for semantic drift. There's still some possibility, though. We can (and sometimes do) repurpose columns, preserving the original name. In some cases, we might change a database constraint without renaming the column.

In the larger case, of course, it doesn't require "‘hot-fix’ changes to QA or even production databases" to create profound semantic changes. All it takes is an app developer deciding that a column should be repurposed. There's may be no structural change on the schema overall. 

A non-structural change in some past release could have implications for structural change in a future release. Imagine three columns in three tables with the same names. Two started out life as simple foreign keys to the third. But one became optional, and now the semantics don't match but the names do. Automated tools are unlikely to discern the intent here. 


It's all procedural migration. I'm not declarative ("state") tools can be trusted beyond discerning the changes and suggesting a possible migration.

Wednesday, September 21, 2016

Bad Trends and Sloppy Velocity

Read this: https://www.linkedin.com/pulse/story-points-evil-brad-black-in-the-market-

There are good quotes from Ron Jeffries on the worthlessness of story points. (I've heard this from other Agile Consultants, also.) Story Points are a political hack to make management stop measuring the future.

The future is very hard to measure. The difficulty in making predictions is one of the things which distinguishes the future from the past. There's entropy, and laws of thermodynamics, and random quantum events that make it hard to determine exactly what might happen next.

If Schrödinger's cat is alive, we'll deliver this feature in this sprint. If the cat is dead, the feature will be delayed. The unit test result is entangled with the photon that may (or may not) have killed the cat. If you check the unit tests, then the future is determined. If you don't check the unit test, the future can be in any state.

When project management becomes Velocity Dashboard Centered (VDC™), that's a hint that something may be wrong.

My suspicion on the cause of VDC?

Product Owners ("Management") may have given up on Agility, and want to make commitments to a a schedule they made up on a whiteboard at an off-site meeting. Serious Commitments. The commitment has taken on a life of its own, and deliverable features aren't really being prioritized. The cadence or tempo has overtaken the actual deliverable.

It feels like the planning has slipped away from story-by-story details.

What can cause VDC?

I think it might be too many layers of management. The PO's boss (or boss's boss) has started dictating some kind of delivery schedule that is uncoupled from reality. The various bosses up in the stratosphere are making writing checks their teams can't cash.

What can we do?

I don't know. It's hard to provide schooling up the food chain to the boss of the boss of the product owner. It's hard to explain to the scrum master that the story points don't much matter, since the stories exist independent of any numbering scheme.

The link above says that there's some value in ordering the stories; assigning random-ish point numbers somehow helps order them.

I reject the last part of this. The stories can be ordered without any numbers. The Agile manifesto is pretty clear on this point: talk about it. The points don't enhance the conversation. Push the story cards around on the board until you have something meaningful. Assigning numbers is silliness.

Actually. I think its harmful.

Rolling numbers up to "senior" management isn't facilitating a conversation. It's summarizing things with empty numerosity. ("Numerosity"? Yes. Empty numerosity: applying numeric methods inappropriately. For example, averaging the day of the week on which it rains, for example, to discover something about Wednesday.)

The Best Part (TBP™)


On Project X, we had a velocity of 50 Story Points per Sprint. On Project U, the "same" team -- someone quit and two new people were hired -- had a velocity of 100 Story Points per Sprint. Wow! Right?

Except. Of course, the numbers were inflated because the existing folks figured the new folks would take longer to get things done. But the new folks didn't take longer. And now the team is stuck calling a 3-point story a 5-point story because the new guys are calibrated to that new range of random numbers.

So what's comparable between the "same" team on two projects? It's not actually the same people. That's out.

We can try to pretend that the projects have the "same" technology (Java 1.6 v. Java 1.8) or the same CI/CD pipeline (Ant v. Maven, Hudson v. Jenkins) or even the same overall enterprise. But practically, there's nothing repeatable about any of it. It's just empty numerosity.

Sorry. I'm not sold, yet, on the value of Story Points.

Tuesday, September 20, 2016

What was I thinking?

Check out this idiocy: https://github.com/slott56/py-false

What is the point? Seriously. What. The. Actual. Heck?

I think of it this way.

  • Languages are a cool thing. Especially programming languages where there's an absolute test -- the Turing machine -- for completeness. 
  • The Forth-like stack language is a cool thing. I've always liked Forth because if it's elegant simplicity. 
  • The use of a first-class lambda construct to implement if and while is particularly elegant.
  • Small languages are fun because they can be understood completely. There are no tricky edge cases in the semantics.
I have ½ of a working GW-Basic implementation in Python, too. It runs. It runs some programs like HamCalc sort of okay-ish. I use it to validate assumptions about the legacy code in https://github.com/slott56/HamCalc-2.1.  Some day, I may make a sincere effort to get it working.

Even languages like the one that supports the classic Adventure game are part of this small language fascination. See adventure.pdf for a detailed analysis of this game; this includes the little language you use to interact with the game.

Tuesday, September 13, 2016

On One Aspect of Design Patterns -- Flexibility

Something I forget to think about is the degree of detail or granularity of design patterns.  I have my own viewpoint and I often assume that others share it.

Here's a quote from an email describing the PLoP (Pattern Languages of Programs) patterns as quite distinct from the Gang of Four (Design Patterns: Elements of Reusable Object-Oriented Software) patterns.
In the main, the PLoP patterns are less granular than the persnickety GoF
"Design Patterns." (Classic GoF, in part, static type binding work-arounds. And
you need to talk about a "facade" pattern? Really? Although see Fowler's at it
again, coining a ™ term - "fluent API" - for Some Not Egregiously Stupid
Practice, to feed to the credulous who have never reflected on what they are doing.)
Cutting through the editorializing, the author is describing two families.

  • GoF patterns that are essentially ways to cope with static type checking in Java and C++.
  • PLoP patterns which are a little more generic and more widely applicable.

"Plug-in Pattern" is a nice example. Enumerates the stuff you kinda know, with
qualities / attributes of its proposal, plus application samples / outcomes of
applying the pattern. The claims to relevance throughout are reminiscent of the
investigation behind Parnas' "Criteria for Decomposing Systems into Modules."
My habit is to assume this is pretty widely known. I assume everyone has wrestled with design patterns large and small and found that some of the GoF apply to Python, but the implementation details will differ. Dramatically. 

Look at the Singleton design pattern, for example. The concept is profound. There are times when we want stateful, global, Singleton instances. The Java or C++ technique of a small factory method which returns the one-and-only instance (or creates the one-and-only instance in the rare edge case) is extremely strange in Python. We can implement it. But why?

Module objects in Python are stateful singletons. Rather than invent a Singleton class, we can -- trivially -- just use a module. And we're done. Problem solved. No Code Written.

The email served as a reminder that sometimes people aren't quite so flexible in their understanding of design patterns. I need to cut them some slack and guide them to seeing that there's wiggle room there. The email reminds me that some people feel compelled to either follow the GoF prescription or discard the GoF entirely. The reminder about PLoP and other pattern languages is a helpful reminder to be more flexible.

The point here is that patterns are a concept. Not a law.