Reliable Backup is hard to do

I believe in triple copies after working with Cosmos at Bing and other similar services and reading recommendations and descriptions of backup schemes.

So at home, we have 2 disk copies and a cloud copy of every file.
So I thought I was covered.   Not quite.
We have a lot of old static files (photos and videos and old docs) that we keep on a read only disc and a backup on a disk kept in our basement.
For active files, we used a mirrored (two 3TB disks) from Buffalo.

Well, the Buffalo HD-WL6TU3R1 device failed.  It wouldn’t boot.  When turned on, both disk access lights blink red for a few seconds and then shut off.   The manual doesn’t describe this diagnostic code and contacting Buffalo was useless. They just told us we are out of warranty.

My recommendations:

  • Don’t buy Buffalo
  • If you do buy Buffalo, toss it immediately after the warranty period, because it is useless.

My suspicion is that the controller in the device, a Single Point of Failure, failed.

No problem, I thought, I removed the individual drives and using my Coolmax multifunction converter, I temporarily hooked up the drives individually to the PC via USB.  They were readable, but it turns out about a third of the data when being copied produced
“ERROR 87 (0x00000057) Copying File . . . The parameter is incorrect”
Tried running chkdsk to repair the disk, but it failed with several errors.

So now 2 of our 3 copies are incomplete.   No problem, we use Backblaze.

In general, I love Backblaze and have recommended it to all of my family and friends because it is truly unlimited backup, reasonably quick, doesn’t seem to slow down our systems, and works smoothly, quietly, automatically behind the scenes.   The annual price is extremely reasonable for backing up our current 6TB of data.

I have previously restored from Backblaze a few small files while I was travelling and realized the file I needed was on a computer back home.
I also previously restored my 200GB music library with a series of restores over just a couple of days using downloaded zip files.

We replaced our dead Buffalo 3TB mirror with 8TB mirror (two 8TB disks) WD MyBook Duo.   Now we have room to grow as GoPro 4K Videos dump 27GB/hour (instead of HD 9GB/hour).

But now we had to restore the 2TB of data we had mirrored.
Backblaze provides a very reasonable alternative, for $189 refundable, they FedEx us a disk.  We ship the disk back in 30 days and they refund the $189.  We just pay shipping back!

Only 2 problems:

  • It took over a week for Backblaze to “gather” the data and then to “build” the disk, before shipping it. Backblaze recommends doing online .zip restores for files you need during the week.   We had to do a few.
  • Backblaze keeps your “data”, but not the meta-data, specifically the timestamps on the files. So all of your files lose their date and become created, modified, and accessed at exactly the same time – when the disk is built to send you.

I find the lack of timestamps screws up a lot of things for us.  We can’t tell when a picture or video was taken unless it happens to have it inside the metadata of the file (and most of our oldest pictures do not).    Many of our documents for business have different versions and to know when a tax document, corporate motion, or other such file was created or modified is useful.

So Backblaze gives us our data, but not our timepstamps.   Beware.

Ultimately we chose a a multi-step solution:

  • Restore those files I had made a 4th local offline copy of 8 months ago.
  • Restore those files we could read off the broken Buffalo raw disks.
  • Restore using Backblaze ZIP files instead those we really cared about
    (750GB downloaded over 3 days in multiple 100GB downloads)
  • Restore any missing files (about 15% of all of them) using the Backblaze data, with wrong timestamps.

Now, I think I need to create a periodic job that dumps into a file all of the timestamp data, so Backblaze will back that up.   Then I can use a program to reset the timestamps to those that I have logged after I’ve done a restore.   Backblaze can be a hassle.


Posted in software testing | 3 Comments

Software Testing: A Research Travelogue by Orso and Rothermel

Interesting survey on updates to software testing was given as an ACM Sigsoft webinar.
Software Testing: A Research Travelogue,” Alessandro Orso, Gregg Rothermel, Willem Visser.
Based on their paper “Software Testing: A Research Travelogue (2000-2014)” by A. Orso and G. Rothermel in Proceedings of the 36th IEEE and ACM SIGSOFT International Conference on Software Engineering (ICSE 2014) — FOSE Track (invited).

This 10 page paper is followed by 210 references! Most encouraging to me was the slide on Empirical Studies & Infrastructure.   I fully agree that Testing is heuristic and thus must be empirically evaluated:

“• State of the art in 2005: study on 224 papers on testing (1994–2003)
None 52%, Case studies 27%, Experiments 17%, Examples 4%

Things have changed dramatically since then 

  • Empirical evaluations are almost required
  • Artifact evaluations at various conferences”

In their conclusion they also stated something I strongly believe in:

“Stop chasing full automation

  • Use the different players for what they are best at doing
    • Human: creativity
    • Computer: computation-intensive, repetitive, error-prone, etc. tasks “

I hope all professional testers are aware of the many topics they touched on:

  • Automated Test Input Generation using Dynamic Symbolic Execution,
    Search-based Testing, Random Testing, Combined Techniques.
  • Combinatorial Testing, Model-Based Testing, and Mining/Learning from Field Data
  • Regression Testing – selection, minimization, prioritization
  • Frameworks for Test Execution and Continuous Integration
Posted in software testing | Leave a comment

Thoughts on Rimby’s How Agile Teams Can Use Test Points

At February 3 SeaSpin meeting, Eric Rimby provided a discussion-provoking thought experiment around what he termed “Test Points” (analogous to Story Points). I didn’t quite have time to snapshot his slide, but I think he finally defined them something like:

“ The number of functional tests cases adequate to achieve complete coverage of boundary values and logical branches strongly correlates with effort to develop.”

He counts functional test cases specified as part of backlog grooming for a user story as “test points”.

While I’ve always had issues with counting test cases (e.g. “small” versus “large” tests and Reponse to How many test cases by James Christie), he at least restricted the context in which the test cases he was counting were created.   He presumed a team trained in a particular test methodology for doing boundary values and logical branches (I suggested Kaner’s Domain Testing Workbook), and that they compared notes over time. Another audience member afterwards indicated that Pex (or other tools) could probably auto generate many of these cases. Similar to how a scrum team should get more uniform in ascribing story points over time, Eric expects the number of functional test cases estimated by various team members for a story would become sufficiently uniform over time.

While I disagree with many of the suppositions he made during his talk, I agree that tracking the number of functional test cases estimated for a story might be a useful thing to track. Whether it correlates to anything remains to be measured. However, I think just getting teams to be better about their upfront Acceptance Test Driven Development (ATDD) as part of story definition can only help.

Abstract from , How Agile Teams Can Use Test Points
Test points are similar to story points. They can be used to estimate story size, track sprint progress, normalize velocity across teams, among other things. Test points have some advantageous that story points do not. They could be used instead of, or alongside with story points.

Posted in software testing | Leave a comment

What are Synthetics?

I attended PNSQC Birds of a feather session “Why do we need synthetics” by Matt Griscom. It was advertised as “The big trend in software quality is towards analytics, and its corollary, synthetics. The question is: why and how much do we need synthetics, and how does it replace the need for more traditional automation?”
I spoke with Matt briefly to understand what he meant by synthetics, because I thought it was a rare, relatively unused term.

I learned a lot from other participants at the session.   First, New Relic is trying to stakeout the term!   (They describe test monitors as Selenium-driven platform sends data using turnkey test scripts).

Second, I attended a great talk, which I highly recommend, by former colleague from Bing:
Automated Synthetic Exploratory Monitoring of Dynamic Web Sites Using Selenium by Marcelo De Barros, Microsoft.

So synthetics are mainstream.   So what are synthetics?   What are not synthetics?   I had a hard time parsing synthetics as corollary of analytics. Still do.
For me synthetics are tests that run in production (or production-like environments) and use production monitoring for verification.
Analytics are just a method, in this context, for doing monitoring.   To me synthetics are artificial (synthetic) data introduced to the system.

I thought synthetics were almost always automated and A/B testing would be a type of testing where synthetics wouldn’t apply.   I was proven wrong on both with a single example: Using Amazon’s Mechanical Turk to pay people to choose whether they like A or B.!     This is manual testing and synthetic, as it is not being done by the real user base.

Maybe the problem with “synthetics” is the same problem I have with “automation”.   Even “test automation” isn’t very specific, and means many things.   I’m not sure if “synthetics” is supposed to mean synthetic data (Wikipedia since 2009), synthetic monitoring (Wikipedia since 2006 – the description also uses “synthetic testing”), or something else.

Posted in software testing | 1 Comment

How Data Science is Changing Software Testing – Robert Musson talk

I enjoyed Robert Musson’s recent presentation How Data Science is Changing Software Testing and recommend you watch it or at least read Robert’s Presentation Slides which don’t do it full justice, but should tease you.
As the abstract stated: It will describe the new skills required of test organizations and the ways individuals can begin to make the transition.

I worked with Bob a few times while at Microsoft, and he truly was one of the original Data Science testers for the past decade doing Data Analytics.   He says (37:50 into video) the tide has turned recently and he has “seen more progress in last 6 months than seen in past 10 years.”

So now I need to learn

  • Statistics, e.g., r-value, p-value, Poisson and Gamma distributions
    Homogenous (non-changing) or Non-homogenous (changing) Poisson for reliability measurements to get me used to time analysis.
  • R language (open source version of S).
    Object oriented with many packages to do exploratory data analysis and quick linear models.
  • Python for easier data manipulation including building dictionaries and packages for linear algebra

So I can prepare for the mindset change.

Mindset change is to one of information discovery vs. bug discovery

An audience member asked how to learn, and Bob recommended for many courses, including statistic courses..  He called out specifically,
Model Thinking – Scott Page – U. of Michigan.
I love models, but I might also start with

Posted in software testing | Tagged , , , | 2 Comments

Testing Magazines

Last week I listened to the Webinar, State of Software Testing. Tea Time With Testers  approached testing experts Jerry Weinberg and Fiona Charles to review their ‘State of Software Testing’ survey for year 2013 survey results.   Mostly the experts indicated the questions were poor and thus the results irrelevant.   But one side comment caught my attention since they both agreed:

every tester should read at least one test magazine a month.

While I do that, I asked a colleague recently, and he said no and wasn’t even aware of the many free online test magazines available.  Thus, this post to list several Test Magazines to choose from.     Note, I consider this an addition to the poll I’ve often heard, which I also agree with: have your read at least 1 book on software testing and better – have your read at least 1 book on software testing every year.

So now my belief is professional testers as part of their continuing education in their profession should read at least 1 book a year and 1 magazine a month about software testing.    Maybe following some blogs also and many of the magazine sites have associated blogs.

Since coming to, I discovered two of these magazines due to posts by Reena Mathews.     Not in a priority order, just a count:

  1. Tea Time With Testers
  2. Automated Software Testing
  3. Better Software
  4. Professional Tester
  5. Testing Trapeze
  6. Software Test & Quality Assurance
  7. Testing Experience
  8. Testing Circus
  9. Testing Magazine

Non Free magazines:

10 .ASQ Software Quality Professional

Blog examples:

There are many sites that list lots of software testing blogs.

Posted in software testing | 1 Comment

Putting Lean Principles to Work for Your Agile Teams – Sam McAfee’s talk

Interesting talk, Putting Lean Principles to Work for Your Agile Teams, by Sam McAfee at Bay area Agile Leadership Network (

While as one commenter put it, nothing new, it was interesting to me to see it strung together and the one-year transformation.
Basically starting with a team that followed many Agile practices he described changes using Lean that transformed or even eliminated the need for some of the Agile practices.

Initial agile practices: Collocated teams with daily stand ups.  Pair Programming 95% of the time amongst the 16-17 engineers.  2 weeks sprints with shipping at the end.  Test Driven Development (TDD) & Continuous Integration (CI).  Engineers estimate in story points.

With all this, they still had stories stuck or blocked for long periods of time.
So apply Theory of Constraints,  (Book:  The Goal — Eli Goldratt , or more recent IT oriented version: The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win by Gene KimKevin Behr and, George Spafford).
Also need more visual progress, so use Kanban board.
Kanban — use buffers to throttle upstream demand and reduce cycle time.   Reduce Work In Progress (WIP):   Ready, Doing, Done
Bottleneck was some times deployment.  They had manual deployment process.
So use Continuous Deployment  (CD)— automate deployment — Reduce cycle time of deploying value to customers as close to zero as possible.

dev -> continuous integration —> test in cloud —>  Deploy & Monitor (system health, . . )

Note: Theory of constraints assume single, stable bottleneck.

With Knowledge work, the bottlenecks bounce around.
Kanban more systematically lays down constraints.

Typically delay in handoff between roles or back & forth between roles.

==> use tighter feedback loops to reduce stuck stories.

CD allows change to fixed length sprint structure. Sprint planning meetings continue, but fixed sprints becoming superfluous because of continuous deployment.

Not all features released produced the Key Performance Indicators (KPIs) they wanted.  So use Innovation Accounting from book The Lean Startup.
Use the build-measure-learn loop to reduce the amount of uncertainty in new product development or process innovation.
Build smallest KPI change you can ship and run experiments.

Small lightweight experiments could be costly using full TDD and pair-programmed development.  For short lived proto-typing code, may not need TDD.   Also consider pair with Designer —  Designer & Engineer to create experiments.   Use experiments to validate business needs

A change from Agile to Lean was dropping story points in favor of Statistical Process Control (SPC) ala Shewhart and Demming.
Assume most stories flow normally, but analyze why outliers outside the range?  What makes those work items special?   Estimation time focused on risky (outlier) areas.
Team used electronic Kanban board (not Sam’s first choice of way to do it) that collected data, which was fortuitously used for SPC control chart.

Not all was rosy.  CEO would make urgent requests to reprioritize.   Which slowed original work down.   How to do the tradeoff?    Measure Cost of Delay.
Compare cost of delay for what you are doing now vs. what CEO wants now.

Cost of Delay: Quantifying the impact of delay on total life-cycle profits for each of your projects.  Delay typically shifts when start recognizing revenue, without shifting end of life.
How to get costs?  Spreadsheet of conversion rates, traffic, etc. from Finance.
Assumes you know cost of production:  Engineering time spent and cost of payroll.

Quantify Risk using data — not intuition — to model, and validate, risk factors.
Books by Hubbard : How to Measure Anything  & Failure of Risk Management.
Quantifying Risk of : Traffic -> convert user —> paid user retention
“all other risk” (without data) is just hand waiving.
Use Monte Carlo simulations — which parts are most sensitive to Risk.

Sam’s summary:

Continuous Deployment
Optional pair programming (E/E or D/E pairs)
Optional TDD & Continuous Integration
Use experiments to validate business needs
Use historical data to provide estimates, and asses risks.

Change daily stand ups  from what I did, doing, and am blocked on
to talk about flow of work.

Moving KPIs in right direction.
How to make many small bets.

But don’t believe what I wrote!   Watch the video with the graphics for more detailed descriptions.
Video of Sam McAfee’s Putting Lean Principles to Work for Your Agile Teams talk.
To visit Bay Area Agile Leadership Network, go here:

Posted in software testing | Leave a comment