Category Archive: technology

Shortcomings in Dropbox’s response to losing user data

Update: Dropbox came through with a CSV containing the date, status, and full path to the file. Basically everything I asked for in the support request, and in just under 24 hours.

Dropbox recently discovered that their software lost user data. If you were affected, you probably received an email that started out like this:

We’re reaching out to let you know about a Selective Sync issue that affected a small number of Dropbox users. Unfortunately, some of your files were deleted when the Dropbox desktop application was shut down or restarted while you were applying Selective Sync settings.

This sucks. It sucks for users, and it sucks for Dropbox. I know the gut wrenching cliff-dive of emotion that comes when you discover you’ve lost user data. I get it, but when you screw up like this, you have to go above and beyond in assisting recovery. Dropbox’s response hasn’t been horrible, but there’s more they can do to help users recover files.

I backup my data. I have a local Time Machine backup that goes back about 18 months, and I use Backblaze for remote backups. Dropbox indicates that the files were deleted about 7 months ago. Theoretically this means I should be able to recover any file that Dropbox lost.

Unfortunately, the format in which Dropbox provides lost file information makes recovery a very tedious process. In the notification email, Dropbox provides a link to their website where you can view details about the lost files. This is what the resulting page looks like:

review-notice

Note: The black boxes are redaction intended to keep at least some semblance of privacy with regard to the information I’m sharing here.

Let’s tally up our losses:

616 - 385 + 27 = 258
36 - 19 + 3    =  20
          Total: 278

Knowing the damage, we must dive in to the next step: recovery. In order to recover a file, you need — at a minimum — the full name of the file that was lost. Ideally, you also know the full path to the file. Let’s see what details Dropbox has provided. Clicking the “385/616 files” link brings up this page:

lost-files-detail

A browsable interface. This helps me get oriented with what was lost, but I would very much prefer not to dig through a nested list of folders to identify almost three hundred lost files. In an effort to find a more comprehensive and machine parsable list, my eye was drawn to the “231 files” link. This is what I get:

lost-files-list

This is, frankly, a disaster. From the perspective of an IT person attempting recovery:

  1. The filenames are incomplete (note the ellipses marked with red boxes).
  2. There is no way to download this list.
  3. There is no file path information provided.
  4. The deletion date is a relative date.

Nearest I can tell, Dropbox has provided no means for IT staff to access a list of lost files that is useful for automated recovery. I submitted a support request to Dropbox, explaining much of the same information as I have posted here. If you were affected by the selective sync data loss issue, I’d appreciate it if you would send a similar request. The more requests they receive, the more likely we are to get something usable for mass-recovery.

I recently received notification that a couple hundred of our files were lost due to a selective sync issue. What’s done is done. We build software too, so I understand how much this must suck for your team, but we’re working on recovery, and I think Dropbox can do more to help.

We have backups of all our lost files, so I can perform a recovery, but the format Dropbox has provided information in isn’t usable by our IT department. A browsable format is nice for end-users, but from an IT recovery perspective, we could use the following in some machine parseable format (JSON, CSV, TSV, etc):

  • File name
  • Dropbox path to file
  • Status (un-recovered, restored, restored wrong version)
  • Date deleted (the actual date, not a human-friendly relative date)

This list would benefit internal IT departments, as well as end-users who contact an IT support professional to assist with the recovery of lost files.

Thanks,
Brad

Easy HTTPS/SSL for development servers: WEBrick, lighttpd, Python SimpleHTTPServer, Pow, etc

If you’re developing web applications, you’ve no doubt noticed the common practice of including a development web server with your framework of choice. This is a convenience provided to allow developers to get going quickly with a framework, but the built-in tools are usually limited solutions. Unfortunately, you may run in to the limitations of the built-in web server, even when you’re just getting started.

One of the common shortcomings of these built-in web servers is a lack of SSL. Web servers like Nginx and Apache both provide easy to use SSL out of the box, but setting up these web servers to serve short-lived, ad hoc applications in varying frameworks is a bit too much overhead for my liking. Feeling that pain, I set out to find a simple way to proxy SSL requests to any web server.

The solution is an application called stunnel. Stunnel is self-described as an SSL wrapper. With stunnel, you can create SSL tunnels defined by a simple configuration file. It was purpose built for the application I was seeking, and much, much more. Fortunately, it’s available through Homebrew:

    brew install stunnel

The default install of stunnel from Homebrew contains only an example configuration, so you’ll need to add a live one:

    cat << 'EOF' > /usr/local/etc/stunnel/stunnel.conf
    pid        = /tmp/stunnel.pid
    setuid     = nobody
    setgid     = nobody
    foreground = yes
    client     = no
    [https]
    cert       = /usr/local/etc/stunnel/stunnel.pem
    accept     = 443
    connect    = 4567
    EOF

This config listens on port 443 (the standard SSL port), then attempts a connection on port 4567 (the default Sinatra port). You can (and should) change the connect port to whatever your development server requires.

The certificate specified is a self-signed certificate that is generated by the Homebrew formula for stunnel. The author’s name and details are used for the cert, but since I only use this for development, I don’t let it bother me.

With the config in place, you simply start stunnel from any bash prompt. Sudo is required because stunnel will listen on 443 (a privileged port). Keep in mind that if you already have a server listening on 443, you’ll encounter an error, so shut down any daemons listening on 443 before you do this.

    sudo stunnel

Stunnel will stay running in your terminal, outputting log information, much like your application’s development server. If you invoke stunnel and you end up back at a bash prompt, you’ve encountered an error. Once stunnel is running, switch over to your application root and start your development web server.

With both running, you should be able to visit https://localhost/ and see your application. Note that because this configuration uses a self-signed certificate, your web browser will give you a security warning. You can dismiss this warning.

At the risk of sounding dumb, I’ll point out that you should NOT use the typical port for your application server when attempting to use an HTTPS connection. I initially requested https://localhost:4567/, which of course generated an error:

    [2014-04-21 13:49:29] ERROR bad Request-Line `\x16\x03\x01\x00?\x01\x00\x00?\x03\x03SUZ)Bo?8}?ϯ?L??Gb???]\x0F\x7F.b'.
    localhost - - [21/Apr/2014:13:49:29 EDT] "\x16\x03\x01\x00?\x01\x00\x00?\x03\x03SUZ)Bo?8}?ϯ?L??Gb???]\x0F\x7F.b" 400 351
    [2014-04-21 13:49:29] ERROR bad URI `\x17??ne?M?????3?\x00\x00J\x00??$?#?'.

That’s what SSL encrypted traffic looks like to an out of the box WEBrick server, and obviously WEBrick doesn’t understand SSL… That’s what landed me here in the first place. If you see this, just drop the port specifier from your URL and try again. You should see console output in the terminal window where stunnel is running. If you don’t, check your stunnel configuration’s ‘listen’ directive to make sure it’s listening on the HTTPS default, 443.

Why you should never use “Identify with email”

I recently read an article advocating a method of authentication using email. My initial reaction was that the usability of such a solution would be very poor for a couple of reasons:

  • It’s bad UX to have users jumping between applications/locations to accomplish a linear task.
  • Email protocols don’t guarantee timely delivery; SMTP retry intervals are frequently specified in minutes, not seconds.

Besides the poor UX, there is a major security concern: email transport is unequivocally insecure.

With a shallow understanding of email, it appears that it might be secure, but that assumption would be wrong. The vast majority of email providers specify some form of encryption on the connection between client and server for both receiving (IMAP/POP) email, as well as sending (SMTP). However, this encryption is only used between you and your mail server because credentials are exchanged during these operations. Once the mail is handed off to your mail server, the email must be transported to the recipient mail server. This transfer to other servers is often unencrypted. As the auth provider, you could explicitly request encryption for SMTP relay to the destination server, but there is no guarantee that it will be accepted. What do you do when a user can’t receive secure email? Also, you have no way of knowing how the email will be relayed once you hand it off to the destination MX.

Authentication over HTTPS offers an explicit guarantee that A) the identity of the server is verified by a third-party, and B) the information being transported over the network cannot be read between here and there. Email cannot satisfy either of these important requirements, therefore you should avoid it as a primary means of authentication.

Apache Bench and Gnuplot: you’re probably doing it wrong

Update: After some reflection and good feedback on Hacker News, I should be clear that the graph output by the first method isn’t “wrong”, but rather (very) commonly misunderstood. I plan on doing a future blog post where I look at both graphs in more detail and try to find better ways to visualize the data.

I should know. I’m a domain expert in “doing it wrong”. I’ve also seen this done in more than one place, so I’m going to memorialize my incompetence with Gnuplot in this blog post.

Does this invocation look familiar? This is a simple example of an Apache Bench command that will run a benchmark against the specified server.

ab   -n 400 -c 10 -g apache-1.tsv  "http://example.com"
^[1] ^[2]   ^[3]  ^[4]             ^[5]

I’ll break it down.

  1. ab: The Apache Bench benchmarking utility
  2. -n 400: The number of requests for your benchmark (yes, this one is crazy small; it’s so people don’t hammer example.com by copy pasting)
  3. -c 10: Number of concurrent requests
  4. -g apache-1.tsv: This tells ab to write a “gnuplot-file” (that’s right from the man page)
  5. "http://example.com": The URL you want to benchmark against

If you’ve used Apache Bench much, you’ve probably discovered the -g switch already. The man page tempts you with the suggestion that “This file can easily be imported into packages like Gnuplot”. The words “easily” and “Gnuplot” should not be spoken in the same sentence. This is not because Gnuplot is a bad tool; it is that Gnuplot is so insanely powerful that few people understand how to use it. Up until this afternoon, I was misusing it as well.

You’ve probably generated a graph from your Apache Bench Gnuplot file that looks something like this:

Sequence plot

This is the graph you get if you treat the gnuplot file provided by Apache Bench as a log file. This graph is probably not showing what you think it does. Hint: this is not response (ms) over chronologically ordered requests.

The problem is, the file output by Apache Bench is not a log file, it’s a data file. Log files are (generally) written serially, so we can treat the sequence of the lines synonymously with time sequence. Let me stop here for a second and make this clear…

The file output by the -g switch of Apache Bench is NOT in time sequence order. It is sorted by ttime (total time).

Yikes. Yes, I am embarrassed that it has taken me this long to realize that, but I’m not alone, so here we are.

The other problem is that the common invocations of gnuplot found scattered around the Internet mistreat this file. Below is a very common example, but even this includes some elements that common scripts miss. The key to spotting the error is in the plot line. The sequence-based graph (above) was created using a Gnuplot script that looks like this:

# Let's output to a jpeg file
set terminal jpeg size 500,500
# This sets the aspect ratio of the graph
set size 1, 1
# The file we'll write to
set output "graphs/sequence.jpg"
# The graph title
set title "Benchmark testing"
# Where to place the legend/key
set key left top
# Draw gridlines oriented on the y axis
set grid y
# Label the x-axis
set xlabel 'requests'
# Label the y-axis
set ylabel "response time (ms)"
# Tell gnuplot to use tabs as the delimiter instead of spaces (default)
set datafile separator '\t'
# Plot the data
plot "data/testing.tsv" every ::2 using 5 title 'response time' with lines
exit

Let’s look at the plot line in more detail:

plot "data/testing.tsv" every ::2 using 5 title 'response time' with lines
^[1] ^[2]               ^[3]      ^[4]    ^[6]                  ^[7]

All of what I’m about to summarize is available in a much more detail in the gnuplot help pages, but here’s the tl;dr version:

  1. plot draws a data series on the graphing area
  2. This is the path to the input datafile
  3. Think of every as in “take every X rows”, only a lot more powerful; in our case, ::2 means start at the second row
  4. using allows you to specify which columns to use; we’ll use column 5, ttime (total time)
  5. An explicit title for the series we’re plotting
  6. Plot with lines

The problem is that we haven’t specified any x-axis ordering value. We’ve only specified that the values in column 5 should be used. Gnuplot will happily render our series for us.

The resulting graph has always puzzled me, and my bewilderment caused me to discard it and rely only on the data output by the Apache Bench report, rather than the graphs. If your graphs look anything like this, you’re looking at a request time distribution plot, with longer requests to the right, and shorter requests to the left. Hence the S-curve. It’s a fun graph, but it’s probably not telling you what you think it is.

Most people want to look at response (in ms) over time (seconds, serially). The good news is, everything we need is in the data file. Let’s look at a better Gnuplot script:

# Let's output to a jpeg file
set terminal jpeg size 500,500
# This sets the aspect ratio of the graph
set size 1, 1
# The file we'll write to
set output "graphs/timeseries.jpg"
# The graph title
set title "Benchmark testing"
# Where to place the legend/key
set key left top
# Draw gridlines oriented on the y axis
set grid y
# Specify that the x-series data is time data
set xdata time
# Specify the *input* format of the time data
set timefmt "%s"
# Specify the *output* format for the x-axis tick labels
set format x "%S"
# Label the x-axis
set xlabel 'seconds'
# Label the y-axis
set ylabel "response time (ms)"
# Tell gnuplot to use tabs as the delimiter instead of spaces (default)
set datafile separator '\t'
# Plot the data
plot "data/testing.tsv" every ::2 using 2:5 title 'response time' with points
exit

We’ll look at the plot line in more detail:

plot "data/testing.tsv" every ::2 using 2:5 title 'response time' with points
^[1] ^[2]               ^[3]      ^[4]      ^[5]                  ^[6]

I’m going to repeat myself some here, but I want this to be clear, so here we go.

  1. plot draws a data series on the graphing area
  2. This is the path to the input datafile
  3. Think of every as in “take every X rows”, only a lot more powerful; in our case, ::2 means start at the second row
  4. using allows you to specify which columns to use; we’ll use columns 2 and 5, seconds (unix timestamp) and ttime (total time)
  5. An explicit title for the series we’re plotting
  6. Plot with points (lines won’t work, as we’ll see below)

The resulting graph looks like this:

Time series plot

Yeah, I know. Right now, you’re scratching your head trying to figure out just what the hell you’re looking at. This is not the pretty line graph we thought we were going to get.

Here’s the thing: unless you’re testing something unrealistically simplistic, your response times aren’t going to be well represented by a single line at any given time.

Keep in mind a few things:

  • We told Apache Bench to use concurrency 10
  • Apache Bench is going to go all out
  • The granularity of our time data is 1 second

The result is a graph that shows stacks of points. The graph above is difficult to read, because it’s rendered in such a small area. The version below is rendered at 1280 x 720. Give it a click to view the large version.

timeseries-1280x720

When you look at this graph, it becomes much clearer what’s going on. At each second of the test, there is a distribution of response times. The benchmark results I plotted are from one of the most complex JSON endpoints in our application. I also turned off caching so that the result would be computed each time. This made the data complex, but isn’t that what we’re after? If we can’t infer meaning from complex data, what good are our tools? Looking at this graph, I can see the distribution of response times, as well as the upper and low boundaries.

The scatterplot style graph is what I plan on using for now, but I think a candlestick chart or heatmap would probably be the best representation. I’m going to continue to work on the Gnuplot script until I have something really refined, so keep an eye out for updates!

Comments welcome on Hacker News

Rubygems, HTTPS, and jRuby OpenSSL LoadError

UPDATE: It seems there was enough clamor over the unannounced change that a reversion to HTTP was warranted. The advice below isn’t needed currently, but may become relevant in the future, should HTTPS end up back in the mix without a suitable workaround.

Rubyists have been dropping by #rubygems this morning experiencing issues with jRuby and gem install. The issue is related to Rubygems.org’s switch to forcing HTTPS on connections to their Gem repository. Specifically, there is a recursive issue with jruby-openssl when trying to install Gems from Rubygems.org: you need jruby-openssl to install jruby-openssl.

If you attempt to install a Gem in any version of jRuby prior to 1.7 that doesn’t already have jruby-openssl installed, you will receive a LoadError error that looks something like this:

Exception `LoadError' at /home/username/.rubies/jruby-1.5.1/lib/ruby/site_ruby/shared/jruby/openssl/autoloads/ssl.rb:8 - OpenSSL::SSL requires the jruby-openssl gem

The trouble is, you’ll get this message even if you’re trying to install the jruby-openssl gem. Uh oh.

The root of the problem is owed to two factors

  1. Prior to jRuby 1.7, OpenSSL support was not included in jRuby core because of crypto export constraints
  2. Rubygems.org recently switched to forcing HTTPS as a means of increasing security in the rubygems ecosystem

The jRuby devs have worked through the crypto export issue, so updating to jRuby 1.7 will solve the problem; you no longer need jruby-openssl, period. If you’re stuck on an older version of jRuby and need to get gem install working again, you can use this horrible, hacky work around:

Download the jruby-openssl .gem file (using something like wget/curl) and install it from the local file like so:

wget http://production.cf.rubygems.org/gems/jruby-openssl-0.7.7.gem
gem install --local ./jruby-openssl-0.7.7.gem

Be sure to replace the version number with one compatible with your version of jRuby. Also, understand that there are no guarantees that the URL schema above will remain in place. Rubygems are an API, so the implementation may change. The long term solution is to move to jRuby 1.7.

He’s no Don Draper

Anyone who has seen Don Draper’s iconic Carousel speech knows that nostalgia is a terribly effective agent for emptying consumers’ pockets. Apparently, a reader at Daring Fireball saw a correlation between Don’s work and a recent advertisement for Internet Explorer.

Take a moment to watch both the Carousel speech and the Internet Explorer ad before you move ahead. I’ll wait.

I don’t doubt that the agency responsible for the advertisement had this in mind when they scripted this piece. Unfortunately, the ad falls flat for me.

I grew up in the 90s. I saw a lot of things I remember fondly when I watched the ad; if not with a chuckle at the absurdity of the 90s aesthetic. I did feel connected with the images, but why didn’t I feel connected to the product?

Don Draper tells us we should be nostalgic, but not because we have a strong sentimental attachment to film slides. We feel what we do because of what the Carousel delivers. We insert our slides, dim the lights, and we are taken back to “a place where we know we are loved”.

Sob.

Unfortunately, yeterday’s Internet is gone. Internet Explorer cannot bring it back. Therefore, the product fails to deliver on the promise of the ad. That, I think, is the disconnect, and it’s the reason the ad falls flat for me.

Why be evil.rb?

Caius Durling shared some “hax” with the Ruby community that inspired some discussion over at Hacker News. I commented there, but it seemed like a decent topic for a blog post. One of the examples illustrates a means to define a method-local variable in the argument definition list:

  def output name=((default=true); "caius")
    "name: #{name.inspect} -- default: #{default.inspect}"
  end

  output() # => "name: \"caius\" -- default: true"
  output("fred") # => "name: \"fred\" -- default: nil"

Have a closer look at the method definition line. This code works because of the way parenthesis are handled. Much like in math, when parenthesis are encountered, we evaluate from the inside out. Fire up an IRB session and run this code:

((default=true); "caius")
default.inspect

The return value given for the first line is “caius”, which gets assigned to “name” in our argument list, but you can also see that default is set to “true”. Using the semi-colon statement separator only works because we’ve wrapped the whole thing in parenthesis. That’s why the return value is “caius”. We leverage Ruby’s last-line return value feature.

This might all seem trivial to you if you’ve been programming in Ruby for a while, but therein lies the crux of the discussion. If we only ever wrote code for ourselves, this would be a non-argument. If we expect other people to read and use our code, we should write in a way that is easy to interpret.

Like many viewpoints, the “wrongness” of this example is not black & white; it’s shades of gray. On one hand, you have the “anything that will eval is valid Ruby” view, and on the other you have the “If it’s not immediately obvious to a beginner, you shouldn’t do it” view. There may be better ways to express those two sides of the matter, but that’s the general idea.

The problem with this code (from the latter viewpoint) is that it crams too much program logic in to the argument definitions. This example uses parenthesis to force the evaluation of default=true; "caius" in the argument definition list. That’s only two statements, but it violates some common expectations:

  1. We generally expect argument definitions to be clear and readable, so that method definitions are self documenting (to some degree); this approach clutters the argument definitions.

  2. We expect argument definitions to sometimes assign default values.

  3. We expect program logic to appear in the body of a method, or to be DRY’d up in separate methods.

In this way, the example is not “incorrect” but awkward. To borrow an idea from the literate programming camp, I’d say that just because you can write awkward sentences with valid grammar, it doesn’t mean you should.

Side note: In Caius’ defense, he does say never to use these.

You have no idea what Steve Jobs would do

The iOS 6 maps kerfuffle has the Steve Jobs prognosticators out in force, all crying the same old song: “Steve would have never let this happen!”

Really, folks? Really?

Very few people — less than I can count on one hand — ever demonstrated any ability to understand what was going on inside Steve Jobs head. The vast majority of the tech punditry flat out disagreed with him. Most common geeks foamed at the mouth in rage over some of Steve Jobs’ actual decisions. Remember the whole “no native apps” on the original iPhone? How about the time Steve Jobs went all Jules Winnfield on Adobe Flash?

The only person who demonstrated any ability to understand Steve Jobs’ reasoning on anything more than a superficial level was John Gruber, and the community over at Hacker News (a place full of really smart geeks) waits with baited breath to tear his articles apart.

The truth is, none of us have any clue whether Steve Jobs would have released iOS 6 Maps in this state. What we do know was that he was hoppin’ mad over Android, and put the wheels in motion on the Apple/Google separation long before his passing.

This whole thing really has nothing to do with anything. It’s just a plea. A plea to stop invoking the name of a man who’s time was cut short in an effort to add credibility to your argument. Yes, iOS Maps suck, but they suck regardless of what Steve Jobs might, or might not, have done.

Cable television subscription rates falling; but where are they going?

BGR reports that 400,000 cable and satellite television subscribers ditched their service this year. This apparent declining trend is backed up by the graph over at NCTA (NTCA is a cable provider trade group). You can see from that graph that the number of cable television subscribers peaked in 2001. So where are these folks getting their entertainment?

Even more interesting is the graph of cable internet subscribers over a similar time period. It looks like cable internet started taking off at about the time television subscribers peaked. The graph below uses the data from NCTA/SNL Kagan.

Just go ahead and assume everyone knows your password

Fast on the heels of the LinkedIn password leak, eHarmoney also announced a password disclosure. Today, Last.fm is suggesting that users should change their passwords.

Not good.

Right now, a lot of you are thinking, “Who cares if hackers have access to my Last.fm account?” I agree. What’s the worst they can do there? Scrobble some music to your timeline, leading others to believe you have poor taste? Oh the horror!

But what if you happen to use the same password for LinkedIn/eHarmony/Last.fm as you do your email? Even if you’re one of those people who have no secrets, consider that your email is the key to a large part of your life. If you forget your bank password, how do you reset it? That’s right, through your email.

You might recall your friendly neighborhood IT guy mentioning something about secure passwords, and for years you’ve gotten by with the ol’ “yeah, yeah, I’m listening” response, but things are starting to get pretty serious. These types of breeches are becoming far more common. LinkedIn aren’t a bunch of schmucks. They’ve got a good team full of smart people, but security is hard. Security is ridiculously, insanely, absurdly, strikingly (is that even a word?) hard. Even the best are going to fail sometimes.

So what can you do? You can lend these guys a hand. Having your Last.fm account compromised isn’t a very big deal if your Last.fm password is different from all your other passwords. Keeping track of a unique password for every website you use sucks. I know that, you know that, and even the security guys know that, but it sucks less than having someone initiate a bank transfer for your entire life savings to an offshore bank who refuses to cooperate with the FBI. Let’s not find out what that’s like, eh?

In the mean time, get yourself some tools to help you out. I like 1Password. It works on PCs, Macs, iPhones, iPod Touches, iPads, and Android phones…. . I’ve been using it for a couple of years now, and I’m not sweating any of these disclosures.

EDIT: Another friend of mine recommends LastPass. I’ve been happy with 1Password, so I don’t have any reason to stray, but if you’re not feeling 1Password for any reason LastPass is probably worth a shot.