Geekness – closer to the world

Geeky at the Lake of Zurich

Google Treasure Hunt 2008 – My Solutions

Google Treasure Hunt 2008

If you are
[ ] a Google Fanboy
[ ] a Computer Science Student
[ ] a geek
[ ] going to school with me
you have probably heard of this year’s Google Treasure Hunt, a “a puzzle contest designed to test your problem-solving skills in computer science, networking, and low-level UNIX trivia”.

The hunt started a few days back. In the first quiz you had to solve a math “problem”. Thanks to Rouven and the Internet we could solve it and answer it correct. And the result is an incredible high number which is unpronounceable 🙂

The second quiz was a little bit easier.

Unzip the archive, then process the resulting files to obtain a numeric result. You’ll be taking the sum of lines from files matching a certain description, and multiplying those sums together to obtain a final result. Note that files have many different extensions, like ‘.pdf’ and ‘.js’, but all are plain text files containing a small number of lines of text.

Sum of line 1 for all files with path or name containing jkl and ending in .txt
Sum of line 4 for all files with path or name containing zzz and ending in .rtf
Hint: If the requested line does not exist, do not increment the sum.

Multiply all the above sums together and enter the product below.

(Keep in mind, the exact question will change once you generate your question!)

I know there is a solution by using some shell commands and pipe them, but even after two hours of trying and guessing I could not find a good solution. So I started my trustworthy Java IDE (= Eclipse) and hacked some lines of code. Do not worry, I will publish the source code as soon as the next quiz is live. I do not want to take your fun out of it 🙂
If you have a solution with a shell script, please post it below.

And now I can not wait for the third quiz. What could it be?

You can try it for yourself. And please comment below if you find a better solution or are having troubles 🙂

15 thoughts on “Google Treasure Hunt 2008 – My Solutions

  1. Well done on the puzzles. I did the recursive folder / file problem probably in a stupid way but I used OSX find and just looked for all the .rtf files in that folder. I then changed the colour of the label for the file if it met the criteria. Managed to get the right answer in around 2 minutes so I’m happy. I know you could probably write some code using regex or whatever to do this but as there weren’t that many files that met the criteria then I thought, what the heck, do it the old fashioned way!!

    Scared to see what comes next though (puzzle 3)

    Best wishes,

    Mark

  2. Hi,

    This simple script will print the numbers you want to add:

    find | grep “jkl.*\.txt”$ | while read FN; do echo 1 | ed $FN 2>/dev/null; done

    find | grep “zzz.*\.rtf”$ | while read FN; do echo 4 | ed $FN 2>/dev/null; done

  3. I like the “ed” part, but I did it with sed, and added the numbers. This is undoubtedly sub-optimal, for example the trade-off between multiple greps and no wildcards; and it seems like there should be a better way to add them.

    find | grep stu | grep \\.pdf$ | while read f; do sed -n 5p $f; done | xargs echo | tr ‘ ‘ + | bc

  4. Good solutions you posted mates! I also thought a bit of shell scripting would do the trick but I decided to code in Ruby + a bit of system(find) calls 🙂
    I did a little different than Haamed Gheibi and Larry, I used:
    find . -ipath “*foo*” -iname “*.pdf” instead of 2 greps.

    Attila Oláh, great job you did with the python script with 9 seconds. My solution is in C++:
    $ ./robot 2000 2000
    41582842586492499863454284232567680870623295758655672546336190968078144973974817078647658399100984261358856682110156655244981205051951527142682940330889793015484813900040121787457432517134924047880480501255466223259341818610562785588729724275019125829310979945256241601844529666172332858280867323497586481135457609230673167422612902661504196849981699351706858878094066263005477044082551941345586522024308759745408078283893842245115998069313875567846001930842335393667126793740710919718326194663469075446022269146352109718745585058004834971180679651548130936303251447263141916712695987929642664128772828140826395066158487442807235207656061110330432905699477436853397157094201366393092777134588950887817449048649811286839443512587136877011761708312525231720077261212139447797863538128167795146640851810800406375718156578943357706573961535132065528380851668468768734843681478292522901543478326293663545898502472037831721033835326043730096830737199546527388904299168707667019959258259344722695151735471964953220187681023380996615718145241783309631781647843904056322654114413609025031832543574320997532983867687985705448055057475955029664405427578211111064192312240045083183062752759630345186798202023360000

    real 0m11.507s
    user 0m10.462s
    sys 0m1.031s

    By the way, you can check my blog for Ruby and C++ solutions to both problems.

    Regards.

  5. By the way, Mário, when I write out my solution (or at least convert it to a base 10 string) it takes a bit longer than 9 seconds. Those 9 secs are just for the calculation.

    I think Larry still has the fastest solution for this one (: Nice to see that so many of you are interested.

    Anyone dares to sum the first 1,000,000,000 decimal digits of pi? I have an open contest over here

  6. My solution to the robots problem has not yet been posted here:

    echo ‘define f(x) {if(x <= 1) return(1); return(f(x-1)*x);} f(2000-1+2000-1)/(f(2000-1)*f(2000-1))’ | time bc

    This 2000×2000 grid takes 0.55s on my computer, including printing to the screen.

    A bit of analysis before coding helped me tremendously. For the robot to travel from the top left corner to the bottom right corner, it executes a series of downward and rightward moves. If the grid is R x D spaces in size, then it must make R-1 rightward moves and D-1 downward moves, in any order, for a total of (R-1)+(D-1) moves. If the downward and rightward moves were unique, then there would be a total of (R-1+D-1)! possibilities. However, all downward moves are the same, and all rightward moves are the same, so we need to divide the possibilities by the number of ways of arranging the downward moves and by the number of ways of arranging the rightward moves. The final answer, then, is ((R-1+D-1)!/((R-1)!*(D-1)!).

    The standard unix utility “bc” does a marvelous job at computing with arbitrarily large integers, but the man page does not indicate that it has a built-in factorial function. However, the man page includes a user-defined factorial function as an example (for us lazy people). A quick copy-and-paste from the man page followed by a simple expression to invoke the newly defined function gives the answer.

    This bc script could be substantially improved by changing the function to include a parameter for the ending point of the product series and possibly by making it non-recursive. You may notice that dividing a factorial by a smaller factorial is the same as simply multiplying the numbers from the smaller plus one up to the larger. That is, if N! = prod(1..N), then N!/M! = prod(1..N)/prod(1..M) = prod(M+1..N) (where M<N).

    I didn’t bother improving the script since it was already rather fast, and I was only going to use it once. Even if it took ten seconds (or even 100) to solve my 57×23 problem from Google, it would not have been worth improving in order to use it only once. There are always trade-offs between run-time efficiency and programmer-time efficiency. If this were to be used a lot, I would probably compare a C or C++ program (using appropriate arbitrary-precision libraries) to this script, but I would still keep this script to use in the regression tests.

  7. wow, you really have put together a great resource for the treasure hunt. I hope other people have put the same amount of energy and thinking in their solutions as you did.
    Thanks for all your input!

  8. Well, after I polished the C++ algorithm a bit more, I just reduced it to 10 seconds in the case of the 2000×2000 grid.
    Larry, your solution is undoubtedly the best and the fastest one, using Math analysis of the problem instead of hammers and axes like I did 🙂
    According to my algorithm which is based on a matrix of calculations, I believe that if a square grid (like 2000×2000) is used instead of a rectangular one, the algorithm becomes much faster since we just need to analyse the diagonal upper or lower part of the square, as the number of paths to reach the goal are symmetric. This is easily shown by replacing D with R at Lary’s formula.
    Great work everyone!

  9. I don’t see a C# implementation for puzzle 2 (took me 5 minutes when ripping example code straight off the MSDN site – a nice short cut).

    Puzzle 1 took me for ever to get my head around – have a look at my “battle royale” blog (Software Developer vs Sys Admin). Sys Admin leads 2 to 1 – his solutions are so ugly, but he solves everything so fast.

  10. puzzle 2:
    Instead of writing (or finding) multiple lines of code (C# or otherwise) to do simple tasks, I chose to use pre-existing utilities, e.g. “find”. If one knows how to use these simple, yet powerful, already-tested utilities then simple things like recursively finding all the file names in a directory tree become trivial (typing four characters, less than one second of my time). Maybe it’s the difference between Unix emphasis and Microsoft emphasis; having a good set of tools to get things done. Is your Sys Admin friend a Unix sysadmin?

    For puzzle 1, some thought to simplify the problem (avoiding premature coding syndrome), and again some knowledge of simple utilities, e.g. “bc”, made solving the problem incredibly faster than coding a robot simulation and finding or writing and using a reliable high-precision integer math library. See my description above.

    There are many valid ways to approach any problem.

    BTW, I am a software developer.

    – Larry Fenske

  11. Larry,

    I agree with the find vs. C# (I couldn’t be bothered getting off my development machine and using a Linux box). My Sys Admin friend sits at a Linux box all day (but has real problems checking his email and printing documents on our MS LAN 😉 ). I forgot that I have Cygwin installed… there goes my justification for coding… oops…

    “Premature Coding Syndrome” PCS – is that one yours? (I haven’t heard the particular term before), but definitely describes my issue with puzzle 2 (I try my best not to do this in my professional life though!) My problem was “cool, a robot, I know robotics, Alife, AI – I can do this” – where as the solution is just simple permutation mathematics and dealing with large integers.

    Call yourself a software developer – you know too many UNIX commands! I bet you un-jam photocopiers and don’t call IT support when you kick the monitor cable out of your computer!

    -Flights

  12. ha ha ha Well, I do at least *try* to un-jam photocopiers, and I did set up both my computer and another person’s when I joined this company (immediately replacing Microsoft Windows with Linux).

    “Premature Coding Syndrome” — Come to think of it I don’t think I have heard that phrase before I just used it.

    Tell your Sys Admin friend that Ubuntu (at least) has decent printer management, and Thunderbird with the Lightning plug-in are good for interfacing with Microsoft Exchange Server, if IMAP is available. I’ve had problems using Evolution.

    – Larry Fenske

  13. My solution to the first puzzle was similar to Larry’s, except I did the factorial computation iteratively instead of recursively. That reduces a lot of overhead and speeds things up.

    The second puzzle was trivial with find, grep, head and tail… I just did it on the command line.

    The third puzzle just took some eyeballing and was even quicker than the first two.

    The fourth took some time, but that’s covered in another blog entry here.

Comments are closed.