Never seen this being done before.
I contacted a Bank of America representative over the telephone, and they needed to authenticate me before we could get down to business. I was asked my name, and was then asked to confirm a series of questions. I was not asked to state these facts myself - they stated it themselves, and asked me to confirm it.
Silly way to verify a person's identity, one would think. Anyone could pass of as me, if they had my name and my account number - and if they simply confirmed every detail that the CSR gave them. However, the CSR did make one minor mistake while stating the details - my phone number was off by one number, and I corrected them promptly. Looking back, it is quite clear that the mistake was deliberate. In this way, I did not say out any of my personal details out loud (which would have been terrible in public), and I pretty much authenticated myself by correcting one random mistake that they chose to make.
September 11, 2011
April 03, 2011
"Quadratic space will kill you faster than quadratic time"
Learnt it the hard way. Ouch.
(Prof Steven Skiena: Lecture on homology)
March 20, 2011
A distributed pipeline for processing text
Usually, Hadoop is the way to go.
However, I have joined a project that has been underway for more than a year, and the processes have been written in mostly an ad-hoc way - shell, python, and Java standalone programs. Converting each of these to mappers and reducers would have been an arduous task.
I decided to re-write the pipeline in SCons. There are many things about this pipeline that represent a conventional build. There are dependencies, and usually newer functionality/processing is added to the later stages of the pipeline. Luckily, SCons takes in regular python functions as "Builders", which I hooked into xml-rpc functions, and we soon had SCons running the pipeline on multiple servers (just five, actually - that's all we'd get for our pipeline). The file-system is an NFS share, which simplifies things a great deal.
Python, however, has been a bit on the slower side. Also, invoking the Java VM every time you need to process a file feels like too much of an overhead. So while the pipeline is functional, and processes the corpus much faster than before (5-6 hours vs 20+ earlier), we are considering re-writing the XML-RPC server in Java. The standalone programs can be easily ported to the server implementation, and invoking shell scripts from Java shouldn't be very different from invoking them from python - things should only improve. I wonder, however, if I should have written this in Hadoop to start with.
However, I have joined a project that has been underway for more than a year, and the processes have been written in mostly an ad-hoc way - shell, python, and Java standalone programs. Converting each of these to mappers and reducers would have been an arduous task.
I decided to re-write the pipeline in SCons. There are many things about this pipeline that represent a conventional build. There are dependencies, and usually newer functionality/processing is added to the later stages of the pipeline. Luckily, SCons takes in regular python functions as "Builders", which I hooked into xml-rpc functions, and we soon had SCons running the pipeline on multiple servers (just five, actually - that's all we'd get for our pipeline). The file-system is an NFS share, which simplifies things a great deal.
Python, however, has been a bit on the slower side. Also, invoking the Java VM every time you need to process a file feels like too much of an overhead. So while the pipeline is functional, and processes the corpus much faster than before (5-6 hours vs 20+ earlier), we are considering re-writing the XML-RPC server in Java. The standalone programs can be easily ported to the server implementation, and invoking shell scripts from Java shouldn't be very different from invoking them from python - things should only improve. I wonder, however, if I should have written this in Hadoop to start with.
Subscribe to:
Posts (Atom)