parsing – Data collection task – Education Career Blog

I have data that follows this kind of patten:

ID      Name1     Name2    Name3    Name4    .....
41242   MCJ5X     TUAW     OXVM4    Kcmev 1
93532   AVEV2     WCRB3    LPAQ 2   DVL2
.
.
.

As of now this is just format in a spreadsheet and has about 6000 lines. What I need to do is to create a new row for each Name after Name1 and associate that with the ID on its current row. For example, see below:

ID      Name1 
41242   MCJ5X     
41242   TUAW    
41242   OXVM4    
41242   Kcmev 1
93532   AVEV2     
93532   WCRB3    
93532   LPAQ 2   
93532   DVL2

Any ideas how I could do this? I feel like this shouldn’t be too complicated but not sure of the best approach. Whether a script or some function I’d really appreciate the help.

,

If possible, you might want to use a csv file. These files are plain-text and most spreadsheet programs can open/modify them (I know Excel and the OpenOffice version can). If you go with this approach, your algorithm will look something like this:

read everything into a string array
create a 1 to many data structure (maybe a Dictionary<string, List<string>> or list of (string, string) tuple types)

loop over each line of the file
splice the current line on the ','s and loop over those
if this is the first splice, add a new item to the 1 to many data structure with the current splice as the Id
otherwise, add this splice to the "many" (name) part of the last item in the data structure

create a new csv file or open the old one for writing
output the "ID, Name1" row
loop over each 1-many item in the data collection
loop over the many items in the current 1-many item
output the 1 (id) + "," + current many item (current name)

You could do this in just about any language. If its a one-time use script then Python, Ruby, or Powershell (depending on platform) would probably be a good choice.

Leave a Comment