Monday, October 26, 2015

Simple bash script to import data into MongoDB

import-data.sh file:
#! /bin/bash
_dfiles=/Users/computer/Downloads/NYSE/*.csv
for f in $_dfiles
do
mongoimport --type csv -d db -c nyse --headline --file ${f} --jsonArray

done 

Setting up hadoop tips

How to Set Up Hadoop Home to use hadoop command directly:
vim ~/.profile
export HADOOP_HOME=/usr/local/bin/hadoop-2.7.1;
export PATH=$PATH:${HADOOP_HOME}/bin;

alias hstart="/usr/local/bin/hadoop-2.7.1/sbin/start-dfs.sh;/usr/local/bin/hadoop-2.7.1/sbin/start-yarn.sh";
alias hstop="/usr/local/bin/hadoop-2.7.1/sbin/stop-yarn.sh;/usr/local/bin/hadoop-2.7.1/sbin/stop-dfs.sh";

Why have to source ~/.profile every time? 
—> since ~/.bash_profile exists.
Solution:

echo "source ~/.profile" >> .bash_profile

Hadoop-- localhost 9000 connection refused

hadoop localhost 9000 connection refused:
telnet localhost 9000
netstat -lpten | grep 9000
return something:


Finally the reason:
core-site.xml:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/bin/hadoop-2.7.1/tmp</value>
</property>
</configuration>
1. Lack of closed </configuration> tap.


2. the tmp directory didn’t exist

Summary:
1. Check network.
2. Check hadoop configuration (hadoop-env.xml/core-site.xml/hdfs-site.xml/mapred-site.xml.tmplate)
3. Always close hadoop services, or will aways get swap file.