Archive for March 23rd, 2010

Howto import a UTF8 textbook (book) in MySQL table / A simple step by step guide through on howto import books in MySQL

Tuesday, March 23rd, 2010

I was looking forward to import a textbook I own to MySQL in order to be able later to easily manipulate the text with MySQL queries. After some time spend on trying hard. Here is the steps I took to import the textbook:

1. First we create necessery database and set default charset to UTF8:

freebsd$ mysql -u root -p
# create database textbook in mysql and change it to your likings
mysql$ CREATE database "textbook";
mysql$ use textbook;
mysql$ SET NAMES UTF8;

2. Then we CREATE Necessery database that will use further as a table to import the textbook into:
Below we create the table “textbook” with one column “sentence”

mysql$ CREATE table textbook (sentence varchar(5000);

3. Now we import the book:

mysql$ LOAD DATA INFILE '/path/to/file/textbook.txt' INTO TABLE textbook;

4. Last it might be a good idea to add some extra numbered column to be able to track the lines of the textbook as below:

mysql$ ALTER table textbook ADD COLUMN ID INT NOT NULL auto_increment FIRST, ADD PRIMARY KEY(ID);

In my case the book was in cyrillic and after I’ve taken the above steps I didn’t have any problems with cyrillic letters in the table.
Of course the above method is a bit dump since it’s not flexible enough and doesn’t track the textbook parts or titles, however it’s still a good way to store example on how to store text data in mysql table and could help somebody further in his journey in learning MySQL and next to that serving some simple daily SQL taks.

Fix weird problem with Apache default encoding (AddDefaultCharset) not taking effect on FreeBSD with Apache version (2.0.63)

Tuesday, March 23rd, 2010

I encountered a weird problems a couple of minutes before. I tried to put some text files written in UTF-8 which contain cyrillic in the file names as well as the content the files has in my Apache directory listing. Trying that produced a lot of “alien symbols” for both file names and file content.
I have to admit that wasn’t exactly what I expected. I’ve checked my /usr/local/etc/apache/httpd.conf just to find out I already have the:

AddDefaultCharset utf-8

within in the configuration, I thought that somehow Apache could have difficulties applying that rules for the custom Virtualhost and therefore give the directive a try locally in the VirtualHost. Grievously that didn’t produced any positive result. That’s why I decided to check if somebody has experienced the same weird behavior and I stumbled on this wondeful blog post .
In short on the plug and pray’s blog … Wait and moment plug and pray? Yes you read that correctly it’s plug and pray :), on that blog it’s explained that newer versions of Apache 2.2.4 as this is the version the blogger talks about as well as the latest Apache release has an issue with AddDefaultCharset not working anymore. Fortunately the issue with the AddDefaultCharset not taking effect in Apache no more is easily solved. All that needs to be done is including the:

IndexOptions Charset=UTF-8

Within Apache conf file. In my case on FreeBSD I had to include that in /usr/local/etc/apache2/httpd.conf after which all worked perfectly fine and I can see all my cyrillic symbols showing anew.