Posts Tagged ‘tag’

How to convert file content encoded in windows-cp1251 charset to UTF-8 (with iconv) to be delivered properly encoded to browsing end clients

Wednesday, May 16th, 2012

windows-cp1251 bulgarian to UTF-8 / Encoding Communication Decoding Communication Funny Picture

I have a bunch of old html files all encoded in the historically obsolete Windows-cp1251. Windows-CP1251 used to be common used 7 years ago and therefore still big portions of the web content in Bulgarian / Russian Cyrillic is still transferred to the end users in this encoding.

This was just before the "UTF-8 revolution", where massively people started using UTF-8,
Well it was clear the specific national country text encoding standards will quickly be moved by to UTF-8 – Universal Encoding format which abbreviation stands for (Unicode Transformation Format).

Though UTF-8 was clear to be "the future", many web developers mostly because of their incompetency or using an old sources of learning how to writen in HTML continued to use windows-cp1251 in HTMLs. I'm even convinced, there are still developers out there who are writting websites for Bulgarian / Russian / Macedonian customers using obsolete encodings …

The smarter developers of those accustomed to windows-cp1251, KOI-8R etc. etc., were using the meta tag to specify the type of charset of the web page content with:

<meta http-equiv="content-type" content="text/html;charset=windows-cp1251">

or

<meta http-equiv="content-type" content="text/html;charset=koi-8r">

Anyhow, still many devs even didn't placed the windows-cp1251 in the head of the HTML …

The result for the system administrator is always a mess – a lot of webpages that are showing like unreadable signs and tons of unhappy customers.
As always the system administrator is considered responsible, for the programmer mistakes :). So instead of programmers fix their bad cooking, the admin has to fix it all!

One quick work around me as admin has applied to failing to display pages in Cyrillic using the Windows-cp1251 character encoding was to force windows-cp1251 as a default encoding for the whole virtualhost or Apache directory with Apache directives like:

<VirtualHost *:80>
ServerAdmin some_user@some_host.com
DocumentRoot /var/www/html
AddDefaultCharset windows-cp1251
ServerName the_host_name.com
ServerAlias www.the_host_name.com
....
....
<Directory>
AddDefaultCharset windows-cp1251
>/Directory>
</VirtualHost>

Though this mostly would, work there are some occasions, where only a particular html files from all the content served by Apache is encoded in windows-cp1251, if most of the content is already written in UTF-8, this could be a big issues as you cannot just change the UTF-8 globally to windows-cp1251, just because few pages are written in archaic encoding….
Since most of the content is displayed to the client by Apache (as prior explained) just fine, only particular htmls lets's ay single.html, single2.html etc. etc. are displayed with some question marks or some non-human readable "hieroglyphs".

Below is a screenshot from two pages returned to my browser in wrongly set htmls charset:

Improper Windows CP1251 encoding with Apache set to serve UTF-8 encoding questiomarks

Improper Windows CP1251 delivered page in UTF-8 browser view

Apache returns cp1251 in some non-UTF8 wrong encoding (webserver improperly served cyrillic encoding)

Improperly served encoding CP1251 delivered by Apache in non-utf-8 encoding

When this kind of issues occur, the only solution is to simply login to the server and use iconv command to convert all files returning unreadable content from whatever the non UTF-8 encoding is lets say in my case Bulgarian typeset of cp1251 to UTF-8

Here is how the iconv command to convert between windows-cp1251 to utf-8 the two sample files named single1.html and single2.html

server:/web# /usr/bin/iconv -f WINDOWS-1251 -t UTF-8 single1.html > single1.html.utf8
server:/web# mv single1.html single1.html.bak;
server:/web# mv single1.html.utf8 single1.html
server:/web# /usr/bin/iconv -f WINDOWS-1251 -t UTF-8 single2.html > single2.html.utf8
server:/web# mv single2.html single2.html.bak;
server:/web# mv single2.html.utf8 single2.html

I always, make copies of the original cp1251 encoded files (as you see mv single1.html single1.html.bak), because if something goes wrong with convertion I can easily revert back.

If there are 10 files with consequential numbers naming they can be converted using a short for loop, like so:

server:/web# for i $(seq 1 10); do
/usr/bin/iconv -f WINDOWS-1251 -t UTF-8 single$i.html > single$i.html.utf8;mv single$i.html single$i.html.bak
mv single$i.html.utf8 single$i.html
done

Just as earlier mentioned if single1.html, single2.html … has in the html <head>:

<meta http-equiv="Content-Type" content="text/html; charset=windows-1251">

You should open, each of the files in question and wipe out the line either by hand or use sed to wipe it in one loop if it has to be done for lets say 10 files named (single{1..10})

server:/web# for i in $(seq 1 10); do
sed '/<meta http-equiv="Content-Type" content="text\/html; charset=windows-1251>/d' single$i.txt > single$i.txt.new;
mv single$i.txt single$i.txt.bak;
mv single$i.txt.new single$i.txt

Well now,

HasciiCAM supposed to stream ASCII video over the network on GNU / Linux

Tuesday, May 22nd, 2012

Richard M. Stallman (RMS) Face portrait rendered in ASCII art from a video with hasciicam
To continue with my lately ASCII centered articles I found hasciicam
hasciicam is a program to stream ASCII video over the network on Linux and probably can be easily made working on FreeBSDtoo.

The project concept is interesting in a matter of fun (play) point of view, however not too usable as we all know ASCII character looking faces doesn't look too pretty.

Below is the Debian (Squeeze) package description:

noah:~# apt-cache show hasciicam|grep -i description -A 7
Description: (h)ascii for the masses: live video as text
Hasciicam makes it possible to have live ASCII video on the web. It
captures video from a tv card and renders it into ascii, formatting the
output into an html page with a refresh tag or in a live ASCII window or
in a simple text file as well, giving the possibility to anybody that has a
bttv card, a Linux box and a cheap modem line to show a live ASCII video
feed that can be browsable without any need for plugin, java etc.
Homepage: http://ascii.dyne.org/

On hasciicam Project webpage is it is stated as a hardware you need to have:
 

"As hardware you need to have a webcam or a videocard supported by "video 4 linux", most of the gear you can buy around should work well."

To install and test it I run:

noah:~# apt-get --yes install hasciicam

Though it is stated on the project website supposed to work display video fine with most 'linux ready' webcams, it didn't with this very standard one.

Here is the exact WebCamera model as identified to the kernel:

noah:~# dmesg|grep -i camera
[ 1.433661] usb 2-2: Product: USB2.0 Camera
[ 10.107840] uvcvideo: Found UVC 1.00 device USB2.0 Camera (1e4e:0102)
[ 10.110660] input: USB2.0 Camera as /devices/pci0000:00/0000:00:1d.7/usb2/2-2/2-2:1.0/input/input11

By the way, I use the very same CAM daily on for Skype video calls as well as the Camera is working with no problems to save video or pictures inside Cheese

Here is the exact WebCamera model as identified to the kernel:

noah:~# dmesg|grep -i camera
[ 1.433661] usb 2-2: Product: USB2.0 Camera
[ 10.107840] uvcvideo: Found UVC 1.00 device USB2.0 Camera (1e4e:0102)
[ 10.110660] input: USB2.0 Camera as /devices/pci0000:00/0000:00:1d.7/usb2/2-2/2-2:1.0/input/input11

The just installed deb has one binary file only /usr/bin/hasciicam. To test it with the camera I issued:

noah:~# hasciicam -d /dev/video0
HasciiCam 1.0 - (h)ascii 4 the masses! - http://ascii.dyne.org
(c)2000-2006 Denis Roio < jaromil @ dyne.org >
watch out for the (h)ASCII ROOTS

Device detected is /dev/video0
USB2.0 Camera
1 channels detected
max size w[640] h[480] - min size w[48] h[32]
Video capabilities:
VID_TYPE_CAPTURE can capture to memory
!! error in ioctl VIDIOCGMBUF: : Invalid argument

Unfortunately as you see from the output, it failed to detect the web camera model.
The exact camera besides its kernel detection naminf is a cheap external USB 2.0 (fake brand / nonanem) "universal" Web PC Camera (SUPER .3mega pixel)

For those who have a further interest in building and installing hasciicam on other Linux platforms than Debian and Ubuntu or whoever wants to look in the code check check Project webpage is. For those who are less of programmers (like me) the project is written in C programming language and uses aa-lib in order to render the video to ASCII.

On the site you will notice two totally schizophrenic looking pictures of presumably the project head developer …

hasciiart video streamed ASCII screenshot of some crazy looking guy smoking marijuanna or smth

As I read in man hasciicam manual page it's said to be able to generate ascii plain text and html files as well as directly to write the output to console, which later probably can be streamed via the network.
Pitily as it didn't detect my camera I couldn't make some testing of its network capabilities.

A Streaming of ASCII couuld be done through pushing the .html output to a webserver and setting a php or javascript to loop through and refresh the browser over the uploaded files every sec or so.

Also I assume the ASCII video output saved in plain console could be streamed via netcat or some tiny scripted perl or bash script and directly observed via a telnet or ssh connection.
One playful way I can think of checking a stored video without the use of FTP is to login via ssh and do:

$ ssh someuser@somehost
$ watch -n 1 "cat video-ascii.html"

🙂

Well something disturbing about hasciicam from a (purely Christian point of view) is it was developed by some kind of non profit organization called RastaSoft on the project website, some of its authors has written JAH BLESS.

As I didn't succeeded seeing it working, I'll be interested to hear if someone who red this article and give it a try can report the web camera model used.

Automatic blog posts tagging in wordpress blog 3.1 with (auto-tags) / wp plugin to increase Search Engine ranking

Monday, April 4th, 2011

There are plenty of articles, on how to increase search engine ranking in wordpress and I’m sure this article might be not that interesting but still I thought it might be nice to mention about this 3 wordpress plugins Auto-Tags, SEO Slugs and Platinium SEO Pack which will help you increase your traffic.

Let me say a few words for each of the 3 plugins:

1. Auto-tags
Below is the description of the plugin directly taken from the plugin website http://wordpress.org/extend/plugins/auto-tag/

This plugin uses the Yahoo.com and tagthe.net APIs to find the most relevant keywords
from the content of your post, and then adds them as tags.
New for version 0.2: an options page allows to choose how many tags are
retrieved from each service The tag adding is fully automatic,
so if you're using a plugin like feedwordpress to display RSS feeds
on your blog as posts, everything will get done as the feed
posts are published. No user intervention necessary!

Here are the installation instructions for auto-tags:

debian:~# cd /var/www/blog/wp-content/plugins
debian:/var/www/wp-content/plugins:# wget https://www.pc-freak.net/files/auto-tag.0.4.6.zip
100%[================================>] 14,325 45.3K/s in 0.3s

2011-04-04 12:30:17 (45.3 KB/s) – `auto-tag.0.4.6.zip’ saved [14325/14325]
debian:/var/www/wp-content/plugins:# unzip auto-tag.0.4.6.zip

In the above example my wordpress installation is in /var/www/blog/ , if your wordpress is installed in another directory location change to the respective directory.

To activate the Plugin go to:

Plugins -> Auto Tags
Press over Activate to activate the plugin.

To configure the Auto-tags plugin navigate to:

Settings -> Auto tags plugin

Auto tags Screen options

Therein you can configure the number of post tags to be retrieved from Yahoo, tagthe.net. The settings also allows you to disable certain tags you don’t want to appear in your post tags from the field, Remove those tags (comma separated)

The plugin also has an option called Append tags to the ones that already exist which on my wordpress 3.1 installation doesn’t work

After ending up your desired configuration simply press the Update Options button.

Now each time you type a new post in your wordpress blog, a tags related to the post will automatically be included.
Based on this tags Search engines will easily find content that relates to your blog tags and thus your page indexing will get better.