Google Cloud Vision APIを使って画像から文字を読み取る

ほぼ、「やってみた」レベルではありますが、自分のメモとしても書いておこうと思います。

やりたかったことは、画像から文字を読み取りたい、ということです。そのために、Google Cloud Visionを使います。

最初に言うと、プログラミング的なことはほぼGoogleさんが提供するサンプルコードでできるので、ここで独創的なことは何もやっていないのですが、わかりづらかったのはGoogle CloudのAPIを有効にすることなので、そこを絡めてメモしておきたいと思いました（笑）。
なので、その辺を含め紹介していきます。

①Google Cloudにアカウントがなければ作りましょう。
Google Cloud Console

②Vision APIを設定にします。
下記を読んで、手順通りにやります。
https://cloud.google.com/vision/docs/setup

途中、秘密鍵を設定する、環境変数でそれを設定する、など面倒なくだりがありますが、やります。

Cloud SDKのインストールもします。これ、結構時間がかかります。

③②まで終わったら、次は下記のURLを見ましょう。

https://cloud.google.com/vision/docs/quickstart-client-libraries

クライアントライブラリをインストールする

のあたりから始めます。私はPythonで行きます。(`･ω･´)

pip install --upgrade google-cloud-vision

とやります。

で、ついにサンプルを動作させることができます！

下記の猫ちゃんの写真を読み込むサンプルを、Googleさんが用意してくれています。ありがたや~

https://raw.githubusercontent.com/googleapis/python-vision/master/samples/snippets/quickstart/resources/wakeupcat.jpg

import io
import os

# Imports the Google Cloud client library
from google.cloud import vision

# Instantiates a client
client = vision.ImageAnnotatorClient()

# The name of the image file to annotate
file_name = os.path.abspath('resources/wakeupcat.jpg')

# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
    content = image_file.read()

image = vision.Image(content=content)

# Performs label detection on the image file
response = client.label_detection(image=image)
labels = response.label_annotations

print('Labels:')
for label in labels:
    print(label.description)

上記を実行してみるも…

 
Could  not automatically determine credentials. Please set  GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and  re-run the application. For more information, please see...

というエラーが出ます。

セイセイセイ~　ちょっと待ってくださいよ~　

と言いたくなりますが、落ち着いてググると、下記に情報があります

https://stackoverflow.com/questions/45501082/set-google-application-credentials-in-python-project-to-use-google-api

下記の一文を、コードに追加します。

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="D:/vision_api_key/key.json"
#上記はただのサンプルなので、D:/以下はご自分のキーファイルの場所を指定してください。

すると、次は課金が有効ではない、というエラーが出ます。

Google Cloudで課金を有効にします。下記のリンクを見てやりましょう。
https://cloud.google.com/billing/docs/how-to/modify-project?hl=ja&visit_id=637623682365584203-1106306741&rd=1#confirm_billing_is_enabled_on_a_project

はぁ、はぁ、ついに実行できましたかね？

コードは全文でこのようになります。

import io
import os

# Imports the Google Cloud client library
from google.cloud import vision

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="D:/vision_api_key/key.json"

# Instantiates a client
client = vision.ImageAnnotatorClient()

# The name of the image file to annotate
file_name = os.path.abspath('wakeupcat.jpg')

# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
    content = image_file.read()

image = vision.Image(content=content)

# Performs label detection on the image file
response = client.label_detection(image=image)
labels = response.label_annotations

print('Labels:')
for label in labels:
    print(label.description)

実行すると

Labels:
 Cat
 Window
 Felidae
 Carnivore
 Jaw
 Ear
 Small to medium-sized cats
 Window blind
 Gesture
 Whiskers

と表示され、おお~　画像が読み込まれた！というのがわかります。

えー、しかし、上記のは画像にラベルをつけるというサンプルです。

我々の目標は、書いてある字を読み取ることでした！

今度はこちらのサンプルを使います。

https://cloud.google.com/vision/docs/ocr?hl=ja

今回は、下記の画像を使います。とある日の私のコンビニの領収書です。

なので、ちょっと上記を変更します。text_detection()というメソッドを使います。


import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="D:/vision_api_key/key.json"

def detect_text(path):
    """Detects text in the file."""
    from google.cloud import vision
    import io
    client = vision.ImageAnnotatorClient()

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.Image(content=content)

    response = client.text_detection(image=image)
    texts = response.text_annotations
     
    return texts

texts = detect_text("receipt.jpg")
print('\n"{}"'.format(texts[0].description))

結果は下記のようになります。

"横浜北幸2丁
 神奈川県横浜市西区北幸 2丁目9-
 目店
 電話: 045-324-5611
 レジ#4
 領 収書
 7PLメンスヒ*オレSイ化粧水ソープ
 スターハックスアイスチャイティーラテ
 7Pセフンフレット3枚入
 009
 00乙米
 *88
 小 計 (税抜8%)
 消費税等( 8%)
 小計 (税抜10%)
 消費税等(10%)
 合 計
 (税率 8%対象
 (税率10%対象
 (内消費税等8%
 (内消費税等10%
 nanaco支払
 お買上明細は上記のとおりです。
 []マークは軽滅税率対象です。 nanaco番号 nanaco残高 今回ポイント ポ”イント残高 伝票番号 半288 半23 009夫 09夫 ギ971 半311) (099
 半23)
 (09夫
 工L6夫
 
 ****4717
 ギ3, 357
 LTLヤ
 4 P
 240P
 210-621-454-5134
 Thet x帆術海戦
 プレゼントキャンペーン
 セブンネットでお買い物&セブンイレブン受取りで
 応募者から抽選で711名様に
 呪術迴戰ォリジナルQUOカード
 をプレゼント
 thopping
 *判>11
 vG-MーC4ト
 O芥見下々/菜社,呪術圈戦割作载同会
 "

ちょっとね~　うーん、残念！

という感じですね。ただ、定型文みたいなものはほぼ読み込めていると思われます。

Google Cloud Vision APIを使って画像から文字を読み取る

クライアントライブラリをインストールする

いいね:

関連

コメントを残すコメントをキャンセル

クライアント ライブラリをインストールする

共有:

いいね:

関連

コメントを残す コメントをキャンセル

クライアントライブラリをインストールする

コメントを残すコメントをキャンセル