Press Relase / News Release Distribution Service [@Press]

Note: This page is a machine translation of the Japanese original and is provided for reference only.
In the event of any discrepancy between this page and the original, the original shall prevail. Click here for the original text.

Transcription service Data Green has released new content "ChatGPT and Transcription".

Aladdin, Inc

Aladdin, Inc. (headquartered in Chuo-ku, Fukuoka City, Fukuoka Prefecture; Yoshinao Nagahama, CEO), operator of Data Green (https://www.data-green.jp/), which provides audio and video data transcription, has released new content, "ChatGPT and Transcription.


ChatGPT and Text Transcription


ChatGPT and Transcription

https://www.data-green.jp/chatgpt/


ChatGPT is gaining attention for its ability to quickly generate detailed, non-natural answers to questions in a wide range of fields.

*It has been pointed out that some of the content may not be factual.


Since ChatGPT is text-driven, it cannot stand alone to do the transcription work for you; you will need to use another speech recognition system such as Whisper.

Like ChatGPT, Whisper, developed by OpenAI, is a speech recognition model that takes voice data as input data, analyzes it, and converts the results into text data.

As is the case with other AI speech recognizers, automatic speech transcription often produces very unnatural, almost sutra-like text, with no punctuation, no differentiation between speakers in a multi-person dialogue, etc.


[About transcription by AI

https://www.data-green.jp/ai/


Transcription by Transcript

https://www.data-green.jp/transcript/


Speech Recognition and Transcription

https://www.data-green.jp/speech_recognition/


[Dialect and transcription]

https://www.data-green.jp/dialect/


This is where ChatGPT, which excels in natural language generation, comes in.

By handing over the entire text that has been transcribed by Whisper and asking them to correct errors, insert punctuation appropriately, etc., you can have the transcribed text rewritten with improved readability.

Since transcription using Whisper is trained on data collected from the Web, it shows a high percentage of correct answers for general conversations and topics.

However, the correct response rate tends to decrease for specific technical terms and technical topics such as medical terminology and university lectures.

The AI can be improved by adding specific training data for technical terms.


<Comparison test

In the case of very good sound quality, other AI voice recognition systems, including Whisper, do not have a bad transcription accuracy rate, so we will compare the results of transcription using "data with poor sound quality" and "voice data with loud noises such as environmental sounds," which are difficult to transcribe automatically.


Transcription comparison test No. 1 (data with high noise and poor sound quality)

[Transcription results by Whisper]

Good morning.

Since I broke my comac before, I am not in very good shape.


[Transcription results by Data Green]

It has been a while.

I am not in very good shape since I broke my eardrum before.


※Documents heard in Japanese are written in English. Please understand that point.


The sound quality is so poor that it is difficult to hear clearly, but I was able to hear "It has been a while since I broke my eardrum.

As for "komak," Whisper seems to have misrecognized it because there was noise between "ma" and "ku.


Transcription comparison test No. 2 (speech data with loud ambient noise)

[Transcription result by Whisper]

No one came alone, even the leader of the group.


[Transcription result by Data Green]

The leader was like, "No one's here to give me compliments," and the surrounding voices were loud.


※Documents heard in Japanese are written in English. Please understand that point.


The surrounding voices are so loud that "no one will give me compliments" and "getting into a groove" are difficult to hear, and Whisper misrecognizes them.


Even if you use AI speech recognition to automatically transcribe data with poor sound quality or noise/environmental sounds like this, the quality will be inadequate.

Experienced human verification and correction are essential.


Even if ChatGPT is used for correction, it cannot handle, for example, jargon that has not been generalized or the latest news terminology.

To improve transcription accuracy, it is also important to handle expertise and context appropriately. Especially on specialized topics, a combination of human knowledge is required.



■About Data Green

Data Green provides "highly accurate transcription" by combining voice data analysis technology with the wealth of experience and know-how of our skilled writers.

We can also provide low-cost, 24/7 transcription services for long-duration transcription of data with poor sound quality or highly specialized audio that cannot be handled by AI voice recognition.

We have also acquired the Privacy Mark and ISO27001 (ISMS) certification, the international standard for information security management systems, so you can rely on us to transcribe highly confidential audio data.


Data Green for transcription and transcription services

https://www.data-green.jp/


Features of Data Green

https://www.data-green.jp/#feature


Types of transcription (de-bubbling, transcribing, and editing)

https://www.data-green.jp/#type


[Uses of transcription (interviews, speeches, meetings, interviews, court cases, etc.)]

https://www.data-green.jp/#use


Transcription fees and costs]

https://www.data-green.jp/#price


Transcription data proofreading service

https://www.data-green.jp/proofreading/


Examples of transcription deliverables

https://www.data-green.jp/#example


Data Green's clients' voices

https://www.data-green.jp/customer/wada.html


Transcription and Transcription Special List]

https://www.data-green.jp/recommend/


What is Transcription (History of Transcription)

https://www.data-green.jp/mojiokoshi/


How to reduce the cost of transcription and transcription services.

https://www.data-green.jp/quality_accuracy/


Checklist for improving the quality of audio data

https://www.data-green.jp/pdf/check_list.pdf


How to deal with special formats

https://www.data-green.jp/format/


Recommended voice recorders and microphones for smartphones

https://www.data-green.jp/voice_recorder/


Apps for transcription and transcription help】】

https://www.data-green.jp/app/


Recording and recording of web conferencing and online meetings

https://www.data-green.jp/recording/


Characteristics of text in terms of media types

https://www.data-green.jp/media/


Sentence endings (honorifics and regular forms)

https://www.data-green.jp/sentence_end/


About distortion of notation

https://www.data-green.jp/orthographical_variants/


About filler words

https://www.data-green.jp/filler_word/


Transcription using OCR and Google Drive

https://www.data-green.jp/ocr/


YouTube subtitles and transcription

https://www.data-green.jp/captions/


Transcription and Transcription Glossary

https://www.data-green.jp/word/



Company Profile

Company name: Aladdin, Inc.

Transcription business: DATA GREEN

Data recovery business: Data Rescue Center *1

Registered trademarks: Data Green, Data Rescue Center, Data Rescue, etc.

Patents held: Patent No. 4090494, Patent No. 4236689, Patent No. 5512470

Phone: 092-720-6633 (main)

Head Office : 3F High Hills Building, 1-5-6 Yakuin, Chuo-ku, Fukuoka City, Fukuoka Prefecture, Japan

Capital : 90 million yen

Establishment : May 31, 2002

Representative : Yoshinao Nagahama, President

Certifications: Privacy Mark, ISO27001 (ISMS)

Member organization: 

Japan Data Recovery Association (Executive Director)

https://www.draj.or.jp/

General Incorporated Association Transcription Utilization Promotion Council

 (regular member)

https://mojiokoshi.or.jp/mojiokoshi/

Fukuoka Lawyers Cooperative Association (special agent)

https://fukubenkyo.jp/


Data Green Logo for Transcription and Transcription Services


1 Data recovery business

Data Rescue Center [Official] Data Recovery

https://www.rescue-center.jp/

Data Rescue Center [Official] Twitter

https://twitter.com/DRC_JP

Data Rescue Center [Official] Column

http://blog.rescue-center.jp/


Image

Logo Image